- Published on
Attention-Fused Multi-Scale OSM Context for Spatio-Temporal GNNs in Urban Mobility Flow Forecasting
Abstract
Urban mobility systems require accurate short-term forecasts for effective resource management and infrastructure planning. While Graph Neural Networks (GNNs) have advanced spatio-temporal prediction in transportation, they largely neglect explicit multi-scale urban context that drives mobility patterns. We introduce an enhanced Diffusion Convolutional Recurrent Neural Network (DCRNN) that enriches standard architectures with multi-scale OpenStreetMap features and attention-based fusion mechanisms. Our approach integrates comprehensive urban infrastructure characteristics across 500m, 1000m, and 1500m radii, enabling adaptive spatial scale selection based on station characteristics. Evaluated on 90,000 trips across 714 Swiss bike-sharing stations, our method achieves significant improvements over strong baselines (ConvLSTM, ST-GCN): approximately 13% RMSE reduction with RMSE of 2.52, MAE of 1.74, and R² of 0.723. The attention mechanism reveals interpretable patterns aligned with urban planning intuitions, where downtown stations emphasize local accessibility while peripheral stations focus on broader connectivity.
Introduction
Urban bike-sharing systems generate massive mobility flows that cities must predict to optimize rebalancing operations, plan infrastructure investments, and enhance user experience. The fundamental challenge lies in modeling both spatial dependencies between stations and temporal dynamics simultaneously, while incorporating the rich urban context that fundamentally influences mobility patterns.
Current spatio-temporal forecasting methods face a critical limitation: they largely ignore the explicit multi-scale urban context available through comprehensive geographical databases. While Graph Neural Networks have advanced spatial modeling capabilities and Recurrent Neural Networks excel at temporal prediction, existing approaches fail to systematically leverage the hierarchical urban infrastructure information that drives mobility patterns at different spatial scales.
Our work addresses this gap by enhancing the Diffusion Convolutional Recurrent Neural Network (DCRNN) with two key innovations: (1) systematic multi-scale OpenStreetMap feature extraction capturing urban context at 500m, 1000m, and 1500m radii, and (2) a soft attention fusion mechanism that learns to select appropriate spatial scales for each station based on local urban characteristics.
This approach enables adaptive spatial reasoning where downtown stations can focus on immediate walkability features (500m radius) while peripheral stations emphasize broader connectivity patterns (1500m radius). The result is a more contextually-aware model that better captures the diverse urban morphologies present in modern cities.
Key Contributions
- Multi-scale urban context integration: Attention-based fusion of multi-scale OSM features for interpretable spatio-temporal GNN forecasts
- Adaptive scale selection: Soft attention mechanism that weights spatial scales based on station urban characteristics
- Strong empirical results: 13% RMSE improvement over baselines on 90,000 trips across 714 stations
- Urban planning insights: Interpretable patterns aligned with urban morphology principles
Multi-Scale Urban Context Framework
Problem Formulation
Given a bike-sharing network as graph $G = (V, E, A)$ with $N=714$ stations, we predict future flow matrix from historical sequences and multi-scale urban features.
The spatial adjacency matrix uses Gaussian kernel weighting:
Each station is enriched with hierarchical OSM features at multiple scales:
For , yielding 47 features per scale.
Attention-Based Multi-Scale Feature Fusion
We employ learnable attention with station-specific context awareness:
This enables urban core stations to emphasize local walkability while peripheral stations focus on broader connectivity.
Enhanced DCRNN Architecture
Diffusion Convolution Operator:
DCGRU Cell:
Training Pipeline
Initialize parameters , optimizer, schedulers
D_train, D_val, D_test ← temporal_split(F, [0.6, 0.2, 0.2])
For epoch = 1 to max_epochs:
For batch in D_train:
X_fused ← attention_fusion(X, Θ_att)
Ŷ ← DCRNN_forward(batch, X_fused, G, Θ)
J ← MSE(Ŷ, Y_true) + λ ||Θ||_2^2
Θ ← Adam_update(∇_Θ J)
J_val ← validate(D_val, Θ)
If J_val improves:
save_checkpoint()
Else:
patience_counter += 1
If patience_counter > 15 break
Return from best checkpoint
Experimental Results
Evaluated on 90,000 trips across 714 Swiss bike-sharing stations:
- RMSE: 2.52 (13% improvement over baselines)
- MAE: 1.74
- R²: 0.723
The attention mechanism reveals interpretable patterns where downtown stations emphasize local accessibility (500m) while peripheral stations focus on broader connectivity (1500m).
Implementation Details
Architecture Specifications:
- Optimizer: Adam (lr=, , , weight decay=)
- Batch size: 32, sequence length: 6 hours, prediction horizon: 1 hour
- Hidden dimensions: 64, dropout: 0.2, diffusion steps: 3
- Regularization: L2 penalty , gradient clipping: 5.0
Hyperparameter Selection:
- Learning rate:
- Hidden dimensions:
- Diffusion steps:
- Dropout:
Computational Requirements:
- Hardware: NVIDIA RTX 3080 (8GB)
- Training time: ~71 minutes with early stopping
- Memory usage: 2.4GB peak
Full Paper
Conclusion
Our enhanced DCRNN with attention-fused multi-scale OSM context provides a significant advancement in urban mobility forecasting. The approach not only improves prediction accuracy but also offers interpretable insights that align with urban planning principles. The adaptive spatial scale selection enables contextually-aware modeling that captures the diverse urban morphologies present in modern cities, making it particularly valuable for smart city applications and infrastructure planning.
The framework demonstrates how systematic integration of geographical databases with advanced neural architectures can yield both performance gains and actionable urban insights, paving the way for more intelligent and equitable urban mobility systems.
