Frankline Oyolo, Misango

Published on

Attention-Fused Multi-Scale OSM Context for Spatio-Temporal GNNs in Urban Mobility Flow Forecasting

Abstract

Urban mobility systems require accurate short-term forecasts for effective resource management and infrastructure planning. While Graph Neural Networks (GNNs) have advanced spatio-temporal prediction in transportation, they largely neglect explicit multi-scale urban context that drives mobility patterns. We introduce an enhanced Diffusion Convolutional Recurrent Neural Network (DCRNN) that enriches standard architectures with multi-scale OpenStreetMap features and attention-based fusion mechanisms. Our approach integrates comprehensive urban infrastructure characteristics across 500m, 1000m, and 1500m radii, enabling adaptive spatial scale selection based on station characteristics. Evaluated on 90,000 trips across 714 Swiss bike-sharing stations, our method achieves significant improvements over strong baselines (ConvLSTM, ST-GCN): approximately 13% RMSE reduction with RMSE of 2.52, MAE of 1.74, and R² of 0.723. The attention mechanism reveals interpretable patterns aligned with urban planning intuitions, where downtown stations emphasize local accessibility while peripheral stations focus on broader connectivity.

Introduction

Urban bike-sharing systems generate massive mobility flows that cities must predict to optimize rebalancing operations, plan infrastructure investments, and enhance user experience. The fundamental challenge lies in modeling both spatial dependencies between stations and temporal dynamics simultaneously, while incorporating the rich urban context that fundamentally influences mobility patterns.

Current spatio-temporal forecasting methods face a critical limitation: they largely ignore the explicit multi-scale urban context available through comprehensive geographical databases. While Graph Neural Networks have advanced spatial modeling capabilities and Recurrent Neural Networks excel at temporal prediction, existing approaches fail to systematically leverage the hierarchical urban infrastructure information that drives mobility patterns at different spatial scales.

Our work addresses this gap by enhancing the Diffusion Convolutional Recurrent Neural Network (DCRNN) with two key innovations: (1) systematic multi-scale OpenStreetMap feature extraction capturing urban context at 500m, 1000m, and 1500m radii, and (2) a soft attention fusion mechanism that learns to select appropriate spatial scales for each station based on local urban characteristics.

This approach enables adaptive spatial reasoning where downtown stations can focus on immediate walkability features (500m radius) while peripheral stations emphasize broader connectivity patterns (1500m radius). The result is a more contextually-aware model that better captures the diverse urban morphologies present in modern cities.

Key Contributions

  • Multi-scale urban context integration: Attention-based fusion of multi-scale OSM features for interpretable spatio-temporal GNN forecasts
  • Adaptive scale selection: Soft attention mechanism that weights spatial scales based on station urban characteristics
  • Strong empirical results: 13% RMSE improvement over baselines on 90,000 trips across 714 stations
  • Urban planning insights: Interpretable patterns aligned with urban morphology principles

Multi-Scale Urban Context Framework

Problem Formulation

Given a bike-sharing network as graph $G = (V, E, A)$ with $N=714$ stations, we predict future flow matrix from historical sequences and multi-scale urban features.

The spatial adjacency matrix uses Gaussian kernel weighting:

Each station is enriched with hierarchical OSM features at multiple scales:

For , yielding 47 features per scale.

Attention-Based Multi-Scale Feature Fusion

We employ learnable attention with station-specific context awareness:

This enables urban core stations to emphasize local walkability while peripheral stations focus on broader connectivity.

Enhanced DCRNN Architecture

Diffusion Convolution Operator:

DCGRU Cell:

Training Pipeline

Algorithm — Enhanced DCRNN Training Pipeline
Input: Flow data , Multi-scale features , Graph
Output: Trained model
1.

Initialize parameters , optimizer, schedulers

D_train, D_val, D_test ← temporal_split(F, [0.6, 0.2, 0.2])

2.

For epoch = 1 to max_epochs:

For batch in D_train:

X_fused ← attention_fusion(X, Θ_att)

Ŷ ← DCRNN_forward(batch, X_fused, G, Θ)

J ← MSE(Ŷ, Y_true) + λ ||Θ||_2^2

Θ ← Adam_update(∇_Θ J)

J_val ← validate(D_val, Θ)

If J_val improves:

save_checkpoint()

Else:

patience_counter += 1

If patience_counter > 15 break

3.

Return from best checkpoint

Experimental Results

Evaluated on 90,000 trips across 714 Swiss bike-sharing stations:

  • RMSE: 2.52 (13% improvement over baselines)
  • MAE: 1.74
  • R²: 0.723

The attention mechanism reveals interpretable patterns where downtown stations emphasize local accessibility (500m) while peripheral stations focus on broader connectivity (1500m).

Implementation Details

Architecture Specifications:

  • Optimizer: Adam (lr=, , , weight decay=)
  • Batch size: 32, sequence length: 6 hours, prediction horizon: 1 hour
  • Hidden dimensions: 64, dropout: 0.2, diffusion steps: 3
  • Regularization: L2 penalty , gradient clipping: 5.0

Hyperparameter Selection:

  • Learning rate:
  • Hidden dimensions:
  • Diffusion steps:
  • Dropout:

Computational Requirements:

  • Hardware: NVIDIA RTX 3080 (8GB)
  • Training time: ~71 minutes with early stopping
  • Memory usage: 2.4GB peak

Full Paper

Conclusion

Our enhanced DCRNN with attention-fused multi-scale OSM context provides a significant advancement in urban mobility forecasting. The approach not only improves prediction accuracy but also offers interpretable insights that align with urban planning principles. The adaptive spatial scale selection enables contextually-aware modeling that captures the diverse urban morphologies present in modern cities, making it particularly valuable for smart city applications and infrastructure planning.

The framework demonstrates how systematic integration of geographical databases with advanced neural architectures can yield both performance gains and actionable urban insights, paving the way for more intelligent and equitable urban mobility systems.