TGIS: the concept
A compound AI system that answers cross-source transport questions, with every number attributable to a source and the reasoning available for review.
What TGIS is, in one frame
TGIS (the Transport Global Intelligence System) is not a chatbot on top of transport data, and it is not a new data standard. It is a system whose purpose is to combine heterogeneous transport data across sources that were never designed to talk to each other, and to return answers whose trustworthiness is visible: every number attributable to a specific source, the reasoning available for review, and any disagreement between sources surfaced rather than silently collapsed.
Fig 1. TGIS in context. An orchestrator triggers domain experts (structured, geospatial, unstructured), a fusion expert that combines aligned evidence, and a reconciliation expert that surfaces disagreement where sources conflict. The system queries sources and remote compute, then returns an answer as text, tables, maps, figures, and a reconciliation log.
It builds on the foundations already in the space (the Transport Data Commons, SDMX, Oxford's infrastructure-resilience work, the Asian Transport Outlook, the Africa Transport Systems Database, and others) rather than replacing them. What it adds is the connective reasoning layer that lets them be used together.
The two sides: ecosystem and outcomes
Every question directed at TGIS falls on one of two sides of a primary cut, and the distinction runs through the whole system.
Ecosystem questions probe the data itself. Does the data exist for this country, mode, and period? Is it current? Is it comparable across sources? Can it be trusted? Who does it miss? These are the questions that determine what is answerable.
Outcomes questions are about real-world transport. What does mobility cost? How fast and reliable is it? How safe, sustainable, inclusive? These are the questions policymakers and analysts ultimately care about.
Existing transport dashboards answer outcomes questions silently, showing numbers without exposing the ecosystem constraints beneath them. TGIS treats the ecosystem layer as a first-class citizen, not as a methodological footnote. This is where the system's distinctive value sits.
The two sides of TGIS (upstream work on the data ecosystem, downstream work on outcomes) follow this cut.
Ecosystem: what TGIS does upstream
The upstream side of TGIS makes the transport data ecosystem more AI-ready and more usable. It serves the organisations that build and maintain the data: TDC, the Asian Transport Outlook, national transport agencies, research programmes.
Three capabilities run through the upstream work.
Agentic tools for SDMX structure-building and format conversion compress work that would traditionally occupy international committees for years. AI agents analyse existing national datasets, compare structures across them, and draft data-structure definitions for expert review rather than expert construction. This contributes directly to TDC's standards mission and to the SDMX community's AI-readiness initiative.
Coverage, freshness, and provenance become queryable properties of a dataset rather than annex material. An analyst or policymaker can ask whether a given indicator exists for a given country, how fresh the latest observation is, whether the figures are comparable with another country's reporting, or who is systematically missing from the data. The answer is retrievable, not assumed.
Gap analysis and prescription (recommending what a country or region needs to collect, harmonise, or open in order to support a specific decision) is a capability no existing transport platform attempts. This is the quadrant where TGIS is most defensible as a public good for the UN Decade.
Seven categories of data question
The data side resolves to seven categories, distilled from established data-quality frameworks (IMF DQAF, UN NQAF, ESS CoP, FAIR, ISO 5259, ISO 19157, World Bank SPI, QA4EO, RAGAS, SDMX AI-readiness). These seven dimensions map the full question space for any transport dataset.
- Coverage. Does data exist for the country, period, mode, and segment I care about? e.g. Does this indicator exist for Kenya at county level?
- Freshness. How recent is the latest value? e.g. How recent is the latest road-fatality figure for Nigeria?
- Harmonisation. Is this comparable across sources, countries, definitions? e.g. Are Kenya's rail-freight figures comparable with UIC definitions?
- Provenance. Where does this number come from: official, estimated, or modelled? e.g. Is this number official, modelled, or estimated?
- Accuracy / Uncertainty. How reliable is this value? e.g. What is the uncertainty budget for WHO-modelled road deaths in Uganda?
- Representativeness. What is systematically missing? e.g. Is the informal sector (minibuses, motorcycle taxis) reflected in this dataset?
- Accessibility. Can I actually get to this data? e.g. Is it available via API, file, or only behind a paywall?
Fig 2. The seven categories of ecosystem question. Each is a dimension data partners can improve and TGIS can measure.
Outcomes: what TGIS does downstream
The downstream side of TGIS answers real-world transport questions by combining evidence across sources. It serves decision-makers.
Two of TGIS's experts are dedicated to this work: a fusion expert that aligns evidence across modalities, and a reconciliation expert that surfaces disagreement when sources publish materially different values for the same indicator.
Combining sources is more than concatenation. Most of it is mechanical fusion: aligning units, normalising formats, translating geographic references (countries to ISO codes, regions to bounding boxes), aligning time periods that aren't natively comparable, joining results across data types so statistical indicators sit alongside geospatial features and policy documents. In most cross-source questions the sources are complementary rather than competing: different facets of the same picture. A smaller fraction of work involves reconciliation in the strict sense: when two or more sources publish the same indicator with materially different values. TGIS handles both.
Four properties run through every downstream answer.
Cross-source synthesis. A question that needs World Bank indicators, OPSIS infrastructure data, IATI spending records, and SLOCAT policy commitments gets all of them retrieved and composed into one answer. Where the answer also needs policy-document context (NDC commitments, project evaluations, research findings), that content is retrieved alongside the numbers. The bulk of the work is alignment: format conversion, unit normalisation, geographic and temporal harmonisation so the pieces can be combined honestly.
Reconciliation where sources do disagree. Some answers depend on indicators that two or more sources publish with materially different values (WHO-modelled versus police-reported road fatalities is the canonical case). When this happens, TGIS shows both with per-source numbers and a one-line explanation of why they differ (different methodology, different collection year, different definition). It does not silently pick a winner. This pattern matters where it applies, but it is one challenge among several in cross-source work, not the headline.
Provenance on every claim. Every number in every answer is attributable to a specific source. No unattributed figures.
The reasoning trace is visible. Alongside the narrative answer, a structured log records which source contributed which claim and how the pieces were combined. Readers can audit the answer, not just accept it.
Six categories of outcomes question
Decision-makers ask questions that fall into six categories, tracking the dimensions transport systems are judged on (in line with sectoral frameworks such as SuM4All, SLOCAT, and EMSA/EMTER for maritime). These are the dimensions against which TGIS's performance on the outcomes side is measured.
- Affordability. How much does mobility cost? e.g. What does a daily commute cost in Nairobi?
- Speed & efficiency. How fast, how congested, how reliable are travel times? e.g. How congested are the main corridors into Lagos?
- Reliability & resilience. Does the system deliver under shocks? e.g. Which corridors have the highest climate risk relative to trade importance?
- Safety. Fatalities and injuries by mode, per km, per capita. e.g. Has road safety in Kenya improved over the last decade?
- Sustainability. Emissions, energy intensity, modal share of low-carbon modes. e.g. Is Kenya on track for its transport NDC commitments?
- Inclusivity & access. Who reaches what (rural, gender, poverty quintile, disability). e.g. How many rural residents live within 2 km of an all-season road?
Fig 3. The six categories of outcomes question. Each is a dimension decision-makers can improve and TGIS can measure.
The question "Has road safety in Kenya improved over the last decade?" illustrates how the two sides interact. An honest answer is impossible without first interrogating the data itself (its accuracy, representativeness, freshness), because reported fatality figures and WHO-modelled figures diverge substantially for many sub-Saharan African countries (see the WHO Global Status Report on Road Safety 2023), and the divergence is part of the real answer. TGIS is designed to recognise this pattern: when the question is about outcomes but the answerable form depends on ecosystem reality, the orchestrator runs the ecosystem pre-checks before retrieving the outcomes data and folds the findings into a qualified answer. The mechanics are described in the detailed concept.
The unity: one architecture, two audiences
The upstream and downstream sides are not parallel programmes. They are two faces of the same compound system. The structured-data expert that accelerates TDC's upstream data preparation is the same expert that answers an analyst's statistical question downstream. Partners who contribute data use the same upstream tools to prepare it. One architecture, two audiences.
This matters for effort and momentum. The organisations contributing data benefit from operational tools, not just paper deliverables. The ecosystem improves through use. And the system's investment in upstream reliability is the same investment as its reliability downstream: one track, not two to be maintained in parallel.
The RIDE programme and the Frontier Tech Hub bring the how to this coalition. The domain expertise (the why, the where, and the for whom) lives with TDC, Oxford, UNECE, WRI, FCDO, GIZ, the MDBs, and partner governments. What TGIS adds is the AI architecture that makes heterogeneous transport data reconcilable, the integration patterns that take twenty-five disparate APIs to one conversational interface, and the honest brokerage of what is and isn't technically feasible today. The sector brings the knowledge, the data, and the users. TGIS brings the way of working.
What becomes real when
TGIS is built in phases, each adding a new capability to the same compound architecture. Analysts, policymakers, investors, and donors draw on all three views; the distinction is the type of evidence being queried.
- Statistical view. Cross-source statistical synthesis across SDMX-native sources. Compare, trend, and rank indicators across countries with per-source provenance, and where two sources publish materially different values for the same indicator, both are surfaced rather than silently collapsed. This is the current live capability.
- Spatial view. Adds geospatial reasoning: corridor and regional questions, climate-risk overlays, infrastructure exposure.
- Documentary view. Adds policy-document analysis. NDC commitments, sector strategies, research findings, and project evaluations become searchable alongside structured data.
Several properties run across all three views rather than belonging to one phase. TGIS is designed to work at both global and local scales. A transport minister in Ghana cares about Ghana, not global averages; a UNDESA analyst tracking the Decade needs the global view. The system lets users zoom to their context while drawing on the full breadth of available data. It is multimodal: most users approach through one mode (roads, maritime, rail, air), but the real value lies in the connections between them, such as port capacity affecting road freight costs, or climate vulnerability across networked infrastructure. The interface allows mode-specific entry points while enabling cross-modal synthesis. And it is multilingual: transport policymakers in partner countries work in French, Spanish, Arabic, Portuguese, and other languages, English among them. TGIS supports simultaneous translation (questions in any language, answers in that language) to avoid the anglophone-only trap that excludes most of the intended users. Current LLMs handle this well for the major languages.
Each phase adds one expert to the compound system. No phase restructures what was built before.
Relationship with TDC, SDMX, and the wider ecosystem
TGIS builds on TDC, not around it. TDC is the initiative building trusted metadata and SDMX-based standards for global transport data. TGIS uses agentic AI to accelerate that work along four axes.
Accelerate SDMX adoption. AI agents compress the preparatory work for new domain packages (analysing existing national datasets, comparing structures, drafting data-structure definitions), so expert committees review rather than build from scratch.
Extend into non-SDMX territory. SDMX is the right standard for statistical data. Geospatial infrastructure, financial flows, and unstructured policy knowledge live outside its scope. TGIS provides agent-accessible interfaces to those sources and combines them with SDMX data in a single analytical flow.
Make unstructured knowledge searchable. Policy reports, NDC commitments, research findings, and project evaluations become retrievable alongside structured data, with citation back to the original document on every claim.
Enable cross-source queries for any AI tool. TDC's own assistant and any other transport AI tool can draw on the same underlying reasoning layer for multi-source queries, rather than each building its own.
This aligns with the SDMX AI-readiness direction set by the eight sponsor organisations in September 2025 and with the pattern already demonstrated by the IMF's StatGPT 2.0.
The longer-term aim is for TGIS to serve as the cross-source data backbone of the UN Decade of Sustainable Transport: the connective layer that makes the Decade's accountability goals trackable across its six priority areas, from the data that already exists to the data that has yet to be gathered.
What TGIS is not
TGIS is not a new data standard. It uses SDMX and TDC's existing ones. It is not a centralised data platform; it queries sources where they live. It is not a real-time operational tool; it is designed for analysis and planning, not traffic management or control. It is not a replacement for any existing platform or tool; it is a layer that makes their data investments composable. It is not a chatbot on top of data; the conversational interface is one possible consumer, and the service layer underneath can serve any AI workflow. It is not a substitute for domain expertise; the sector brings the knowledge, the data, and the users. And it is not a guarantee that better evidence will change behaviour: giving people a tool does not mean they will use it, and a system designed around how people actually make decisions will always do more than one that hopes to change their practice. TGIS is designed to meet decisions where they are made, not to prescribe how they should be.