CrisisQuant - Humanitarian Aid Intelligence Platform
Inspiration
Every year, billions of dollars in humanitarian aid are mobilized across dozens of crisis-affected countries, yet chronic underfunding, misallocation, and coordination gaps mean millions of people in need go unserved. The problem isn't always a lack of funding; it's a lack of visibility. We asked: what if we could use machine learning to surface hidden patterns in humanitarian funding, identifying where money is going, where it isn't, and where every dollar would have the greatest impact?
What It Does
The Humanitarian Aid Intelligence Platform is an end-to-end data pipeline and interactive dashboard that:
Detects anomalous funding allocations using Isolation Forest for flagging country-cluster-year combinations that deviate significantly from expected patterns Visualizes global funding gaps through an interactive choropleth map across 80+ countries and 15+ humanitarian sectors Benchmarks comparable allocations using K-Means clustering to group similar funding contexts for peer comparison Optimizes aid portfolio allocation using linear programming to maximize beneficiaries reached within a given budget Tracks donor contribution flows showing pledge-vs-paid gaps across country-based pooled funds (CBPF)
How We Built It
Datasets
| Dataset | Source | Rows | Purpose |
|---|---|---|---|
| HPC HNO 2025 | OCHA | 306K | People in need + targeted beneficiaries |
| FTS Requirements | UN OCHA FTS | 10.5K | Funding requirements vs actual funding |
| CBPF Projects | UNOCHA | 109K | Project budgets, geo coords, org type |
| CBPF Contributions | UNOCHA | 4.2K | Donor pledge vs paid flows |
| COD Population | OCHA | 6.7k | Per-capita normalization |
| EM-DAT Disasters | CRED | 27.5K | Disaster severity context (5yr window) |
Feature Engineering (PySpark): 12 engineered features, including funding coverage rate, beneficiary-to-funding ratio, cost per beneficiary, disaster severity score, and sector-level efficiency benchmarks. Computed by country × cluster × year granularity.
ML Models (scikit-learn + MLflow): Isolation Forest is an unsupervised anomaly detection on 9 funding features, contamination=5%, results logged to MLflow. K-Means is an auto-selected K (silhouette score optimization) for benchmarking comparable allocations Linear Programming (scipy) — portfolio optimizer maximizing expected beneficiaries subject to a budget constraint
Visualization (Streamlit): Five-tab interactive dashboard deployed as a Databricks App with live Delta table connectivity via the Databricks SQL Connector and OAuth M2M authentication.
Challenges We Ran Into
Multilingual HNO data: With sector names across 24 countries appeared in English, French, and Spanish, requiring a 60+ entry normalization mapping before joins were viable Key incompatibility: FTS uses numeric cluster codes while HNO uses alphabetic codes; direct joins were impossible, requiring a sector-name-based join strategy Databricks Serverless constraints: Functions like cache and unpersist, MLflow Model Registry, and DBFS root access are all unavailable on Serverless, requiring workarounds for each OAuth conflicts: Databricks Apps auto-injects OAuth M2M credentials which conflict with PAT tokens; resolving this required switching entirely to the SQL Connector with OAuth
Accomplishments We're Proud Of
- Clean end-to-end pipeline from raw CSV/Excel - Delta Lake - ML - live dashboard in a single reproducible workflow
- Anomaly detection surfacing genuinely suspicious allocations — sectors receiving funding with zero recorded requirements, extreme cost-per-beneficiary outliers, and over-funded appeals
- Portfolio optimizer that gives humanitarian decision-makers a concrete, quantitative answer to "where should the next dollar go?"
- Handling real-world messy humanitarian data at scale, multilingual, multi-source, inconsistent schemas, without synthetic shortcuts
What We Learned
Humanitarian datasets are far messier than expected, HXL tagging helps but multilingual content, inconsistent codes, and aggregate summary rows require significant cleaning Serverless compute changes the rules, memory management, MLflow integration, and filesystem access all behave differently and require deliberate adaptation Anomaly detection on funding data is genuinely useful, the flagged rows consistently correspond to real data quality issues or unusual funding patterns worth investigating
What's Next
Incorporate conflict data (ACLED) and displacement data (UNHCR) as additional anomaly features Add time-series forecasting to predict funding gaps 6–12 months ahead Expand HNO coverage beyond 2025 by ingesting historical HNO files (2020–2024) Build an alert system that flags new anomalies automatically when FTS data is updated
Built With
| Category | Technologies |
|---|---|
| Cloud Platform | Databricks, Delta Lake, Unity Catalog |
| Data Processing | PySpark, Pandas, PyArrow |
| ML & Optimization | scikit-learn, scipy, MLflow |
| Visualization | Streamlit, Plotly |
| Data Sources | OCHA FTS, HPC HNO, EM-DAT, CBPF, COD |
| Auth & Deployment | Databricks Apps, OAuth M2M, SQL Connector |
| Language | Python 3.12 |
Log in or sign up for Devpost to join the conversation.