CrisisQuant - Humanitarian Aid Intelligence Platform

Inspiration

Every year, billions of dollars in humanitarian aid are mobilized across dozens of crisis-affected countries, yet chronic underfunding, misallocation, and coordination gaps mean millions of people in need go unserved. The problem isn't always a lack of funding; it's a lack of visibility. We asked: what if we could use machine learning to surface hidden patterns in humanitarian funding, identifying where money is going, where it isn't, and where every dollar would have the greatest impact?

What It Does

The Humanitarian Aid Intelligence Platform is an end-to-end data pipeline and interactive dashboard that:

Detects anomalous funding allocations using Isolation Forest for flagging country-cluster-year combinations that deviate significantly from expected patterns Visualizes global funding gaps through an interactive choropleth map across 80+ countries and 15+ humanitarian sectors Benchmarks comparable allocations using K-Means clustering to group similar funding contexts for peer comparison Optimizes aid portfolio allocation using linear programming to maximize beneficiaries reached within a given budget Tracks donor contribution flows showing pledge-vs-paid gaps across country-based pooled funds (CBPF)

How We Built It

Datasets

Dataset Source Rows Purpose
HPC HNO 2025 OCHA 306K People in need + targeted beneficiaries
FTS Requirements UN OCHA FTS 10.5K Funding requirements vs actual funding
CBPF Projects UNOCHA 109K Project budgets, geo coords, org type
CBPF Contributions UNOCHA 4.2K Donor pledge vs paid flows
COD Population OCHA 6.7k Per-capita normalization
EM-DAT Disasters CRED 27.5K Disaster severity context (5yr window)

Feature Engineering (PySpark): 12 engineered features, including funding coverage rate, beneficiary-to-funding ratio, cost per beneficiary, disaster severity score, and sector-level efficiency benchmarks. Computed by country × cluster × year granularity.

ML Models (scikit-learn + MLflow): Isolation Forest is an unsupervised anomaly detection on 9 funding features, contamination=5%, results logged to MLflow. K-Means is an auto-selected K (silhouette score optimization) for benchmarking comparable allocations Linear Programming (scipy) — portfolio optimizer maximizing expected beneficiaries subject to a budget constraint

Visualization (Streamlit): Five-tab interactive dashboard deployed as a Databricks App with live Delta table connectivity via the Databricks SQL Connector and OAuth M2M authentication.

Challenges We Ran Into

Multilingual HNO data: With sector names across 24 countries appeared in English, French, and Spanish, requiring a 60+ entry normalization mapping before joins were viable Key incompatibility: FTS uses numeric cluster codes while HNO uses alphabetic codes; direct joins were impossible, requiring a sector-name-based join strategy Databricks Serverless constraints: Functions like cache and unpersist, MLflow Model Registry, and DBFS root access are all unavailable on Serverless, requiring workarounds for each OAuth conflicts: Databricks Apps auto-injects OAuth M2M credentials which conflict with PAT tokens; resolving this required switching entirely to the SQL Connector with OAuth

Accomplishments We're Proud Of

  • Clean end-to-end pipeline from raw CSV/Excel - Delta Lake - ML - live dashboard in a single reproducible workflow
  • Anomaly detection surfacing genuinely suspicious allocations — sectors receiving funding with zero recorded requirements, extreme cost-per-beneficiary outliers, and over-funded appeals
  • Portfolio optimizer that gives humanitarian decision-makers a concrete, quantitative answer to "where should the next dollar go?"
  • Handling real-world messy humanitarian data at scale, multilingual, multi-source, inconsistent schemas, without synthetic shortcuts

What We Learned

Humanitarian datasets are far messier than expected, HXL tagging helps but multilingual content, inconsistent codes, and aggregate summary rows require significant cleaning Serverless compute changes the rules, memory management, MLflow integration, and filesystem access all behave differently and require deliberate adaptation Anomaly detection on funding data is genuinely useful, the flagged rows consistently correspond to real data quality issues or unusual funding patterns worth investigating

What's Next

Incorporate conflict data (ACLED) and displacement data (UNHCR) as additional anomaly features Add time-series forecasting to predict funding gaps 6–12 months ahead Expand HNO coverage beyond 2025 by ingesting historical HNO files (2020–2024) Build an alert system that flags new anomalies automatically when FTS data is updated

Built With

Category Technologies
Cloud Platform Databricks, Delta Lake, Unity Catalog
Data Processing PySpark, Pandas, PyArrow
ML & Optimization scikit-learn, scipy, MLflow
Visualization Streamlit, Plotly
Data Sources OCHA FTS, HPC HNO, EM-DAT, CBPF, COD
Auth & Deployment Databricks Apps, OAuth M2M, SQL Connector
Language Python 3.12

Built With

Share this project:

Updates