CrisisQuant

CrisisQuant - Humanitarian Aid Intelligence Platform

Inspiration

Every year, billions of dollars in humanitarian aid are mobilized across dozens of crisis-affected countries, yet chronic underfunding, misallocation, and coordination gaps mean millions of people in need go unserved. The problem isn't always a lack of funding; it's a lack of visibility. We asked: what if we could use machine learning to surface hidden patterns in humanitarian funding, identifying where money is going, where it isn't, and where every dollar would have the greatest impact?

What It Does

The Humanitarian Aid Intelligence Platform is an end-to-end data pipeline and interactive dashboard that:

Detects anomalous funding allocations using Isolation Forest for flagging country-cluster-year combinations that deviate significantly from expected patterns Visualizes global funding gaps through an interactive choropleth map across 80+ countries and 15+ humanitarian sectors Benchmarks comparable allocations using K-Means clustering to group similar funding contexts for peer comparison Optimizes aid portfolio allocation using linear programming to maximize beneficiaries reached within a given budget Tracks donor contribution flows showing pledge-vs-paid gaps across country-based pooled funds (CBPF)

How We Built It

Datasets

Dataset	Source	Rows	Purpose
HPC HNO 2025	OCHA	306K	People in need + targeted beneficiaries
FTS Requirements	UN OCHA FTS	10.5K	Funding requirements vs actual funding
CBPF Projects	UNOCHA	109K	Project budgets, geo coords, org type
CBPF Contributions	UNOCHA	4.2K	Donor pledge vs paid flows
COD Population	OCHA	6.7k	Per-capita normalization
EM-DAT Disasters	CRED	27.5K	Disaster severity context (5yr window)

Feature Engineering (PySpark): 12 engineered features, including funding coverage rate, beneficiary-to-funding ratio, cost per beneficiary, disaster severity score, and sector-level efficiency benchmarks. Computed by country × cluster × year granularity.

ML Models (scikit-learn + MLflow): Isolation Forest is an unsupervised anomaly detection on 9 funding features, contamination=5%, results logged to MLflow. K-Means is an auto-selected K (silhouette score optimization) for benchmarking comparable allocations Linear Programming (scipy) — portfolio optimizer maximizing expected beneficiaries subject to a budget constraint

Visualization (Streamlit): Five-tab interactive dashboard deployed as a Databricks App with live Delta table connectivity via the Databricks SQL Connector and OAuth M2M authentication.

Challenges We Ran Into

Multilingual HNO data: With sector names across 24 countries appeared in English, French, and Spanish, requiring a 60+ entry normalization mapping before joins were viable Key incompatibility: FTS uses numeric cluster codes while HNO uses alphabetic codes; direct joins were impossible, requiring a sector-name-based join strategy Databricks Serverless constraints: Functions like cache and unpersist, MLflow Model Registry, and DBFS root access are all unavailable on Serverless, requiring workarounds for each OAuth conflicts: Databricks Apps auto-injects OAuth M2M credentials which conflict with PAT tokens; resolving this required switching entirely to the SQL Connector with OAuth

Accomplishments We're Proud Of

Clean end-to-end pipeline from raw CSV/Excel - Delta Lake - ML - live dashboard in a single reproducible workflow
Anomaly detection surfacing genuinely suspicious allocations — sectors receiving funding with zero recorded requirements, extreme cost-per-beneficiary outliers, and over-funded appeals
Portfolio optimizer that gives humanitarian decision-makers a concrete, quantitative answer to "where should the next dollar go?"
Handling real-world messy humanitarian data at scale, multilingual, multi-source, inconsistent schemas, without synthetic shortcuts

What We Learned

Humanitarian datasets are far messier than expected, HXL tagging helps but multilingual content, inconsistent codes, and aggregate summary rows require significant cleaning Serverless compute changes the rules, memory management, MLflow integration, and filesystem access all behave differently and require deliberate adaptation Anomaly detection on funding data is genuinely useful, the flagged rows consistently correspond to real data quality issues or unusual funding patterns worth investigating

What's Next

Incorporate conflict data (ACLED) and displacement data (UNHCR) as additional anomaly features Add time-series forecasting to predict funding gaps 6–12 months ahead Expand HNO coverage beyond 2025 by ingesting historical HNO files (2020–2024) Build an alert system that flags new anomalies automatically when FTS data is updated

Built With

Category	Technologies
Cloud Platform	Databricks, Delta Lake, Unity Catalog
Data Processing	PySpark, Pandas, PyArrow
ML & Optimization	scikit-learn, scipy, MLflow
Visualization	Streamlit, Plotly
Data Sources	OCHA FTS, HPC HNO, EM-DAT, CBPF, COD
Auth & Deployment	Databricks Apps, OAuth M2M, SQL Connector
Language	Python 3.12

Built With

forecasting
python
sql

Updates

Pradyumna Kumar started this project — Feb 22, 2026 08:59 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.