Quantifying the Generalization Gap: A New Benchmark for Out-of-Distribution Graph-Based Android Malware Classification
This repository contains the accompanied code to reproduce the results in our paper. Please refer to the individual README.md in each subfolder for further details.
To reconstruct the dataset, please download all the necessary APKs from the original repo, with permission from the original owners.
splitssubfolder contains the split of our two new datasets, in the same format as MalNet.- Run
llm_inference_server.pyto run a server with a HuggingFace instance of the code embedding extractor. - Run
create_graph.pyto generate the attributed FCGs. This script will invoke REST requests to the LLM server to generate function embeddings.
Move processed data (whether independently constructed or downloaded from our precomputed upload) into the appropriate directory structure, then run model training/evaluation.
- Follow the instructions in
Exphormers.mdto set up environment. - Put the data in the
datasetssubfolder according toREADME.md. - Create the desired training configuration and run:
python main.py --cfg <yaml_file>@misc{tran2026quantifyinggeneralizationgapnew,
title={Quantifying the Generalization Gap: A New Benchmark for Out-of-Distribution Graph-Based Android Malware Classification},
author={Ngoc N. Tran and Anwar Said and Waseem Abbas and Tyler Derr and Xenofon D. Koutsoukos},
year={2026},
eprint={2508.06734},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2508.06734},
}