GraphAllocBench is a benchmark and toolkit for Preference-Conditioned Policy Learning (PCPL) for multiple objectives. It provides a flexible resource-allocation environment, a set of configurable problem definitions, and a suite of evaluation utilities so researchers and practitioners can design scenarios that stress different trade-offs and Pareto fronts.
At its core, GraphAllocBench makes it easy to:
- Create customizable problems by varying numbers of demands, resources, objectives, and objective shapes (e.g. sinusoidal, concave, convex, bell-shaped, S-shaped).
- Produce diverse Pareto fronts and objective landscapes to evaluate PCPL and scalarization strategies.
- Run batch preference sweeps and standardized evaluations (Pareto extraction, hypervolume, proportion of non-dominated solutions, ordering score, inference helpers).
We include an example PCPL setup using Stable Baselines3 PPO paired with a Smooth Tchebycheff scalarization to demonstrate how to train and evaluate agents with preference-conditioned rewards.
This package requires Python 3.10 or higher.
pip install -e .from graphallocbench import CityPlannerEnv
from graphallocbench.evaluation import run_experiments
env = CityPlannerEnv("graphallocbench/config/problems/problem_0.yml") # Enter the path to your problem configuration YAML file
obs, info = env.reset()
print(env.action_space, env.observation_space)More examples can be found in graphallocbench/examples/*.
The package exposes four main modules:
graphallocbench.city_env– Environment implementation (CityPlannerEnv) and neural architectures / feature extractors.graphallocbench.evaluation– Utilities for evaluating trained PCPL agents (Pareto front extraction, hypervolume, ordering score, inference helpers, etc.).graphallocbench.train_utils– Training helpers (single/parallel PPO training utilities).graphallocbench.constants– Centralized global constants (e.g., RL model class, allowed devices, inference batch size).
For full implementation details of the environment (observations, action space, requirements matrix, allocation matrix, productions, objective functions, and reward modes) see the companion document: GraphAllocBench.md.
Problem configuration YAMLs describe resource capacities, demands, objectives, and scalarization settings. You can create your own or adapt the examples shipped under graphallocbench/configs/problems/*.
from stable_baselines3 import PPO
model = PPO("MultiInputPolicy", env)
model.learn(total_timesteps=10_000)from graphallocbench.evaluation import run_experiments
final_objectives, allocations = run_experiments(env, model=model, n_iter=32)
print(final_objectives.shape)If you use GraphAllocBench, please cite the accompanying research paper. (BibTeX will be added when available.)
MIT License (see LICENSE).
The original research code is preserved under legacy/*.
