RLLaVA: An RL-central Framework for Language and Vision Assistants 🚀

Arxiv(RLLaVA) ｜ 🤗 Models(RLLaVA) ｜ Blog(RLLaVA)

If you like our project, please send us a star ⭐ on GitHub.

✨ What's RLLaVA?

RLLaVA is a user-friendly framework for multi-modal RL. It features an RL-central design that decouples algorithm logic from distributed execution, enables modular customization of algorithms, models, and engines, and is optimized for resource-constrained setups to make advanced RL research more accessible.

📰 News

[2026-02-02] 🚀 We have released implementations for a series of representative SFT-RL Fusion Algorithms (including SRFT, LUFFY, UFT and HPT). These methods are unified under our plugin system, demonstrating the flexibility of RLLaVA's architecture. Check out the examples!

✨ Why RLLaVA?

🎯 RL-Centric: Implements an algorithm-driven approach tailored for RL, decoupling logic from distributed execution so researchers can focus on innovation without distributed system complexities.
📦 Modular Design: Develop, extend, and customize RL algorithms and multi-modal architectures as easily as snapping together building blocks.
⚡ Resource-Efficient: Optimized for resource-constrained teams—most tasks run on a single 24GB GPU, making multi-modal RL truly accessible.
🛠️ User-Friendly: Minimalist code with familiar HuggingFace & PyTorch APIs for seamless setup and extensions.

🚀 Quick Start

1. Installation

git clone https://github.com/TinyLoopX/RLLaVA && cd RLLaVA

conda create -n rllava python==3.12 && conda activate rllava

bash ./install.sh

2. Run Examples

We provide ready-to-run scripts for various algorithms and tasks in the examples/ directory.

# Example: Train with GRPO
bash examples/algorithms/qwen2_5_vl_3b_geoqa3k_grpo.sh

You can explore more examples in the directory structure:

examples/
├── algorithms/      # Algorithm comparisons and ablations (GRPO, RLOO, DAPO, etc.)
└── tasks/           # End-to-end task scripts:
    ├── math/        # Geometry, reasoning, and equation solving
    ├── counting/    # Object counting and compositional queries
    ├── grounding/   # Visual grounding and detection-style tasks
    ├── agent_search/# Web search–augmented agents
    ├── agent_code/  # Code-generation agents with tool use
    └── ...          # More real-world multi-modal benchmarks

3. Customize Your Experiment

RLLaVA makes it easy to define custom tasks. You only need 3 files:

Reward function → examples/reward_function/your_task.py
Prompt template → examples/format_prompt/your_task.jinja
Launch script / command → Point to dataset + reward + prompt (no need to modify YAML directly):

torchrun -m rllava.train.pipeline.rlvr \
  config=examples/config.yaml \
  data.train_files=your_org/dataset@train \
  data.format_prompt=./examples/format_prompt/your_task.jinja \
  reward.reward_function=./examples/reward_function/your_task.py:compute_score \
  algorithm.adv_estimator=grpo  # Switch algorithms here (rloo, remax, ppo, etc.)

For detailed usage instructions, please refer to examples/README.md

📦 Supported Scope

Algorithms

We support a broad family of RL methods, enabled by simple config switches:

GRPO, RLOO, REINFORCE++, OPO, REMAX, GPG, PPO, DAPO, GMPO, GSPO, DR-GRPO, CLIP-COV, KL-COV

Models:

Qwen2-VL/Qwen2.5-VL/Qwen3-VL vision language models
TinyLLaVA-style architectures with customizable vision encoders, connectors, and LLMs
Support for LLMs (e.g., Qwen3, LLaMA) in text-only RL scenarios

Backends:

Training: FSDP, FSDP2, DeepSpeed
Inference: SGLang, vLLM, HuggingFace

🤝 Contributing & Community

We welcome contributions! We're especially interested in new RL algorithms, multi-modal tasks, and resource-constrained improvements. Have questions? Join our WeChat group:

🙏 Acknowledgements

Our RL algorithms and distributed training implementation draw inspiration from the open-source community, particularly veRL, EasyR1, and AReaL.

Citation

@misc{zhao2025rllavarlcentralframeworklanguage,
      title={RLLaVA: An RL-central Framework for Language and Vision Assistants}, 
      author={Lei Zhao and Zihao Ma and Boyu Lin and Yuhe Liu and Wenjun Wu and Lei Huang},
      year={2025},
      eprint={2512.21450},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2512.21450}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
assets		assets
examples		examples
rllava		rllava
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RLLaVA: An RL-central Framework for Language and Vision Assistants 🚀

If you like our project, please send us a star ⭐ on GitHub.

✨ What's RLLaVA?

📰 News

✨ Why RLLaVA?

🚀 Quick Start

1. Installation

2. Run Examples

3. Customize Your Experiment

📦 Supported Scope

Algorithms

🤝 Contributing & Community

🙏 Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

TinyLoopX/RLLaVA

Folders and files

Latest commit

History

Repository files navigation

RLLaVA: An RL-central Framework for Language and Vision Assistants 🚀

If you like our project, please send us a star ⭐ on GitHub.

✨ What's RLLaVA?

📰 News

✨ Why RLLaVA?

🚀 Quick Start

1. Installation

2. Run Examples

3. Customize Your Experiment

📦 Supported Scope

Algorithms

🤝 Contributing & Community

🙏 Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages