Skip to content

RLLaVA is a user-friendly framework for multi-modal RL research and optimized for resource-constrained teams.

License

Notifications You must be signed in to change notification settings

TinyLoopX/RLLaVA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RLLaVA Icon

RLLaVA: An RL-central Framework for Language and Vision Assistants πŸš€

Arxiv(RLLaVA) | πŸ€— Models(RLLaVA) | Blog(RLLaVA)

If you like our project, please send us a star ⭐ on GitHub.

✨ What's RLLaVA?

RLLaVA is a user-friendly framework for multi-modal RL. It features an RL-central design that decouples algorithm logic from distributed execution, enables modular customization of algorithms, models, and engines, and is optimized for resource-constrained setups to make advanced RL research more accessible.

RLLaVA Architecture

πŸ“° News

  • [2026-02-02] πŸš€ We have released implementations for a series of representative SFT-RL Fusion Algorithms (including SRFT, LUFFY, UFT and HPT). These methods are unified under our plugin system, demonstrating the flexibility of RLLaVA's architecture. Check out the examples!

✨ Why RLLaVA?

  • 🎯 RL-Centric: Implements an algorithm-driven approach tailored for RL, decoupling logic from distributed execution so researchers can focus on innovation without distributed system complexities.
  • πŸ“¦ Modular Design: Develop, extend, and customize RL algorithms and multi-modal architectures as easily as snapping together building blocks.
  • ⚑ Resource-Efficient: Optimized for resource-constrained teamsβ€”most tasks run on a single 24GB GPU, making multi-modal RL truly accessible.
  • πŸ› οΈ User-Friendly: Minimalist code with familiar HuggingFace & PyTorch APIs for seamless setup and extensions.

πŸš€ Quick Start

1. Installation

git clone https://github.com/TinyLoopX/RLLaVA && cd RLLaVA

conda create -n rllava python==3.12 && conda activate rllava

bash ./install.sh

2. Run Examples

We provide ready-to-run scripts for various algorithms and tasks in the examples/ directory.

# Example: Train with GRPO
bash examples/algorithms/qwen2_5_vl_3b_geoqa3k_grpo.sh

You can explore more examples in the directory structure:

examples/
β”œβ”€β”€ algorithms/      # Algorithm comparisons and ablations (GRPO, RLOO, DAPO, etc.)
└── tasks/           # End-to-end task scripts:
    β”œβ”€β”€ math/        # Geometry, reasoning, and equation solving
    β”œβ”€β”€ counting/    # Object counting and compositional queries
    β”œβ”€β”€ grounding/   # Visual grounding and detection-style tasks
    β”œβ”€β”€ agent_search/# Web search–augmented agents
    β”œβ”€β”€ agent_code/  # Code-generation agents with tool use
    └── ...          # More real-world multi-modal benchmarks

3. Customize Your Experiment

RLLaVA makes it easy to define custom tasks. You only need 3 files:

  1. Reward function β†’ examples/reward_function/your_task.py
  2. Prompt template β†’ examples/format_prompt/your_task.jinja
  3. Launch script / command β†’ Point to dataset + reward + prompt (no need to modify YAML directly):
torchrun -m rllava.train.pipeline.rlvr \
  config=examples/config.yaml \
  data.train_files=your_org/dataset@train \
  data.format_prompt=./examples/format_prompt/your_task.jinja \
  reward.reward_function=./examples/reward_function/your_task.py:compute_score \
  algorithm.adv_estimator=grpo  # Switch algorithms here (rloo, remax, ppo, etc.)

For detailed usage instructions, please refer to examples/README.md

πŸ“¦ Supported Scope

Algorithms

We support a broad family of RL methods, enabled by simple config switches:

  • GRPO, RLOO, REINFORCE++, OPO, REMAX, GPG, PPO, DAPO, GMPO, GSPO, DR-GRPO, CLIP-COV, KL-COV

Models:

  • Qwen2-VL/Qwen2.5-VL/Qwen3-VL vision language models
  • TinyLLaVA-style architectures with customizable vision encoders, connectors, and LLMs
  • Support for LLMs (e.g., Qwen3, LLaMA) in text-only RL scenarios

Backends:

  • Training: FSDP, FSDP2, DeepSpeed
  • Inference: SGLang, vLLM, HuggingFace

🀝 Contributing & Community

We welcome contributions! We're especially interested in new RL algorithms, multi-modal tasks, and resource-constrained improvements. Have questions? Join our WeChat group:

RLLaVA WeChat Group

πŸ™ Acknowledgements

Our RL algorithms and distributed training implementation draw inspiration from the open-source community, particularly veRL, EasyR1, and AReaL.

Citation

@misc{zhao2025rllavarlcentralframeworklanguage,
      title={RLLaVA: An RL-central Framework for Language and Vision Assistants}, 
      author={Lei Zhao and Zihao Ma and Boyu Lin and Yuhe Liu and Wenjun Wu and Lei Huang},
      year={2025},
      eprint={2512.21450},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2512.21450}, 
}

About

RLLaVA is a user-friendly framework for multi-modal RL research and optimized for resource-constrained teams.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •