MAMM-Refine: Multi-Agent Multi-Modal Refinement

A framework for improving factual consistency in text generation through multi-agent collaboration and debate.

Authors: David Wan, Justin Chen, Elias Stengel-Eskin, Mohit Bansal

🚀 Quick Start

Installation

git clone https://github.com/meetdavidwan/mammrefine.git
cd mammrefine
pip install -r requirements.txt

Setup API Keys

export OPENAI_API_KEY="your-openai-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"

Run Complete Pipeline

bash run_script.sh

Try a Simple Example

python example.py

📖 Overview

MAMM-Refine improves text generation faithfulness through a three-stage pipeline:

Detection: Identifies factually inconsistent sentences
Critique: Generates detailed feedback explaining inconsistencies
Refinement: Produces improved summaries based on critiques

Each stage supports both single-agent and multi-agent debate modes.

🔧 Usage

Single Agent Mode

# Detection
python src/detect.py gpt-4o data/mediasum.json output/detection.json

# Critique
python src/critique.py gpt-4o detection.json output/critiques.json

# Refinement
python src/refine.py claude-3-sonnet critiques.json output/refined.json

Multi-Agent Debate Mode

# Run the complete debate pipeline
bash run_script.sh

📊 Input Format

Your input JSON should contain documents with this structure:

[
  {
    "document": "Source document text...",
    "summary": "Summary to be refined...",
    "summary_sentences": ["Sentence 1", "Sentence 2"],
    "topic": "Document topic"
  }
]

🤖 Supported Models

OpenAI: gpt-4o
Anthropic: claude-3-sonnet

📁 Project Structure

mammrefine/
├── src/                    # Core implementation
│   ├── detect.py          # Factual consistency detection
│   ├── critique.py        # Critique generation
│   ├── refine.py          # Summary refinement
│   ├── model.py           # Model interface
│   └── prompts.py         # System prompts
├── data/                  # Input datasets
├── output/                # Generated results
├── run_script.sh          # Complete pipeline script
└── requirements.txt       # Dependencies

🔬 Multi-Agent Debate Process

The debate system uses iterative rounds:

Initial: Multiple agents process input independently
Selection: Agents choose between different outputs
Debate: Agents review reasoning and make informed choices
Final: Consensus or best-selected output

📝 Citation

@inproceedings{wan-etal-2025-mamm,
    title = "{MAMM}-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration",
    author = "Wan, David and Chen, Justin and Stengel-Eskin, Elias and Bansal, Mohit",
    booktitle = "Proceedings of NAACL 2025",
    year = "2025"
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_script.sh		run_script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MAMM-Refine: Multi-Agent Multi-Modal Refinement

🚀 Quick Start

Installation

Setup API Keys

Run Complete Pipeline

Try a Simple Example

📖 Overview

🔧 Usage

Single Agent Mode

Multi-Agent Debate Mode

📊 Input Format

🤖 Supported Models

📁 Project Structure

🔬 Multi-Agent Debate Process

📝 Citation

About

Uh oh!

Releases

Packages

Languages

License

meetdavidwan/mammrefine

Folders and files

Latest commit

History

Repository files navigation

MAMM-Refine: Multi-Agent Multi-Modal Refinement

🚀 Quick Start

Installation

Setup API Keys

Run Complete Pipeline

Try a Simple Example

📖 Overview

🔧 Usage

Single Agent Mode

Multi-Agent Debate Mode

📊 Input Format

🤖 Supported Models

📁 Project Structure

🔬 Multi-Agent Debate Process

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages