Skip to content

MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration

License

Notifications You must be signed in to change notification settings

meetdavidwan/mammrefine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MAMM-Refine: Multi-Agent Multi-Modal Refinement

A framework for improving factual consistency in text generation through multi-agent collaboration and debate.

Authors: David Wan, Justin Chen, Elias Stengel-Eskin, Mohit Bansal

πŸš€ Quick Start

Installation

git clone https://github.com/meetdavidwan/mammrefine.git
cd mammrefine
pip install -r requirements.txt

Setup API Keys

export OPENAI_API_KEY="your-openai-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"

Run Complete Pipeline

bash run_script.sh

Try a Simple Example

python example.py

πŸ“– Overview

MAMM-Refine improves text generation faithfulness through a three-stage pipeline:

  1. Detection: Identifies factually inconsistent sentences
  2. Critique: Generates detailed feedback explaining inconsistencies
  3. Refinement: Produces improved summaries based on critiques

Each stage supports both single-agent and multi-agent debate modes.

πŸ”§ Usage

Single Agent Mode

# Detection
python src/detect.py gpt-4o data/mediasum.json output/detection.json

# Critique
python src/critique.py gpt-4o detection.json output/critiques.json

# Refinement
python src/refine.py claude-3-sonnet critiques.json output/refined.json

Multi-Agent Debate Mode

# Run the complete debate pipeline
bash run_script.sh

πŸ“Š Input Format

Your input JSON should contain documents with this structure:

[
  {
    "document": "Source document text...",
    "summary": "Summary to be refined...",
    "summary_sentences": ["Sentence 1", "Sentence 2"],
    "topic": "Document topic"
  }
]

πŸ€– Supported Models

  • OpenAI: gpt-4o
  • Anthropic: claude-3-sonnet

πŸ“ Project Structure

mammrefine/
β”œβ”€β”€ src/                    # Core implementation
β”‚   β”œβ”€β”€ detect.py          # Factual consistency detection
β”‚   β”œβ”€β”€ critique.py        # Critique generation
β”‚   β”œβ”€β”€ refine.py          # Summary refinement
β”‚   β”œβ”€β”€ model.py           # Model interface
β”‚   └── prompts.py         # System prompts
β”œβ”€β”€ data/                  # Input datasets
β”œβ”€β”€ output/                # Generated results
β”œβ”€β”€ run_script.sh          # Complete pipeline script
└── requirements.txt       # Dependencies

πŸ”¬ Multi-Agent Debate Process

The debate system uses iterative rounds:

  1. Initial: Multiple agents process input independently
  2. Selection: Agents choose between different outputs
  3. Debate: Agents review reasoning and make informed choices
  4. Final: Consensus or best-selected output

πŸ“ Citation

@inproceedings{wan-etal-2025-mamm,
    title = "{MAMM}-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration",
    author = "Wan, David and Chen, Justin and Stengel-Eskin, Elias and Bansal, Mohit",
    booktitle = "Proceedings of NAACL 2025",
    year = "2025"
}

About

MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published