Skip to content

Synthetitastic is an accelerationist set of synthetic data.

License

Notifications You must be signed in to change notification settings

KTibow/synthetitastic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synthetitastic

Synthetitastic is an accelerationist set of synthetic data. It's packed with a variety of programatically generated cases that can be evaled on and RLed with. Synthetitastic isn't ordinary - it wants to be saturated, so that LLMs can get better and more efficient at these seemingly easy tasks.

Basic tasks

These are the kinds of tasks that computers are great at, reasoning LLMs are okay at, and normal LLMs are bad at. With Synthetitastic, LLMs can hone their skills on these tasks, reaching perfect accuracy and speed.

The test cases are jsonl files, where each line has keys like

  • input: Input text
  • output: Expected output text

And we have these tests

  • Day of week (eg "What day of the week is 2084-09-26? Just say the day.")
  • Epoch conversion (eg "I want to make a Discord timestamp for 19:24:17 UTC on 2129-11-29. Just say the Unix time (seconds) I should use.")
  • Large multiplication (eg "What is the product of 96933 and 90409? Just say the number.")
  • Largest number (eg "What is the largest number without the letter o? Reply with just the decimal number. Exclude numbers like googolplex.")
  • Letter counting (eg "How many 'e' are in 'bestseller'? Just say the number.")
  • Wordle (eg "I guessed raxes and got 🟨🟨🟨🟨⬛ - so what's the word?")

Here are the results:

Day of week

GPT-4.5: 8/10

Llama 70b: 2/10

Llama Maverick: 7/10

Qwen 235B (thinking): 5/10

Claude 3.7: 0/10

R1 0528: 8/10

R1: 8/10

Epoch conversion

GPT-4.5: 0/10

Llama 70b: 0/10

Llama Maverick: 0/10

Qwen 235B (thinking): 0/10

Claude 3.7: 0/10

R1 0528: 1/10

R1: 0/10

Large multiplication

GPT-4.5: 0/10

Llama 70b: 0/10

Llama Maverick: 2/10

Qwen 235B (thinking): 0/10

Claude 3.7: 0/10

R1 0528: 10/10

R1: 10/10

Largest number

GPT-4.5: 8/10

Llama 70b: 3/21

Llama Maverick: 6/21

Qwen 235B (after 6k tokens of thinking): 17/21

Claude 3.7: 5/10

R1 0528 (after 14.5k tokens of thinking): 20/21

R1: 7/10

Letter counting

GPT-4.5: 10/10

Llama 70b: 6/10

Llama Maverick: 5/10

Qwen 235B (thinking): 8/10

Claude 3.7: 6/10

R1 0528: 9/10

R1: 10/10

Wordle

GPT-4.5: 2/10

Llama 70b: 0/10

Llama Maverick: 0/10

Qwen 235B (thinking): 0/10

Claude 3.7: 2/10

R1 0528: 10/10

R1: 9/10

Multimodal tasks

These tasks are closer to real life. LLMs should be perfect at these in theory, but currently aren't that great.

  • Angle identification
  • Clock reading
  • Icon recognition
  • Object counting
  • Point identification

The test cases are jsonl files, where each line has keys like

  • input: Input text
  • input_image: Base 64 encoded input image (PNG)
  • output: Expected output text

More

If you have an idea, PR it.

I might add more things like test cases that mirror my workflow or reward functions for drawing things in the future.

About

Synthetitastic is an accelerationist set of synthetic data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published