Skip to content

Conversation

@kacpersaw
Copy link
Contributor

Summary

Add circuit breaker support for AI Bridge to protect against cascading failures from upstream AI provider rate limits (HTTP 429, 503, and Anthropic's 529 overloaded responses).

Changes

  • Add 5 new CLI options for circuit breaker configuration:
    • --aibridge-circuit-breaker-enabled (default: false)
    • --aibridge-circuit-breaker-failure-threshold (default: 5)
    • --aibridge-circuit-breaker-interval (default: 10s)
    • --aibridge-circuit-breaker-timeout (default: 30s)
    • --aibridge-circuit-breaker-max-requests (default: 3)
  • Update aibridge dependency to include circuit breaker support
  • Add tests for pool creation with circuit breaker providers

Notes

  • Circuit breaker is disabled by default for backward compatibility
  • When enabled, applies to both OpenAI and Anthropic providers
  • Uses sony/gobreaker internally via the aibridge library

Testing

make test RUN=TestPoolWithCircuitBreakerProviders

@kacpersaw kacpersaw force-pushed the kacpersaw/aibridge-circuit-breaker-setup branch 2 times, most recently from 930deb9 to 8fe246a Compare January 19, 2026 09:54
@kacpersaw kacpersaw changed the title feat(aibridge): add circuit breaker configuration support feat(codersdk): add circuit breaker configuration support for aibridge Jan 19, 2026
@kacpersaw kacpersaw force-pushed the kacpersaw/aibridge-circuit-breaker-setup branch from 8fe246a to 93cfb29 Compare January 19, 2026 09:59
@kacpersaw kacpersaw marked this pull request as ready for review January 19, 2026 11:09
@matifali
Copy link
Member

Circuit breaker is disabled by default for backward compatibility

Do we not recommend this for prod deployments? If we do, then I think it's fine to shop a breaking change and call it out. Most customers will only use it after GA.

@dannykopping
Copy link
Contributor

Circuit breaker is disabled by default for backward compatibility

Do we not recommend this for prod deployments? If we do, then I think it's fine to shop a breaking change and call it out. Most customers will only use it after GA.

I don't think we should frame it as disabled for BC reasons. We haven't validated this yet in production so I think making it optional for now is the conservative approach. We should enable it in dogfood after this PR lands, and keep an eye on the metrics to assess if the behaviour aligned with our expectations and requirements.

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Add circuit breaker support for AI Bridge to protect against cascading
failures from upstream AI provider rate limits (429, 503, 529 responses).

New CLI options:
- --aibridge-circuit-breaker-enabled (default: false)
- --aibridge-circuit-breaker-failure-threshold (default: 5)
- --aibridge-circuit-breaker-interval (default: 10s)
- --aibridge-circuit-breaker-timeout (default: 30s)
- --aibridge-circuit-breaker-max-requests (default: 3)

The circuit breaker is disabled by default for backward compatibility.
When enabled, it applies to both OpenAI and Anthropic providers.
@kacpersaw kacpersaw force-pushed the kacpersaw/aibridge-circuit-breaker-setup branch from dabab8a to f444e76 Compare January 20, 2026 13:36
@kacpersaw kacpersaw merged commit ed679bb into main Jan 20, 2026
33 of 35 checks passed
@kacpersaw kacpersaw deleted the kacpersaw/aibridge-circuit-breaker-setup branch January 20, 2026 13:59
@github-actions github-actions bot locked and limited conversation to collaborators Jan 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants