-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: add boundary log forwarding from agent to coderd #21345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
334f1ca to
ab19300
Compare
ced7c18 to
adf8dda
Compare
Add receiving boundary logs via stream unix socket in the agent, forwarding of boundary audit logs from agent to coderd via agent API, and re-emission of boundary logs to coderd stderr. Log format example: [API] 2025-12-23 18:31:46.755 [info] coderd.agentrpc: boundary_request owner=.. workspace_name=.. agent_name=.. decision=.. workspace_id=.. http_method=.. http_url=.. event_time=.. request_id=..
adf8dda to
752536f
Compare
dannykopping
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we proceed, we need to extract the message framing into its own package so it can be shared. There's a lot of duplication across this PR and https://github.com/coder/boundary/pull/124, and it will lead to inconsistencies.
We also need a more resilient approach towards the socket addressing; /tmp is not guaranteed to exist.
| s.wg.Add(1) | ||
| go func() { | ||
| defer s.wg.Done() | ||
| <-ctx.Done() | ||
| _ = conn.Close() | ||
| }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you not move the conn.Close to the same defer as cancel() but run it afterwards?
I'm not seeing why we need the extra goroutine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binary.Read() may be blocked waiting on I/O. When the context cancellation happens, something needs to interrupt binary.Read(). That's what the conn.Close() does here.
EDIT: this could be a read deadline, but this approach doesn't require the loop to spin on the deadline expiry to check for cancellation.
dannykopping
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! A few minor points but we're close now.
dannykopping
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing blocking, LGTM 👍
| // MaxMessageSizeV1 is the maximum size of the encoded protobuf messages sent | ||
| // over the wire for the TagV1 tag. While the wire format allows 24 bits for | ||
| // length, TagV1 only uses 15 bits. | ||
| MaxMessageSizeV1 uint32 = 1 << 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking nit: I think this detail should be hidden behind a Tag type to make adding new tags easy, and using tag settings expressively.
Right now there are a couple shortcomings with this design:
- You just have to know that
TagV1andMaxMessageSizeV1 - The logic to determine max size is scattered across both
{Read,Write}Frame - Adding future tags will require changes in a few places instead of just one (the definition)
Here's an idea you might consider: https://gist.github.com/dannykopping/2d7d512d4c4c6508003dc6efcf39cee0
Just a quick hack job but the codec package tests pass at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into refactoring after this initial PR merges. I debated just using a map keyed by a Tag because all that's really needed is knowing what maximum message size is associated with a particular tag.
Add the agent forwarding of boundary audit logs from workspaces to coderd via the agent API, and re-emission of boundary logs to coderd stderr.
Log format example:
Corresponding boundary PR: coder/boundary#124
RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba
#21280