feat: implement boundary usage tracker and telemetry collection by zedkipp · Pull Request #21716 · coder/coder

zedkipp · 2026-01-27T22:09:00Z

Implements telemetry for boundary usage tracking across all Coder replicas and reports them via telemetry.

Changes:

Implement Tracker with Track() and FlushToDB() methods
Add telemetry integration via collectBoundaryUsageSummary()
Use telemetry lock to ensure only one replica collects per period

The tracker accumulates unique workspaces, unique users, and request counts (allowed/denied) in memory, then flushes to the database periodically. During telemetry collection, stats are aggregated across all replicas and reset for the next period.

Relates to coder/boundary#138

coder-tasks · 2026-01-27T22:24:58Z

Documentation Check

No Changes Needed

This PR implements internal telemetry collection for boundary usage statistics. The changes are entirely internal:

Internal tracking: The boundaryusage.Tracker accumulates statistics (unique workspaces, users, allowed/denied requests) in memory and flushes them periodically to the database.
Telemetry integration: The BoundaryUsageSummary is collected during telemetry snapshots and aggregates data across all replicas. This follows the same pattern as other internal telemetry metrics already documented in the code.
No user-facing changes:
- No new CLI flags, API endpoints, or configuration options
- No changes to existing Agent Boundary behavior or configuration
- The telemetry collection is automatic and uses existing CODER_TELEMETRY_ENABLE controls
Existing documentation covers the feature:
- docs/admin/setup/telemetry.md explains telemetry collection and references the source code for details
- docs/ai-coder/boundary/agent-boundary.md documents Agent Boundaries and audit logs (which are distinct from this telemetry)

The telemetry data structure is properly defined in coderd/telemetry/telemetry.go (the BoundaryUsageSummary struct), which is the source of truth referenced by the telemetry documentation.

Automated review via Coder Tasks

zedkipp · 2026-01-27T22:29:07Z

coderd/boundaryusage/tracker_test.go

I have a more comprehensive test I will add after this PR merges that will stitch in BoundaryLogsAPI to ensure Track() is called properly when the logs are received from the workspace agent.

coderd/telemetry/telemetry.go

Implements telemetry for boundary usage tracking across all Coder replicas and reports them via telemetry. Changes: - Implement Tracker with Track() and FlushToDB() methods - Add telemetry integration via collectBoundaryUsageSummary() - Use telemetry lock to ensure only one replica collects per period The tracker accumulates unique workspaces, unique users, and request counts (allowed/denied) in memory, then flushes to the database periodically. During telemetry collection, stats are aggregated across all replicas and reset for the next period.

f0ssel

If possible would like another engineer's eyes here but everything here looks clean to me

f0ssel · 2026-01-27T22:47:48Z

coderd/boundaryusage/tracker.go

+		return nil
+	}
+
+	//nolint:gocritic // This is the actual package doing boundary usage tracking.


What is the lint complaining here? Is there a more "proper" way for dbauthz?

A lint rule is flagging this as dangerous because there are some dbauthz.As<...> functions that are very powerful.

coder/scripts/rules.go

Lines 23 to 30 in 3eeeabf

// dbauthzAuthorizationContext is a lint rule that protects the usage of

// system contexts. This is a dangerous pattern that can lead to

// leaking database information as a system context can be essentially

// "sudo".

//

// Anytime a function like "AsSystem" is used, it should be accompanied by a comment

// explaining why it's ok and a nolint.

func dbauthzAuthorizationContext(m dsl.Matcher) {

This particular usage only gives access to boundary usage resources, and it's only being used for boundary usage tracking. I think the lint rule is pretty aggressive given some of the deprecated powerful functions, but there's not much risk here.

evgeniy-scherbina · 2026-01-28T00:03:26Z

coderd/telemetry/telemetry.go

+		r.options.Logger.Debug(ctx, "boundary usage telemetry lock already claimed by another replica, skipping", slog.F("period_ending_at", periodEndingAt))
+		return nil, nil //nolint:nilnil // This is simple to handle when dealing with telemetry.
+	}
+	if err != nil {


I'm a bit concerned that if any error, other than unique_violation will be returned or unique_violation will be incorrectly wrapped - collectBoundaryUsageSummary will return error, which can break all telemetry process?

but I see that aibridge uses same approach, so probably okay.

Yeah, everything in the snapshot seems to be all-or-nothing. I debated making the snapshot proceed if the boundary usage fails for whatever reason, but I couldn't really come up with a good reason to deviate from the prior art because the boundary telemetry should always work assuming there's no unique violation.

coderd/coderd.go

zedkipp · 2026-01-28T02:10:00Z

For future readers: I tested this by launching the develop.sh script and pointed Coder at a local telemetry server with CODER_TELEMETRY=true CODER_TELEMETRY_URL=http://localhost:3001 ./scripts/develop.sh. Here’s an example of the telemetry snapshot the local telemetry server received:

{
  "aibridge_interceptions_summaries": null,
  <snip>
  "boundary_usage_summary": {
    "allowed_requests": 7,
    "denied_requests": 90,
    "unique_users": 1,
    "unique_workspaces": 1
  },
  "cli_invocations": null,
  "deployment_id": "bc7037e3-51f4-4ef8-9270-85535b93c3f8",
  "external_provisioners": null,
  <snip>
}

github-actions bot assigned zedkipp Jan 27, 2026

zedkipp force-pushed the zedkipp/boundary-usage-telemetry-snapshot branch 4 times, most recently from f149d4f to dace6b5 Compare January 27, 2026 22:21

zedkipp requested review from cstyan, evgeniy-scherbina and f0ssel January 27, 2026 22:22

zedkipp marked this pull request as ready for review January 27, 2026 22:23

zedkipp commented Jan 27, 2026

View reviewed changes

coderd/telemetry/telemetry.go Outdated Show resolved Hide resolved

zedkipp force-pushed the zedkipp/boundary-usage-telemetry-snapshot branch 2 times, most recently from 07eb313 to 778ad4f Compare January 27, 2026 22:44

zedkipp commented Jan 27, 2026

View reviewed changes

coderd/telemetry/telemetry.go Outdated Show resolved Hide resolved

zedkipp force-pushed the zedkipp/boundary-usage-telemetry-snapshot branch from 778ad4f to 0cf7cf9 Compare January 27, 2026 22:45

f0ssel approved these changes Jan 27, 2026

View reviewed changes

evgeniy-scherbina approved these changes Jan 27, 2026

View reviewed changes

evgeniy-scherbina reviewed Jan 28, 2026

View reviewed changes

coderd/coderd.go Outdated Show resolved Hide resolved

evgeniy-scherbina reviewed Jan 28, 2026

View reviewed changes

coderd/coderd.go Outdated Show resolved Hide resolved

zedkipp added 3 commits January 27, 2026 18:17

refactor: make flush loop a method on Tracker

1e073f2

refactor: move flush loop start to enterprise

cf93019

refactor: cleanup starting flush loop

b03b11c

zedkipp merged commit 2204731 into main Jan 28, 2026
57 checks passed

zedkipp deleted the zedkipp/boundary-usage-telemetry-snapshot branch January 28, 2026 02:11

github-actions bot locked and limited conversation to collaborators Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement boundary usage tracker and telemetry collection#21716

feat: implement boundary usage tracker and telemetry collection#21716
zedkipp merged 4 commits intomainfrom
zedkipp/boundary-usage-telemetry-snapshot

zedkipp commented Jan 27, 2026 •

edited

Loading

Uh oh!

coder-tasks bot commented Jan 27, 2026

Uh oh!

zedkipp Jan 27, 2026

Uh oh!

Uh oh!

Uh oh!

f0ssel left a comment

Uh oh!

f0ssel Jan 27, 2026

Uh oh!

zedkipp Jan 27, 2026

Uh oh!

evgeniy-scherbina Jan 28, 2026

Uh oh!

zedkipp Jan 28, 2026

Uh oh!

Uh oh!

Uh oh!

zedkipp commented Jan 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	// dbauthzAuthorizationContext is a lint rule that protects the usage of
	// system contexts. This is a dangerous pattern that can lead to
	// leaking database information as a system context can be essentially
	// "sudo".
	//
	// Anytime a function like "AsSystem" is used, it should be accompanied by a comment
	// explaining why it's ok and a nolint.
	func dbauthzAuthorizationContext(m dsl.Matcher) {

Conversation

zedkipp commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coder-tasks bot commented Jan 27, 2026

Documentation Check

No Changes Needed

Uh oh!

zedkipp Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

f0ssel left a comment

Choose a reason for hiding this comment

Uh oh!

f0ssel Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

zedkipp Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

evgeniy-scherbina Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

zedkipp Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zedkipp commented Jan 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zedkipp commented Jan 27, 2026 •

edited

Loading