Skip to content

feat(metrics): track thread pool backlog#358

Merged
yordis merged 3 commits into
masterfrom
yordis/feat-thread-pool-backlog-monitor
May 14, 2026
Merged

feat(metrics): track thread pool backlog#358
yordis merged 3 commits into
masterfrom
yordis/feat-thread-pool-backlog-monitor

Conversation

@yordis
Copy link
Copy Markdown
Member

@yordis yordis commented May 14, 2026

  • Operators need visibility into work queued outside the node's named internal queues.
  • Thread-pool pressure should appear in the same telemetry surface operators already use for queue health.

Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

Warning

Rate limit exceeded

@yordis has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 51 minutes and 13 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 99e70cea-920a-4070-8ca6-ac006689ab42

📥 Commits

Reviewing files that changed from the base of the PR and between 0c5af23 and 2d98f10.

📒 Files selected for processing (1)
  • src/EventStore.Core/Bus/QueueStatsCollector.cs

Walkthrough

This PR introduces thread pool backlog monitoring to Event Store by adding a new ThreadPoolBacklogMonitor class that periodically samples CLR pending work items, enhanced peak-tracking in QueueStatsCollector, and wiring the monitor into the cluster node lifecycle with corresponding metrics configuration and tests.

Changes

Thread Pool Backlog Monitoring

Layer / File(s) Summary
Queue length peak tracking foundation
src/EventStore.Core/Bus/QueueStatsCollector.cs
ReportQueueLength(int) method centralizes peak updates using Math.Max. ProcessingStarted() and GetStatistics() both call this method to track the highest queue depth observed during message processing and statistics collection.
ThreadPoolBacklogMonitor implementation
src/EventStore.Core/Metrics/ThreadPoolBacklogMonitor.cs
New sealed class implements periodic CLR thread pool pending work sampling via System.Threading.Timer. Samples pending counts, records metrics through QueueStatsCollector and QueueTracker, and manages lifecycle (start/stop/dispose) with thread-safe atomic flags and locking for timer coordination.
ClusterVNode integration
src/EventStore.Core/ClusterVNode.cs
Instantiates and starts ThreadPoolBacklogMonitor during node construction using queue stats managers; disposes and clears the monitor reference during shutdown cleanup to ensure graceful resource release.
Metrics configuration and test coverage
src/EventStore.ClusterNode/metricsconfig.json, src/EventStore.Core.XUnit.Tests/Metrics/QueueStatsCollectorTests.cs
Metrics config maps queue names matching ThreadPoolBacklog to the ThreadPoolBacklog label. Three xUnit tests verify peak initialization, peak updates via ReportQueueLength, and idempotent disposal of the monitor.

Sequence Diagram

sequenceDiagram
  participant VNode as ClusterVNode
  participant Monitor as ThreadPoolBacklogMonitor
  participant Timer as System.Threading.Timer
  participant QueueStats as QueueStatsCollector
  participant QueueMon as QueueMonitor.Default
  VNode->>Monitor: Start()
  Monitor->>QueueStats: Start()
  Monitor->>QueueMon: Register()
  Monitor->>Timer: Enqueue first work item
  loop Sample cycle (while not stopped)
    Timer->>Monitor: Execute()
    Monitor->>Monitor: Capture pending count
    Monitor->>QueueStats: ProcessingStarted()
    Monitor->>QueueStats: ReportQueueLength()
    Monitor->>QueueStats: ProcessingFinished()
    Monitor->>Timer: Re-arm if not stopped
  end
  VNode->>Monitor: Stop() / Dispose()
  Monitor->>Timer: Cancel
  Monitor->>QueueMon: Unregister()
  Monitor->>QueueStats: Stop()
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • TrogonStack/TrogonEventStore#355: Introduces queue-length backlog recording infrastructure via QueueTracker.RecordQueueLength and QueueTrackers, which this PR uses to report backlog metrics from the thread pool monitor.

Poem

🐰 A thread pool hops through pending work,
Our monitor counts each lurking quirk,
Peak heights tracked with Math.Max cheer,
Backlog metrics crystal clear! ✨
Sampled timers, metrics flow—
Now EventStore's pools all glow.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(metrics): track thread pool backlog' directly and clearly summarizes the main change: adding thread pool backlog tracking to the metrics system.
Description check ✅ Passed The description explains the business need (visibility into work queued outside named queues) and how it's addressed (surfacing thread-pool pressure in existing telemetry), which directly relates to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch yordis/feat-thread-pool-backlog-monitor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cursor
Copy link
Copy Markdown

cursor Bot commented May 14, 2026

PR Summary

Medium Risk
Adds a new background monitor that schedules thread-pool work and reports its backlog via the queue telemetry system, plus adjusts queue peak-length tracking logic; errors here could add overhead or skew operational metrics.

Overview
Adds a new ThreadPoolBacklog monitored queue that samples ThreadPool.PendingWorkItemCount every second and reports it through the existing queue stats/telemetry surface, wiring it into ClusterVNode startup/shutdown and the queue label configuration.

Refactors QueueStatsCollector peak-length tracking to be thread-safe and to always include the currently reported queue length in peak calculations (with tests covering peak behavior and idempotent disposal of the new monitor).

Reviewed by Cursor Bugbot for commit 2d98f10. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d460a51. Configure here.

Comment thread src/EventStore.Core/Metrics/ThreadPoolBacklogMonitor.cs Outdated
Comment thread src/EventStore.Core/Metrics/ThreadPoolBacklogMonitor.cs
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/EventStore.Core/Bus/QueueStatsCollector.cs`:
- Around line 97-101: ReportQueueLength updates non-atomic peaks
(_lifetimeQueueLengthPeak, _currentQueueLengthPeak) and can race with calls from
ProcessingStarted and GetStatistics; replace the Math.Max read-compare-write
with atomic Interlocked-based updates: use Interlocked.CompareExchange loops (or
Interlocked.Exchange/Read patterns) to update _lifetimeQueueLengthPeak and
_currentQueueLengthPeak only if the new queueLength is greater, leaving
_statisticsLock usage in GetStatistics unchanged; reference ReportQueueLength,
_lifetimeQueueLengthPeak, _currentQueueLengthPeak, ProcessingStarted,
GetStatistics and _statisticsLock when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 50e56702-7fc8-40f7-9f66-1d95a4582e52

📥 Commits

Reviewing files that changed from the base of the PR and between 3860089 and 0c5af23.

📒 Files selected for processing (5)
  • src/EventStore.ClusterNode/metricsconfig.json
  • src/EventStore.Core.XUnit.Tests/Metrics/QueueStatsCollectorTests.cs
  • src/EventStore.Core/Bus/QueueStatsCollector.cs
  • src/EventStore.Core/ClusterVNode.cs
  • src/EventStore.Core/Metrics/ThreadPoolBacklogMonitor.cs

Comment thread src/EventStore.Core/Bus/QueueStatsCollector.cs
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
@yordis yordis merged commit 7373777 into master May 14, 2026
22 checks passed
@yordis yordis deleted the yordis/feat-thread-pool-backlog-monitor branch May 14, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant