Conversation
|
Claude finished @ZanCorDX's task in 3m 28s —— View job 🔍 PR Review: Better ShutdownThis PR improves the shutdown mechanism by replacing sleep-based waiting with SummaryThe PR introduces a two-phase shutdown mechanism:
This replaces the previous approach that used arbitrary sleeps to wait for tasks to complete. ✅ Strengths
|
| Item | Status |
|---|---|
| Code compiles | ✅ Appears correct |
| No security issues | ✅ No concerns |
| Error handling | ✅ Properly handles JoinHandle errors |
| Documentation | ✅ Good doc comments on new structs/functions |
| Tests |
🎯 Overall Assessment
Approve with minor suggestions. This is a well-designed improvement to the shutdown mechanism. The two-phase approach with proper task tracking is much more robust than the previous sleep-based approach. The inline comments contain minor suggestions that can be addressed in a follow-up if desired.
| Duration::from_secs(RUN_SUBMIT_TO_RELAYS_JOB_CANCEL_TIME_SECONDS); | ||
| /// v3 might need to keep the server running for a while until the slot ends. | ||
| pub const OPTIMISTIC_V3_CLOSE_TIME_SECONDS: u64 = SLOT_DURATION_SECS; | ||
|
|
There was a problem hiding this comment.
Good addition of the OPTIMISTIC_V3_CLOSE_TIME_SECONDS constant. However, it appears that RUN_SUBMIT_TO_RELAYS_JOB_CANCEL_TIME_SECONDS (defined at line 14) is now unused since RUN_SUBMIT_TO_RELAYS_JOB_CANCEL_TIME was removed. Consider cleaning up that unused constant to avoid dead code.
| tokio::time::sleep(RUN_SUBMIT_TO_RELAYS_JOB_CANCEL_TIME).await; | ||
| task_manager.graceful_shutdown_with_timeout(Duration::from_secs(5)); | ||
| abort_token.cancelled().await; | ||
| task_manager.graceful_shutdown(); |
There was a problem hiding this comment.
Good design pattern: This task forwards the abort signal to the task_manager, which is cleaner than the previous approach that used a hardcoded sleep. However, ensure that task_manager.graceful_shutdown() properly signals all child tasks to complete their cleanup work before the clickhouse_shutdown_handle completes.
| let backup_handle = backup.spawn(task_executor, backup_table_name, tracing_target); | ||
|
|
||
| // Spawn a wrapper task that waits for both to complete | ||
| tokio::spawn(async move { |
There was a problem hiding this comment.
Good pattern: Wrapping both handles in a single task that waits for both to complete is a clean solution. This ensures the returned JoinHandle only completes when both the inserter and backup tasks have finished their cleanup.
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
|
Claude finished @ZanCorDX's task in 1m 41s —— View job 🔍 PR Review: Better Shutdown (Re-review after sync)SummaryReviewed the latest sync which added logging of remaining tasks when graceful shutdown timeout is reached. Found one issue that should be addressed.
|
| } | ||
| } | ||
| _ = tokio::time::sleep_until(graceful_deadline) => { | ||
| warn!("Graceful shutdown timeout reached, signaling abort"); |
There was a problem hiding this comment.
This line appears to be redundant now that the more informative warning with task counts was added below. Consider removing it:
| warn!("Graceful shutdown timeout reached, signaling abort"); |
|
Claude finished @ZanCorDX's task in 1m 19s —— View job 🔍 PR Review: Better Shutdown (Re-review after sync)SummaryReviewed the latest sync (commits Previous Review Status
|
| warn!("Graceful shutdown timeout reached, signaling abort"); | ||
| warn!( |
There was a problem hiding this comment.
This line is now redundant with the more informative warning that includes task counts on the next line. Consider removing the duplicate:
| warn!("Graceful shutdown timeout reached, signaling abort"); | |
| warn!( | |
| warn!( |
|
Claude finished @ZanCorDX's task in 1m 23s —— View job 🔍 PR Review: Better Shutdown (Re-review after latest sync)SummaryReviewed the latest state after commit ✅ Previous Issues Status
|
|
Claude finished @ZanCorDX's task in 1m 15s —— View job 🔍 PR Review: Better Shutdown (Final Re-review)SummaryReviewed the latest state after commit ✅ Previous Issues Status
📝 Latest Commit Analysis (
|
|
Claude finished @ZanCorDX's task in 1m 23s —— View job 🔍 PR Review: Better Shutdown (Final Review)SummaryReviewed the latest state after commit ✅ All Previous Issues Resolved
|
| } | ||
| _ = tokio::time::sleep_until(graceful_deadline) => { | ||
| warn!( | ||
| remaining_inner_jobs = non_critical_tasks.len(), |
There was a problem hiding this comment.
Minor: For consistency with the renamed variable (non_critical_tasks) and the field name at line 609 (remaining_non_critical_tasks), consider using the same naming pattern here:
| remaining_inner_jobs = non_critical_tasks.len(), | |
| remaining_non_critical_tasks = non_critical_tasks.len(), |
|
Claude finished @ZanCorDX's task in 1m 47s —— View job 🔍 PR Review: Better ShutdownProgress
✅ All Previous Issues Resolved
✅ Architecture ReviewThe two-phase shutdown mechanism is well-designed and significantly improves upon the previous sleep-based approach: Phase 1 (Graceful): Wait for all tasks (non-critical + critical) with Key Strengths:
📋 Final AssessmentApprove ✅ The PR is ready for merge. This is a significant improvement to the shutdown mechanism:
All previously identified issues have been addressed. The code is well-structured and the shutdown logic is clear and robust. |
📝 Summary
Shutdown is now performed in 2 steps:
1- Cancellation: We propagate a cancel signal so we stop building, data stops flowing and channels close.
We expect all critical threads to finish.
2- Abort: If some critical thread is still alive we trigger a second signal so critical tasks stop waiting for data (assuming the source is hanged) and perform any necessary clean up.
Sleeps were replaced by JoinHandles.
✅ I have completed the following steps:
make lintmake test