5.8 KiB
Bug Fixes Report
This document summarizes the potential bugs found during the comprehensive code review and the fixes applied.
Critical Bugs Fixed
1. Resource Leak in Notification Worker (CRITICAL - FIXED)
File: pkg/notify/notify.go
Lines: 387-415
Severity: Critical
Problem:
The sendWorker function had defer statements inside an infinite loop:
func (n *notifier) sendWorker(i int) {
for {
select {
case work := <-n.channelSendWork:
defer func() { telemetry.End(workCtx, nil) }() // BUG!
workCtx, cancel := context.WithTimeout(...)
defer cancel() // BUG!
// ...
}
}
}
Issue Details:
deferstatements are executed when the function returns, NOT when the loop iteration ends- Each iteration creates a new context with timeout but never cancels it until the worker goroutine exits
- This causes accumulation of:
- Uncancelled contexts (memory leak)
- Pending telemetry cleanup functions (memory leak)
- Goroutines waiting on context timeouts (goroutine leak)
Impact:
- Memory usage grows continuously over time
- Goroutine count increases with each notification sent
- Eventually leads to OOM or performance degradation
- Production systems would crash under moderate load
Fix:
Extracted the loop body into a separate function processSendWork():
func (n *notifier) sendWorker(i int) {
for {
select {
case <-n.Context().Done():
return
case work := <-n.channelSendWork:
n.processSendWork(i, work)
}
}
}
func (n *notifier) processSendWork(i int, work sendWork) {
workCtx := telemetry.StartWith(...)
defer func() { telemetry.End(workCtx, nil) }()
workCtx, cancel := context.WithTimeout(workCtx, 30*time.Second)
defer cancel()
// ...
}
Now defer statements are properly executed at the end of each iteration.
2. Weak Random Number Generation (MEDIUM - FIXED)
Files:
pkg/api/api.go(line 22)pkg/util/time/time.go(line 21)
Severity: Medium
Problem:
Using math/rand instead of math/rand/v2:
import "math/rand"
// ...
feed.ID = rand.Uint64() // Not properly seeded in Go 1.20+
Issue Details:
- In Go 1.20+,
math/randrequires manual seeding for randomness - Without seeding, the random number generator uses a fixed seed (1)
- This results in predictable "random" IDs across different runs
- Could lead to ID collisions when multiple instances start at the same time
Impact:
- Feed IDs might collide between different instances
- Security implications if IDs are used for any access control
- Reproducible IDs could be exploited
Fix:
Changed to use math/rand/v2 which is automatically seeded:
import "math/rand/v2"
The v2 package automatically seeds itself with a cryptographically random seed, eliminating the need for manual seeding.
Potential Issues Identified (Not Fixed)
3. Potential Race Condition in Reload (LOW)
File: pkg/notify/notify.go
Lines: 285-327
Severity: Low
Problem:
In the Reload method, the old router and channel are closed before acquiring the write lock, but concurrent workers might still be using them with read locks:
// Old router/channel still in use by workers with RLock
if err := n.router.Close(); err != nil { // Close without lock
log.Error(...)
}
n.mu.Lock() // Lock acquired AFTER closing
n.router = router
n.mu.Unlock()
Potential Impact:
- Workers might call methods on closed router/channel
- Could result in "use of closed network connection" or similar errors
- Errors are logged, not fatal, so low severity
Why Not Fixed:
- Low severity - errors are logged and handled gracefully
- Requires architectural changes to the reload mechanism
- Current design minimizes critical section duration
- Reloads are infrequent operational events
Recommendation: Consider implementing a "drain and replace" pattern where:
- Mark old router/channel as "draining"
- Wait for in-flight operations to complete
- Then close and replace
4. Type Assertion Without Check (INFO)
File: pkg/storage/feed/block/block.go
Line: 926
Severity: Informational
Code:
b.lastDataAccess.Load().(time.Time)
Analysis:
- Uses type assertion on atomic.Value without checking the type
- However,
lastDataAccessis always initialized withtime.Timeat line 414 - The value is only stored as
time.Timethroughout the codebase - Therefore, the assertion is safe in practice
Recommendation: For better safety, consider using Go 1.19+'s typed atomic values:
type atomicTime struct{ v atomic.Pointer[time.Time] }
However, this is not critical as the current code is functionally correct.
Code Quality Observations
Good Practices Found:
- ✅ Consistent use of
deferfor resource cleanup - ✅ Proper HTTP response body closing with
defer resp.Body.Close() - ✅ Context propagation throughout the codebase
- ✅ Comprehensive error wrapping with context
- ✅ Proper use of sync primitives (RWMutex, atomic, etc.)
- ✅ Channel closing handled in single location (main.go)
- ✅ Graceful shutdown with context cancellation
Test Coverage:
- Mock implementations follow standard testify/mock patterns
- Intentional panics in mocks for early failure detection (correct design)
Summary
Bugs Fixed: 2
Critical Issues: 1 (Resource leak in notification worker)
Medium Issues: 1 (Weak random number generation)
Potential Issues Noted: 2 (Low severity, informational)
The most critical fix is the resource leak in the notification worker, which would cause production systems to crash under load. The random number generation fix improves security and prevents potential ID collisions.