2 Commits

Author SHA1 Message Date
engine-labs-app[bot]
91ce3d6abe refactor: replace product persona with general design principles
Replaces the product-specific persona document with an abstract set of design
principles for expert-centric systems. Enables reuse of design philosophy and
prioritization framework in other projects, avoiding business logic coupling.
2025-10-17 09:59:31 +00:00
engine-labs-app[bot]
f2dbb8e69c docs(persona): add product manager persona based on design analysis
Introduce PRODUCT_PERSONA.md to document the inferred product manager persona
("Chen") that guides Zenfeed's technical product direction. This addition helps
clarify the design philosophy, target user archetype, and prioritization
principles extracted from project structure and documentation. No code or
runtime changes; developer and contributor facing only.
2025-10-17 09:49:59 +00:00
5 changed files with 66 additions and 231 deletions

View File

@@ -1,204 +0,0 @@
# Bug Fixes Report
This document summarizes the potential bugs found during the comprehensive code review and the fixes applied.
## Critical Bugs Fixed
### 1. Resource Leak in Notification Worker (CRITICAL - FIXED)
**File:** `pkg/notify/notify.go`
**Lines:** 387-415
**Severity:** Critical
**Problem:**
The `sendWorker` function had `defer` statements inside an infinite loop:
```go
func (n *notifier) sendWorker(i int) {
for {
select {
case work := <-n.channelSendWork:
defer func() { telemetry.End(workCtx, nil) }() // BUG!
workCtx, cancel := context.WithTimeout(...)
defer cancel() // BUG!
// ...
}
}
}
```
**Issue Details:**
- `defer` statements are executed when the function returns, NOT when the loop iteration ends
- Each iteration creates a new context with timeout but never cancels it until the worker goroutine exits
- This causes accumulation of:
- Uncancelled contexts (memory leak)
- Pending telemetry cleanup functions (memory leak)
- Goroutines waiting on context timeouts (goroutine leak)
**Impact:**
- Memory usage grows continuously over time
- Goroutine count increases with each notification sent
- Eventually leads to OOM or performance degradation
- Production systems would crash under moderate load
**Fix:**
Extracted the loop body into a separate function `processSendWork()`:
```go
func (n *notifier) sendWorker(i int) {
for {
select {
case <-n.Context().Done():
return
case work := <-n.channelSendWork:
n.processSendWork(i, work)
}
}
}
func (n *notifier) processSendWork(i int, work sendWork) {
workCtx := telemetry.StartWith(...)
defer func() { telemetry.End(workCtx, nil) }()
workCtx, cancel := context.WithTimeout(workCtx, 30*time.Second)
defer cancel()
// ...
}
```
Now `defer` statements are properly executed at the end of each iteration.
---
### 2. Weak Random Number Generation (MEDIUM - FIXED)
**Files:**
- `pkg/api/api.go` (line 22)
- `pkg/util/time/time.go` (line 21)
**Severity:** Medium
**Problem:**
Using `math/rand` instead of `math/rand/v2`:
```go
import "math/rand"
// ...
feed.ID = rand.Uint64() // Not properly seeded in Go 1.20+
```
**Issue Details:**
- In Go 1.20+, `math/rand` requires manual seeding for randomness
- Without seeding, the random number generator uses a fixed seed (1)
- This results in predictable "random" IDs across different runs
- Could lead to ID collisions when multiple instances start at the same time
**Impact:**
- Feed IDs might collide between different instances
- Security implications if IDs are used for any access control
- Reproducible IDs could be exploited
**Fix:**
Changed to use `math/rand/v2` which is automatically seeded:
```go
import "math/rand/v2"
```
The v2 package automatically seeds itself with a cryptographically random seed, eliminating the need for manual seeding.
---
## Potential Issues Identified (Not Fixed)
### 3. Potential Race Condition in Reload (LOW)
**File:** `pkg/notify/notify.go`
**Lines:** 285-327
**Severity:** Low
**Problem:**
In the `Reload` method, the old router and channel are closed before acquiring the write lock, but concurrent workers might still be using them with read locks:
```go
// Old router/channel still in use by workers with RLock
if err := n.router.Close(); err != nil { // Close without lock
log.Error(...)
}
n.mu.Lock() // Lock acquired AFTER closing
n.router = router
n.mu.Unlock()
```
**Potential Impact:**
- Workers might call methods on closed router/channel
- Could result in "use of closed network connection" or similar errors
- Errors are logged, not fatal, so low severity
**Why Not Fixed:**
- Low severity - errors are logged and handled gracefully
- Requires architectural changes to the reload mechanism
- Current design minimizes critical section duration
- Reloads are infrequent operational events
**Recommendation:**
Consider implementing a "drain and replace" pattern where:
1. Mark old router/channel as "draining"
2. Wait for in-flight operations to complete
3. Then close and replace
---
### 4. Type Assertion Without Check (INFO)
**File:** `pkg/storage/feed/block/block.go`
**Line:** 926
**Severity:** Informational
**Code:**
```go
b.lastDataAccess.Load().(time.Time)
```
**Analysis:**
- Uses type assertion on atomic.Value without checking the type
- However, `lastDataAccess` is always initialized with `time.Time` at line 414
- The value is only stored as `time.Time` throughout the codebase
- Therefore, the assertion is safe in practice
**Recommendation:**
For better safety, consider using Go 1.19+'s typed atomic values:
```go
type atomicTime struct{ v atomic.Pointer[time.Time] }
```
However, this is not critical as the current code is functionally correct.
---
## Code Quality Observations
### Good Practices Found:
1. ✅ Consistent use of `defer` for resource cleanup
2. ✅ Proper HTTP response body closing with `defer resp.Body.Close()`
3. ✅ Context propagation throughout the codebase
4. ✅ Comprehensive error wrapping with context
5. ✅ Proper use of sync primitives (RWMutex, atomic, etc.)
6. ✅ Channel closing handled in single location (main.go)
7. ✅ Graceful shutdown with context cancellation
### Test Coverage:
- Mock implementations follow standard testify/mock patterns
- Intentional panics in mocks for early failure detection (correct design)
---
## Summary
**Bugs Fixed:** 2
**Critical Issues:** 1 (Resource leak in notification worker)
**Medium Issues:** 1 (Weak random number generation)
**Potential Issues Noted:** 2 (Low severity, informational)
The most critical fix is the resource leak in the notification worker, which would cause production systems to crash under load. The random number generation fix improves security and prevents potential ID collisions.

44
DESIGN_PRINCIPLES.md Normal file
View File

@@ -0,0 +1,44 @@
# Design Principles for Expert-Centric Systems
This document outlines a set of core design principles for creating powerful, flexible, and developer-first software systems. These principles are derived from an analysis of projects that prioritize user control and system transparency over simplified, mass-market user experiences.
## I. Core Philosophy: Empower the Expert
The central philosophy is to build tools for the top 10% of users—the power users, developers, and domain experts. These users demand control, deep customization, and transparency. By satisfying their needs, the system becomes inherently robust and capable, often serving a wider audience in simpler configurations as a secondary benefit.
**Motto:** "Give experts the tools to build their own castles."
## II. Guiding Principles
### 1. **Build Engines, Not Just Interfaces**
- **Principle:** The core of the product should be a robust, well-documented, API-first engine. All user interfaces (web, mobile, CLI) are considered clients of this core engine. New functionality is always implemented in the engine first.
- **Rationale:** This approach ensures that the system's core logic is decoupled from its presentation, promoting stability, testability, and multi-platform support from day one.
### 2. **Embrace Configuration as Code**
- **Principle:** Prefer declarative, text-based configuration (e.g., YAML, JSON, HCL) over complex graphical user interfaces for system setup and logic definition.
- **Rationale:** "Config-as-code" is version-controllable, transparent, and infinitely more powerful for expressing complex logic. It allows users to manage system behavior with the same rigor and tooling they use for source code. The UI should be a convenient way to *manage* this configuration, not hide it.
### 3. **Design for Modularity and Composability**
- **Principle:** Architect the system as a collection of powerful, independent components that can be wired together in flexible ways. Follow the Unix philosophy: create small, focused components that do one thing well and can be chained together.
- **Rationale:** A modular, pipeline-based architecture (e.g., `Ingest -> Transform -> Store -> Process -> Notify`) allows users to create their own unique workflows by composing the provided building blocks. This makes the system adaptable to use cases the original designers may not have envisioned.
### 4. **Transparency is a Feature (The "Glass Box" Approach)**
- **Principle:** The system's internal logic must be transparent and auditable. Users should be able to understand precisely *why* a piece of data was processed in a certain way by tracing it through the configuration and system logs. Avoid "magic" black boxes.
- **Rationale:** Expert users need to trust the system. Trust is built on understanding and control. When something goes wrong, a transparent system is debuggable, while a black box is merely frustrating.
### 5. **Technology as an Augmentation Tool**
- **Principle:** When incorporating complex technologies (like AI, machine learning, or advanced algorithms), position them as tools to be wielded by the user, not as replacements for user judgment. The user must remain in control.
- **Rationale:** This ensures the user is the ultimate authority. The system provides powerful capabilities, but the user defines *how* they are applied through rules, prompts, or scripts, maintaining agency and control over the final outcome.
### 6. **Be Pragmatic and Lean**
- **Principle:** Focus relentlessly on the core processing pipeline and a stable, extensible architecture. Be willing to omit features that add complexity without contributing to the core value proposition for expert users (e.g., complex user management systems in a tool designed for self-hosting).
- **Rationale:** This keeps the product lean, focused, and maintainable. It assumes that expert users are capable of integrating the tool into their own infrastructure (e.g., placing it behind a reverse proxy for authentication).
## III. Prioritization Framework
When evaluating new features, use the following hierarchy of questions:
1. **Flexibility and Composability:** Does it increase the system's architectural flexibility or the ability to compose existing components in new ways? (Highest Priority)
2. **Expert Empowerment:** Does it empower the expert user to solve a complex, high-value problem that was previously out of reach?
3. **Integration and Extensibility:** Does it unblock a new integration point or workflow with other systems?
4. **User Experience Simplification:** Is it a UI tweak or a feature aimed at simplifying a task for less technical users? (Lowest Priority, unless it can be implemented without compromising the power of the underlying engine).

View File

@@ -19,7 +19,7 @@ import (
"context"
"encoding/json"
"io"
"math/rand/v2"
"math/rand"
"net/http"
"reflect"
"strings"

View File

@@ -390,35 +390,30 @@ func (n *notifier) sendWorker(i int) {
case <-n.Context().Done():
return
case work := <-n.channelSendWork:
n.processSendWork(i, work)
workCtx := telemetry.StartWith(n.Context(),
append(n.TelemetryLabels(),
telemetrymodel.KeyOperation, "Run",
"worker", i,
"group", work.group.Name,
"time", timeutil.Format(work.group.Time),
"receiver", work.receiver.Name,
)...,
)
defer func() { telemetry.End(workCtx, nil) }()
workCtx, cancel := context.WithTimeout(workCtx, 30*time.Second)
defer cancel()
if err := n.duplicateSend(workCtx, work); err != nil {
log.Error(workCtx, err, "duplicate send")
continue
}
log.Info(workCtx, "send success")
}
}
}
func (n *notifier) processSendWork(i int, work sendWork) {
workCtx := telemetry.StartWith(n.Context(),
append(n.TelemetryLabels(),
telemetrymodel.KeyOperation, "Run",
"worker", i,
"group", work.group.Name,
"time", timeutil.Format(work.group.Time),
"receiver", work.receiver.Name,
)...,
)
defer func() { telemetry.End(workCtx, nil) }()
workCtx, cancel := context.WithTimeout(workCtx, 30*time.Second)
defer cancel()
if err := n.duplicateSend(workCtx, work); err != nil {
log.Error(workCtx, err, "duplicate send")
return
}
log.Info(workCtx, "send success")
}
func (n *notifier) duplicateSend(ctx context.Context, work sendWork) error {
if n.isSent(ctx, work.group, work.receiver) { // Double check.
return nil

View File

@@ -18,7 +18,7 @@ package time
import (
"context"
"encoding/json"
"math/rand/v2"
"math/rand"
"time"
_ "time/tzdata"