This document describes WHAT we’re building — the target architecture.
For WHEN we’re building it, see the Roadmap (roadmap-2026.md).
For HOW we’re building it this phase, see the Build Plan (build-plan-phase-1.md).
Architecture Timeline — Where We Are
A consolidated view of every capability stream across all TOGAF architecture layers, from project inception (January 2026) through Phase 2 kickoff (July). The red line marks today — Week 16, 14 April 2026.
| Stream | Jan | Feb | Mar | Apr 14 |
Apr | May | Jun | Jul | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Signal Collection (9 sources) | ||||||||||||||||||||||||||||||
| Signal Scoring (HxWxD V2) | ||||||||||||||||||||||||||||||
| Intelligence Layer (V5/V6) | ||||||||||||||||||||||||||||||
| Trend Classification (rules) | ||||||||||||||||||||||||||||||
| Delivery — Ops Dashboard | ||||||||||||||||||||||||||||||
| Delivery — Client Reports | ||||||||||||||||||||||||||||||
| Delivery — Email (pilots) | ||||||||||||||||||||||||||||||
| Multi-Tenancy (client isolation) | ||||||||||||||||||||||||||||||
| Client Onboarding | ||||||||||||||||||||||||||||||
| Stream | Jan | Feb | Mar | Apr 14 |
Apr | May | Jun | Jul | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| n8n Orchestration | ||||||||||||||||||||||||||||||
| Python Scoring Engine | ||||||||||||||||||||||||||||||
| Gemini Intelligence Layer | ||||||||||||||||||||||||||||||
| Plotly Dash Dashboard v2 | ||||||||||||||||||||||||||||||
| Pipeline API (FastAPI) | ||||||||||||||||||||||||||||||
| /report Skill (Claude Code) | ||||||||||||||||||||||||||||||
| Validation Feedback UI (Tim) | ||||||||||||||||||||||||||||||
| Minimal Auth (Tim) | ||||||||||||||||||||||||||||||
| Report Template Engine (Tim) | ||||||||||||||||||||||||||||||
| Stream | Jan | Feb | Mar | Apr 14 |
Apr | May | Jun | Jul | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PostgreSQL Schema (11+ tables) | ||||||||||||||||||||||||||||||
| Collector Data Flows (9 sources) | ||||||||||||||||||||||||||||||
| Fuzzy Dedup + Entity Resolution | ||||||||||||||||||||||||||||||
| Trend Families + Clustering | ||||||||||||||||||||||||||||||
| Multi-tenancy Tables (Tim) | ||||||||||||||||||||||||||||||
| Calendar Events DB | ||||||||||||||||||||||||||||||
| Client Folder Structure | ||||||||||||||||||||||||||||||
| Vertical Intelligence Layer | ||||||||||||||||||||||||||||||
| Stream | Jan | Feb | Mar | Apr 14 |
Apr | May | Jun | Jul | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| VPS + Docker Setup | ||||||||||||||||||||||||||||||
| n8n Deployment + Upgrades | ||||||||||||||||||||||||||||||
| Caddy Reverse Proxy + TLS | ||||||||||||||||||||||||||||||
| Pipeline API Deployment | ||||||||||||||||||||||||||||||
| Dashboard (dash.rumblings.io) | ||||||||||||||||||||||||||||||
| Web Static (web.rumblings.io) | ||||||||||||||||||||||||||||||
| Pipeline Observability Hooks | ||||||||||||||||||||||||||||||
| Email Provider Setup (Tim) | ||||||||||||||||||||||||||||||
| Stream | Jan | Feb | Mar | Apr 14 |
Apr | May | Jun | Jul | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Social Signal Validation Research | ||||||||||||||||||||||||||||||
| Legal Docs (ToS, Privacy, DPA) | ||||||||||||||||||||||||||||||
| Demo Environment + Script | ||||||||||||||||||||||||||||||
| Pilot Prep Homework Process | ||||||||||||||||||||||||||||||
Executive Summary
Rumblings is a cultural trend detection system that identifies emerging trends 2–12 weeks before mainstream media coverage. It ingests signals from 9+ data sources, scores them using a proprietary Height × Width × Depth model, classifies trends through deterministic rules, and generates intelligence layer outputs (So What context, Now What activation suggestions, narrative stories) that transform raw signal data into actionable brand intelligence.
Architecture in one sentence: n8n orchestrates 9+ data collectors feeding PostgreSQL, a Python scoring engine (H×W×D V2) classifies signals via deterministic rules, a Gemini 2.5 Flash intelligence layer generates cultural briefs with sector-specific context, and a Plotly Dash dashboard serves operational and client-facing views — all running on a single Hostinger VPS with Docker.
Target product tiers:
| Tier | What the Client Gets | Phase |
|---|---|---|
| Tier 1 | Weekly trend intelligence reports: So What (sector context) + lite Now What (vertical-level activation) + narrative stories | Phase 1–2 |
| Tier 2 | + Content briefs, client-specific Now What activation, creator matching, saturation alerts | Phase 3 |
| Tier 3 | + API access, trend attribution, trajectory modelling | Phase 4 |
1. Business Architecture — Capabilities Required
This section describes what the system must do, independent of technology choices.
1.1 Signal Collection Capability
The system must continuously ingest signals from diverse data sources to achieve multi-signal triangulation — confirming trends across independent platforms rather than relying on any single source.
| Capability | Description | Status |
|---|---|---|
| Multi-source ingestion | Collect from 9+ independent data sources (social, news, search, cultural platforms) on automated schedules | Deployed |
| Seed term management | Curated seed term lists per source, managed by founder-owners with weekly review cadence | Deployed |
| Source-specific parsing | Each collector normalises platform-specific data into a common signal schema (term, engagement, velocity, raw_data JSONB) | Deployed |
| Deduplication | Content-hash dedup within sources; fuzzy dedup across sources (Levenshtein + token-based) | Deployed |
| Entity resolution | Same entity appearing as different terms across sources resolved into canonical form | Built, integration pending |
| Rate limit resilience | Collectors handle rate limits gracefully — backoff, retry, partial collection rather than failure | Partial (GT pending) |
| ToS compliance | All data collection complies with source Terms of Service — no scraping, no prohibited use | Under review (W12, Grace) |
1.2 Signal Scoring Capability
The system must score every signal across three independent dimensions to enable reliable trend classification without LLM dependency.
| Capability | Description | Status |
|---|---|---|
| Height scoring (intensity) | Per-source metric extraction, percentile normalisation against calibrated distributions, recency-weighted decay, max-aggregated across sources. No source-count multiplier (Width’s job). | Deployed (V2) |
| Width scoring (breadth) | IW (intra-source diversity) + XW (cross-source spread) with taper. 7 profiles: Spike, Flash, Swell, Wave, Undercurrent, Seedling, Ripple. | Deployed (V2) |
| Depth scoring (substance) | 4 components: Evidence Quality (30pts), Temporal Dynamics (30pts), External Interest (20pts), Information Richness (20pts). Gating prevents hollow scores. | Deployed (V2) |
| Composite scoring | H×W×D combined into composite score with deterministic classification: Strong/Emerging/Possible/Noise | Deployed |
| Daily snapshots | Trend scores captured daily for time-series analysis and trajectory tracking | Deployed |
| Calibration | 151 historically-documented trends provide ground truth for scoring model calibration | Complete |
1.3 Intelligence Layer Capability
The system must generate natural-language intelligence that transforms raw trend data into actionable brand insights. This is the core product differentiator.
| Capability | Description | Status |
|---|---|---|
| Cultural Brief generation | Every scored trend gets a structured brief: what/why_now/who/so_what/category/brand_safety/confidence | Deployed |
| Sector-specific “So What” | Context tailored by sector (beauty, fashion, F&B, tech, etc.) — not generic “this trend is growing” | Deployed |
| Lite “Now What” (Tier 1) | 2–4 actionable activation suggestions at vertical level (not client-specific — that’s Tier 2) | Deployed |
| Report Narrative | 2–3 paragraph trend story suitable for weekly intelligence reports | Deployed |
| Trend Profiling | Auto-generated summary, sentiment, key_events, origin for each detected trend | Deployed |
| Trend Enrichment | 4-layer: internal signal analysis → LLM enrichment → Urban Dictionary → Google Trends | Deployed |
| Historical context | Wikipedia pageviews, GDELT volume, Google Trends baselines via TrendHistorian | Deployed |
| Data sufficiency gating | Below thresholds, intelligence outputs switch to qualitative observations or suppress quantitative claims | Not started |
| Client-specific “Now What” (Tier 2) | “YOUR brand should do X because of your positioning” — requires client matching | Phase 2 |
1.4 Trend Classification Capability
The system must classify detected trends using deterministic rules (not LLM), with human validation feedback loop.
| Capability | Description | Status |
|---|---|---|
| Deterministic classification | Rules-based: Strong (H≥30, W≥40), Emerging, Possible, Noise. No LLM in classification path. | Deployed |
| Width gating | W≥40 requires 2+ sources. ~82% of terms are single-source (W=20) → always Noise. By design. | Deployed |
| Profile assignment | 7 trend profiles based on H×W×D shape (Spike, Flash, Swell, Wave, Undercurrent, Seedling, Ripple) | Deployed |
| Validation feedback loop | Expert corrections (confirm/reject/reclassify) stored in validation_feedback table | Schema deployed, UI pending |
| Trend families | Clustering related trends into families via trend_families + trend_family_members tables | Deployed |
1.5 Delivery Capability
The system must deliver trend intelligence to clients through multiple channels.
| Capability | Description | Status |
|---|---|---|
| Operational dashboard | Plotly Dash v2: Sankey pipeline view, UpSet collector view, H×W×D scatter, network graph | Deployed |
| Client-facing dashboard | Client-scoped views showing only relevant trends with intelligence layer outputs | Not started (May) |
| Slack Connect | Trend intelligence delivered to client Slack channels | Deferred to Phase 2 |
| Email reports | Formatted weekly intelligence emails with trend summaries | Manual for pilots |
| API access (Tier 3) | RESTful API for programmatic trend data access | Phase 4 |
| Content briefs (Tier 2) | 500-word structured briefs: angle, audience, key messages, format, timing, brand safety | Phase 3 |
1.6 Multi-Tenancy Capability
The system must support multiple clients with isolated views of relevant trends.
| Capability | Description | Status |
|---|---|---|
| Client data model | Minimum viable: clients, client_verticals, client_terms, client_preferences tables | Tim WS1 starting W16 (Apr 14) |
| Client-scoped queries | Every intelligence query becomes client-aware — clients see only trends relevant to them | Tim WS1 W17 |
| Client matching | LLM-based relevance scoring: given client profile + trend, how relevant? | Phase 2 |
| Authentication | Minimal auth (API key + unique URL), not full login. Approach simplified from original scope. | Tim WS1 W18 |
| Client onboarding | Discovery → seed terms → configuration → first delivery. AJ/Jen run kickoffs, Tom does technical config. | Tim WS1 W18 |
2. Application Architecture — Components & Interactions
This section describes which software components deliver the capabilities above, and how they interact.
2.1 Component Overview
+------------------------------------------------------------------+
| ORCHESTRATION (n8n) |
| 9 Collector Workflows (various schedules) |
| Pipeline: Preparer (hourly) -> Evaluator (5min) -> |
| Persister (30min) -> Health Monitor (15min) |
| Enrichment V2 Trigger (4h) |
+------+------------------------+-----------------------------------+
| |
v v
+--------------+ +----------------------------------------------+
| Collectors | | Pipeline Processing (Python) |
| | | |
| 9 sources: | | * Entity extraction (Gemini 2.5 Flash) |
| HN, BS, | | * HxWxD V2 scoring (deterministic) |
| GDELT, Wiki | | * Cascade pre-filter (efficiency) |
| GA, GT, | | * LLM evaluation (SOP-driven) |
| Pinterest, | | * Trend persistence + snapshots |
| Tumblr, | | * Intelligence layer (4 components) |
| Substack | | |
| | | Writes to: PostgreSQL |
| Writes to: | | LLM: Gemini 2.5 Flash (primary) |
| PostgreSQL | | Ollama Qwen 2.5 7B (fallback) |
| (raw_signals)| | |
+--------------+ +----------------------------------------------+
|
v
+------------------------------------------------------------------+
| PostgreSQL (Docker on VPS) |
| |
| 11 tables: sources, raw_signals, processed_signals, |
| detected_trends, trend_evidence, trend_snapshots, |
| validation_feedback, pipeline_runs, digest_logs, |
| digest_messages, digest_feedback |
| |
| + trend_families, trend_family_members |
| + evaluation_queue, pipeline_jobs (chunked processing) |
+-----------------------+-------------------------------------------+
|
v
+------------------------------------------------------------------+
| Plotly Dash Dashboard v2 (Python) |
| |
| 5 pages: Ops Home, Pipeline (Sankey), Collectors (UpSet), |
| Signals (HxWxD scatter), Trends (network graph) |
| |
| Design: Inter font, JetBrains Mono for data |
| DB: SQLAlchemy singleton |
| API: /api/seed-lookup, /api/trends, /api/collector-status, |
| /api/pipeline-status, /api/trends/families |
+------------------------------------------------------------------+
2.2 Data Collectors
Role: Ingest signals from 9 data sources into raw_signals table via n8n workflows.
| Collector | Source ID | Type | Schedule | n8n Workflow | Daily Volume | Owner |
|---|---|---|---|---|---|---|
| Hacker News | 1 | API | 10min | Algolia search | ~120 | Tom |
| Bluesky | 2 | API | 15min | AT Protocol | ~1,500+ | Tom |
| GDELT | 3 | API | 30min | GKG API | ~300 | Lori |
| Google Autocomplete | 5 | Unofficial | 1hr | Autocomplete API | ~500+ | AJ |
| Wikipedia | 6 | API | 2x daily | Pageviews API | ~40+ | Tom |
| 7 | API | 6hr | Trends + Seed Term | ~800+ | AJ | |
| Tumblr | 10 | API | Hourly | Trending tags + search | ~3,000+ | Jen |
| Substack | 11 | RSS | Hourly | 41 publications | ~50+ | Jen |
| Trade Press | — | RSS | Daily | RSS consolidation | ~30+ | Lori |
Collector ownership model: Each founder reviews their assigned collectors weekly (Friday), curates seed terms, flags broken collectors. Goal: drop noise from 52.6% to ~28% via 8-week calibration.
What collectors do NOT do:
- Score or classify signals (pipeline’s job)
- Deduplicate across sources (entity resolution’s job)
- Generate intelligence (intelligence layer’s job)
2.3 Pipeline Processing
Role: Transform raw signals into scored, classified, enriched trends. Runs as 4 independent n8n workflows for resilience.
| Workflow | Schedule | Purpose | Duration |
|---|---|---|---|
| Preparer | Hourly | Entity extraction on new raw_signals via Gemini 2.5 Flash | ~5min |
| Evaluator | Every 5min | Process evaluation_queue: H×W×D scoring, cascade pre-filter, LLM evaluation, classification | ~2min |
| Persister | Every 30min | Persist evaluated signals to detected_trends, create/update trend snapshots | ~1min |
| Health Monitor | Every 15min | Pipeline health checks, staleness alerts, error rate monitoring | ~30sec |
| Enrichment V2 Trigger | Every 4hr | 4-layer enrichment: internal analysis → LLM → Urban Dictionary → Google Trends | ~10min |
Processing flow:
raw_signals (collector writes)
-> Entity extraction (Preparer)
-> evaluation_queue (chunked)
-> HxWxD V2 scoring (Evaluator)
-> Cascade pre-filter (skip low-quality)
-> LLM evaluation (SOP-driven, Gemini 2.5 Flash)
-> processed_signals (scored + classified)
-> detected_trends (persisted, deduplicated)
-> trend_snapshots (daily timeseries)
-> Intelligence layer (So What, Now What, narratives)
2.4 Intelligence Layer
Role: Transform scored trends into actionable brand intelligence. This is the core product differentiator — without quality intelligence outputs, trend detection alone is commodity.
4 Production Components
| Component | File | Purpose | Model |
|---|---|---|---|
| NarrativeGenerator | agents/narrative_generator.py | Cultural Briefs: what/why_now/who/so_what/category/brand_safety/confidence + sector So What + lite Now What + report narrative | Gemini 2.5 Flash, temp 0.15 |
| TrendProfiler | data/pipeline/trend_profiler.py | Auto-generates summary, sentiment, key_events, origin per trend | Gemini 2.5 Flash |
| TrendEnricherV2 | data/pipeline/trend_enrichment_v2.py | 4-layer enrichment: internal signals → LLM context → Urban Dictionary → Google Trends | Mixed |
| TrendHistorian | data/analysis/trend_historian.py | Historical baselines: Wikipedia pageviews, GDELT volume, Google Trends | API calls |
Why four separate components (not one)? Each component runs on a different schedule, has different failure modes, and serves different consumers. NarrativeGenerator runs on-demand (latency-sensitive, client-facing). TrendProfiler runs during persistence (batch). TrendEnricherV2 runs every 4 hours (expensive, rate-limited by external APIs). TrendHistorian runs on-demand for case studies (heavy API calls). Consolidating them would couple fast paths to slow paths and make partial failures cascade.
2.5 Dashboard (Plotly Dash v2)
Role: Serve operational and (future) client-facing views of trend intelligence.
Current 5 Pages
| Page | Purpose | Key Visualisation |
|---|---|---|
| Ops Home | Daily operational overview | KPI cards, recent trends, pipeline status |
| Pipeline | Data flow visualisation | Sankey diagram (signals → processing → trends) |
| Collectors | Source health monitoring | UpSet plot (cross-source overlap) + heatmap |
| Signals | Individual signal exploration | H×W×D 3D scatter with lasso select |
| Trends | Trend relationship mapping | dash-cytoscape network graph |
Design system: theme.py — Inter font, JetBrains Mono for data, source-specific colours, classification colours. Stephen Few / Tufte / FT Visual Vocabulary principles.
API Endpoints
| Endpoint | Purpose |
|---|---|
/api/seed-lookup?term=X&days=7 | Look up signals for a specific term |
/api/trends?days=7&classification=strong | List trends by classification |
/api/collector-status | Collector health summary |
/api/pipeline-status | Pipeline processing metrics |
/api/trends/families | Trend family clusters |
2.6 Agent Architecture
Role: Execute SOP-defined logic at scale. One agent per SOP.
Base pattern: agents/base.py — RumblingsAgent class loads SOP markdown, extracts decision criteria into system prompt, executes with structured JSON output, validates output, logs execution with SOP version tracking.
| Agent | SOP Source | Purpose |
|---|---|---|
| TrendEvaluationAgent | sop-trend-evaluation.md | Trend vs. noise classification |
| CredibilityAgent | sop-credibility-assessment.md | Source credibility scoring |
| ThemeClassificationAgent | sop-theme-classification.md | Theme identification |
| ClientRelevanceAgent | sop-client-relevance.md | Client-specific filtering (Phase 2) |
| NarrativeGenerator | (embedded prompts) | Cultural brief generation |
Principles: Low temperature (0.1–0.2), structured JSON output, 100% SOP example pass rate required before deployment, flag uncertain cases for human review.
2.7 Component Interaction Summary
Cloud (Seed Terms, Configuration)
-> n8n Orchestrator (VPS)
-> 9 Collectors (various APIs, RSS feeds)
-> PostgreSQL raw_signals table
-> Pipeline (4 workflows):
Preparer (entity extraction via Gemini)
-> Evaluator (HxWxD V2 scoring + LLM eval)
-> Persister (trend detection + snapshots)
-> Health Monitor (alerting)
-> Enrichment V2 (4-layer context enrichment)
-> Intelligence Layer (briefs, profiles, narratives)
-> Plotly Dash Dashboard (read from PostgreSQL)
-> Team members (browser)
-> API endpoints (curl/programmatic)
-> [Future] Email delivery (manual for pilots, automated Phase 2)
-> [Future] Client Dashboard (scoped views)
3. Data Architecture — Models, Schemas & Flows
This section describes the data entities, their relationships, and how data flows through the system.
3.1 Conceptual Data Model
+--------------+
| Sources |
| (9 active) |
+------+-------+
| collect
v
+--------------+
| Raw Signals |
| (~6K+/day) |
+------+-------+
| process
v
+------------------+
| Processed Signals|
| (HxWxD scored) |
+------+-----------+
| detect
+------------+------------+
v v v
+----------+ +----------+ +--------------+
| Detected | | Trend | | Trend |
| Trends | | Evidence | | Snapshots |
| (~166 | | (links) | | (daily ts) |
| active) | | | | |
+----+-----+ +----------+ +--------------+
|
+--- Trend Families (clustering)
|
+--- Intelligence Layer
| (briefs, profiles, enrichment)
|
+--- Validation Feedback
| (expert corrections)
|
+--- [Future] Client Matching
(relevance scoring per client)
Key relationships:
- A Source produces many Raw Signals via collectors
- A Raw Signal is scored into one Processed Signal (1:1)
- Multiple Processed Signals contribute to one Detected Trend via Trend Evidence (many:many)
- A Detected Trend has daily Trend Snapshots for time-series analysis
- Trend Families group related trends via
trend_family_members(many:many) - Validation Feedback records expert corrections on both signals and trends
- Pipeline Runs track execution metadata for observability
3.2 Physical Data Model — Current Tables
sources (Reference)
9 active sources. Fields: id, name, source_type, tier, poll_frequency, rate_limit, is_active.
raw_signals (Incoming Data)
All collected signals. ~6,000+/day across all sources.
| Column | Type | Purpose |
|---|---|---|
| id | BIGSERIAL | PK |
| source_id | INTEGER FK | Source reference |
| external_id | VARCHAR(255) | Platform-specific ID |
| collected_at | TIMESTAMPTZ | Collection timestamp |
| term | VARCHAR(255) | Primary search/seed term |
| primary_term | VARCHAR(255) | Normalised term |
| keywords | TEXT[] | Extracted keywords |
| title | TEXT | Content title |
| engagement | INTEGER | Platform engagement metric |
| velocity | FLOAT | Rate of change |
| raw_data | JSONB | Full platform-specific payload |
| content_hash | VARCHAR(64) | Dedup hash |
| enrichment_status | VARCHAR(20) | Processing status |
Unique constraint: (source_id, external_id) — prevents duplicate collection.
processed_signals (Scored)
After H×W×D scoring and LLM evaluation.
| Column | Type | Purpose |
|---|---|---|
| height_score | FLOAT | Intensity (0–100) |
| width_score | FLOAT | Breadth (0–100) |
| depth_score | FLOAT | Substance (0–100) |
| composite_score | FLOAT | Combined score |
| classification | VARCHAR(50) | Strong/Emerging/Possible/Noise |
| hwd_components | JSONB | Detailed component breakdown |
| enrichment_data | JSONB | Archetype, alert priority, flags |
| evaluation_details | JSONB | Early detection signals, concern flags |
detected_trends (Confirmed Trends)
~166 active trends. Unique on term.
| Column | Type | Purpose |
|---|---|---|
| term | VARCHAR(255) UNIQUE | Canonical trend name |
| aliases | TEXT[] | Alternative names |
| height/width/depth_score | FLOAT | Current aggregate scores |
| composite_score | FLOAT | Combined score |
| trend_type | VARCHAR(50) | Classification |
| profile | VARCHAR(20) | Shape: spike/flash/swell/wave/undercurrent/seedling/ripple |
| enrichment_data | JSONB | Intelligence layer outputs (nested under enrichment_v2 key) |
| validation_status | VARCHAR(20) | pending/confirmed/rejected/review |
| summary, sentiment, key_events, origin | Various | Auto-generated trend profiles |
Other Tables
trend_snapshots— One row per trend per day. H/W/D/composite scores, signal_count, source_count, profile.trend_evidence— Links processed_signals to detected_trends with contribution_score and is_primary flag.validation_feedback— Expert corrections: entity_type, action (confirm/reject/reclassify), reviewer, notes.pipeline_runs— Execution metadata: signals_fetched, trends_found, classifications breakdown, duration, LLM usage.digest_logs,digest_messages,digest_feedback— Slack digest delivery tracking with reaction/reply collection.evaluation_queue,pipeline_jobs— Chunked processing for the split pipeline architecture.trend_families,trend_family_members— Trend clustering viascripts/run_trend_families.py.
3.3 Data Flow Diagram
SEED TERMS (curated per source, per founder)
|
+-------------------+-------------------+
v v v
HN API Bluesky AT Pinterest API
GDELT GKG Wikipedia PV Tumblr API
Google AC Substack RSS Trade Press RSS
| | |
+-------------------+-------------------+
|
v
+-----------------+
| raw_signals | (~6K+/day)
| (PostgreSQL) |
+--------+--------+
|
+--------v--------+
| Preparer | Entity extraction
| (hourly, n8n) | via Gemini 2.5 Flash
+--------+--------+
|
+--------v--------+
| evaluation_ | Chunked queue
| queue |
+--------+--------+
|
+--------v--------+
| Evaluator | HxWxD V2 scoring
| (5min, n8n) | + Cascade pre-filter
| | + LLM evaluation
+--------+--------+
|
+--------v--------+
| processed_ | Scored + classified
| signals |
+--------+--------+
|
+--------v--------+
| Persister | Trend detection
| (30min, n8n) | + snapshot capture
+--------+--------+
|
+--------v--------+
| detected_ | ~166 active trends
| trends |
+--------+--------+
|
+--------v--------+
| Enrichment V2 | 4-layer context
| (4hr, n8n) | enrichment
+--------+--------+
|
+--------v--------+
| Intelligence | So What + Now What
| Layer | + Narratives + Profiles
+--------+--------+
|
+--------+--------+
v v
Dashboard [Future]
(Plotly Dash) Slack/Email
Delivery
3.4 Extensions (PostgreSQL)
vectorextension — pgvector for future embedding-based similarity searchpg_trgmextension — trigram matching for fuzzy text search
3.5 Future Schema (Phase 2: Multi-Tenancy)
-- Minimum viable client model (May 2026)
clients (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
verticals TEXT[],
preferences JSONB,
is_active BOOLEAN DEFAULT true,
created_at TIMESTAMPTZ DEFAULT NOW()
)
client_terms (
client_id INTEGER REFERENCES clients(id),
term VARCHAR(255),
relevance FLOAT,
added_by VARCHAR(50)
)
client_preferences (
client_id INTEGER REFERENCES clients(id),
notification_channel VARCHAR(20), -- 'slack', 'email'
delivery_schedule VARCHAR(20), -- 'daily', 'weekly'
slack_channel_id VARCHAR(50),
email_recipients TEXT[]
)
4. Infrastructure Architecture — Services, Security & Costs
This section describes the infrastructure, security boundaries, deployment topology, and cost estimates.
4.1 Infrastructure Services
| Service | Role | Config |
|---|---|---|
| Hostinger VPS (KVM 8) | Single server running all services | 72.62.195.132, SSH: rumblings-vps |
| PostgreSQL (Docker) | Primary data store — signals, trends, pipeline metadata | Container: rumblings-postgres |
| n8n (Docker) | Workflow orchestration — collectors, pipeline, health monitoring | Container: rumblings-n8n, UI: n8n.rumblings.io |
| Ollama (Docker) | Local LLM fallback — Qwen 2.5 7B | Container: rumblings-ollama |
| Caddy | Reverse proxy + automatic TLS | Routes to n8n, dashboard, web-static |
| Gemini 2.5 Flash | Primary production LLM (via HTTP from n8n) | Google AI API, temperature 0.15 |
| Google Drive | Knowledge base collaboration (shared drive) | rclone bisync every 15min |
| GitHub | Code repository | tcraw-rumblings/rumblings-code |
| Plotly Dash | Dashboard application | Container: pipeline-api |
4.2 Deployment Topology
+-----------------------------------------------------------+
| Hostinger VPS (72.62.195.132) |
| |
| +----------+ +----------+ +----------+ +----------+ |
| | rumblings| | rumblings| | rumblings| | pipeline | |
| | -n8n | | -postgres| | -ollama | | -api | |
| | (n8n) | | (PG 15) | | (Ollama) | | (Dash) | |
| +----------+ +----------+ +----------+ +----------+ |
| |
| +--------------------------------------------------------+|
| | Caddy (reverse proxy + TLS) ||
| | n8n.rumblings.io -> rumblings-n8n ||
| | dash.rumblings.io -> pipeline-api ||
| | web.rumblings.io -> /opt/rumblings/web-static/ ||
| +--------------------------------------------------------+|
| |
| Code: /home/tom/Rumblings/rumblings-code/ (git) |
| Build: /opt/rumblings/ (Docker context, code dirs |
| at TOP LEVEL: api/, data/, agents/) |
| Docker compose: /opt/rumblings/infra/docker-compose.yml |
+-----------------------------------------------------------+
|
| rclone bisync (15min)
v
+-----------------+
| Google Drive |
| (Shared Drive) |
| Knowledge Base |
+-----------------+
CRITICAL deployment gotcha: Git repo is at /home/tom/Rumblings/rumblings-code/. Docker build context is /opt/rumblings/. Code dirs must be rsynced INDIVIDUALLY. Syncing to /opt/rumblings/code/ does NOT update the build context.
n8n dual-table CRITICAL: n8n stores workflows in workflow_entity AND workflow_history. The engine reads from workflow_history, NOT workflow_entity.nodes. Both tables MUST be updated.
4.3 Security Boundaries
+-----------------------------------------------------------+
| Current: Internal Only (Team of 4) |
| |
| Team members |
| -> SSH to VPS (Tom only) |
| -> n8n UI (basic auth) |
| -> Dash UI (Caddy TLS) |
| -> API endpoints (no auth currently) |
| -> Aria (Claude Code, any team member) |
| |
| No client-facing access yet. |
| No public API access yet. |
+-----------------------------------------------------------+
+-----------------------------------------------------------+
| Phase 2 Target: Client-Facing (May-June) |
| |
| Client browser -> Caddy (TLS) |
| -> Auth middleware (approach TBD) |
| -> Dashboard (client-scoped views) |
| -> PostgreSQL (client-filtered queries) |
| |
| Email -> Client recipients (manual for pilots) |
| Pipeline -> Observability hooks (completion/failure) |
+-----------------------------------------------------------+
4.4 Cost Estimates
| Component | Monthly Cost | Notes |
|---|---|---|
| Hostinger VPS (KVM 8) | ~$25–30 | Single server for everything |
| Gemini 2.5 Flash (LLM API) | ~$15–40 | Entity extraction + evaluation + intelligence. Variable with volume. |
| Google Drive | Free | Shared drive within existing Workspace |
| GitHub | Free | Public repo (private available if needed) |
| Domain + DNS | ~$5 | rumblings.io |
| Squarespace (marketing site) | $27 | Marketing/landing page |
| Anthropic Claude (edge cases) | ~$5–10 | ~10% of LLM calls |
| Total | ~$80–115/month |
Capacity Analysis
Current VPS: Hostinger KVM 8 — 8 vCPU, 16GB RAM, 200GB NVMe SSD.
| Resource | Current Load | At 3 Pilot Clients | At 10 Clients | Bottleneck Threshold |
|---|---|---|---|---|
| CPU | ~15% avg (spikes to 40%) | ~20% avg | ~35% avg | 80% sustained → upgrade VPS tier |
| RAM | ~8GB used (PG 3GB, n8n 2GB, Ollama 2GB, Dash 1GB) | ~9GB | ~11GB | 14GB → drop Ollama or upgrade |
| Disk | ~40GB used (PG 25GB, Docker 10GB, logs 5GB) | ~50GB | ~80GB | 160GB → archive old raw_signals |
| DB connections | ~15 concurrent | ~20 | ~30 | 100 (PG default) → not a concern |
| n8n executions | ~200/hour | ~220/hour | ~250/hour | n8n handles 500+/hour |
| LLM API | ~500 calls/day | ~600/day | ~800/day | Gemini 1500 RPM → nowhere near |
First bottleneck (likely): Disk space. At ~1GB/month with indexes, Postgres hits 100GB by month 8 with 10 clients. Mitigation: archive raw_signals older than 90 days.
Scaling trigger: When any resource sustains >75% for a week, evaluate: (1) vertical scale to KVM 12, or (2) separate Postgres to managed DB. Horizontal scaling only justified at 25+ clients.
4.5 Monitoring & Observability
| What | How | Frequency |
|---|---|---|
| Pipeline health | Health Monitor workflow + pipeline_runs table | Every 15min |
| Collector health | /collector-health skill — signal counts vs baselines | On demand / weekly review |
| Pipeline processing | /pipeline-health skill — 5-stage check | On demand |
| n8n executions | n8n UI execution history | Continuous |
| Docker container health | docker ps, container logs | On demand |
| Disk/memory | VPS monitoring (Hostinger panel) | On demand |
5. H×W×D Scoring Model — Detail
All scoring is deterministic (no LLM). Calibrated against 151 historically-documented trends.
5.1 Height V2 (Intensity)
| Aspect | Detail |
|---|---|
| Per-source metrics | HN velocity, BS total_engagement, Tumblr velocity (fallback: note_count), Wiki spike_ratio (NOT ×100), Pinterest avg_growth_wow, GA dual (Trends velocity + rank inversion) |
| Normalisation | calibrate() builds per-source sorted distributions; _percentile_rank() converts raw → 0–100. Minimum 20 samples. |
| Recency decay | exp(-ln(2)/half_life × age_hours). Half-lives: HN=6h, BS=6h, Tumblr=12h, Wiki=4h, Pinterest=24h, GA=8h |
| Aggregation | Max across sources. No source-count multiplier (Width’s job). No weighted average. |
| Presence sources | GDELT: min(75, count×12.5), Substack: min(75, count×15). Linear, no cliffs. |
5.2 Width V2 (Breadth)
| Aspect | Detail |
|---|---|
| IW (intra-source) | Source-specific diversity metric (e.g., unique authors, engagement spread) |
| XW (cross-source) | Number of independent sources × taper function |
| Profiles | 7 shapes based on H×W×D signature: Spike, Flash, Swell, Wave, Undercurrent, Seedling, Ripple |
| Width gating | W≥40 requires 2+ sources. ~82% single-source terms → W=20 → always Noise. By design. |
5.3 Depth V2 (Substance)
| Component | Points | What It Measures |
|---|---|---|
| Evidence Quality (EQ) | 30 | Quality and diversity of evidence across sources |
| Temporal Dynamics (TD) | 30 | Velocity, acceleration, jerk — momentum indicators |
| External Interest (EI) | 20 | GDELT tone, Google Trends, external validation signals |
| Information Richness (IR) | 20 | Completeness of metadata, narrative quality |
Gating: {4 components: ×1.0, 3: ×0.9, 2: ×0.7, 1: ×0.4} — prevents hollow scores. Single-source can’t clear D≥40.
5.4 Classification Rules
| Classification | Criteria |
|---|---|
| Strong | H≥30 AND W≥40 (requires 2+ sources) |
| Emerging | H≥20 AND W≥30 |
| Possible | H≥10 AND W≥20 |
| Noise | Below Possible thresholds |
6. Intelligence Layer — Detail
6.1 NarrativeGenerator Output Structure
{
"what": "Description of the trend",
"why_now": "Why this is emerging now",
"who": "Who is driving/participating",
"so_what": "Sector-specific implications",
"category": "beauty|fashion|food|tech|lifestyle|...",
"brand_safety": "safe|caution|risky",
"confidence": 0.85,
"now_what_lite": ["Activation suggestion 1", "Activation suggestion 2"],
"report_narrative": "2-3 paragraph trend story..."
}
6.2 Quality Thresholds (Planned)
| Signal Strength | Intelligence Output | Data Minimum |
|---|---|---|
| High confidence | Full brief with quantitative claims | 5+ sources, 50+ signals, 7+ days |
| Medium confidence | Brief with qualitative observations | 2+ sources, 10+ signals, 3+ days |
| Low confidence | Suppress quantitative claims, flag as emerging | 1 source or <10 signals |
| Insufficient | No intelligence output generated | <3 signals total |
6.3 Enrichment V2 Pipeline
Layer 1: Internal signal analysis
-> Signal count, source diversity, temporal pattern, engagement distribution
Layer 2: LLM enrichment (Gemini 2.5 Flash)
-> Cultural context, demographic associations, industry implications
Layer 3: Urban Dictionary
-> Slang/cultural terminology context
Layer 4: Google Trends
-> Search interest baseline, regional distribution, related queries
Enrichment data stored in detected_trends.enrichment_data under enrichment_v2 key using COALESCE merge.
7. Delivery Phases
For detailed build timelines, hour budgets, and weekly task plans, see the Roadmap and Build Plan.
This blueprint is delivered across four phases. Each phase adds product tiers and validates with real clients before building more.
| Phase | Period | Gate | What Ships |
|---|---|---|---|
| Phase 1: Build | Mar–May | M2: Demo Ready (May 31) | V6 intelligence reports, /report skill, multi-tenancy foundation, demo environment |
| Phase 2: Pilot | Jun–Jul | M3: First Pilots (Jun 30) | 2–3 free pilots receiving weekly intelligence, client matching, validation feedback |
| Phase 3: Tier 2 | Aug–Sep | M5: First Revenue (Sep 30) | Content briefs, creator matching, saturation alerts, paid conversion |
| Phase 4: Tier 3 | Oct–Dec | M6: Tier 2 Complete (Nov 30) | API access, trend attribution, trajectory modelling |
Client Value Progression
| Stage | What the Client Sees |
|---|---|
| Pilot launch | Onboarded, seed terms configured, first intelligence delivery |
| Calibrated intelligence | 4 weeks of intelligence, matching tuned to their verticals |
| Full Tier 1 | Weekly intelligence with So What + lite Now What, validated quality |
| Tier 2 upgrade | Client-specific Now What, content briefs, creator matching |
| Tier 3 / API | Programmatic access to trend data for their own systems |
9. Decision Register
Moved to decisions-pending.md as of 2026-03-19. That file is the sole source of truth for all open and resolved decisions.
Summary: 10 resolved decisions (D1–D10), 4 open pre-Phase 2 (D11–D14), 4 open post-Phase 2 (D15–D18), plus pricing and prediction feasibility decisions.
10. Risk Register
Extracted to Lori’s risk register as of 2026-03-19. See planning-system-assessment.md for the risk extract.
10 risks identified at plan creation (Mar 19, 2026):
- Intelligence layer outputs are generic (Critical/High)
- Multi-tenancy scope creep (High/Medium)
- Co-founder availability gaps (Medium/Medium)
- Google Trends re-breaks (Medium/Medium)
- Single VPS failure (High/Low)
- Tom’s time constraint — 2 days/week (High/Certain)
- Noise rate doesn’t drop below 30% (Medium/Medium)
- Pilot clients don’t engage (High/Medium)
- n8n dual-table deployment bugs (Medium/Medium)
- LLM costs escalate (Medium/Low)
11. Known Limitations
| Limitation | Impact | Context |
|---|---|---|
| No Reddit data | Missing ~12% of trend signals | Commercial contract required. 88% coverage validated without it. |
| No Twitter/X data | Missing real-time social pulse | $5K/mo prohibitive. Bluesky partially compensates. |
| Single VPS architecture | No redundancy, single point of failure | Acceptable at <10 clients. Vertical scaling first lever. |
| Hourly batch processing | Signals are up to 1h stale | Real-time not needed for cultural trend detection (weeks-scale phenomena). |
| GDELT engagement always 0 | Depth V2 EI component dormant | GDELT provides volume/tone but no engagement. EI component ready when fixed. |
| No view-through attribution | Can’t track what users saw but didn’t click | Fundamental limitation of signal-based detection vs. ad tracking. |
| L1 identity resolution only | Same entity may appear as different terms | Fuzzy dedup + entity resolution both deployed. L2 probabilistic = Phase 3+. |
| Tom + Tim (2 developers) | Bus factor = 2, Tim is contractor (trial) | Tim Goerner (Augmentra) from W16. WS1 gate at W19. Tom sole deployer to VPS. |
| No client-facing auth | Dashboard/API currently open to anyone with URL | Pre-pilot blocker. Tim WS1 builds minimal auth W18. Sufficient for 2–3 pilots. |
| Google Trends rate-limited | Enrichment pipeline partially blocked | DataForSEO dropped; rate-limiting fix pending W12. |
12. Future Architecture (Phase 2+)
The target architecture extends beyond Phase 1 in these areas:
| Layer | Capability | Phase |
|---|---|---|
| Intelligence | Client-specific “Now What” activation, data sufficiency gating, prompt quality benchmarking, multi-language | 2–3 |
| Delivery | Content briefs (500-word structured), creator matching, saturation alerts, white-label reports | 2–3 |
| API | RESTful API access (rate-limited, API keys), trend attribution, trajectory modelling | 3–4 |
| Platform | Advanced multi-tenancy (RBAC, billing, usage tracking), SSO/OAuth, webhook integrations | 3–4 |
| Data | Additional collectors (Threads, TikTok if feasible), L2 entity resolution, ML trend prediction, pgvector | 3–4 |
| Scale | Horizontal infrastructure, dedicated DB server, CDN, load balancing | 4 (if demand) |
13. Architecture State (as of April 14, 2026)
This table maps every architectural component to its current state. For build sequence, see the Roadmap.
| Layer | Component | State | Notes |
|---|---|---|---|
| Business | Signal collection (10 sources) | Deployed | HN, Bluesky, GDELT, GA, Wikipedia, Pinterest, Tumblr, Substack, Trade Press, YouTube |
| Signal scoring (H×W×D V2) | Deployed | 200+ tests passing. Deterministic classification. | |
| Intelligence layer (So What / Now What / Narrative) | Deployed (V5), V6 in progress | V5 shipped W13. V6 SOPs being wired W16–W17. | |
| Trend classification (7 profiles) | Deployed | Deterministic rules. Validation feedback schema deployed. | |
| Delivery — operational dashboard | Deployed | Plotly Dash v2, 5 pages, dash.rumblings.io | |
| Delivery — client reports (/report skill) | Not built | Phase 1 critical path (W18–W21) | |
| Delivery — email | Not built | Manual for pilots. Automated = Tim WS2 or Phase 2. | |
| Multi-tenancy | In progress | Tim WS1 starting W16. 4 tables + scoped queries. | |
| Client onboarding | Not built | Needs founder workshop. Tim’s onboarding script W18. | |
| Application | n8n orchestration (9 collector WFs + 4 pipeline WFs) | Deployed | Upgraded to 2.13.3 (W13) |
| Python scoring engine | Deployed | H×W×D V2, cascade pre-filter, LLM evaluation | |
| Gemini 2.5 Flash intelligence layer | Deployed | 4 components: briefs, profiles, narratives, enrichment | |
| Plotly Dash v2 dashboard | Deployed | 5 pages: Ops, Pipeline, Collectors, Signals, Trends | |
| Pipeline API (FastAPI) | Deployed | 5+ endpoints, Bearer token auth | |
| /report skill (Claude Code) | Not built | Phase 1 critical path | |
| Validation feedback UI | Not built | Tim WS1 W17–W18 | |
| Minimal auth (API key + URL) | Not built | Tim WS1 W18 | |
| Data | PostgreSQL schema (15+ tables) | Deployed | raw_signals, processed_signals, detected_trends, trend_evidence, trend_snapshots, validation_feedback, pipeline_runs, digest_*, trend_families, evaluation_queue, pipeline_jobs |
| Fuzzy dedup | Deployed | 20% term reduction | |
| Entity resolution | Deployed | Integrated W13 (commit ca61688) | |
| Trend families + clustering | Deployed | trend_families + trend_family_members tables | |
| Enrichment V2 (4-layer) | Deployed | Internal → LLM → Urban Dictionary → Google Trends | |
| Daily trend snapshots | Deployed | Time-series tracking | |
| Multi-tenancy tables | In progress | Tim WS1 W16: clients, client_verticals, client_terms, client_preferences | |
| Calendar events DB | Not built | Phase 1 (#2910) | |
| Client folder structure | Not built | Phase 1 (#2911) | |
| Vertical Intelligence Layer | Not built | Medium priority (#2920) | |
| Infrastructure | VPS (Hostinger, 72.62.195.132) | Deployed | Docker, 4 containers |
| Caddy reverse proxy + TLS | Deployed | dash.rumblings.io, web.rumblings.io | |
| Pipeline API deployment | Deployed | Docker service, port 8001 | |
| Web static hosting | Deployed | Planning docs, reports on web.rumblings.io | |
| Pipeline observability | Not built | Lightweight hooks, incremental (#2933) | |
| Parallel | Social Signal Validation Research | In progress | Plan written, co-founder gate pending |
| Legal docs (ToS, privacy, data agreement) | Not built | Phase 1 (#2928) |
14. Key Metrics (Current)
| Metric | Current Value | Target | Timeline |
|---|---|---|---|
| Noise rate | 52.6% | <30% | 8-week calibration cycle |
| Active collectors | 8/9 healthy | 9/9 | GT rate-limit fix pending |
| Intelligence layer components | 4/4 built + deployed | Quality-reviewed by Jen/AJ | April |
| Case studies | 5 candidates, 3 assigned | 3+ complete | May 31 |
| Signals/day | ~6,000+ | Stable | — |
| Active trends | ~166 | Growing with quality | — |
| H×W×D V2 tests passing | 176 | 100% | — |
| Lead time (detection before mainstream) | 2–12 weeks (estimated) | Validated via case studies | May 31 |
| Pipeline processing | 4 workflows, all running | Healthy | — |
| Monthly infrastructure cost | ~$80–115 | <$150 until 10+ clients | — |
15. Team & Responsibilities
| Person | Role | Rumblings Days | Current Focus |
|---|---|---|---|
| Tom Crawford | Chief of AI, technical lead | 2 days/week | V6 SOP wiring, /report skill, social research, Tim management |
| Tim Goerner | Contractor (Augmentra) | 2 days/week (W16+) | Multi-tenancy (WS1), report infrastructure (WS2 conditional) |
| Jen Ringland | Chief of Product & Impact | Part-time | V6 SOP quality review, Vertical Lens SOP, Tumblr/Substack |
| AJ Jones | Chief of Brand, Experience & Partnerships | Part-time | Pilot client identification, Pinterest/GA collectors, demo prep |
| Lori Susko | Chief of Operations | Part-time | GDELT/Trade Press collectors, legal, V6 SOP 02 session |
Architecture Blueprint created 2026-03-19 as “Implementation Plan v1”. Restructured 2026-04-14 to separate architecture (this doc) from sequencing (Roadmap) and tactical build plan (Build Plan). TOGAF architecture domain structure retained.