Rumblings — Architecture Blueprint

AI-Powered Cultural Trend Detection Platform
Company: Rumblings Pty Ltd
Technical Lead: Tom Crawford (2 days/week)
Contractor: Tim Goerner (Augmentra, 2 days/week from W16)
Chief of Product: Jen Ringland
Brand/Sales: AJ Jones
Operations: Lori Susko
Current Phase: Phase 1 Build (March–May), Week 16
Last updated: 2026-04-14
15
Sections
4
TOGAF Layers
9+
Data Sources
~166
Active Trends

This document describes WHAT we’re building — the target architecture.

For WHEN we’re building it, see the Roadmap (roadmap-2026.md).

For HOW we’re building it this phase, see the Build Plan (build-plan-phase-1.md).

Architecture Timeline — Where We Are

A consolidated view of every capability stream across all TOGAF architecture layers, from project inception (January 2026) through Phase 2 kickoff (July). The red line marks today — Week 16, 14 April 2026.

Complete
In Progress
Not Started
Conditional (gate/decision pending)
NOW — W16, Apr 14
1. Business Architecture (Capabilities)
Stream Jan Feb Mar Apr
14
Apr May Jun Jul
Signal Collection (9 sources)
Signal Scoring (HxWxD V2)
Intelligence Layer (V5/V6)
Trend Classification (rules)
Delivery — Ops Dashboard
Delivery — Client Reports
Delivery — Email (pilots)
Multi-Tenancy (client isolation)
Client Onboarding
2. Application Architecture (Components)
Stream Jan Feb Mar Apr
14
Apr May Jun Jul
n8n Orchestration
Python Scoring Engine
Gemini Intelligence Layer
Plotly Dash Dashboard v2
Pipeline API (FastAPI)
/report Skill (Claude Code)
Validation Feedback UI (Tim)
Minimal Auth (Tim)
Report Template Engine (Tim)
3. Data Architecture
Stream Jan Feb Mar Apr
14
Apr May Jun Jul
PostgreSQL Schema (11+ tables)
Collector Data Flows (9 sources)
Fuzzy Dedup + Entity Resolution
Trend Families + Clustering
Multi-tenancy Tables (Tim)
Calendar Events DB
Client Folder Structure
Vertical Intelligence Layer
4. Technology / Infrastructure Architecture
Stream Jan Feb Mar Apr
14
Apr May Jun Jul
VPS + Docker Setup
n8n Deployment + Upgrades
Caddy Reverse Proxy + TLS
Pipeline API Deployment
Dashboard (dash.rumblings.io)
Web Static (web.rumblings.io)
Pipeline Observability Hooks
Email Provider Setup (Tim)
5. Parallel Tracks
Stream Jan Feb Mar Apr
14
Apr May Jun Jul
Social Signal Validation Research
Legal Docs (ToS, Privacy, DPA)
Demo Environment + Script
Pilot Prep Homework Process
Reading the chart: Most of the foundation (collection, scoring, classification, dashboard, infrastructure) is complete. Active work is concentrated on V6 intelligence quality, Tim’s multi-tenancy build (W16+), and preparing the /report skill and client onboarding for May pilots. Grey bars to the right of the red line represent the remaining Phase 1 deliverables needed before first client pilots in June.

Executive Summary

Rumblings is a cultural trend detection system that identifies emerging trends 2–12 weeks before mainstream media coverage. It ingests signals from 9+ data sources, scores them using a proprietary Height × Width × Depth model, classifies trends through deterministic rules, and generates intelligence layer outputs (So What context, Now What activation suggestions, narrative stories) that transform raw signal data into actionable brand intelligence.

Architecture in one sentence: n8n orchestrates 9+ data collectors feeding PostgreSQL, a Python scoring engine (H×W×D V2) classifies signals via deterministic rules, a Gemini 2.5 Flash intelligence layer generates cultural briefs with sector-specific context, and a Plotly Dash dashboard serves operational and client-facing views — all running on a single Hostinger VPS with Docker.

The strategic bet: Cultural trend detection is commodity. Intelligence layer quality (sector-specific “So What” + actionable “Now What” + narrative stories) is the differentiator. If the LLM outputs are generic, nothing else matters.

Target product tiers:

TierWhat the Client GetsPhase
Tier 1Weekly trend intelligence reports: So What (sector context) + lite Now What (vertical-level activation) + narrative storiesPhase 1–2
Tier 2+ Content briefs, client-specific Now What activation, creator matching, saturation alertsPhase 3
Tier 3+ API access, trend attribution, trajectory modellingPhase 4

1. Business Architecture — Capabilities Required

This section describes what the system must do, independent of technology choices.

1.1 Signal Collection Capability

The system must continuously ingest signals from diverse data sources to achieve multi-signal triangulation — confirming trends across independent platforms rather than relying on any single source.

CapabilityDescriptionStatus
Multi-source ingestionCollect from 9+ independent data sources (social, news, search, cultural platforms) on automated schedulesDeployed
Seed term managementCurated seed term lists per source, managed by founder-owners with weekly review cadenceDeployed
Source-specific parsingEach collector normalises platform-specific data into a common signal schema (term, engagement, velocity, raw_data JSONB)Deployed
DeduplicationContent-hash dedup within sources; fuzzy dedup across sources (Levenshtein + token-based)Deployed
Entity resolutionSame entity appearing as different terms across sources resolved into canonical formBuilt, integration pending
Rate limit resilienceCollectors handle rate limits gracefully — backoff, retry, partial collection rather than failurePartial (GT pending)
ToS complianceAll data collection complies with source Terms of Service — no scraping, no prohibited useUnder review (W12, Grace)

1.2 Signal Scoring Capability

The system must score every signal across three independent dimensions to enable reliable trend classification without LLM dependency.

CapabilityDescriptionStatus
Height scoring (intensity)Per-source metric extraction, percentile normalisation against calibrated distributions, recency-weighted decay, max-aggregated across sources. No source-count multiplier (Width’s job).Deployed (V2)
Width scoring (breadth)IW (intra-source diversity) + XW (cross-source spread) with taper. 7 profiles: Spike, Flash, Swell, Wave, Undercurrent, Seedling, Ripple.Deployed (V2)
Depth scoring (substance)4 components: Evidence Quality (30pts), Temporal Dynamics (30pts), External Interest (20pts), Information Richness (20pts). Gating prevents hollow scores.Deployed (V2)
Composite scoringH×W×D combined into composite score with deterministic classification: Strong/Emerging/Possible/NoiseDeployed
Daily snapshotsTrend scores captured daily for time-series analysis and trajectory trackingDeployed
Calibration151 historically-documented trends provide ground truth for scoring model calibrationComplete

1.3 Intelligence Layer Capability

The system must generate natural-language intelligence that transforms raw trend data into actionable brand insights. This is the core product differentiator.

CapabilityDescriptionStatus
Cultural Brief generationEvery scored trend gets a structured brief: what/why_now/who/so_what/category/brand_safety/confidenceDeployed
Sector-specific “So What”Context tailored by sector (beauty, fashion, F&B, tech, etc.) — not generic “this trend is growing”Deployed
Lite “Now What” (Tier 1)2–4 actionable activation suggestions at vertical level (not client-specific — that’s Tier 2)Deployed
Report Narrative2–3 paragraph trend story suitable for weekly intelligence reportsDeployed
Trend ProfilingAuto-generated summary, sentiment, key_events, origin for each detected trendDeployed
Trend Enrichment4-layer: internal signal analysis → LLM enrichment → Urban Dictionary → Google TrendsDeployed
Historical contextWikipedia pageviews, GDELT volume, Google Trends baselines via TrendHistorianDeployed
Data sufficiency gatingBelow thresholds, intelligence outputs switch to qualitative observations or suppress quantitative claimsNot started
Client-specific “Now What” (Tier 2)“YOUR brand should do X because of your positioning” — requires client matchingPhase 2

1.4 Trend Classification Capability

The system must classify detected trends using deterministic rules (not LLM), with human validation feedback loop.

CapabilityDescriptionStatus
Deterministic classificationRules-based: Strong (H≥30, W≥40), Emerging, Possible, Noise. No LLM in classification path.Deployed
Width gatingW≥40 requires 2+ sources. ~82% of terms are single-source (W=20) → always Noise. By design.Deployed
Profile assignment7 trend profiles based on H×W×D shape (Spike, Flash, Swell, Wave, Undercurrent, Seedling, Ripple)Deployed
Validation feedback loopExpert corrections (confirm/reject/reclassify) stored in validation_feedback tableSchema deployed, UI pending
Trend familiesClustering related trends into families via trend_families + trend_family_members tablesDeployed

1.5 Delivery Capability

The system must deliver trend intelligence to clients through multiple channels.

CapabilityDescriptionStatus
Operational dashboardPlotly Dash v2: Sankey pipeline view, UpSet collector view, H×W×D scatter, network graphDeployed
Client-facing dashboardClient-scoped views showing only relevant trends with intelligence layer outputsNot started (May)
Slack ConnectTrend intelligence delivered to client Slack channelsDeferred to Phase 2
Email reportsFormatted weekly intelligence emails with trend summariesManual for pilots
API access (Tier 3)RESTful API for programmatic trend data accessPhase 4
Content briefs (Tier 2)500-word structured briefs: angle, audience, key messages, format, timing, brand safetyPhase 3

1.6 Multi-Tenancy Capability

The system must support multiple clients with isolated views of relevant trends.

CapabilityDescriptionStatus
Client data modelMinimum viable: clients, client_verticals, client_terms, client_preferences tablesTim WS1 starting W16 (Apr 14)
Client-scoped queriesEvery intelligence query becomes client-aware — clients see only trends relevant to themTim WS1 W17
Client matchingLLM-based relevance scoring: given client profile + trend, how relevant?Phase 2
AuthenticationMinimal auth (API key + unique URL), not full login. Approach simplified from original scope.Tim WS1 W18
Client onboardingDiscovery → seed terms → configuration → first delivery. AJ/Jen run kickoffs, Tom does technical config.Tim WS1 W18

2. Application Architecture — Components & Interactions

This section describes which software components deliver the capabilities above, and how they interact.

2.1 Component Overview

+------------------------------------------------------------------+
|                    ORCHESTRATION (n8n)                             |
|  9 Collector Workflows (various schedules)                        |
|  Pipeline: Preparer (hourly) -> Evaluator (5min) ->               |
|            Persister (30min) -> Health Monitor (15min)             |
|  Enrichment V2 Trigger (4h)                                       |
+------+------------------------+-----------------------------------+
       |                        |
       v                        v
+--------------+    +----------------------------------------------+
|  Collectors  |    |  Pipeline Processing (Python)                |
|              |    |                                               |
|  9 sources:  |    |  * Entity extraction (Gemini 2.5 Flash)      |
|  HN, BS,     |    |  * HxWxD V2 scoring (deterministic)          |
|  GDELT, Wiki |    |  * Cascade pre-filter (efficiency)           |
|  GA, GT,     |    |  * LLM evaluation (SOP-driven)               |
|  Pinterest,  |    |  * Trend persistence + snapshots             |
|  Tumblr,     |    |  * Intelligence layer (4 components)         |
|  Substack    |    |                                               |
|              |    |  Writes to: PostgreSQL                       |
|  Writes to:  |    |  LLM: Gemini 2.5 Flash (primary)             |
|  PostgreSQL  |    |       Ollama Qwen 2.5 7B (fallback)          |
|  (raw_signals)|   |                                               |
+--------------+    +----------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  PostgreSQL (Docker on VPS)                                       |
|                                                                   |
|  11 tables: sources, raw_signals, processed_signals,              |
|  detected_trends, trend_evidence, trend_snapshots,                |
|  validation_feedback, pipeline_runs, digest_logs,                 |
|  digest_messages, digest_feedback                                 |
|                                                                   |
|  + trend_families, trend_family_members                           |
|  + evaluation_queue, pipeline_jobs (chunked processing)           |
+-----------------------+-------------------------------------------+
                        |
                        v
+------------------------------------------------------------------+
|  Plotly Dash Dashboard v2 (Python)                                |
|                                                                   |
|  5 pages: Ops Home, Pipeline (Sankey), Collectors (UpSet),        |
|  Signals (HxWxD scatter), Trends (network graph)                  |
|                                                                   |
|  Design: Inter font, JetBrains Mono for data                     |
|  DB: SQLAlchemy singleton                                         |
|  API: /api/seed-lookup, /api/trends, /api/collector-status,       |
|       /api/pipeline-status, /api/trends/families                  |
+------------------------------------------------------------------+

2.2 Data Collectors

Role: Ingest signals from 9 data sources into raw_signals table via n8n workflows.

CollectorSource IDTypeSchedulen8n WorkflowDaily VolumeOwner
Hacker News1API10minAlgolia search~120Tom
Bluesky2API15minAT Protocol~1,500+Tom
GDELT3API30minGKG API~300Lori
Google Autocomplete5Unofficial1hrAutocomplete API~500+AJ
Wikipedia6API2x dailyPageviews API~40+Tom
Pinterest7API6hrTrends + Seed Term~800+AJ
Tumblr10APIHourlyTrending tags + search~3,000+Jen
Substack11RSSHourly41 publications~50+Jen
Trade PressRSSDailyRSS consolidation~30+Lori

Collector ownership model: Each founder reviews their assigned collectors weekly (Friday), curates seed terms, flags broken collectors. Goal: drop noise from 52.6% to ~28% via 8-week calibration.

What collectors do NOT do:

2.3 Pipeline Processing

Role: Transform raw signals into scored, classified, enriched trends. Runs as 4 independent n8n workflows for resilience.

WorkflowSchedulePurposeDuration
PreparerHourlyEntity extraction on new raw_signals via Gemini 2.5 Flash~5min
EvaluatorEvery 5minProcess evaluation_queue: H×W×D scoring, cascade pre-filter, LLM evaluation, classification~2min
PersisterEvery 30minPersist evaluated signals to detected_trends, create/update trend snapshots~1min
Health MonitorEvery 15minPipeline health checks, staleness alerts, error rate monitoring~30sec
Enrichment V2 TriggerEvery 4hr4-layer enrichment: internal analysis → LLM → Urban Dictionary → Google Trends~10min

Processing flow:

raw_signals (collector writes)
  -> Entity extraction (Preparer)
    -> evaluation_queue (chunked)
      -> HxWxD V2 scoring (Evaluator)
        -> Cascade pre-filter (skip low-quality)
          -> LLM evaluation (SOP-driven, Gemini 2.5 Flash)
            -> processed_signals (scored + classified)
              -> detected_trends (persisted, deduplicated)
                -> trend_snapshots (daily timeseries)
                  -> Intelligence layer (So What, Now What, narratives)

2.4 Intelligence Layer

Role: Transform scored trends into actionable brand intelligence. This is the core product differentiator — without quality intelligence outputs, trend detection alone is commodity.

4 Production Components

ComponentFilePurposeModel
NarrativeGeneratoragents/narrative_generator.pyCultural Briefs: what/why_now/who/so_what/category/brand_safety/confidence + sector So What + lite Now What + report narrativeGemini 2.5 Flash, temp 0.15
TrendProfilerdata/pipeline/trend_profiler.pyAuto-generates summary, sentiment, key_events, origin per trendGemini 2.5 Flash
TrendEnricherV2data/pipeline/trend_enrichment_v2.py4-layer enrichment: internal signals → LLM context → Urban Dictionary → Google TrendsMixed
TrendHistoriandata/analysis/trend_historian.pyHistorical baselines: Wikipedia pageviews, GDELT volume, Google TrendsAPI calls
Quality is the #1 risk. If outputs are generic (“This trend is growing and brands should pay attention”), the product fails. April is dedicated to Jen/AJ quality review loops and prompt iteration.

Why four separate components (not one)? Each component runs on a different schedule, has different failure modes, and serves different consumers. NarrativeGenerator runs on-demand (latency-sensitive, client-facing). TrendProfiler runs during persistence (batch). TrendEnricherV2 runs every 4 hours (expensive, rate-limited by external APIs). TrendHistorian runs on-demand for case studies (heavy API calls). Consolidating them would couple fast paths to slow paths and make partial failures cascade.

2.5 Dashboard (Plotly Dash v2)

Role: Serve operational and (future) client-facing views of trend intelligence.

Current 5 Pages

PagePurposeKey Visualisation
Ops HomeDaily operational overviewKPI cards, recent trends, pipeline status
PipelineData flow visualisationSankey diagram (signals → processing → trends)
CollectorsSource health monitoringUpSet plot (cross-source overlap) + heatmap
SignalsIndividual signal explorationH×W×D 3D scatter with lasso select
TrendsTrend relationship mappingdash-cytoscape network graph

Design system: theme.py — Inter font, JetBrains Mono for data, source-specific colours, classification colours. Stephen Few / Tufte / FT Visual Vocabulary principles.

API Endpoints

EndpointPurpose
/api/seed-lookup?term=X&days=7Look up signals for a specific term
/api/trends?days=7&classification=strongList trends by classification
/api/collector-statusCollector health summary
/api/pipeline-statusPipeline processing metrics
/api/trends/familiesTrend family clusters

2.6 Agent Architecture

Role: Execute SOP-defined logic at scale. One agent per SOP.

Base pattern: agents/base.pyRumblingsAgent class loads SOP markdown, extracts decision criteria into system prompt, executes with structured JSON output, validates output, logs execution with SOP version tracking.

AgentSOP SourcePurpose
TrendEvaluationAgentsop-trend-evaluation.mdTrend vs. noise classification
CredibilityAgentsop-credibility-assessment.mdSource credibility scoring
ThemeClassificationAgentsop-theme-classification.mdTheme identification
ClientRelevanceAgentsop-client-relevance.mdClient-specific filtering (Phase 2)
NarrativeGenerator(embedded prompts)Cultural brief generation

Principles: Low temperature (0.1–0.2), structured JSON output, 100% SOP example pass rate required before deployment, flag uncertain cases for human review.

2.7 Component Interaction Summary

Cloud (Seed Terms, Configuration)
  -> n8n Orchestrator (VPS)
    -> 9 Collectors (various APIs, RSS feeds)
      -> PostgreSQL raw_signals table
    -> Pipeline (4 workflows):
        Preparer (entity extraction via Gemini)
        -> Evaluator (HxWxD V2 scoring + LLM eval)
        -> Persister (trend detection + snapshots)
        -> Health Monitor (alerting)
    -> Enrichment V2 (4-layer context enrichment)
    -> Intelligence Layer (briefs, profiles, narratives)
  -> Plotly Dash Dashboard (read from PostgreSQL)
    -> Team members (browser)
    -> API endpoints (curl/programmatic)
  -> [Future] Email delivery (manual for pilots, automated Phase 2)
  -> [Future] Client Dashboard (scoped views)

3. Data Architecture — Models, Schemas & Flows

This section describes the data entities, their relationships, and how data flows through the system.

3.1 Conceptual Data Model

                    +--------------+
                    |   Sources    |
                    |  (9 active)  |
                    +------+-------+
                           | collect
                           v
                    +--------------+
                    | Raw Signals  |
                    | (~6K+/day)   |
                    +------+-------+
                           | process
                           v
                    +------------------+
                    | Processed Signals|
                    | (HxWxD scored)   |
                    +------+-----------+
                           | detect
              +------------+------------+
              v            v            v
        +----------+ +----------+ +--------------+
        | Detected | |  Trend   | |    Trend     |
        | Trends   | | Evidence | |  Snapshots   |
        | (~166    | | (links)  | |  (daily ts)  |
        |  active) | |          | |              |
        +----+-----+ +----------+ +--------------+
             |
             +--- Trend Families (clustering)
             |
             +--- Intelligence Layer
             |    (briefs, profiles, enrichment)
             |
             +--- Validation Feedback
             |    (expert corrections)
             |
             +--- [Future] Client Matching
                  (relevance scoring per client)

Key relationships:

3.2 Physical Data Model — Current Tables

sources (Reference)

9 active sources. Fields: id, name, source_type, tier, poll_frequency, rate_limit, is_active.

raw_signals (Incoming Data)

All collected signals. ~6,000+/day across all sources.

ColumnTypePurpose
idBIGSERIALPK
source_idINTEGER FKSource reference
external_idVARCHAR(255)Platform-specific ID
collected_atTIMESTAMPTZCollection timestamp
termVARCHAR(255)Primary search/seed term
primary_termVARCHAR(255)Normalised term
keywordsTEXT[]Extracted keywords
titleTEXTContent title
engagementINTEGERPlatform engagement metric
velocityFLOATRate of change
raw_dataJSONBFull platform-specific payload
content_hashVARCHAR(64)Dedup hash
enrichment_statusVARCHAR(20)Processing status

Unique constraint: (source_id, external_id) — prevents duplicate collection.

processed_signals (Scored)

After H×W×D scoring and LLM evaluation.

ColumnTypePurpose
height_scoreFLOATIntensity (0–100)
width_scoreFLOATBreadth (0–100)
depth_scoreFLOATSubstance (0–100)
composite_scoreFLOATCombined score
classificationVARCHAR(50)Strong/Emerging/Possible/Noise
hwd_componentsJSONBDetailed component breakdown
enrichment_dataJSONBArchetype, alert priority, flags
evaluation_detailsJSONBEarly detection signals, concern flags

detected_trends (Confirmed Trends)

~166 active trends. Unique on term.

ColumnTypePurpose
termVARCHAR(255) UNIQUECanonical trend name
aliasesTEXT[]Alternative names
height/width/depth_scoreFLOATCurrent aggregate scores
composite_scoreFLOATCombined score
trend_typeVARCHAR(50)Classification
profileVARCHAR(20)Shape: spike/flash/swell/wave/undercurrent/seedling/ripple
enrichment_dataJSONBIntelligence layer outputs (nested under enrichment_v2 key)
validation_statusVARCHAR(20)pending/confirmed/rejected/review
summary, sentiment, key_events, originVariousAuto-generated trend profiles

Other Tables

3.3 Data Flow Diagram

         SEED TERMS (curated per source, per founder)
                        |
    +-------------------+-------------------+
    v                   v                   v
 HN API           Bluesky AT          Pinterest API
 GDELT GKG        Wikipedia PV        Tumblr API
 Google AC         Substack RSS        Trade Press RSS
    |                   |                   |
    +-------------------+-------------------+
                        |
                        v
              +-----------------+
              |  raw_signals    |  (~6K+/day)
              |  (PostgreSQL)   |
              +--------+--------+
                       |
              +--------v--------+
              |  Preparer       |  Entity extraction
              |  (hourly, n8n)  |  via Gemini 2.5 Flash
              +--------+--------+
                       |
              +--------v--------+
              |  evaluation_    |  Chunked queue
              |  queue          |
              +--------+--------+
                       |
              +--------v--------+
              |  Evaluator      |  HxWxD V2 scoring
              |  (5min, n8n)    |  + Cascade pre-filter
              |                 |  + LLM evaluation
              +--------+--------+
                       |
              +--------v--------+
              |  processed_     |  Scored + classified
              |  signals        |
              +--------+--------+
                       |
              +--------v--------+
              |  Persister      |  Trend detection
              |  (30min, n8n)   |  + snapshot capture
              +--------+--------+
                       |
              +--------v--------+
              |  detected_      |  ~166 active trends
              |  trends         |
              +--------+--------+
                       |
              +--------v--------+
              |  Enrichment V2  |  4-layer context
              |  (4hr, n8n)     |  enrichment
              +--------+--------+
                       |
              +--------v--------+
              |  Intelligence   |  So What + Now What
              |  Layer          |  + Narratives + Profiles
              +--------+--------+
                       |
              +--------+--------+
              v                 v
         Dashboard         [Future]
         (Plotly Dash)     Slack/Email
                           Delivery

3.4 Extensions (PostgreSQL)

3.5 Future Schema (Phase 2: Multi-Tenancy)

-- Minimum viable client model (May 2026)
clients (
  id              SERIAL PRIMARY KEY,
  name            VARCHAR(255) NOT NULL,
  verticals       TEXT[],
  preferences     JSONB,
  is_active       BOOLEAN DEFAULT true,
  created_at      TIMESTAMPTZ DEFAULT NOW()
)

client_terms (
  client_id       INTEGER REFERENCES clients(id),
  term            VARCHAR(255),
  relevance       FLOAT,
  added_by        VARCHAR(50)
)

client_preferences (
  client_id       INTEGER REFERENCES clients(id),
  notification_channel VARCHAR(20),  -- 'slack', 'email'
  delivery_schedule    VARCHAR(20),  -- 'daily', 'weekly'
  slack_channel_id     VARCHAR(50),
  email_recipients     TEXT[]
)
Scope discipline is critical. No RBAC, no billing integration, no usage tracking. Just “who is this client and what do they care about?”

4. Infrastructure Architecture — Services, Security & Costs

This section describes the infrastructure, security boundaries, deployment topology, and cost estimates.

4.1 Infrastructure Services

ServiceRoleConfig
Hostinger VPS (KVM 8)Single server running all services72.62.195.132, SSH: rumblings-vps
PostgreSQL (Docker)Primary data store — signals, trends, pipeline metadataContainer: rumblings-postgres
n8n (Docker)Workflow orchestration — collectors, pipeline, health monitoringContainer: rumblings-n8n, UI: n8n.rumblings.io
Ollama (Docker)Local LLM fallback — Qwen 2.5 7BContainer: rumblings-ollama
CaddyReverse proxy + automatic TLSRoutes to n8n, dashboard, web-static
Gemini 2.5 FlashPrimary production LLM (via HTTP from n8n)Google AI API, temperature 0.15
Google DriveKnowledge base collaboration (shared drive)rclone bisync every 15min
GitHubCode repositorytcraw-rumblings/rumblings-code
Plotly DashDashboard applicationContainer: pipeline-api

4.2 Deployment Topology

+-----------------------------------------------------------+
|  Hostinger VPS (72.62.195.132)                             |
|                                                            |
|  +----------+  +----------+  +----------+  +----------+   |
|  | rumblings|  | rumblings|  | rumblings|  | pipeline |   |
|  | -n8n     |  | -postgres|  | -ollama  |  | -api     |   |
|  | (n8n)    |  | (PG 15)  |  | (Ollama) |  | (Dash)   |   |
|  +----------+  +----------+  +----------+  +----------+   |
|                                                            |
|  +--------------------------------------------------------+|
|  | Caddy (reverse proxy + TLS)                            ||
|  |  n8n.rumblings.io   -> rumblings-n8n                   ||
|  |  dash.rumblings.io  -> pipeline-api                    ||
|  |  web.rumblings.io   -> /opt/rumblings/web-static/      ||
|  +--------------------------------------------------------+|
|                                                            |
|  Code: /home/tom/Rumblings/rumblings-code/ (git)          |
|  Build: /opt/rumblings/ (Docker context, code dirs        |
|         at TOP LEVEL: api/, data/, agents/)               |
|  Docker compose: /opt/rumblings/infra/docker-compose.yml  |
+-----------------------------------------------------------+
         |
         | rclone bisync (15min)
         v
+-----------------+
| Google Drive     |
| (Shared Drive)   |
| Knowledge Base   |
+-----------------+

CRITICAL deployment gotcha: Git repo is at /home/tom/Rumblings/rumblings-code/. Docker build context is /opt/rumblings/. Code dirs must be rsynced INDIVIDUALLY. Syncing to /opt/rumblings/code/ does NOT update the build context.

n8n dual-table CRITICAL: n8n stores workflows in workflow_entity AND workflow_history. The engine reads from workflow_history, NOT workflow_entity.nodes. Both tables MUST be updated.

4.3 Security Boundaries

+-----------------------------------------------------------+
|  Current: Internal Only (Team of 4)                        |
|                                                            |
|  Team members                                              |
|    -> SSH to VPS (Tom only)                                |
|    -> n8n UI (basic auth)                                  |
|    -> Dash UI (Caddy TLS)                                  |
|    -> API endpoints (no auth currently)                    |
|    -> Aria (Claude Code, any team member)                  |
|                                                            |
|  No client-facing access yet.                              |
|  No public API access yet.                                 |
+-----------------------------------------------------------+

+-----------------------------------------------------------+
|  Phase 2 Target: Client-Facing (May-June)                  |
|                                                            |
|  Client browser -> Caddy (TLS)                             |
|    -> Auth middleware (approach TBD)                        |
|      -> Dashboard (client-scoped views)                    |
|        -> PostgreSQL (client-filtered queries)             |
|                                                            |
|  Email -> Client recipients (manual for pilots)            |
|  Pipeline -> Observability hooks (completion/failure)      |
+-----------------------------------------------------------+

4.4 Cost Estimates

ComponentMonthly CostNotes
Hostinger VPS (KVM 8)~$25–30Single server for everything
Gemini 2.5 Flash (LLM API)~$15–40Entity extraction + evaluation + intelligence. Variable with volume.
Google DriveFreeShared drive within existing Workspace
GitHubFreePublic repo (private available if needed)
Domain + DNS~$5rumblings.io
Squarespace (marketing site)$27Marketing/landing page
Anthropic Claude (edge cases)~$5–10~10% of LLM calls
Total~$80–115/month

Capacity Analysis

Current VPS: Hostinger KVM 8 — 8 vCPU, 16GB RAM, 200GB NVMe SSD.

ResourceCurrent LoadAt 3 Pilot ClientsAt 10 ClientsBottleneck Threshold
CPU~15% avg (spikes to 40%)~20% avg~35% avg80% sustained → upgrade VPS tier
RAM~8GB used (PG 3GB, n8n 2GB, Ollama 2GB, Dash 1GB)~9GB~11GB14GB → drop Ollama or upgrade
Disk~40GB used (PG 25GB, Docker 10GB, logs 5GB)~50GB~80GB160GB → archive old raw_signals
DB connections~15 concurrent~20~30100 (PG default) → not a concern
n8n executions~200/hour~220/hour~250/hourn8n handles 500+/hour
LLM API~500 calls/day~600/day~800/dayGemini 1500 RPM → nowhere near

First bottleneck (likely): Disk space. At ~1GB/month with indexes, Postgres hits 100GB by month 8 with 10 clients. Mitigation: archive raw_signals older than 90 days.

Scaling trigger: When any resource sustains >75% for a week, evaluate: (1) vertical scale to KVM 12, or (2) separate Postgres to managed DB. Horizontal scaling only justified at 25+ clients.

4.5 Monitoring & Observability

WhatHowFrequency
Pipeline healthHealth Monitor workflow + pipeline_runs tableEvery 15min
Collector health/collector-health skill — signal counts vs baselinesOn demand / weekly review
Pipeline processing/pipeline-health skill — 5-stage checkOn demand
n8n executionsn8n UI execution historyContinuous
Docker container healthdocker ps, container logsOn demand
Disk/memoryVPS monitoring (Hostinger panel)On demand

5. H×W×D Scoring Model — Detail

All scoring is deterministic (no LLM). Calibrated against 151 historically-documented trends.

5.1 Height V2 (Intensity)

AspectDetail
Per-source metricsHN velocity, BS total_engagement, Tumblr velocity (fallback: note_count), Wiki spike_ratio (NOT ×100), Pinterest avg_growth_wow, GA dual (Trends velocity + rank inversion)
Normalisationcalibrate() builds per-source sorted distributions; _percentile_rank() converts raw → 0–100. Minimum 20 samples.
Recency decayexp(-ln(2)/half_life × age_hours). Half-lives: HN=6h, BS=6h, Tumblr=12h, Wiki=4h, Pinterest=24h, GA=8h
AggregationMax across sources. No source-count multiplier (Width’s job). No weighted average.
Presence sourcesGDELT: min(75, count×12.5), Substack: min(75, count×15). Linear, no cliffs.

5.2 Width V2 (Breadth)

AspectDetail
IW (intra-source)Source-specific diversity metric (e.g., unique authors, engagement spread)
XW (cross-source)Number of independent sources × taper function
Profiles7 shapes based on H×W×D signature: Spike, Flash, Swell, Wave, Undercurrent, Seedling, Ripple
Width gatingW≥40 requires 2+ sources. ~82% single-source terms → W=20 → always Noise. By design.

5.3 Depth V2 (Substance)

ComponentPointsWhat It Measures
Evidence Quality (EQ)30Quality and diversity of evidence across sources
Temporal Dynamics (TD)30Velocity, acceleration, jerk — momentum indicators
External Interest (EI)20GDELT tone, Google Trends, external validation signals
Information Richness (IR)20Completeness of metadata, narrative quality

Gating: {4 components: ×1.0, 3: ×0.9, 2: ×0.7, 1: ×0.4} — prevents hollow scores. Single-source can’t clear D≥40.

5.4 Classification Rules

ClassificationCriteria
StrongH≥30 AND W≥40 (requires 2+ sources)
EmergingH≥20 AND W≥30
PossibleH≥10 AND W≥20
NoiseBelow Possible thresholds
Key insight: Width is the real gatekeeper. Width 40 requires 2 sources minimum. This enforces multi-signal triangulation by design.

6. Intelligence Layer — Detail

6.1 NarrativeGenerator Output Structure

{
  "what": "Description of the trend",
  "why_now": "Why this is emerging now",
  "who": "Who is driving/participating",
  "so_what": "Sector-specific implications",
  "category": "beauty|fashion|food|tech|lifestyle|...",
  "brand_safety": "safe|caution|risky",
  "confidence": 0.85,
  "now_what_lite": ["Activation suggestion 1", "Activation suggestion 2"],
  "report_narrative": "2-3 paragraph trend story..."
}

6.2 Quality Thresholds (Planned)

Signal StrengthIntelligence OutputData Minimum
High confidenceFull brief with quantitative claims5+ sources, 50+ signals, 7+ days
Medium confidenceBrief with qualitative observations2+ sources, 10+ signals, 3+ days
Low confidenceSuppress quantitative claims, flag as emerging1 source or <10 signals
InsufficientNo intelligence output generated<3 signals total

6.3 Enrichment V2 Pipeline

Layer 1: Internal signal analysis
  -> Signal count, source diversity, temporal pattern, engagement distribution

Layer 2: LLM enrichment (Gemini 2.5 Flash)
  -> Cultural context, demographic associations, industry implications

Layer 3: Urban Dictionary
  -> Slang/cultural terminology context

Layer 4: Google Trends
  -> Search interest baseline, regional distribution, related queries

Enrichment data stored in detected_trends.enrichment_data under enrichment_v2 key using COALESCE merge.


7. Delivery Phases

For detailed build timelines, hour budgets, and weekly task plans, see the Roadmap and Build Plan.

This blueprint is delivered across four phases. Each phase adds product tiers and validates with real clients before building more.

PhasePeriodGateWhat Ships
Phase 1: BuildMar–MayM2: Demo Ready (May 31)V6 intelligence reports, /report skill, multi-tenancy foundation, demo environment
Phase 2: PilotJun–JulM3: First Pilots (Jun 30)2–3 free pilots receiving weekly intelligence, client matching, validation feedback
Phase 3: Tier 2Aug–SepM5: First Revenue (Sep 30)Content briefs, creator matching, saturation alerts, paid conversion
Phase 4: Tier 3Oct–DecM6: Tier 2 Complete (Nov 30)API access, trend attribution, trajectory modelling

Client Value Progression

StageWhat the Client Sees
Pilot launchOnboarded, seed terms configured, first intelligence delivery
Calibrated intelligence4 weeks of intelligence, matching tuned to their verticals
Full Tier 1Weekly intelligence with So What + lite Now What, validated quality
Tier 2 upgradeClient-specific Now What, content briefs, creator matching
Tier 3 / APIProgrammatic access to trend data for their own systems

9. Decision Register

Moved to decisions-pending.md as of 2026-03-19. That file is the sole source of truth for all open and resolved decisions.

Summary: 10 resolved decisions (D1–D10), 4 open pre-Phase 2 (D11–D14), 4 open post-Phase 2 (D15–D18), plus pricing and prediction feasibility decisions.


10. Risk Register

Extracted to Lori’s risk register as of 2026-03-19. See planning-system-assessment.md for the risk extract.

10 risks identified at plan creation (Mar 19, 2026):

  1. Intelligence layer outputs are generic (Critical/High)
  2. Multi-tenancy scope creep (High/Medium)
  3. Co-founder availability gaps (Medium/Medium)
  4. Google Trends re-breaks (Medium/Medium)
  5. Single VPS failure (High/Low)
  6. Tom’s time constraint — 2 days/week (High/Certain)
  7. Noise rate doesn’t drop below 30% (Medium/Medium)
  8. Pilot clients don’t engage (High/Medium)
  9. n8n dual-table deployment bugs (Medium/Medium)
  10. LLM costs escalate (Medium/Low)

11. Known Limitations

LimitationImpactContext
No Reddit dataMissing ~12% of trend signalsCommercial contract required. 88% coverage validated without it.
No Twitter/X dataMissing real-time social pulse$5K/mo prohibitive. Bluesky partially compensates.
Single VPS architectureNo redundancy, single point of failureAcceptable at <10 clients. Vertical scaling first lever.
Hourly batch processingSignals are up to 1h staleReal-time not needed for cultural trend detection (weeks-scale phenomena).
GDELT engagement always 0Depth V2 EI component dormantGDELT provides volume/tone but no engagement. EI component ready when fixed.
No view-through attributionCan’t track what users saw but didn’t clickFundamental limitation of signal-based detection vs. ad tracking.
L1 identity resolution onlySame entity may appear as different termsFuzzy dedup + entity resolution both deployed. L2 probabilistic = Phase 3+.
Tom + Tim (2 developers)Bus factor = 2, Tim is contractor (trial)Tim Goerner (Augmentra) from W16. WS1 gate at W19. Tom sole deployer to VPS.
No client-facing authDashboard/API currently open to anyone with URLPre-pilot blocker. Tim WS1 builds minimal auth W18. Sufficient for 2–3 pilots.
Google Trends rate-limitedEnrichment pipeline partially blockedDataForSEO dropped; rate-limiting fix pending W12.

12. Future Architecture (Phase 2+)

For sequencing and timelines, see the Roadmap.

The target architecture extends beyond Phase 1 in these areas:

LayerCapabilityPhase
IntelligenceClient-specific “Now What” activation, data sufficiency gating, prompt quality benchmarking, multi-language2–3
DeliveryContent briefs (500-word structured), creator matching, saturation alerts, white-label reports2–3
APIRESTful API access (rate-limited, API keys), trend attribution, trajectory modelling3–4
PlatformAdvanced multi-tenancy (RBAC, billing, usage tracking), SSO/OAuth, webhook integrations3–4
DataAdditional collectors (Threads, TikTok if feasible), L2 entity resolution, ML trend prediction, pgvector3–4
ScaleHorizontal infrastructure, dedicated DB server, CDN, load balancing4 (if demand)

13. Architecture State (as of April 14, 2026)

This table maps every architectural component to its current state. For build sequence, see the Roadmap.

LayerComponentStateNotes
BusinessSignal collection (10 sources)DeployedHN, Bluesky, GDELT, GA, Wikipedia, Pinterest, Tumblr, Substack, Trade Press, YouTube
Signal scoring (H×W×D V2)Deployed200+ tests passing. Deterministic classification.
Intelligence layer (So What / Now What / Narrative)Deployed (V5), V6 in progressV5 shipped W13. V6 SOPs being wired W16–W17.
Trend classification (7 profiles)DeployedDeterministic rules. Validation feedback schema deployed.
Delivery — operational dashboardDeployedPlotly Dash v2, 5 pages, dash.rumblings.io
Delivery — client reports (/report skill)Not builtPhase 1 critical path (W18–W21)
Delivery — emailNot builtManual for pilots. Automated = Tim WS2 or Phase 2.
Multi-tenancyIn progressTim WS1 starting W16. 4 tables + scoped queries.
Client onboardingNot builtNeeds founder workshop. Tim’s onboarding script W18.
Applicationn8n orchestration (9 collector WFs + 4 pipeline WFs)DeployedUpgraded to 2.13.3 (W13)
Python scoring engineDeployedH×W×D V2, cascade pre-filter, LLM evaluation
Gemini 2.5 Flash intelligence layerDeployed4 components: briefs, profiles, narratives, enrichment
Plotly Dash v2 dashboardDeployed5 pages: Ops, Pipeline, Collectors, Signals, Trends
Pipeline API (FastAPI)Deployed5+ endpoints, Bearer token auth
/report skill (Claude Code)Not builtPhase 1 critical path
Validation feedback UINot builtTim WS1 W17–W18
Minimal auth (API key + URL)Not builtTim WS1 W18
DataPostgreSQL schema (15+ tables)Deployedraw_signals, processed_signals, detected_trends, trend_evidence, trend_snapshots, validation_feedback, pipeline_runs, digest_*, trend_families, evaluation_queue, pipeline_jobs
Fuzzy dedupDeployed20% term reduction
Entity resolutionDeployedIntegrated W13 (commit ca61688)
Trend families + clusteringDeployedtrend_families + trend_family_members tables
Enrichment V2 (4-layer)DeployedInternal → LLM → Urban Dictionary → Google Trends
Daily trend snapshotsDeployedTime-series tracking
Multi-tenancy tablesIn progressTim WS1 W16: clients, client_verticals, client_terms, client_preferences
Calendar events DBNot builtPhase 1 (#2910)
Client folder structureNot builtPhase 1 (#2911)
Vertical Intelligence LayerNot builtMedium priority (#2920)
InfrastructureVPS (Hostinger, 72.62.195.132)DeployedDocker, 4 containers
Caddy reverse proxy + TLSDeployeddash.rumblings.io, web.rumblings.io
Pipeline API deploymentDeployedDocker service, port 8001
Web static hostingDeployedPlanning docs, reports on web.rumblings.io
Pipeline observabilityNot builtLightweight hooks, incremental (#2933)
ParallelSocial Signal Validation ResearchIn progressPlan written, co-founder gate pending
Legal docs (ToS, privacy, data agreement)Not builtPhase 1 (#2928)

14. Key Metrics (Current)

MetricCurrent ValueTargetTimeline
Noise rate52.6%<30%8-week calibration cycle
Active collectors8/9 healthy9/9GT rate-limit fix pending
Intelligence layer components4/4 built + deployedQuality-reviewed by Jen/AJApril
Case studies5 candidates, 3 assigned3+ completeMay 31
Signals/day~6,000+Stable
Active trends~166Growing with quality
H×W×D V2 tests passing176100%
Lead time (detection before mainstream)2–12 weeks (estimated)Validated via case studiesMay 31
Pipeline processing4 workflows, all runningHealthy
Monthly infrastructure cost~$80–115<$150 until 10+ clients

15. Team & Responsibilities

PersonRoleRumblings DaysCurrent Focus
Tom CrawfordChief of AI, technical lead2 days/weekV6 SOP wiring, /report skill, social research, Tim management
Tim GoernerContractor (Augmentra)2 days/week (W16+)Multi-tenancy (WS1), report infrastructure (WS2 conditional)
Jen RinglandChief of Product & ImpactPart-timeV6 SOP quality review, Vertical Lens SOP, Tumblr/Substack
AJ JonesChief of Brand, Experience & PartnershipsPart-timePilot client identification, Pinterest/GA collectors, demo prep
Lori SuskoChief of OperationsPart-timeGDELT/Trade Press collectors, legal, V6 SOP 02 session
The separation: Domain experts (Jen, AJ, Lori) own the “what” and “why”. Tom owns the “how”. Tim builds infrastructure under Tom’s direction. SOPs are the contract between domain and technical.

Architecture Blueprint created 2026-03-19 as “Implementation Plan v1”. Restructured 2026-04-14 to separate architecture (this doc) from sequencing (Roadmap) and tactical build plan (Build Plan). TOGAF architecture domain structure retained.