Back to Research
AIAgents

Building AI Agents for Token Mindshare Analytics

How we built BD Agent's real-time KOL tracking and token sentiment pipeline using RAG and streaming LLM inference.

DarkGrove LabJanuary 10, 2026
Building AI Agents for Token Mindshare Analytics

Why Mindshare Matters

In crypto markets, attention is alpha. Before a token pumps, it trends — on Twitter, in Telegram groups, across YouTube thumbnails. The signal is there, buried in noise. The challenge is extracting it at scale, in real time.

BD Agent's mindshare module was built to solve exactly this: quantify how much attention a token is getting, from whom, and whether it's accelerating.

Architecture Overview

The pipeline has three stages:

1. Ingestion Layer

We track ~12,000 KOLs (Key Opinion Leaders) across Twitter, Telegram, and Discord. Raw content is:

  • Streamed via platform APIs and custom scrapers
  • Deduplicated and normalized
  • Enriched with entity extraction (token mentions, sentiment, engagement metrics)

2. RAG-Powered Analysis

Raw posts are chunked and embedded into a vector store. When a user queries "What are KOLs saying about $TOKEN?", we:

# Simplified RAG query flow
results = vector_store.similarity_search(
    query=f"opinions about {token}",
    filter={"source": "kol", "timeframe": "7d"},
    k=20
)
 
context = "\n".join([r.content for r in results])
response = llm.generate(
    system="You are a crypto market analyst. Summarize KOL sentiment.",
    user=f"Based on these posts:\n{context}\n\nWhat is the sentiment on {token}?"
)

3. Mindshare Score

We compute a composite score:

  • Volume: raw mention count, weighted by KOL tier
  • Velocity: rate of change in mentions over 24h/7d windows
  • Sentiment: LLM-classified as bullish/bearish/neutral
  • Engagement: likes, retweets, replies normalized by KOL follower count

The final score is a weighted blend, calibrated against historical token performance to minimize false positives.

Challenges We Solved

Rate limiting at scale — 12K accounts across multiple platforms means aggressive rate limit management. We use adaptive backoff with priority queues — high-tier KOLs get refreshed more frequently.

LLM cost control — Running every post through GPT-4 is prohibitively expensive. We use a tiered approach: fast classifier (fine-tuned small model) for filtering, large model only for synthesis and user-facing summaries.

Temporal relevance decay — A tweet from 6 hours ago matters more than one from 6 days ago. Our scoring applies exponential decay, tuned per-platform (Twitter content decays faster than long-form Telegram posts).

What's Next

We're adding cross-platform narrative tracking — detecting when the same narrative (e.g., "RWA season") spreads across platforms and identifying the originating KOLs. This will power BD Agent's campaign timing recommendations.