IOC Hunter – Automated Threat‑Intelligence Extraction on Cloudflare Workers

Project Overview

IOC Hunter is an open‑source threat‑intelligence pipeline that turns unstructured security‑blog content into actionable Indicators of Compromise (IOCs).
Built entirely on Cloudflare Workers, the service runs server‑less, scales automatically, and incurs only usage‑based costs – a perfect fit for security teams that need fast, reliable enrichment without managing infrastructure.

The pipeline continuously watches public sources (RSS feeds, Google Alerts, manual submissions), extracts IOCs such as malicious domains, IP addresses, file hashes, and crypto wallets, validates them, and then delivers the results in formats ready for SIEMs, SOAR platforms, or custom scripts. By automating what is traditionally a manual, error‑prone hunt, IOC Hunter lets analysts focus on response instead of data collection.

Key Features

Feature	What it does
Multi‑source ingestion	RSS feeds, Google Alert e‑mails, and a simple manual‑submission endpoint.
Cloudflare relevance scoring	Articles mentioning Cloudflare‑specific keywords are prioritized for extraction.
16+ IOC types	IPv4/IPv6, domains, URLs, MD5/SHA‑1/256/512, CVE IDs, MITRE ATT&CK technique IDs, and major cryptocurrency wallet addresses.
False‑positive reduction	Public‑Suffix‑List (PSL) domain validation, safe‑domain blocklist, and detection‑name filtering.
Multiple export formats	JSON, CSV, STIX 2.1, and RSS feeds for easy integration.
Configurable notifications	Webhook and generic API‑call channels with per‑event filtering (type, count, IOC categories).
Server‑less storage	Cloudflare D1 for relational data and R2 for raw article caching.
Event‑driven architecture	Cloudflare Queues decouple each stage, guaranteeing reliable asynchronous processing.

Architecture Overview

A lightweight ASCII diagram captures the data flow (≈ 10 lines, ≤ 60 chars wide):

TEXT

[Sources] ──► Argus ──► Alexandria ──► Hermes ──► [Notifications]
                │            │               │
                ▼            ▼               ▼
             D1 DB       R2 Cache        Event Queue

Argus – monitors feeds, scores articles, and queues URLs.
Alexandria – fetches the article, extracts IOCs, validates them, and stores the results.
Hermes – consumes extraction events, applies user‑defined filters, and dispatches alerts.

All components run as independent Cloudflare Workers and communicate through Cloudflare Queues, keeping the system resilient to spikes in traffic.

How It Works – Technical Deep‑Dive

1. Argus – Feed Monitoring & Scoring

Argus runs on a 15‑minute cron trigger. It fetches each RSS entry (or processes a Google‑Alert e‑mail), extracts the article body, and scores it against a small Cloudflare‑keyword list.

// argus/src/services/article-scorer.ts
const[REDACTED]cloudflare', 'workers.dev', 'trycloudflare'];

/**
 * Returns a numeric relevance score based on keyword occurrences.
 */
export function scoreArticle(content: string): number {
  let score = 0;
  for (const kw of CLOUDFLARE_KEYWORDS) {
    const matches = content.match(new RegExp(kw, 'gi'));
    if (matches) {
      score += matches.length;          // one point per occurrence
    }
  }
  return score;
}

If scoreArticle returns ≥ 1, the article URL is placed onto the feed-queue for Alexandria to consume.

2. Alexandria – Fetch, Extract, Validate

Alexandria’s /api/extract endpoint receives a URL, fetches the page (following redirects and applying a realistic browser header), and runs a series of type‑specific extractors.

a) Fetching the article

// alexandria/src/services/fetcher.ts
export async function fetchArticle(url: string): Promise<string> {
  const resp = await fetch(url, {
    headers: { 'User-Agent': 'Mozilla/5.0 (compatible; IOC‑Hunter/1.0)' },
    redirect: 'follow',
  });
  if (!resp.ok) throw new Error(`Unable to fetch ${url}`);
  return resp.text();
}

b) IOC extraction (example: URLs)

// alexandria/src/ioc-types/network.ts
const URL_REGEX = /\bhttps?:\/\/[^\s/$.?#].[^\s]*\b/gi;

/**
 * Returns every URL found in the supplied text.
 */
export function extractUrls(text: string): string[] {
  const matches = text.match(URL_REGEX);
  return matches ? Array.from(new Set(matches)) : [];
}

The main extractor aggregates results from all type modules:

// alexandria/src/services/ioc-extractor.ts
import { extractUrls } from '../ioc-types/network';
import { extractIps } from '../ioc-types/network';   // similar helper
import { extractHashes } from '../ioc-types/file';

export async function extractIOCs(content: string): Promise<string[]> {
  const iocs: string[] = [];
  iocs.push(...extractUrls(content));
  iocs.push(...extractIps(content));
  iocs.push(...extractHashes(content));
  // add other type‑specific calls as needed
  return iocs;
}

c) False‑positive filtering

Domain validation uses the tldts library, which incorporates the PSL and an ICANN‑TLD check.

// alexandria/src/services/psl-validator.ts
import { parseDomain, fromUrl } from 'tldts';

export function isValidDomain(candidate: string): boolean {
  const parsed = parseDomain(candidate);
  return !!parsed && parsed.isIcann;
}

The filter pipeline discards any IOC that fails validation:

// alexandria/src/services/ioc-filter.ts
import { isValidDomain } from './psl-validator';

export function filterIOCs(iocs: string[]): string[] {
  return iocs.filter(ioc => {
    // simple heuristic: keep hashes as‑is, validate domains/URLs only
    if (ioc.startsWith('http')) return isValidDomain(new URL(ioc).hostname);
    if (ioc.includes('.')) return isValidDomain(ioc);
    return true; // assume non‑domain strings are legitimate (IPs, hashes, etc.)
  });
}

After filtering, the final IOC list is persisted to the D1 database and an event is placed on the ioc-event-queue for Hermes.

3. Hermes – Notification Dispatch

Hermes reads events from ioc-event-queue, applies user‑defined routing rules, and sends the payload to configured channels.

// hermes/src/dispatcher.ts
export async function dispatch(event: IocEvent): Promise<void> {
  for (const channel of event.targets) {
    const payload = buildPayload(event, channel);
    await fetch(channel.webhookUrl, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(payload),
    });
  }
}

Channels can be filtered by IOC type, count thresholds, or custom tags, giving teams precise control over which alerts surface in their SOC.

Getting Started

Prerequisites – Cloudflare account, wrangler CLI (v2+), and a GitHub clone of the repository.
Environment – Copy the example .dev.vars.example to .dev.vars and fill in:
- D1_DATABASE_ID – ID of the Cloudflare D1 instance.
- R2_BUCKET – Name of the R2 bucket for article caching.
- Optional secret keys for Hermes webhook authentication.
Run locally

BASH

# Install dependencies for each component
cd argus && npm ci
cd ../alexandria && npm ci
cd ../hermes && npm ci

# Start the workers in dev mode (each opens a local tunnel)
wrangler dev --local

Deploy – After testing, push to Cloudflare with wrangler publish for each worker. The README in the repo contains full deployment steps and sample configuration files.

Recent Developments

Safe‑domain list expansion – The internal blocklist now includes dozens of known CDNs and cloud‑service domains, cutting false alerts by ~15 %.
Event logging – Alexandria now writes every extraction request to a dedicated log table, improving observability and simplifying troubleshooting.
Notification enhancements – Hermes gained per‑event filtering (e.g., “only notify when ≥ 5 new IOCs are found”) and retry logic for transient webhook failures.

These refinements tighten the pipeline’s accuracy while keeping the public API unchanged.

Conclusion

IOC Hunter demonstrates how a fully server‑less stack can deliver a production‑grade threat‑intelligence pipeline. By combining Cloudflare Workers, D1, R2, and Queues, the project provides continuous, low‑cost ingestion, high‑fidelity IOC extraction, and flexible, real‑time alerting without the overhead of traditional infrastructure.

Whether you’re a SOC looking to augment existing feeds or a researcher needing a reusable extraction engine, IOC Hunter offers a solid, extensible foundation. Contributions are welcome—feel free to fork, add new IOC types, or integrate with your own alerting platform.

Explore the code, try it out, and help make threat intelligence more automated for everyone.