IOC Hunter – Automated Threat‑Intelligence Extraction on Cloudflare Workers
Project Overview
IOC Hunter is an open‑source threat‑intelligence pipeline that turns unstructured security‑blog content into actionable Indicators of Compromise (IOCs).
Built entirely on Cloudflare Workers, the service runs server‑less, scales automatically, and incurs only usage‑based costs – a perfect fit for security teams that need fast, reliable enrichment without managing infrastructure.
The pipeline continuously watches public sources (RSS feeds, Google Alerts, manual submissions), extracts IOCs such as malicious domains, IP addresses, file hashes, and crypto wallets, validates them, and then delivers the results in formats ready for SIEMs, SOAR platforms, or custom scripts. By automating what is traditionally a manual, error‑prone hunt, IOC Hunter lets analysts focus on response instead of data collection.
Key Features
| Feature | What it does |
|---|---|
| Multi‑source ingestion | RSS feeds, Google Alert e‑mails, and a simple manual‑submission endpoint. |
| Cloudflare relevance scoring | Articles mentioning Cloudflare‑specific keywords are prioritized for extraction. |
| 16+ IOC types | IPv4/IPv6, domains, URLs, MD5/SHA‑1/256/512, CVE IDs, MITRE ATT&CK technique IDs, and major cryptocurrency wallet addresses. |
| False‑positive reduction | Public‑Suffix‑List (PSL) domain validation, safe‑domain blocklist, and detection‑name filtering. |
| Multiple export formats | JSON, CSV, STIX 2.1, and RSS feeds for easy integration. |
| Configurable notifications | Webhook and generic API‑call channels with per‑event filtering (type, count, IOC categories). |
| Server‑less storage | Cloudflare D1 for relational data and R2 for raw article caching. |
| Event‑driven architecture | Cloudflare Queues decouple each stage, guaranteeing reliable asynchronous processing. |
Architecture Overview
A lightweight ASCII diagram captures the data flow (≈ 10 lines, ≤ 60 chars wide):
[Sources] ──► Argus ──► Alexandria ──► Hermes ──► [Notifications]
│ │ │
▼ ▼ ▼
D1 DB R2 Cache Event Queue- Argus – monitors feeds, scores articles, and queues URLs.
- Alexandria – fetches the article, extracts IOCs, validates them, and stores the results.
- Hermes – consumes extraction events, applies user‑defined filters, and dispatches alerts.
All components run as independent Cloudflare Workers and communicate through Cloudflare Queues, keeping the system resilient to spikes in traffic.
How It Works – Technical Deep‑Dive
1. Argus – Feed Monitoring & Scoring
Argus runs on a 15‑minute cron trigger. It fetches each RSS entry (or processes a Google‑Alert e‑mail), extracts the article body, and scores it against a small Cloudflare‑keyword list.
// argus/src/services/article-scorer.ts
const[REDACTED]cloudflare', 'workers.dev', 'trycloudflare'];
/**
* Returns a numeric relevance score based on keyword occurrences.
*/
export function scoreArticle(content: string): number {
let score = 0;
for (const kw of CLOUDFLARE_KEYWORDS) {
const matches = content.match(new RegExp(kw, 'gi'));
if (matches) {
score += matches.length; // one point per occurrence
}
}
return score;
}If scoreArticle returns ≥ 1, the article URL is placed onto the feed-queue for Alexandria to consume.
2. Alexandria – Fetch, Extract, Validate
Alexandria’s /api/extract endpoint receives a URL, fetches the page (following redirects and applying a realistic browser header), and runs a series of type‑specific extractors.
a) Fetching the article
// alexandria/src/services/fetcher.ts
export async function fetchArticle(url: string): Promise<string> {
const resp = await fetch(url, {
headers: { 'User-Agent': 'Mozilla/5.0 (compatible; IOC‑Hunter/1.0)' },
redirect: 'follow',
});
if (!resp.ok) throw new Error(`Unable to fetch ${url}`);
return resp.text();
}b) IOC extraction (example: URLs)
// alexandria/src/ioc-types/network.ts
const URL_REGEX = /\bhttps?:\/\/[^\s/$.?#].[^\s]*\b/gi;
/**
* Returns every URL found in the supplied text.
*/
export function extractUrls(text: string): string[] {
const matches = text.match(URL_REGEX);
return matches ? Array.from(new Set(matches)) : [];
}The main extractor aggregates results from all type modules:
// alexandria/src/services/ioc-extractor.ts
import { extractUrls } from '../ioc-types/network';
import { extractIps } from '../ioc-types/network'; // similar helper
import { extractHashes } from '../ioc-types/file';
export async function extractIOCs(content: string): Promise<string[]> {
const iocs: string[] = [];
iocs.push(...extractUrls(content));
iocs.push(...extractIps(content));
iocs.push(...extractHashes(content));
// add other type‑specific calls as needed
return iocs;
}c) False‑positive filtering
Domain validation uses the tldts library, which incorporates the PSL and an ICANN‑TLD check.
// alexandria/src/services/psl-validator.ts
import { parseDomain, fromUrl } from 'tldts';
export function isValidDomain(candidate: string): boolean {
const parsed = parseDomain(candidate);
return !!parsed && parsed.isIcann;
}The filter pipeline discards any IOC that fails validation:
// alexandria/src/services/ioc-filter.ts
import { isValidDomain } from './psl-validator';
export function filterIOCs(iocs: string[]): string[] {
return iocs.filter(ioc => {
// simple heuristic: keep hashes as‑is, validate domains/URLs only
if (ioc.startsWith('http')) return isValidDomain(new URL(ioc).hostname);
if (ioc.includes('.')) return isValidDomain(ioc);
return true; // assume non‑domain strings are legitimate (IPs, hashes, etc.)
});
}After filtering, the final IOC list is persisted to the D1 database and an event is placed on the ioc-event-queue for Hermes.
3. Hermes – Notification Dispatch
Hermes reads events from ioc-event-queue, applies user‑defined routing rules, and sends the payload to configured channels.
// hermes/src/dispatcher.ts
export async function dispatch(event: IocEvent): Promise<void> {
for (const channel of event.targets) {
const payload = buildPayload(event, channel);
await fetch(channel.webhookUrl, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
});
}
}Channels can be filtered by IOC type, count thresholds, or custom tags, giving teams precise control over which alerts surface in their SOC.
Getting Started
- Prerequisites – Cloudflare account,
wranglerCLI (v2+), and a GitHub clone of the repository. - Environment – Copy the example
.dev.vars.exampleto.dev.varsand fill in:D1_DATABASE_ID– ID of the Cloudflare D1 instance.R2_BUCKET– Name of the R2 bucket for article caching.- Optional secret keys for Hermes webhook authentication.
- Run locally
# Install dependencies for each component
cd argus && npm ci
cd ../alexandria && npm ci
cd ../hermes && npm ci
# Start the workers in dev mode (each opens a local tunnel)
wrangler dev --local- Deploy – After testing, push to Cloudflare with
wrangler publishfor each worker. The README in the repo contains full deployment steps and sample configuration files.
Recent Developments
- Safe‑domain list expansion – The internal blocklist now includes dozens of known CDNs and cloud‑service domains, cutting false alerts by ~15 %.
- Event logging – Alexandria now writes every extraction request to a dedicated log table, improving observability and simplifying troubleshooting.
- Notification enhancements – Hermes gained per‑event filtering (e.g., “only notify when ≥ 5 new IOCs are found”) and retry logic for transient webhook failures.
These refinements tighten the pipeline’s accuracy while keeping the public API unchanged.
Conclusion
IOC Hunter demonstrates how a fully server‑less stack can deliver a production‑grade threat‑intelligence pipeline. By combining Cloudflare Workers, D1, R2, and Queues, the project provides continuous, low‑cost ingestion, high‑fidelity IOC extraction, and flexible, real‑time alerting without the overhead of traditional infrastructure.
Whether you’re a SOC looking to augment existing feeds or a researcher needing a reusable extraction engine, IOC Hunter offers a solid, extensible foundation. Contributions are welcome—feel free to fork, add new IOC types, or integrate with your own alerting platform.
Explore the code, try it out, and help make threat intelligence more automated for everyone.