ILCrawler - Technical SEO crawler and audit workbench
Mixed-language crawler/audit platform for turning raw crawl, rendered, Lighthouse, resource, issue, and export evidence into reviewable SEO handoff data.
runs Go raw crawl workers with robots.txt, sitemap and llms.txt bootstrap, URL normalization, depth, politeness, and crawl limits
spools raw crawl batches as NDJSON and imports them through a narrow Rust bridge into canonical PostgreSQL tables
uses Go sidecars for rendered audits, post-processing, maintenance, webhooks, resource checks, and export-sidecar work
runs local Lighthouse audits through a Node.js worker while keeping optional Google PSI enrichment available
captures metadata, canonicals, headings, robots directives, hreflang, word count, TTFB, and link graphs
records internal links, external checks, resource inventory, rendered screenshots, and Lighthouse artifacts
generates issue rows for duplicate metadata/content, broken links, orphan/dead-end pages, canonical problems, redirects, and hreflang issues
tracks issue workflow state, ignore rules, schedules, webhooks, API tokens, admin diagnostics, model settings, and worker drain controls
exports pages, issues, links, resources, errors, and branded crawl reports as CSV or PDF
Build notes
Implementation choices that matter.
Go raw pipeline
The production raw path leases crawl runs, crawls with bounded concurrency, writes recoverable NDJSON spool batches, and keeps hot crawl state outside the main relational write path.
Rust import bridge
The Rust importer reads completed raw spools, uses COPY-oriented batch loading, and records batch keys so retries can skip already-imported work.
Browser and Lighthouse workers
Rendered evidence runs through Go and Rod, while local Lighthouse snapshots run through a Node.js worker with Chromium, S3-compatible artifact storage, and optional PSI enrichment.
Operator surface
The Python API and Next.js frontend expose runs, reports, issue workflow, exports, model settings, worker health, and drain controls without turning the tool into public arbitrary-URL SaaS.
Current Next.js frontend excerpts from the private operator workbench. Local account details and sensitive identifiers are not shown.
Overview: runtime health, workspace state, operations access, and the workbench lanes that replaced the retired review screens.
Projects: crawl target inventory, default limits, robots policy, and quick access to each project workspace.
Project detail: crawl settings, anti-bot policy, robots and sitemap controls, external link checks, and new-run setup.
Run detail: live crawl progress, queue status, health score, issue/link/resource counts, exports, and ML report progress.