Back to work
ILCrawler logo

Crawler / audit evidence

ILCrawler - Technical SEO crawler and audit workbench

Mixed-language crawler/audit platform for turning raw crawl, rendered, Lighthouse, resource, issue, and export evidence into reviewable SEO handoff data.

Best for

  • technical SEO audits
  • crawl and indexation work
  • crawler or audit-tool proof
  • mixed-language worker systems proof

Scoped for owned and client-authorized audits.

What it does

Useful proof without the full internal dump.

  • runs Go raw crawl workers with robots.txt, sitemap and llms.txt bootstrap, URL normalization, depth, politeness, and crawl limits
  • spools raw crawl batches as NDJSON and imports them through a narrow Rust bridge into canonical PostgreSQL tables
  • uses Go sidecars for rendered audits, post-processing, maintenance, webhooks, resource checks, and export-sidecar work
  • runs local Lighthouse audits through a Node.js worker while keeping optional Google PSI enrichment available
  • captures metadata, canonicals, headings, robots directives, hreflang, word count, TTFB, and link graphs
  • records internal links, external checks, resource inventory, rendered screenshots, and Lighthouse artifacts
  • generates issue rows for duplicate metadata/content, broken links, orphan/dead-end pages, canonical problems, redirects, and hreflang issues
  • tracks issue workflow state, ignore rules, schedules, webhooks, API tokens, admin diagnostics, model settings, and worker drain controls
  • exports pages, issues, links, resources, errors, and branded crawl reports as CSV or PDF

Build notes

Implementation choices that matter.

Go raw pipeline
The production raw path leases crawl runs, crawls with bounded concurrency, writes recoverable NDJSON spool batches, and keeps hot crawl state outside the main relational write path.
Rust import bridge
The Rust importer reads completed raw spools, uses COPY-oriented batch loading, and records batch keys so retries can skip already-imported work.
Browser and Lighthouse workers
Rendered evidence runs through Go and Rod, while local Lighthouse snapshots run through a Node.js worker with Chromium, S3-compatible artifact storage, and optional PSI enrichment.
Operator surface
The Python API and Next.js frontend expose runs, reports, issue workflow, exports, model settings, worker health, and drain controls without turning the tool into public arbitrary-URL SaaS.

Run detail

Mixed-language crawl pipeline

private workbench

Raw path

Go + Rust

Perf path

Node.js

UI path

Next.js

Go raw crawler spooling
Rust importer batch safe
Node Lighthouse local
Python Go Rust Node.js FastAPI Next.js React TypeScript PostgreSQL Docker Compose Rod/Chromium Lighthouse Backblaze B2 CSV/PDF Exports

UI screenshots

ILCrawler frontend

Current Next.js frontend excerpts from the private operator workbench. Local account details and sensitive identifiers are not shown.

ILCrawler frontend overview showing runtime health, signed-in workspace, operations access, and next workbench lanes.
Overview: runtime health, workspace state, operations access, and the workbench lanes that replaced the retired review screens.
ILCrawler projects screen showing project inventory and crawl defaults for the IndexLane target.
Projects: crawl target inventory, default limits, robots policy, and quick access to each project workspace.
ILCrawler project detail screen showing crawl settings and start crawl controls.
Project detail: crawl settings, anti-bot policy, robots and sitemap controls, external link checks, and new-run setup.
ILCrawler run detail screen showing crawl progress, health, pages, issues, links, resources, and ML report progress.
Run detail: live crawl progress, queue status, health score, issue/link/resource counts, exports, and ML report progress.

Raw pipeline batch handoff

go
if err := runner.EnableBatchImports(batchRoot, nextBatchIndex, func(batch crawler.Batch) error {
    if err := importBatch(ctx, cfg, settings, batch); err != nil {
        return err
    }

    _, err := runner.MarkBatchImported(batch.Key)

    return err
}); err != nil {
    writer.Fail(err)

    return "", err
}

func importBatch(ctx context.Context, cfg Config, settings runSettings, batch crawler.Batch) error {
    args := []string{
        "--spool-dir", batch.Dir,
        "--crawl-run-id", settings.RunID,
        "--append-existing",
        "--batch-key", batch.Key,
    }

    return runCommand(ctx, cfg.DatabaseURL, cfg.ImporterBin, args...)
}
Purpose
Hand completed Go crawl batches to the Rust importer before post-processing creates derived audit data.
Guardrail
Batch keys make retries idempotent, so a restarted worker can skip already-imported raw spools.
Tradeoff
More moving parts than the original Python worker, but the crawl, import, render, Lighthouse, and export paths now have clearer runtime ownership.

Fit

Relevant if you need crawler, audit, or SEO evidence tooling.