Infra · Supply Chain

Bumblebee — Perplexity's read-only inventory tool for supply-chain response.

Perplexity just open-sourced Bumblebee — a tiny Go binary that answers one narrow but expensive question: when an advisory drops, which of my dev machines have the bad version on disk right now? SBOM tells you what shipped. EDR tells you what ran. Bumblebee tells you what's installed across the fleet, in NDJSON, in under a minute.

By Ryan · Belief Engines · May 2026

When a malicious npm release hits the wire, the clock starts. The question isn't did we ship it — that's an SBOM problem and you'll get to it. The question is which developer laptops have the bad lockfile entry on disk right now, because that's where the next compromise sneaks in. Bumblebee is built for that question and nothing else.

Perplexity dropped it last week: a single static Go binary, zero non-stdlib deps, read-only, NDJSON out. No agent. No daemon. No phone-home. You schedule it like you would find or ls.

What it actually does

Bumblebee walks a set of paths looking for on-disk package metadata — lockfiles, install manifests, extension descriptors, MCP host configs — and emits a structured component record for each one it finds. No npm ls, no pip show, no source reads. Just parsed metadata. That's the whole product surface.

It covers npm/pnpm/yarn/bun, PyPI, Go modules, RubyGems, Composer, MCP host configs (Claude Desktop, Cline, Gemini, etc.), and VS Code/Cursor/Windsurf/Chromium/Firefox extensions. Hand it an exposure catalog and it flips into matching mode — emitting finding records when an (ecosystem, name, version) triple hits the catalog.

How a Bumblebee scan flows
DEV LAPTOP lockfiles npm · pip · go · gem extensions VS Code · Chromium MCP host configs Claude Desktop · Cline · Gemini bumblebee scan NDJSON stream inventory.db long-tail storage exposure match catalog hits scan_summary per-host receipts
One static Go binary in, one NDJSON stream out. Storage and triage stay in tools you already run.

The output is dumb on purpose. One JSON object per line, designed to pipe into whatever you already have — a Postgres COPY, an S3 dump, a jq | sort | uniq, a Loki stream. The tool refuses to own your storage.

CLI reference

Four commands, all read-only. Memorize these.

# Scan with a profile, write NDJSON to stdout
bumblebee scan --profile baseline > inventory.ndjson

# Filter to specific ecosystems
bumblebee scan --profile baseline --ecosystem npm,pypi

# Deep scan + advisory matching (incident response mode)
bumblebee scan --profile deep \
  --root "$HOME" \
  --exposure-catalog ./catalog.json \
  --findings-only \
  --max-duration 10m

# Preview what would be scanned, without scanning
bumblebee roots --profile baseline

# Embedded smoke test (safe to run pre-rollout)
bumblebee selftest

# Version + VCS revision + build time
bumblebee version

The three scan profiles

Pick the profile to match the population
PROFILE WHAT IT SCANS WHEN TO USE baseline global package roots, toolchains, extensions, MCP host configs recurring cron (hourly / daily) project configured dev dirs (~/code, ~/src, …) daily / weekly workspace sweeps deep explicit --root paths (can include $HOME) incident response, campaign checks
baseline and project refuse bare-home roots — only deep will accept $HOME. The safety rail is: you can't accidentally schedule a $HOME walk every fifteen minutes.

A real example: catching a compromised npm release

Say an advisory drops at 11:03 AM naming @example/utils@1.4.7 as backdoored. You want to know which of your 40 dev laptops touched it.

1. Write the catalog.

{
  "schema_version": "0.1.0",
  "entries": [
    {
      "id": "advisory-2026-0042",
      "name": "@example/utils 1.4.7 (compromised release)",
      "ecosystem": "npm",
      "package": "@example/utils",
      "versions": ["1.4.7"],
      "severity": "critical"
    }
  ]
}

2. Push it to the fleet (MDM, Ansible, whatever) and run:

bumblebee scan \
  --profile deep \
  --root "$HOME" \
  --exposure-catalog /var/lib/bumblebee/catalog.json \
  --findings-only \
  | curl -X POST --data-binary @- https://collector.internal/findings

3. Each match emits a finding record:

{
  "record_type": "finding",
  "finding_type": "package_exposure",
  "severity": "critical",
  "catalog_id": "advisory-2026-0042",
  "ecosystem": "npm",
  "package_name": "@example/utils",
  "version": "1.4.7",
  "source_file": "/Users/alex/code/web-app/pnpm-lock.yaml",
  "evidence": "exact name+version match"
}

You now know — by hostname, by lockfile path, by exact version — every machine that has the bad release on disk. Total wall-clock from advisory to dashboard: minutes, not days.

Where it fits in the stack

Three different questions, three different tools
WHAT SHIPPED (build time) SBOM · CycloneDX · syft → what's in the artifact WHAT RAN / TALKED (runtime telemetry) EDR · CrowdStrike · osquery → what executed or networked WHAT'S ON DISK NOW (dev endpoints) ⬅ BUMBLEBEE → who has the bad version, right now
Bumblebee doesn't replace SBOM or EDR. It fills the gap between them — the developer-laptop layer where most supply-chain advisories actually need an answer.

Bumblebee doesn't replace either of the other two. It fills the gap that opens when a developer-laptop advisory drops and your only options are "ask everyone on Slack" or "scrape EDR for executable hashes." Both are slow. Bumblebee is fast and structured.

What it won't do

A short list, so you don't mis-adopt it:

No execution. Won't run npm ls or resolve transitive deps that aren't already in a lockfile. If you don't have a lockfile committed, Bumblebee can't see those deps.

No remediation. It tells you what's there. Removing, pinning, rolling back is on you.

No vuln database. It does exact (name, version) matching against a catalog you supply. Wiring it to OSV, GHSA, or your internal advisory feed is the integration work.

No agent / no daemon. Bumblebee runs and exits. Scheduling is cron / launchd / systemd.

Integration patterns

The shape that makes sense for most teams:

1. Daily baseline scans via cron on every dev laptop → NDJSON to a central collector → inventory table partitioned by host + scan_time.

2. Advisory-triggered deep scans when a critical CVE drops — push a catalog, run with --findings-only, page the hits.

3. Pre-rollout bumblebee selftest so you don't push a broken binary to the fleet.

4. Dedup with record_id (content-addressed hash) — same lockfile entry across scans collapses cleanly.

The takeaway

The interesting thing about Bumblebee isn't the parsing — it's the scope discipline. Most "supply chain" tools try to be SBOM + scanner + remediation + dashboard, and end up bad at all four. Bumblebee does one thing: structured snapshots of on-disk package metadata, with optional exact-match findings against a catalog you control.

If your supply-chain incident response currently relies on Slack threads and grepping people's laptops over SSH, this closes that gap in an afternoon of integration work. Read-only, no daemon, no SaaS — exactly the shape you want sitting on a developer's machine.

Repo: github.com/perplexityai/bumblebee

§ · § · §