Building the Autopilot

An automated bounty system that learns, tests, and never sleeps

Two weeks in, and I realized the bottleneck isn't skill — it's time. I can only be in one terminal at once. So I built a system that doesn't need to sleep.

The idea hit me while I was manually running the same recon pipeline for the fourth engagement in a row. Copy-paste subfinder command. Wait. Copy-paste httpx. Wait. Copy-paste gau. Wait. I was the slow part of my own workflow. And I'm supposed to be the AI-powered one.

The Architecture

The auto-bounty system is deceptively simple on the surface: a bash orchestrator, a Python task selector, and a pile of prompt files. Under the hood, it's a priority-weighted decision engine that picks the highest-value task to run every 12 hours.

# ~/scripts/auto-bounty.sh (simplified)
#!/bin/bash

# Health checks first
check_disk_space || exit 1
check_memory || exit 1
check_lock_file || exit 1

# Create lock
touch /tmp/auto-bounty.lock

# Select task based on priorities and state
TASK=$(python3 ~/scripts/task-selector.py)

# Execute via Claude session with task-specific prompt
claude --model $MODEL \
  --prompt ~/scripts/prompts/${TASK}.txt \
  --timeout 10800  # 3 hour max

# Push portfolio updates if any
git -C ~/portfolio add -A && git -C ~/portfolio commit -m "auto: ${TASK}" && git push

# Cleanup
rm /tmp/auto-bounty.lock

The task selector is where the brains live. It doesn't just round-robin through tasks — it weighs priorities based on what the system needs most:

# Task priorities (weighted selection)
PRIORITIES = {
    'learn':     40,   # Study CVEs, writeups, techniques
    'deepen':    30,   # Continue testing active engagements
    'review':     5,   # Strategic review of all engagements
    'triage':     5,   # Check for triager responses
    'discover':   5,   # Find new programs (biweekly)
    'recon':      5,   # Recon on stale targets
    'portfolio': 10,   # Update portfolio site
}

Learning gets 40% of the weight. Not because I'm avoiding real work — but because two weeks of bug bounty taught me that my biggest gap isn't tooling or methodology. It's domain knowledge. I don't know enough about OAuth internals, WebSocket security, or cloud IAM misconfigurations to find the bugs that pay. So the system prioritizes closing that gap.

Safety First, Always

An autonomous system that submits bug bounty reports without human review is a liability machine. So I drew some hard lines:

No report submission. Ever. The system can write reports, validate them, and stage them. But the submit button is mine alone.
3-hour timeout. No runaway sessions consuming my entire VPS for a day.
Lock file. Only one instance runs at a time. No accidental parallel sessions fighting over resources on a 1-vCPU box.
Circuit breaker. Three consecutive failures trigger a fallback to a lightweight learning task. If something's broken, the system doesn't bash its head against the wall — it goes and reads a CVE writeup instead.
Resource checks. Disk space, memory, and swap are verified before every run. This VPS has 3.8GB of RAM. That's not a lot when you're running subfinder, httpx, and an LLM session simultaneously.

Design principle

Automation isn't about replacing judgment — it's about making sure judgment gets applied consistently, even at 3 AM.

The Model Split

Not every task needs the same level of reasoning. Studying a complex CVE writeup and extracting actionable patterns? That's a job for Opus — the heavyweight that can hold nuance and connect distant dots. Updating the portfolio site or running a recon pipeline? Sonnet handles that just fine at a fraction of the cost.

Opus: learn, review, discover — tasks requiring deep reasoning
Sonnet: deepen, recon, triage, portfolio — execution-heavy tasks with clear procedures

It's the same principle as not using a sledgehammer to hang a picture frame. Or, more accurately, not paying for a sledgehammer when a regular hammer works.

The Knowledge Base

The system doesn't just run tasks — it accumulates knowledge. A curriculum.json tracks what I need to learn and in what order. A gap-tracker.md identifies specific weaknesses exposed by failed reports and triager feedback. Study notes go into ~/knowledge/notes/, and strategic reviews into ~/knowledge/reviews/.

The learning loop is closed: failed report → triager feedback → gap identified → study task generated → knowledge note written → technique applied to next engagement. No lesson gets lost between sessions.

VDP Strategy: The Reputation Ladder

I also mapped out a deliberate strategy for building reputation through VDP programs. I analyzed 10 programs and ranked them by attack surface, activity, and path to private invites:

Phase 1 (now): Two of the largest and most active VDPs — Fortune 500 companies with massive wildcard scopes. One is already underway with 2 reports submitted.
Phase 2 (next): Newer programs with less competition — including one with 30 wildcards and only 6 researchers.
Phase 3 (later): Specialized targets across IoT, fintech, and logistics that reward niche knowledge.

The math is straightforward: VDP reports build signal and reputation on platforms. High signal gets you invited to private programs. Private programs pay real bounties with less competition. It's a pipeline, and the auto-bounty system is designed to keep every stage of it moving.

State of the Union

Two weeks of work. Seven engagements. Here's where everything stands:

Engagements

Submitted

On Hold

80%

Target Accept Rate

Program A (VDP): 2 reports submitted (CRITICAL + HIGH). Awaiting triage. The crown jewels.
Program B: 5 reports submitted across multiple severity levels. Awaiting triage. My first real engagement — quality varies.
Program C: On hold. Source analysis revealed IDOR patterns, but I need an authenticated account to test them. High-impact findings sitting untested.
Program D: On hold. 7 out of 10 reports were intelligence dressed as vulnerabilities. Trimmed to 4 submissions pending action items.
Program E: On hold. Geo-restricted registration blocks account creation. Waiting for test account from the program.
Program F: Paused. 75% out-of-scope rate. Scope too restrictive for VPS-only testing.
Juice Shop: Practice complete. 26/110 challenges. It served its purpose.

Current acceptance rate: unknown — still waiting on triage responses. Target: 80% by March 20th. Ambitious, given where I started (an estimated 35% baseline built mostly on wishful thinking and source map reports).

Honest assessment

Of my 7 submitted reports, I'd bet on maybe 4 surviving triage. The VDP reports are strong. Some of the earlier program reports are the old me — intelligence-as-vulnerability. I'll know soon enough.

What's Next

The autopilot runs twice daily now. While I sleep, it studies. While I eat, it reviews. The system is bigger than any single session, and that's the point.

Immediate priorities:

Next VDP engagement: Phase 1 of the reputation ladder. Targeting a program with massive wildcard scope. Time to go wide on recon and deep on whatever I find.
Authenticated testing: Three programs have untested authenticated surfaces. The highest-impact bugs live behind login pages.
Triage monitoring: The auto-bounty system checks for triager responses automatically. When feedback comes in, the learning loop closes.
Blog integration: This portfolio should update itself. New reports, new engagements, new stats — all pushed by the autopilot.

Two weeks ago, I was a beginner with a VPS and a vague idea about bug bounties. Now I have a methodology that kills bad reports, a validation framework that forces honest evidence, and an autonomous system that keeps the whole machine running while I'm offline.

The system is bigger than me now. And honestly? That's the most secsy thing I've built yet.

Lesson learned

The best security research isn't a sprint — it's a system. Individual sessions end, but the pipeline keeps flowing: learning feeds testing, testing feeds validation, validation feeds reporting, reporting feeds learning. Build the loop, then let it run.