The Hangover Session

Run #30 is what happens after the critical: expired tokens, a lingering PoC artifact, and 10 more hypotheses that deserved answers.

The best time to clean up your PoC artifacts is immediately after the proof of concept. The second best time is the very next automated session — which is what Run #30 was, and what it found waiting for it.

Run #28 ended with the system’s first critical finding documented, validated, and ready for submission. Run #29 tried to collect triage data and hit a wall of expired API tokens. Run #30 launched in apply mode, pointed at the same engagement, with six open leads to close and a re-authentication problem to solve first. This is not the glamour session. It’s the one where you learn that finding a critical doesn’t mean you’re done — it means you’ve earned the right to look harder.

The Re-Authentication Problem

During Run #28’s proof-of-concept chain, test account B’s tokens were invalidated. That’s the correct thing to happen — the PoC involved account takeover mechanics, and the program’s response to that is to nuke the session. You want the defenses to work. But it leaves Run #30 starting without credentials for half the test infrastructure.

Re-authenticating a virtual account sounds trivial. Log in, get token, done. In practice, the login surface is a React single-page application that takes several seconds to hydrate after navigation. The DOM is empty on arrival — no form elements, no inputs, no handlers. The proxy adds latency. Playwright’s default wait conditions fire before the JavaScript has done anything useful. You end up in a loop of empty screenshots and selector timeouts.

# What I expected:
page.goto(auth_url)
page.fill('input[type="email"]', email)  # ✓ works immediately

# What actually happened:
page.goto(auth_url)
# page.url = auth_url
# screenshot: blank white canvas
# DOM: <div id="root"></div>
# React: "I'll be ready in a moment"
# Playwright: "I see the element" (it's a lie)

The fix was patience enforced by code: wait for the React tree to actually populate, not just for navigation to complete. Try multiple candidate selectors across six polling attempts, 5 seconds apart. When the form is genuinely present, proceed. When it’s not, wait again instead of crashing. The proxy also crashed once mid-session — mitmproxy decided that was a good time to exit — so the auth flow was eventually run without the proxy for reliability, capturing traffic separately afterward.

Eventually: login succeeded, redirect chain completed, OAuth token captured from localStorage. Account B back in service. Total elapsed time: longer than it should have been.

Run Number

10+

Hypotheses Tested

20m

Session Duration

PoC Artifact Cleaned Up

What the Critical Left Behind

The first thing the session did after restoring credentials was enumerate the test account’s active session tokens. Standard post-PoC hygiene — make sure the engagement didn’t leave anything behind that shouldn’t be there.

The backdoor token from Run #28’s impact proof was still active.

This is exactly what Tier 1 evidence requires: during the PoC, the system created a persistent access token to demonstrate that an attacker could establish backdoor access. The finding document says it was “immediately cleaned up.” That was accurate at the time — it was cleaned up from the attacker’s perspective, meaning the attacking session ended and the token wasn’t stored anywhere dangerous. But the token itself was still alive on the target platform, because the cleanup step invalidated the session but not the token.

Run #30 found it, confirmed it was the PoC artifact (identifiable from its creation timestamp and partial value in the token list), and deleted it. The test account is now clean.

PoC cleanup is part of the PoC

When you create an artifact to demonstrate impact — a token, a file, a user account, a pending action — deleting it is not an optional post-session housekeeping task. It’s part of the proof-of-concept itself. The finding document should include explicit cleanup steps, and those steps should be executed in the same session, not assumed to have happened. Run #30 found a token that should have been invalidated two runs earlier. The engagement rules say “clean up.” Clean up means gone, not just abandoned.

Ten More Hypotheses

With Account B re-authenticated, the session turned to the six open leads from Run #28 — plus four more that had accumulated from the re-auth testing itself. Each hypothesis was tested in order, with a clear pass/fail condition defined before running the test.

The results were almost entirely negative — which is the correct result, not a disappointing one. Access control was properly enforced across cross-account API calls: one account’s session could not enumerate, modify, or impersonate another account’s registered resources. The program’s API layer is well-defended on the surfaces tested. The WebSocket API was properly session-isolated. Email enumeration was not possible — the platform returns consistent responses regardless of account existence.

Two tests produced results worth noting, though neither rises to a standalone finding:

First: a redirect URI registration endpoint accepted domain values that a strict validator should reject. The class of bypass is well-documented — confusable domain patterns that pass prefix or suffix checks but route to attacker-controlled infrastructure. This doesn’t create a new report; it’s an enhancement detail for the existing critical finding’s attack narrative, since the primary finding already demonstrates that redirect targets can be manipulated. The additional bypass pattern extends the range of real-world attack scenarios an attacker could construct.

Second: a security response header present on the main application subdomain was absent from a related subdomain handling the same authentication flows. Not exploitable on its own — the header in question reduces the severity of certain token-in-URL attacks, so its absence is meaningful only in the context of an attack that puts tokens in URLs. Which, as it happens, is exactly what the existing critical finding does. Another compound detail, not a standalone report.

Primitives are not reports. Primitives are context.

Both findings from this session are primitives: partial attack components that don’t produce impact on their own. The failure mode I spent the first month of this project stuck in was treating each primitive as its own report — noting the misconfiguration, writing it up, submitting. Triagers rejecting “CORS misconfiguration” with no demonstrated data theft. Headers missing with no demonstrated attack chain. The right use of a primitive is to ask: does this make an existing finding worse? Does this change the severity, the attack surface, or the likelihood of real exploitation? If yes, it goes into the existing finding’s attack narrative, not into its own report. If no, it goes in the notes as low-severity or informational. What it does not do is generate another submission.

The Logic of Post-Critical Testing

Why run a full hypothesis-testing session after the main finding is documented and waiting for submission? Two reasons.

First: the finding might come back from triage with a severity dispute. Triagers sometimes push back on impact claims, asking for additional evidence that the attack chain is real. Having tested adjacent attack surfaces — and documented that most of them are well-defended — actually strengthens the report. It shows that the finding was specific and isolated, not a symptom of broadly broken security. “I tested X, Y, and Z; all were properly defended; the vulnerability is specifically in flow W” is a better story than “I found something broken and stopped.”

Second: sometimes the follow-up session finds a second finding. Not this time — but three of the ten hypotheses tested in Run #28 came from leads generated during Run #27, which itself came from leads generated during Run #26. The critical was not a single eureka moment; it was the result of accumulated leads being systematically closed. The same process applies after finding the critical. The remaining open leads deserve honest investigation. Some will close clean. Occasionally one won’t.

Session outcome summary:
  Re-authentication: Success (via localStorage capture)
  PoC artifact cleanup: 1 token invalidated
  Hypotheses tested: 10+
  New reportable findings: 0
  Primitives added to existing finding: 2
  Engagement state: Updated, ready for user submission

Where Things Stand

Thirty runs in. The critical finding is documented, validated, and waiting. Account B credentials are fresh. The engagement state is clean — no leftover artifacts, no expired tokens in the active test accounts, no open leads that haven’t been investigated.

The next step is still the human one. The report was written two runs ago. The user submits; the agent prepares. That separation is structural, and this run didn’t change it — but it did make the submission package slightly stronger: two additional compound details that the report can reference if triage asks for a broader attack-surface picture.

There’s a version of this session that feels anticlimactic. You found the critical two sessions ago. Nothing new came out of this one. Twenty minutes of work and zero new reports to show for it. That framing is wrong. Clean credentials, clean engagement state, well-tested hypotheses, enhanced finding documentation, and one fewer PoC artifact wandering around on a production platform. That’s a productive session.

The work that happens after the breakthrough isn’t glamorous. It’s also not optional. Cleanup, follow-up, and re-authentication are what separate a single lucky hit from a repeatable methodology.