FIELD NOTES · 2026-06-30

What we figured out about building agents

One brain-trust session, distilled to the parts that travel: the app is becoming a surface an agent paints on, the words "agent," "harness," and "loop" finally have edges, the 2026 toolchain is a pipeline not a chat box, and the model race is bending away from raw coding toward intelligence and cost.

synthesised by the note-taker from a recorded session · AI edition

A through-line ran under the whole conversation: stop shipping apps, start shipping agents — and the craft is in the loop, the harness, and the surfaces, not the chat window.

01The agent behind the app

The cleanest reframe of the night: the agent lives behind the app, and the app is just a surface it projects onto — web, mobile, WhatsApp, chat, a browser extension. The agent paints the surface and refreshes it on a schedule or on demand. It's the early-app-store moment again, but for agentic software.

Because the agent owns the output, it can adapt to who's reading: dense signal for an expert, an auto-added explainer layer for a novice. The open-source fire-tracking agent is the worked example — it polls its sources every two hours (or on a manual refresh) and renders a live widget: mile-radius rings, fire locations, wind direction and strength. One agent, one canvas it keeps repainting.

An app is a static surface. An agent with surfaces is an infinite, personalized content machine.

02Agent vs. harness vs. loop

The words get used interchangeably; they shouldn't. The stack, bottom to top:

LLMraw token stream + Harnessintercepts tokens → tools = Agentharness + a body:files · compute · sandbox · net

Claude Code, Codex, OpenClaw, PI are harnesses. OpenClaw on a Mac Mini is an agent; OpenClaw alone is just a harness. Black-box harnesses hide their system prompts — power and a sharp edge both.

Two loops, not one

Worth separating: a cadence loop runs continuously against a metric, improving by increments — Kaizen for software (a check-in agent every few hours). A build loop aims each pass at ~80% of the target and iterates to done without micromanaging. Different tools for different jobs.

Two more ideas with legs. Agent "telepathy": agents share state by reading each other's HTML wiki / filesystem directly — no second model call to ask the other agent. Faster, cheaper. And auto-dream: a nightly pass that reviews the day's session logs and indexes them into a memory tree — agents that "dream" into HTML wikis navigable by both machines and humans. (This very page is one of those.)

03The 2026 toolstack

The harness stopped being a chat box and became a pipeline. Claude Code as the conductor, running Opus 4 (strategy, design, orchestration) and GPT-5.5 (terse, strict coding) in parallel. Around it:

An abstraction over git worktrees — no manual git, ever.

Ship

Design → PR → deploy, end to end. Shows output as HTML to you-as-PM, not raw code; opens and merges PRs itself.

A design-system generator that breaks the AI-slop look and pushes toward Apple-tier output.

Ponytail

The "senior engineer with a ponytail" — picks the right libraries, refuses to over-build, cuts token use.

Auto-dream / retro

At session end the agent reviews its own process and PRs a workflow improvement back for approval.

Status bar

Shows the current pipeline stage; a hand icon signals exactly when you're needed.

04Where the models are going

The read on the field, and the bet underneath it:

ModelShapeUse
GPT-5.5strong, terse coderpure coding tasks
Opus 4conversational, strategicdesign, orchestration
GPT-5.6 upcomingthree sizes — Soul / Tara / Luna; ~⅓ the tokensTara ≈ Opus 4 at ~half the price
GLM 5.2≈ Opus 4 on benchmarks, 6–10× cheapernot yet a daily coding driver
Fusion untestedreportedly Fable-level, on OpenRoutertbd
Coding capability is approaching a ceiling. The next round of differentiation is intelligence and cost, not raw code.— the working thesis: in ~6 months cheap models handle today's coding; expensive models pull ahead elsewhere.

05Cloudflare-native agents

The substrate underneath all of it. The whole stack on Cloudflare — instantly scalable, best pricing, with $10k in startup credits available through their program. Flu, the open-source agent framework, is built on Cloudflare's abstractions (agents as Durable Objects, close to the user). The Astro team's new framework maps cleanly onto the same edge — a recent switch.

2 hrs
fire agent refresh cadence
$10k
Cloudflare startup credits
~⅓
tokens, next-gen models

06Home Zero — an AI-native build

One project carried the thesis into product. Home Zero repositioned away from "the next Notion for real estate" toward AI-native home search, aimed at "Claude Dads (and moms)" — technical users already living in Claude Code, who'll operate it from their own session and bring their own API keys. Three agents do the work:

Home Advisorknows the buyer Market Agentlistings · trends Home Agentper-property twin Buyer surfacesapp · WhatsApp

The advisor queries the market and home agents on the buyer's behalf, then paints the buyer's surfaces. Mission: a public-benefit company driving housing costs toward zero.

07An AI-native trading firm

Nick's build, the most autonomous in the room: an AI-native firm trading real capital on Coinbase, by itself. The architecture centers on a single approved actor — "The Cleaner" — the only agent allowed to execute live trades. It catches signals 24/7, which is the entire point: it solves the human sleep-and-distraction problem. A supporting cast feeds and fences it:

The Cleaner

The approved systems agent — the only one that touches live money. Watches every signal, around the clock.

Edge Reaper

Scrapes forums for trading systems and back-tests them against historical data.

Signal Forge Factory

Parses each system into JSON and runs it through an analysis pipeline.

Risk Officer

Governs the live-money gates before The Cleaner is allowed to act.

AI Cost Governor

Allocates a daily token budget per agent; an emergency kill switch sits on the command center.

Command center

A GUI of toggle switches, agent status, and report generation.

It's designed to self-improve: trade three times a day live, drop into research mode overnight, review its own trades, and update its approach. The current human workflow is itself a two-model pipeline — ChatGPT Pro as project manager generating every Codex prompt (Nick never writes one directly), a fresh thread each day to fight context drift. Next phase: once the v1 package is complete, drop the entire codebase into GPT-5.6 for a rapid v2 refactor.

08A trading dashboard — order-book intelligence

A quieter, human-in-the-loop cousin: an observational agent that watches price and the order book in real time, plus historical analysis. A working methodology under test is the "double-zero theory" — price held at an even dollar (e.g. $59,820.00) reads as a market-maker signaling continuation of downward momentum; the agent's job is to help develop and validate methods like it.

The real moat is data: order-book history that brokers don't collect or surface — no broker with hundreds of thousands of accounts provides it, an intentional omission. A secondary, much-later educational layer targets users who already have financial literacy (candles, averages, basic instruments); the recommended onboarding is babypips.com — preschool through middle school only — deliberately avoiding the indicator-stacking that doesn't hold up over time. Philosophy throughout: augment the trader's judgment, don't replace it. Infra note: currently on AWS, where it hit billing overages (thresholds and alerts now in place) — Pete's suggestion was to migrate to Cloudflare to kill the surprise costs.

09Paonia — the civic thread

The same person, a very different arena: Pete's civic work in the Town of Paonia — a short-term-rental ordinance, a string of open-records disputes, and a mayoral recall. It opens on a number nobody agrees on. Former administrator Stefan Wynn's public claim of 72 active STRs anchors the ordinance — and three independent counts contradict it:

Stefan / Granicus
72
Operators group
26–28
Pete's scrape
~20
Ethel, by hand
14–15
Granicus serves real-estate investors and counts potential + double-listed properties — the wrong tool for a census.

The records fight

The load-bearing escalation. In March 2026, Wynn — then administrator — filed police reports characterizing residents' protected speech as threats, naming Pete (passing out flyers) and Kaja (a community Facebook post).

March 2026
Files police reports framing flyering + a Facebook post as threats.
Hours later
Police find no basis to act.
Same day
Resigns — from his Town email, as administrator — citing fear for his family.
After
The board posts no agenda item, no acknowledgment, no reply to the complaints.
Instead
Kala Rose tables a generic free-speech affirmation — never the conduct itself; Sheree says she may not be able to discuss it.
Reconstructed from the CORA (open-records) release.

The records, obtained under CORA, show Wynn forwarded the emails to his attorney and then to police; the released PDFs carried improperly applied redactions (black boxes that lifted off to reveal the text underneath). The attorney correspondence suggests he sought a cease-and-desist against the flyers and named Kaja an "accomplice" — though she distributed none. Clerk Samira added two back-dated personnel incidents to Kaja's record the same day Kaja went to the mayor. Pete's framing is structural: a board-oversight failure, not just a Wynn problem — and his ask is narrow: one special meeting, on the record, with Kaja and Zane cleared by name.

Reform & recall

With the administrator seat vacant, Pete argues it's the moment to restructure: split the administrator role (only 25% town-funded; 75% rides on utilities) and separate the treasurer; replace the ~$1,000/meeting town attorney with a trained $40/hr staffer; CORA-request and AI-diff the $10k Indiana HR-manual contract (bid at $7,500 to duck the discretionary cap) against the free one he already published; drop parking minimums as Mancos did. Running underneath: a decade-long cycle of administrator trouble, including a prior $400k embezzlement.

Jul 16–20
recall petition window
85
signatures needed
Nov 3
ballot if it qualifies
~20 vs 72
actual STRs vs claimed