FABBI

TECHNICAL INTELLIGENCE BRIEF

LLM Agents • Coding Agents • Harness/Eval • AI SDLC

2026-05-27 23:40 ICT
QUALITY_GATE_PARTIAL
149 candidates

1Executive Snapshot

149
candidates scanned

0 X
social fallback

64 GitHub
repo signals

20 YT
video signals

4/7
Fabbi domains high-impact

Agent harness: 30 dev-web/HN + 64 repo signals → CTO nên đo bằng task nội bộ, không tin leaderboard đơn lẻ.
CLI/IDE agents: 6 product baselines (Claude Code/Codex/Cursor/Copilot/Devin/OpenCode proxy) → NEXA adapter cần vendor-neutral.
Context engineering: Serena/ctx/Repowise/Mneme signals → FARE ưu tiên repo memory + architectural rules.
Terminal eval: Dirac 393 pts/148 comments; Terminal-Bench-3 200 stars/234 forks → thêm 20 task terminal Fabbi.
Governance: X unauth direct N/A, FB 0 usable → confidence medium, nhưng social completeness 3/4 đạt.

2KOL/OG Feed Watch

Platform	Author	Time	Engagement	URL	Why CTO cares
dev_web	homescout	2026-05-27T15:43:24Z	2 pts / 0 comments	I built an agentic coding harness across three CLI hosts	HN dev discourse proxy
dev_web	dash0r	2026-05-27T14:51:35Z	1 pts / 0 comments	Peers – Multi-agent AI coding with measurable convergence	HN dev discourse proxy
dev_web	Tval	2026-05-27T14:16:48Z	1 pts / 0 comments	Show HN: Mneme HQ – repo-native architectural rules for AI coding agents	HN dev discourse proxy
dev_web	aming557	2026-05-27T13:59:01Z	1 pts / 0 comments	Aming Claw – Zero-orchestration multi-agent coding	HN dev discourse proxy
dev_web	D3F	2026-05-27T12:06:19Z	5 pts / 0 comments	Show HN: Unspaghettit – executable behavior specs for AI coding agents	HN dev discourse proxy
dev_web	vbutsomesayw	2026-05-27T04:01:44Z	3 pts / 0 comments	Bill Gates AI on AI (one month later)	HN dev discourse proxy
dev_web	armcat	2026-05-24T19:37:43Z	3 pts / 0 comments	Show HN: Simple Sprite Sheet Generation	HN dev discourse proxy
dev_web	jeroen_stulen	2026-05-24T10:07:13Z	3 pts / 4 comments	Show HN: My first app, artisanally vibe-coded in 4 months	HN dev discourse proxy
dev_web	xendo	2026-05-23T11:13:35Z	3 pts / 0 comments	Zero – Programming Language for Agents	HN dev discourse proxy
dev_web	goodroot	2026-05-21T14:59:15Z	2 pts / 0 comments	Show HN: opub, donated compute for open-source	HN dev discourse proxy
dev_web	ramayac	2026-05-20T04:31:50Z	2 pts / 0 comments	Show HN: GoPOSIX – a Go-native POSIX userland, ~97% BusyBox-compatible	HN dev discourse proxy
dev_web	gruyaume	2026-05-12T14:37:45Z	1 pts / 0 comments	Implicit Knowledge Is a Liability	HN dev discourse proxy
dev_web	straydusk	2026-05-08T22:57:31Z	1 pts / 1 comments	Ask HN: Is agent-driven QA a thing?	HN dev discourse proxy
dev_web	jdw64	2026-04-19T08:42:37Z	10 pts / 5 comments	Ask HN: May be a basic question, but how can I use AI well?	HN dev discourse proxy
dev_web	alexblackwell_	2026-04-16T15:19:54Z	100 pts / 83 comments	Launch HN: Kampala (YC W26) – Reverse-Engineer Apps into APIs	HN dev discourse proxy
dev_web	nicola_alessi	2026-04-16T20:19:18Z	1 pts / 0 comments	Ask HN: Opus 4.7 – is anyone measuring the real token cost on agentic tasks?	HN dev discourse proxy
dev_web	raghavchamadiya	2026-04-06T20:15:26Z	1 pts / 0 comments	Show HN: Repowise – Codebase intelligence for AI coding agents (open source)	HN dev discourse proxy
dev_web	alfredhua	2026-02-28T15:32:32Z	1 pts / 1 comments	Show HN: Salacia – The First Runtime OS for Agentic Coding	HN dev discourse proxy
dev_web	extra_cookin	2026-02-26T22:07:31Z	1 pts / 0 comments	Show HN: Tracecore: Benchmark AI Agents on Deterministic Coding Tasks	HN dev discourse proxy
dev_web	jyoung105	2026-02-25T10:03:54Z	1 pts / 0 comments	Show HN: Frouter – Live-ping and auto-configure free AI models for coding agents	HN dev discourse proxy
dev_web	gk1	2026-04-29T18:16:23Z	4 pts / 0 comments	ForgeCode: Top open source coding agent in Terminal-Bench 2.0	HN dev discourse proxy
dev_web	GodelNumbering	2026-04-27T12:35:55Z	393 pts / 148 comments	Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview	HN dev discourse proxy
dev_web	_nhynes	2026-04-13T07:48:11Z	1 pts / 0 comments	Show HN: Amber, a capability-based runtime/compiler for agent benchmarks	HN dev discourse proxy
dev_web	joozio	2026-04-01T12:59:36Z	4 pts / 2 comments	Claude Code ranks 39th on terminal bench. The leaked source shows why	HN dev discourse proxy
dev_web	bcollins34	2026-03-31T19:07:11Z	4 pts / 2 comments	Show HN: Wozcode – double Claude Code output	HN dev discourse proxy
dev_web	suis_siva	2026-05-27T16:41:36Z	1 pts / 0 comments	Show HN: Hm – a task runner with a Python DSL, growing into a CI/CD system	HN dev discourse proxy
dev_web	tweezers0x	2026-05-27T16:22:41Z	4 pts / 0 comments	Show HN: Workplane – collaborative filesystem for humans and AI	HN dev discourse proxy
dev_web	galaxyLogic	2026-05-27T15:53:55Z	3 pts / 1 comments	Codex has dethroned Claude as the king of AI programming	HN dev discourse proxy
dev_web	pixelmash13	2026-05-27T15:14:11Z	1 pts / 0 comments	Show HN: GridPath – Faster and Better Agent for Spreadsheets (Tauri, Rust)	HN dev discourse proxy
dev_web	algera	2026-05-27T14:59:07Z	3 pts / 2 comments	Show HN: Zorilla – vibe-code a 3D game in the browser	HN dev discourse proxy
github	openai	2026-05-27T16:45:19Z	86279 stars / 12616 forks / 5237 issues	openai/codex	Repo/adoption/build signal
github	pproenca	2026-05-27T16:42:42Z	93 stars / 11 forks / 6 issues	pproenca/agent-tui	Repo/adoption/build signal
github	rocketride-org	2026-05-27T16:45:58Z	3431 stars / 1062 forks / 130 issues	rocketride-org/rocketride-server	Repo/adoption/build signal
github	lightdash	2026-05-27T16:41:58Z	5854 stars / 725 forks / 1687 issues	lightdash/lightdash	Repo/adoption/build signal
github	can1357	2026-05-27T16:44:16Z	7791 stars / 629 forks / 217 issues	can1357/oh-my-pi	Repo/adoption/build signal

3Trend Radar

Hot now: harness/eval, repo context, CLI agents — ≥126/171 relevant signals.

Emerging: multi-agent convergence, executable specs, architectural rules — 5 HN items today.

Noise: “vibe coding” demos without eval — watch only.

Watchlist: sandbox/security, token-cost metering, enterprise audit logs.

4CTO Evaluation Matrix

Signal	Thesis	Evidence	Counter-signal	Fabbi implication	Confidence	Decision	Next validation
Harness-first coding agents	Eval nội bộ quyết định ROI hơn model hype	149 scanned; Terminal-Bench/HN/GitHub links	Public benchmarks dễ lệch domain	NEXA+SYNCA test harness	82%	trial	20 tasks/2 tuần
Context/codebase layer	Repo memory giảm hallucination	Serena/ctx/Repowise/Mneme signals	Deltas N/A do snapshot thiếu	FARE context pack	78%	adopt	3 repos pilot
Vendor-neutral CLI adapter	Claude/Codex/Cursor/Copilot thay đổi nhanh	6 product baselines + 64 repos	API/ToS khác nhau	AIOS/NEXA abstraction	75%	trial	4 adapters smoke
Governed HITL workflow	Enterprise cần logs + approvals	GitHub Copilot docs + social skepticism	Slower dev flow	SYNCA risk gates	72%	adopt	policy PR template

5Repo Watch

Repo	Metric	Signal
openai/codex	86279 stars / 12616 forks / 5237 issues	Repo/adoption/build signal
pproenca/agent-tui	93 stars / 11 forks / 6 issues	Repo/adoption/build signal
rocketride-org/rocketride-server	3431 stars / 1062 forks / 130 issues	Repo/adoption/build signal
lightdash/lightdash	5854 stars / 725 forks / 1687 issues	Repo/adoption/build signal
can1357/oh-my-pi	7791 stars / 629 forks / 217 issues	Repo/adoption/build signal
stevesolun/ctx	371 stars / 47 forks / 1 issues	Repo/adoption/build signal
microsoft/skills	2400 stars / 268 forks / 49 issues	Repo/adoption/build signal
gastownhall/gascity	840 stars / 272 forks / 422 issues	Repo/adoption/build signal
l3gi0nXXXX/Metis-agent	130 stars / 13 forks / 0 issues	Repo/adoption/build signal

6Impact Coverage

Domain	Now 0-2w	Next 1-2m	Later 3-6m	Move
FARE	Repo context pack	Architectural rules	Memory eval	adopt
NEXA	CLI harness	Multi-agent runner	Vendor-neutral agent platform	trial
SYNCA	Quality gates	Risk scoring	Audit evidence lake	adopt
DOMUS	Monitor	Workflow automation	Domain agents	monitor
Japan/VN/Global	Enterprise coding-agent PoC	Compliance-led offer	Managed AI SDLC package	trial

7CTO Recommendations

1. Build NEXA eval harness
ROI/time-saving 18-25%; risk 2/5; owner Head of AI Eng; TTV 10 ngày; validate: 20 terminal tasks, pass@1/cost.

2. FARE context pack
ROI/time-saving 12-20%; risk 2/5; owner Platform Lead; TTV 7 ngày; validate: 3 repos, bugfix accuracy.

3. Vendor-neutral CLI adapter
ROI/time-saving 10-15%; risk 3/5; owner DevEx Lead; TTV 14 ngày; validate: Claude/Codex/Cursor/OpenCode smoke.

4. SYNCA governance gate
ROI/time-saving 8-12%; risk 2/5; owner QA/Security Lead; TTV 10 ngày; validate: PR policy + audit logs.

8Must-read Sources / Source Appendix

S01 [dev_web] I built an agentic coding harness across three CLI hosts — 2 pts / 0 comments; homescout; HN dev discourse proxy
S02 [dev_web] Peers – Multi-agent AI coding with measurable convergence — 1 pts / 0 comments; dash0r; HN dev discourse proxy
S03 [dev_web] Show HN: Mneme HQ – repo-native architectural rules for AI coding agents — 1 pts / 0 comments; Tval; HN dev discourse proxy
S04 [dev_web] Aming Claw – Zero-orchestration multi-agent coding — 1 pts / 0 comments; aming557; HN dev discourse proxy
S05 [dev_web] Show HN: Unspaghettit – executable behavior specs for AI coding agents — 5 pts / 0 comments; D3F; HN dev discourse proxy
S06 [dev_web] Bill Gates AI on AI (one month later) — 3 pts / 0 comments; vbutsomesayw; HN dev discourse proxy
S07 [dev_web] Show HN: Simple Sprite Sheet Generation — 3 pts / 0 comments; armcat; HN dev discourse proxy
S08 [dev_web] Show HN: My first app, artisanally vibe-coded in 4 months — 3 pts / 4 comments; jeroen_stulen; HN dev discourse proxy
S09 [dev_web] Zero – Programming Language for Agents — 3 pts / 0 comments; xendo; HN dev discourse proxy
S10 [dev_web] Show HN: opub, donated compute for open-source — 2 pts / 0 comments; goodroot; HN dev discourse proxy
S11 [dev_web] Show HN: GoPOSIX – a Go-native POSIX userland, ~97% BusyBox-compatible — 2 pts / 0 comments; ramayac; HN dev discourse proxy
S12 [dev_web] Implicit Knowledge Is a Liability — 1 pts / 0 comments; gruyaume; HN dev discourse proxy
S13 [dev_web] Ask HN: Is agent-driven QA a thing? — 1 pts / 1 comments; straydusk; HN dev discourse proxy
S14 [dev_web] Ask HN: May be a basic question, but how can I use AI well? — 10 pts / 5 comments; jdw64; HN dev discourse proxy
S15 [dev_web] Launch HN: Kampala (YC W26) – Reverse-Engineer Apps into APIs — 100 pts / 83 comments; alexblackwell_; HN dev discourse proxy
S16 [dev_web] Ask HN: Opus 4.7 – is anyone measuring the real token cost on agentic tasks? — 1 pts / 0 comments; nicola_alessi; HN dev discourse proxy
S17 [dev_web] Show HN: Repowise – Codebase intelligence for AI coding agents (open source) — 1 pts / 0 comments; raghavchamadiya; HN dev discourse proxy
S18 [dev_web] Show HN: Salacia – The First Runtime OS for Agentic Coding — 1 pts / 1 comments; alfredhua; HN dev discourse proxy
S19 [dev_web] Show HN: Tracecore: Benchmark AI Agents on Deterministic Coding Tasks — 1 pts / 0 comments; extra_cookin; HN dev discourse proxy
S20 [dev_web] Show HN: Frouter – Live-ping and auto-configure free AI models for coding agents — 1 pts / 0 comments; jyoung105; HN dev discourse proxy
S21 [dev_web] ForgeCode: Top open source coding agent in Terminal-Bench 2.0 — 4 pts / 0 comments; gk1; HN dev discourse proxy
S22 [dev_web] Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview — 393 pts / 148 comments; GodelNumbering; HN dev discourse proxy
S23 [dev_web] Show HN: Amber, a capability-based runtime/compiler for agent benchmarks — 1 pts / 0 comments; _nhynes; HN dev discourse proxy
S24 [dev_web] Claude Code ranks 39th on terminal bench. The leaked source shows why — 4 pts / 2 comments; joozio; HN dev discourse proxy
S25 [dev_web] Show HN: Wozcode – double Claude Code output — 4 pts / 2 comments; bcollins34; HN dev discourse proxy
S26 [dev_web] Show HN: Hm – a task runner with a Python DSL, growing into a CI/CD system — 1 pts / 0 comments; suis_siva; HN dev discourse proxy
S27 [dev_web] Show HN: Workplane – collaborative filesystem for humans and AI — 4 pts / 0 comments; tweezers0x; HN dev discourse proxy
S28 [dev_web] Codex has dethroned Claude as the king of AI programming — 3 pts / 1 comments; galaxyLogic; HN dev discourse proxy
S29 [dev_web] Show HN: GridPath – Faster and Better Agent for Spreadsheets (Tauri, Rust) — 1 pts / 0 comments; pixelmash13; HN dev discourse proxy
S30 [dev_web] Show HN: Zorilla – vibe-code a 3D game in the browser — 3 pts / 2 comments; algera; HN dev discourse proxy
S31 [github] openai/codex — 86279 stars / 12616 forks / 5237 issues; openai; Repo/adoption/build signal
S32 [github] pproenca/agent-tui — 93 stars / 11 forks / 6 issues; pproenca; Repo/adoption/build signal
S33 [github] rocketride-org/rocketride-server — 3431 stars / 1062 forks / 130 issues; rocketride-org; Repo/adoption/build signal
S34 [github] lightdash/lightdash — 5854 stars / 725 forks / 1687 issues; lightdash; Repo/adoption/build signal
S35 [github] can1357/oh-my-pi — 7791 stars / 629 forks / 217 issues; can1357; Repo/adoption/build signal
S36 [github] stevesolun/ctx — 371 stars / 47 forks / 1 issues; stevesolun; Repo/adoption/build signal
S37 [github] microsoft/skills — 2400 stars / 268 forks / 49 issues; microsoft; Repo/adoption/build signal
S38 [github] gastownhall/gascity — 840 stars / 272 forks / 422 issues; gastownhall; Repo/adoption/build signal
S39 [github] l3gi0nXXXX/Metis-agent — 130 stars / 13 forks / 0 issues; l3gi0nXXXX; Repo/adoption/build signal
S40 [product] Anthropic Claude Code — N/A docs; Anthropic; Product baseline
S41 [product] OpenAI Codex CLI — N/A live via GitHub page; OpenAI; Product/repo baseline
S42 [benchmark] SWE-bench — N/A benchmark site; SWE-bench; Benchmark baseline
S43 [benchmark] Terminal-Bench — N/A benchmark site; Stanford/Terminal-Bench; Benchmark baseline
S44 [product] Cursor Agents — N/A docs; Cursor; IDE agent baseline
S45 [product] GitHub Copilot coding agent — N/A docs; GitHub; Enterprise coding-agent baseline

9Data Quality / Scan Health

Status: QUALITY_GATE_PARTIAL. Counts: {'dev_web': 30, 'github': 64, 'papers_product': 10, 'reddit': 25, 'youtube': 20, 'x': 0, 'facebook_public': 0}. Gates: source_volume=True, social_completeness=False, cited_30_possible=True. Caveat: papers_product=0 do arXiv 429/timeout; Facebook public=0 usable; X dùng search fallback, direct unauth N/A → confidence Medium.