The open-source AI Agent project OpenSquilla recently launched its 0.4.0 version, introducing a groundbreaking coding workflow mode featuring an industry-first "self-verification" mechanism. Instead of just claiming that code is completed, the Agent now runs automated tests to generate a verifiable report proving correctness before final delivery, addressing the critical trust bottleneck in AI coding.
At the core of this self-verification is a closed-loop "Red-Green regression proof chain". The Agent first writes a test designed to fail to isolate the bug, then implements the code fix to turn the test from "red" to "green", and finally runs all existing tests to prevent regression. Any failure triggers an automatic repair loop executed within a sandboxed environment to ensure safety.
In official demonstrations, #OpenSquilla was tested on micrograd (a minimalist autograd engine by Andrej Karpathy), where it successfully fixed a subtle gradient calculation bug. The fixed outputs matched industry-standard PyTorch up to 10 decimal places. This represents the team's latest runtime implementation following their benchmark claw-swe-bench. Alongside this, OpenSquilla released signed desktop installers for macOS and Windows.
Aiming to "maximize Agent intelligence per unit cost", OpenSquilla leverages Learnable Harness to reduce token usage through local smart routing, on-demand skill loading, and tool preprocessing. Compared to OpenRouter, its smart router is 4.4% more accurate while cutting costs by 75%, reducing overall operational costs by 60%–80% in typical scenarios.
Founded by Wang Yunhe (former head of LLM R&D at a leading tech firm) and CTO Han Kai, the project has amassed thousands of GitHub stars within weeks of launch and completed its seed funding round, emerging as a key player in Harness and Agent-native development.
[AgentUpdate Depth Analysis] The evolution of AI coding from autocomplete tools like Cursor to autonomous developers like Devin has historically stumbled upon the "trust barrier." OpenSquilla 0.4.0's self-verification mechanism represents a pivotal shift by baking Test-Driven Development (TDD) directly into the Agent's cognitive loop. By enforcing a "Red-Green-Regression" verification chain, it transitions AI coding from subjective LLM self-evaluation to objective, runtime-level proof. This approach significantly reduces the human-in-the-loop overhead required for code review. In the broader AI Agent ecosystem, this design moves us closer to fully autonomous, production-ready #DevOps pipelines where reliability is mathematically and logically provable, transforming how enterprises scale AI-driven software development.