<p>LLMs can solve complex mathematical problems, but they stumble on simple arithmetic. The team led by Christos Tzamos at Percepta found a way to fix this - they literally embedded a virtual machine into the model's weights.</p>
<p>Here’s how it works: a program is fed as tokens, and the model executes it step by step through its weights, outputting the result token by token. No external tools - all computations happen autoregressively inside the transformer itself.</p>
<p>The main problem with regular attention is that it’s too slow for real computations. Percepta circumvented this with a new decoding path that makes attention exponentially faster - almost constant work for each token. The result is over 30,000 tokens per second on a regular CPU.</p>
<p>In practice, the model executes programs in C (compiled to WebAssembly) for millions of steps and solves the most complex Sudoku puzzles with 100% accuracy.</p>
<p><a href="https://www.percepta.ai/blog/can-llms-be-computers">https://www.percepta.ai/blog/can-llms-be-computers</a></p>
<p>#ai #llm #research #percepta</p>