Louisa - The Grid

We ran 21,000 agentic simulations. A blend of open-source models matched the top performers.

Most teams build around one model. Find the best, lock in, optimize around it. We tested the opposite: what happens when you route across multiple cost-efficient open-source models instead? On over 21,000 simulations of one of the hardest agentic benchmarks available, that blend scored 90.9%, within 7 points