Browserbase.com has launched the Arena - a place where you can run several Browser Use models in parallel (for example, the new Google Gemini vs Antropic) and see how they handle the task.
Link: https://arena.browserbase.com/
P.S. In general, Browser Use is the feature I’ve been looking forward to the most and at the same time the most "disappointing" in terms of quality.
