Media Summary: Is losing 20% accuracy worth paying 20% less on the cost of your Ralph Wiggum is “just enough orchestration.” It's a simple way to coordinate multiple runs of coding This talk was recorded at NDC Sydney in Sydney, Australia. Attend ...

The Openhands Index Benchmarking Llms As Software Engineering Agents - Detailed Analysis & Overview

Is losing 20% accuracy worth paying 20% less on the cost of your Ralph Wiggum is “just enough orchestration.” It's a simple way to coordinate multiple runs of coding This talk was recorded at NDC Sydney in Sydney, Australia. Attend ... In this AI Research Roundup episode, Alex discusses the paper: 'ProgramBench: Can Language Models Rebuild Programs From ... Build meeting bots and desktop recording apps in hours - gets you $100 in free credits In today's we'll ... Welcome to an eye-opening exploration of the revolutionary

In this AI Research Roundup episode, Alex discusses the paper: 'AcademiClaw: When Students Set Challenges for AI In this AI Research Roundup episode, Alex discusses the paper: "AIRS-Bench: a Suite of Tasks for Frontier AI Research Science ... Want to play with the technology yourself? Explore our interactive demo → Learn more about the ...

Photo Gallery

The OpenHands Index: Benchmarking LLMs as Software Engineering Agents
How important is benchmarking and testing different LLMs?
Which AI Model Wins at Real Coding? OpenHands Index Results | Graham Neubig
Using AI models to determine agent quality
MiniMax Is Now Free on OpenHands + Benchmark Fixes & AI Dev Workstation Demo
Are LLMs good software engineers? - Anthony Shaw - NDC Sydney 2026
ProgramBench: New Coding Benchmark for LLM Agents
OpenHands: Open-source AI Software Development Agents
7 new open source AI tools you need right now…
OpenHands Community Update: Agent Canvas, GPT-5.5 & LLM Profiles
OpenHands runs natively on Windows! Learn about Agent Canvas!
AgentBench: NEW Benchmarking Tool CHANGES The LLM LEADERBOARD (Installation Tutorial)
View Detailed Profile
The OpenHands Index: Benchmarking LLMs as Software Engineering Agents

The OpenHands Index: Benchmarking LLMs as Software Engineering Agents

The OpenHands Index

How important is benchmarking and testing different LLMs?

How important is benchmarking and testing different LLMs?

Is losing 20% accuracy worth paying 20% less on the cost of your

Which AI Model Wins at Real Coding? OpenHands Index Results | Graham Neubig

Which AI Model Wins at Real Coding? OpenHands Index Results | Graham Neubig

If you're deploying AI

Using AI models to determine agent quality

Using AI models to determine agent quality

Ralph Wiggum is “just enough orchestration.” It's a simple way to coordinate multiple runs of coding

MiniMax Is Now Free on OpenHands + Benchmark Fixes & AI Dev Workstation Demo

MiniMax Is Now Free on OpenHands + Benchmark Fixes & AI Dev Workstation Demo

MiniMax is now free to use on

Are LLMs good software engineers? - Anthony Shaw - NDC Sydney 2026

Are LLMs good software engineers? - Anthony Shaw - NDC Sydney 2026

This talk was recorded at NDC Sydney in Sydney, Australia. #ndcsydney #ndcconferences #developer #softwaredeveloper Attend ...

ProgramBench: New Coding Benchmark for LLM Agents

ProgramBench: New Coding Benchmark for LLM Agents

In this AI Research Roundup episode, Alex discusses the paper: 'ProgramBench: Can Language Models Rebuild Programs From ...

OpenHands: Open-source AI Software Development Agents

OpenHands: Open-source AI Software Development Agents

OpenHands

7 new open source AI tools you need right now…

7 new open source AI tools you need right now…

Build meeting bots and desktop recording apps in hours - https://www.recall.ai/fireship gets you $100 in free credits In today's we'll ...

OpenHands Community Update: Agent Canvas, GPT-5.5 & LLM Profiles

OpenHands Community Update: Agent Canvas, GPT-5.5 & LLM Profiles

In this

OpenHands runs natively on Windows! Learn about Agent Canvas!

OpenHands runs natively on Windows! Learn about Agent Canvas!

Learn how to run

AgentBench: NEW Benchmarking Tool CHANGES The LLM LEADERBOARD (Installation Tutorial)

AgentBench: NEW Benchmarking Tool CHANGES The LLM LEADERBOARD (Installation Tutorial)

Welcome to an eye-opening exploration of the revolutionary

Automating Large Scale Refactors with Parallel Agents - Robert Brennan, OpenHands

Automating Large Scale Refactors with Parallel Agents - Robert Brennan, OpenHands

Today's

OpenHands + Devstral = A Fully Local Coding Agent

OpenHands + Devstral = A Fully Local Coding Agent

OpenHands

AcademiClaw: New Academic Benchmark for LLM Agents

AcademiClaw: New Academic Benchmark for LLM Agents

In this AI Research Roundup episode, Alex discusses the paper: 'AcademiClaw: When Students Set Challenges for AI

Flexibly Choose your Coding Model with OpenHands LLM Profiles

Flexibly Choose your Coding Model with OpenHands LLM Profiles

OpenHands LLM

AIRS-Bench: New Benchmark for LLM Research Agents

AIRS-Bench: New Benchmark for LLM Research Agents

In this AI Research Roundup episode, Alex discusses the paper: "AIRS-Bench: a Suite of Tasks for Frontier AI Research Science ...

Software Development Agents: What Works and What Doesn't - Robert Brennan, OpenHands

Software Development Agents: What Works and What Doesn't - Robert Brennan, OpenHands

The adoption of AI into

Using a Local Agentic Coding LLM through Slack or GitHub with OpenHands

Using a Local Agentic Coding LLM through Slack or GitHub with OpenHands

See how to use a locally hosted coding

What are Large Language Model (LLM) Benchmarks?

What are Large Language Model (LLM) Benchmarks?

Want to play with the technology yourself? Explore our interactive demo → https://ibm.biz/BdKetJ Learn more about the ...