OpenAI’s ChatGPT 5.5 demonstrates strong performance in coordinating tools for command-line tasks but falls short on long-horizon software engineering challenges, according to two academic benchmarks.