The comparison involved 100 games of Mastermind, a reasoning task requiring the models to deduce a hidden code through logical guesses informed by feedback hints. Key metrics included success rate ...
today revealed performance highlights of its flagship product Genius winning the code-breaking game Mastermind in a side by side comparison with a leading generative AI model, OpenAI’s o1 ...
“Mastermind was the perfect choice for this test because it requires reasoning through each step logically, predicting the cause-and-effect outcomes of its decisions, and dynamically adapting to crack ...