Hugging Face: Open LLM Leaderboard: DROP benchmark scoring issues require new evaluation harness | SignalBreak | SignalBreak