Hugging Face: BAAI FlagEval Debate: Multilingual, multi-model LLM evaluation via debates | SignalBreak | SignalBreak