HyperCLOVA X Outperforms Global AI Leaders in KMMLU Assessment, Highlighting Sovereign AI’s Competitive Edge
HyperCLOVA X Outperforms Global AI Leaders in KMMLU Assessment, Highlighting Sovereign AI’s Competitive Edge
HyperCLOVA X Outperforms Global AI Leaders in KMMLU Assessment, Highlighting Sovereign AI’s Competitive Edge
- Consisting of 35,030 expert-level questions across 45 fields, KMMLU provides a thorough assessment of both general and Korea-specific knowledge
- HyperCLOVA X achieves higher average scores compared to GPT-3.5-Turbo and Gemini-Pro, and even surpasses GPT-4 in Korea-specific knowledge
- NAVER CLOUD is enhancing the domestic AI ecosystem with the HyperCLOVA X solution, known for its security and performance capabilities, thereby accelerating global expansion by leveraging the proven competitiveness of sovereign AI in Korea
February 27, 2024
NAVER CLOUD (CEO Kim Yu-won) announced on February 27 that HyperCLOVA X has outperformed generative AIs from OpenAI and Google in the Measuring Massive Multitask Language Understanding in Korean (KMMLU*) assessment, showcasing its superior capabilities as a sovereign AI.
* KMMLU: Measuring Massive Multitask Language Understanding in Korean (https://arxiv.org/abs/2402.11548)
Led by the distinguished Korean open-source language model research team, “HAE-RAE,” KMMLU is an initiative aimed at creating AI performance evaluation metrics. It includes 35,030 questions to test expert-level knowledge across 45 domains, such as humanities, sociology, science, and technology. Approximately 80% of the questions are designed to evaluate universally relevant knowledge areas, including mathematical reasoning, while the remaining 20% are focused on assessing the ability to solve Korea-specific issues, such as the geography of the Korean Peninsula and Korean laws. This approach ensures a comprehensive and balanced evaluation of AI systems, measuring their global capabilities as well as their relevance to Korean users.
Previously, adapting the translated “MMLU,” a benchmark utilized by North American tech giants like OpenAI and Google for AI performance evaluation, for Korea posed challenges because of translation inaccuracies and cultural differences implied in numerous questions. KMMLU, comprising original questions in Korean, provides a more precise assessment of the Korean language comprehension of both local and international AI systems.
Research on KMMLU has shown that HyperCLOVA X outshines OpenAI’s GPT-3.5-Turbo and Google’s Gemini-Pro, demonstrating its superior performance in both General Knowledge and Korea-Specific Knowledge compared to leading global AI technologies. Moreover, it exceeds OpenAI’s GPT-4 in understanding Korea-specific knowledge, indicating HyperCLOVA X’s exceptional applicability in fields requiring local insights, such as education and legal information.
NAVER CLOUD is committed to evolving HyperCLOVA X into a secure and high-performing “Sovereign AI” solution, drawing on its validated strengths as showcased in the KMMLU evaluation. In October, they launched “Neurocloud for HyperCLOVA X,” a hybrid cloud service enabling clients to deploy HyperCLOVA X within private networks, thus bolstering data security. Furthermore, NAVER CLOUD is set to introduce a suite of corporate solutions in the future.
Sung Nako, the Head of Hyperscale AI at NAVER CLOUD, stated, “HyperCLOVA X represents a sovereign AI that melds Korea-specific problem-solving abilities with extensive knowledge. As it offers high-performance and secure solutions, it is becoming increasingly popular in local industries. With the growing global interest in AI that caters to native languages, we are well-positioned to expedite our entrance into the global market, leveraging the proven advantages of sovereign AI in Korea.”
NAVER CLOUD has played a crucial role in advancing Korea’s AI technology scene, notably by taking part in the development of the KMMLU for impartial assessments of AI models’ proficiency in the Korean language. In 2021, it unveiled the Korean Language Understanding Evaluation (KLUE) benchmark, collaborating with experts from approximately 30 companies and universities. Furthermore, last year, NAVER CLOUD released Korean datasets* designed to enhance the accuracy of hyperscale language models in Korea, which stems from interdisciplinary research collaborations spanning social sciences and law.
* SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration (https://arxiv.org/abs/2305.17696), KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application (https://arxiv.org/abs/2305.17701)