NAVER Cloud Unveils Omnimodal HyperCLOVA X, Introducing a Step-by-Step Expansion Strategy Built from the Ground Up
NAVER Cloud Unveils Omnimodal HyperCLOVA X, Introducing a Step-by-Step Expansion Strategy Built from the Ground Up
NAVER Cloud Unveils Omnimodal HyperCLOVA X, Introducing a Step-by-Step Expansion Strategy Built from the Ground Up
- Two models released—an omnimodal model and a reasoning model—to accelerate the development of real-world AI agents
- Omnimodal architecture refined from the basics, with a focus on data differentiation, phased scale-up, and specialized model production
- Omnimodal AI highlighted as a next-generation foundation technology for understanding the real world
December 29, 2025
NAVER Cloud (CEO Kim Yu-won) has unveiled the first outcome of its “Omni Foundation Model” development project, which it is leading under the Ministry of Science and ICT’s “Independent AI Foundation Model” initiative. NAVER Cloud announced the open-source release of the nation’s first foundation model to apply a native omnimodal structure, “Native Omni Model (HyperCLOVA X SEED 8B Omni),” and a high-performance reasoning model, “HyperCLOVA X SEED 32B Think,” which enhances conventional reasoning-based AI with capabilities in vision, voice, and tool use. With these releases, the company is accelerating the development of AI agents that can be readily integrated into daily life and industrial settings.
Laying the groundwork for a “future technology” omni model, accelerating the transition to daily and industrial AI through data differentiation, phased scale-up, and specialized model production
The newly released “HyperCLOVA X SEED 8B Omni” fully adopts a native omnimodal structure, in which different forms of data—such as text, images, and audio—are learned together from the outset in a single model. Omnimodal AI can contextually integrate information across a shared semantic space, regardless of its form. This enables high applicability in real-world environments where speech, text, visual, and audio information interact simultaneously, making it a next-generation AI technology attracting growing attention. Because of this capability, global big tech companies are also positioning omnimodal AI as a core technology pillar in their next-generation foundation model strategies.
To maximize the potential of omnimodal AI, NAVER Cloud is adopting a strategy that extends beyond conventional training on Internet documents or image-based data, focusing on acquiring data that reflects diverse real-world contexts. Sung Nako, Executive Director of Hyperscale AI at NAVER Cloud, stated, “Even if you scale up a model, if the diversity of the data is limited, the AI’s problem-solving ability will inevitably be confined to specific domains or subjects.” He added, “That’s why the process of securing and refining differentiated real-world data, such as undigitized contextual data from daily life or spatial data reflecting regional geographic features, must come first.”
NAVER Cloud plans to begin a phased scale-up by training the model with differentiated data, having now validated its native omnimodal AI development methodology through this release. Unlike traditional multimodal approaches, which combine separate models for text, images, and speech, omnimodal AI features a single-model architecture, making it easier to scale. Building on this structure, the company aims to efficiently expand specialized omnimodal models in various sizes to support services closely integrated into both industry and daily life.
The model also features an omnimodal generation capability, enabling it to generate and edit images based on text prompts. By understanding the context of both text and images, the model produces output that reflects intended meaning, enabling natural execution of text comprehension and image generation/editing within a single model. This functionality, previously offered by global frontier AI models, demonstrates that NAVER Cloud has now achieved comparable multimodal generation capabilities.
Combining vision, voice, and tool capabilities with reasoning-based AI to develop omnimodal agents on par with global models
NAVER Cloud has also released “HyperCLOVA X SEED 32B Think” to validate the practical applicability and future potential of omnimodal AI agents. This model combines its reasoning-based AI with capabilities in visual understanding, voice interaction, and tool use to deliver an agent experience capable of understanding complex inputs and requests and solving problems.
According to benchmarking by global AI evaluation agency Artificial Analysis, the model demonstrated a performance range comparable to leading global AI models, based on a composite index covering 10 key benchmarks, including general knowledge, advanced reasoning, coding, and agentic tasks.
In category-specific evaluations, the model showed particular strength in areas closely tied to real-world use. It demonstrated superior performance compared to global models in key capabilities such as General Knowledge (Korean Text), Vision Understanding, and Agentic Task (tool-based problem-solving as an agent), proving its competency in handling complex tasks.
In addition, when applied to this year’s College Scholastic Ability Test (CSAT), the model achieved Grade 1 (top-tier) scores across all major subjects, including Korean, mathematics, English, and Korean history, and earned perfect scores in English and Korean history. The company noted that, unlike many AI models that require converting exam questions into text before input, this model directly understood and solved problems from image inputs, marking a key point of differentiation.
Sung added, “We confirmed that expanding AI’s sensory capabilities horizontally—across text, vision, and audio—while simultaneously enhancing its reasoning and problem-solving skills significantly improves its ability to address real-world challenges.” He continued, “We plan to continue scaling based on this robust foundational structure, believing that gradual expansion is the way to develop not just a large-scale model, but one that is truly practical and usable.”
NAVER Cloud plans to gradually expand the deployment of AI agents based on this omnimodal HyperCLOVA X in various domains, including search, commerce, content, public services, and industrial applications, accelerating the creation of a technology ecosystem that enables “AI for everyone.” (End)