Naver Cloud has introduced “Omnimodal HyperCLOVA X,” a groundbreaking AI model designed to transcend the limitations of existing large language models (LLMs) by integrating sensory capabilities. Unlike traditional LLMs that rely solely on text comprehension, Omnimodal HyperCLOVA X can process and understand various data types, including audio, images, and video, enhancing its applicability in real-world scenarios. This new model aims to accelerate the development of AI agents usable in everyday life and industrial environments.
Naver Cloud unveiled two open-source models: the “HyperCLOVA X Seed 8B Omni,” the first domestically developed model with a native omnimodal structure, and the “HyperCLOVA X Seed 32B Sync,” which combines visual, audio, and tool-utilization capabilities. These models are set to reduce development and operational costs by directly understanding complex inputs such as graphs and charts, which are prevalent in industrial settings.
The 32B Sync model showcased impressive performance, achieving top scores in major subjects of the College Scholastic Ability Test, including perfect scores in English and Korean history, by solving problems directly from photographs without text input. This highlights its cost-efficiency and competitiveness compared to larger models.
Naver Cloud emphasizes that expanding AI’s sensory capabilities while enhancing reasoning skills significantly improves its problem-solving abilities, marking a step towards AI that is not only large but practically useful.

