The Chinese AI start-up DeepSeek has introduced multimodal capabilities to its main chatbot, enabling it to process images and videos alongside text. This development aligns it with competitors already offering similar functions. The feature is currently available to select users for beta testing, following the release of DeepSeek’s new flagship model V4 and significant price reductions. The enhancement was announced by Chen Xiaokang, leader of the multimodal team, who highlighted the addition of an image recognition mode to the chat interface. This update is seen as essential for advancing beyond basic text interactions into more complex applications. Despite gaining international recognition in January 2025 for its model’s reasoning abilities and cost efficiency, DeepSeek had been criticized for lacking a multimodal offering.

