NG Solution Team
Technology

How does DeepSeek’s new AI model use visual perception to compress text?

DeepSeek has introduced a groundbreaking multimodal AI model designed to efficiently process large and complex documents by significantly reducing the number of tokens required. This innovation leverages visual perception as a compression medium, allowing the model to handle vast amounts of text without a corresponding increase in computational costs. The open-source model, DeepSeek-OCR, now available on platforms like Hugging Face and GitHub, emerged from research into the use of vision encoders for text compression in large language models. DeepSeek claims this approach can reduce token usage by seven to 20 times, addressing the challenges of processing extensive text contexts in AI models. This development aligns with DeepSeek’s ongoing commitment to enhancing AI efficiency and reducing costs, building on their previous open-source models V3 and R1. The DeepSeek-OCR model features two primary components: the DeepEncoder and the DeepSeek3B-MoE-A570M decoder.

Related posts

How is Reviewly Revolutionizing the Product Review Landscape?

David Jones

How Are Trump’s Tariffs Reshaping Global Markets and Investor Strategies?

Emily Brown

Are Hackers Targeting the Ministry of Justice?

David Jones

Leave a Comment

This website uses cookies to improve your experience. We assume you agree, but you can opt out if you wish. Accept More Info

Privacy & Cookies Policy