NG Solution Team
Technology

How does DeepSeek’s new AI model use visual perception to compress text?

DeepSeek has introduced a groundbreaking multimodal AI model designed to efficiently process large and complex documents by significantly reducing the number of tokens required. This innovation leverages visual perception as a compression medium, allowing the model to handle vast amounts of text without a corresponding increase in computational costs. The open-source model, DeepSeek-OCR, now available on platforms like Hugging Face and GitHub, emerged from research into the use of vision encoders for text compression in large language models. DeepSeek claims this approach can reduce token usage by seven to 20 times, addressing the challenges of processing extensive text contexts in AI models. This development aligns with DeepSeek’s ongoing commitment to enhancing AI efficiency and reducing costs, building on their previous open-source models V3 and R1. The DeepSeek-OCR model features two primary components: the DeepEncoder and the DeepSeek3B-MoE-A570M decoder.

Related posts

Could the iPhone 17 Pro Feature a Bold Orange Color?

Michael Johnson

How to watch ‘Big Noon Kickoff’ live for free on 9/27/25?

David Jones

Is the ‘Ghost Shark’ Leading the Future of Defense Technology?

Jessica Williams

Leave a Comment

This website uses cookies to improve your experience. We assume you agree, but you can opt out if you wish. Accept More Info

Privacy & Cookies Policy