Chinese AI start-up DeepSeek has unveiled its data filtering processes for training models, highlighting concerns about potential “hallucination” and “abuse” risks. The Hangzhou-based company has emphasized its commitment to AI security, aligning with Beijing’s increased industry oversight. Data for pre-training is sourced from publicly available information and authorized third-party data, without intentions to collect personal data. DeepSeek employs automated filters to eliminate content like hate speech and spam, while using algorithmic detection and human review to address statistical biases in large datasets. Despite efforts to reduce model hallucinations through advanced techniques, the company acknowledges this issue remains unavoidable. Users are advised to seek professional guidance when necessary, as the models generate predictions rather than retrieving exact answers.

