top of page

Off-the-Shelf Datasets

Accelerate your time-to-market and reduce development costs with our off-the-shelf datasets. We've handled the heavy lifting of data collection and annotation to provide you with high-quality, AI-ready data, available for immediate deployment. Focus on building your models, not on sourcing your data.

Chinese NUSPEE Exams QAE Data

Catalog no.: KS2025M01

This dataset offers 25,000 Question-Answer-Explanation (QAE) triplets from China's China’s National Unified Subjects of the Postgraduate Entrance Exam (NUSPEE, 研究生考试全国统一命题科目), a standardized test comparable to the GRE. Containing 428 complete exams across 16+ subjects like Mathematics, Law, and Computer Science, this is the most comprehensive public archive of its kind. The inclusion of detailed explanations for most questions makes this dataset exceptionally valuable for training and assessing advanced AI reasoning capabilities.

Sample data can be downloaded here.

2

Chinese USSPEE STEM Exams QAE Data

Catalog no.: KS2025M02

This dataset offers thousands of Question-Answer-Explanation (QAE) triplets from the STEM portion of China's University-Specific Postgraduate Entrance Exams (USSPEE, 研究生入学考试招生单位自命题科目). Unlike the national unified tests, these exams are created by individual universities, featuring highly specialized questions for their specific graduate programs. Covering over 100 distinct subject areas like Signals and Systems, Advanced Mathematics, Computer Science, and Pharmaceutics, the inclusion of detailed explanations makes this dataset exceptionally valuable for training AI models in deep, domain-specific reasoning.

Sample data can be downloaded here.

3

Chinese Administrative Aptitude Test QAE Data

Catalog no.: KS2025M03

This dataset contains thousands of Question-Answer-Explanation (QAE) triplets from the Administrative Aptitude Test (AAT,行政职业能力测验), a core component of China's highly competitive Civil Service Examination(国家公务员考试). Encompassing 700 complete national and provincial exams administered between 2000 and 2025, this collection is a comprehensive resource for aptitude assessment. The dataset tests a broad range of practical skills—including verbal, quantitative, logical, and data interpretation reasoning—with detailed explanations that make it an exceptional tool for training and evaluating versatile AI models.

Sample data can be downloaded here.

4

Chinese K-12 STEM Exams QAE Data

Catalog no.: KS2025M04

This dataset provides a massive collection of Question-Answer-Explanation (QAE) triplets from China's high-stakes high school (Zhongkao, 中考) and college (Gaokao, 高考) entrance exams. Sourced from over 10,500 official and mock exams between 1990 and 2025, it covers core STEM subjects including Mathematics, Physics, Chemistry, and Biology. Given the Gaokao's reputation as one of the world's most challenging standardized tests, this dataset serves as an exceptional resource for training and benchmarking AI in advanced scientific problem-solving, with detailed explanations guiding the reasoning process.

Sample data can be downloaded here.

bottom of page