top of page
Free Access for EMNLP 25 Participants:
Gaokao STEM Benchmark Data(GaokaoSTEM)
1
Data source
The data was drawn from recent College Entrance Exams of China, aka Gaokao (高考), covering four subjects: mathematics, physics, chemistry, and biology.
2
Data type
​Problem, Solution, Explanation (PSE): all problems come with solution, 91% of them also have explanations (reasoning steps).
3
Modality
The data is multimodal - 52% of the problems have figures or diagrams.
4
Volume & Format
There are a total of 5,158 PSEs evenly distributed among four subject areas.
PSEs are stored in json files with links to figures and diagrams stored in jpg files.
5
License
The dataset has CC BY NC license, it's free to use for research purpose only. For commercial use or information about other STEM datasets, please contact us.​
bottom of page





