top of page

Free Access for EMNLP 25 Participants:

Gaokao STEM Benchmark Data(GaokaoSTEM)

Data source

The data was drawn from recent College Entrance Exams of China, aka Gaokao (高考), covering four subjects: mathematics, physics, chemistry, and biology.

2

Data type

​Problem, Solution, Explanation (PSE): all problems come with solution, 91% of them also have explanations (reasoning steps).

3

Modality

The data is multimodal - 52% of the problems have figures or diagrams.

4

Volume & Format

There are a total of 5,158 PSEs evenly distributed among four subject areas.

PSEs are stored in json files with links to figures and diagrams stored in jpg files.

5

License

The dataset has CC BY NC license, it's free to use for research purpose only. For commercial use or information about other STEM datasets, please contact us.​

Purple Abstract_edited.jpg

Request Access

  • Facebook
  • Twitter
  • Instagram
  • LinkedIn
bottom of page