ArcherArcher is a challenging bilingual text-to-SQL dataset specific to complex reasoning, including arithmetic, commonsense and hypothetical reasoning. It contains 1,042 English questions and 1,042 Chinese questions, along with 521 unique SQL queries, covering 20 English databases across 20 domains. This leaderboard provides a different data splitting from the original paper for better evaluation, where we further choose 8 databases from train set to be test data. Thus now the train set contains 8 databases, the dev set contains 2 databases and the blind test set contains 10 databases.
Paper (Zheng et al. 24)Arithmetic Reasoning Example
How much higher is the maximum power of a BMW car than the maximum power of a Fiat car?
宝⻢汽⻋的最⾼功率⽐⻜雅特汽⻋的最⾼功率⾼多少?
SELECT MAX(horsepower) - (SELECT MAX (horsepower) FROM cars_data A JOIN car_names B ON A.id=B.makeid WHERE B.model="fiat") AS diff FROM cars_data A JOIN car_names B ON A.id=B.makeid WHERE B.model="bmw"
Commonsense Reasoning Example
Which 4-cylinder car needs the most fuel to drive 300 miles? List how many gallons it needs, and its make and model.
开300英⾥耗油最多的四缸⻋的品牌和型号分别是什么,它需要多少加仑的油?
Commonsense Knowledge: Fuel used is calculated by divding distance driven by fuel consumption.
SELECT B. Make, B.Model, 1.0 * 300 / mpg AS n_gallon FROM cars_data A JOIN car_names B ON A.Id=B.MakeId WHERE cylinders="4" ORDER BY mpg ASC LIMIT 1
Hypothetical Reasoning Example
If all cars produced by the Daimler Benz company have 4- cylinders, then in all 4-cylinder cars, which one needs the most fuel to drive 300 miles? Please list how many gallons it needs, along with its make and model.
假如⽣产⾃奔驰公司的⻋都是四缸,开300英⾥耗油最多的 四缸⻋的品牌和型号分别是什么,它需要多少加仑的油
SELECT B.Make, B.Model, 1.0 * 300 / mpg AS n_gallon FROM cars_data A JOIN car_names B ON A.id=B.makeid JOIN model_list C ON B.model=C.model JOIN car_makers D on C.maker=D.id WHERE D.fullname="Daimler Benz" or A.cylinders="4” ORDER BY mpg ASC LIMIT 1
For submission, please follow the guidance here.
| Rank | Model | Size | Dev | Test |
|---|
| Rank | Model | Size | Dev | Test |
|---|