Archer is a challenging bilingual text-to-SQL dataset specific to complex reasoning, including arithmetic, commonsense and hypothetical reasoning. It contains 1,042 English questions and 1,042 Chinese questions, along with 521 unique SQL queries, covering 20 English databases across 20 domains. This leaderboard provides a different data splitting from the original paper for better evaluation, where we further choose 8 databases from train set to be test data. Thus now the train set contains 8 databases, the dev set contains 2 databases and the blind test set contains 10 databases.
Paper (Zheng et al. 24)Arithmetic Reasoning Example
How much higher is the maximum power of a BMW car than the maximum power of a Fiat car?
宝⻢汽⻋的最⾼功率⽐⻜雅特汽⻋的最⾼功率⾼多少?
SELECT MAX(horsepower) - (SELECT MAX (horsepower) FROM cars_data A JOIN car_names B ON A.id=B.makeid WHERE B.model="fiat") AS diff FROM cars_data A JOIN car_names B ON A.id=B.makeid WHERE B.model="bmw"
Commonsense Reasoning Example
Which 4-cylinder car needs the most fuel to drive 300 miles? List how many gallons it needs, and its make and model.
开300英⾥耗油最多的四缸⻋的品牌和型号分别是什么,它需要多少加仑的油?
Commonsense Knowledge: Fuel used is calculated by divding distance driven by fuel consumption.
SELECT B. Make, B.Model, 1.0 * 300 / mpg AS n_gallon FROM cars_data A JOIN car_names B ON A.Id=B.MakeId WHERE cylinders="4" ORDER BY mpg ASC LIMIT 1
Hypothetical Reasoning Example
If all cars produced by the Daimler Benz company have 4- cylinders, then in all 4-cylinder cars, which one needs the most fuel to drive 300 miles? Please list how many gallons it needs, along with its make and model.
假如⽣产⾃奔驰公司的⻋都是四缸,开300英⾥耗油最多的 四缸⻋的品牌和型号分别是什么,它需要多少加仑的油
SELECT B.Make, B.Model, 1.0 * 300 / mpg AS n_gallon FROM cars_data A JOIN car_names B ON A.id=B.makeid JOIN model_list C ON B.model=C.model JOIN car_makers D on C.maker=D.id WHERE D.fullname="Daimler Benz" or A.cylinders="4” ORDER BY mpg ASC LIMIT 1
For submission, please follow the guidance here.
Rank | Model | Size | Dev | Test |
---|---|---|---|---|
1 Sep 10, 2024 |
GPT-4o + zpoint-embedding KnowDee |
UNK | 22.12 | 42.18 |
2 Sep 10, 2024 |
GPT-4o + Deepseek-Coder-33b Harbin Institute of Technology |
UNK | 34.62 | 39.12 |
2 Sep 10, 2024 |
GPT-4o HITSZ-GDDW Tech |
UNK | 31.73 | 39.12 |
4 Sep 5, 2024 |
GPT-4o + deepseek IDMG (Beijing University of Posts and Telecommunications) |
UNK | 31.73 | 31.87 |
5 Sep 10, 2024 |
deepseek-chat JD-5Star |
UNK | 24.04 | 31.11 |
6 Sep 10, 2024 |
GPT-4o MI&TLab (Harbin Institute of Technology) |
UNK | 32.69 | 30.73 |
6 Sep 10, 2024 |
GPT-4o + all-MiniLM-L6-v2 NUDT |
UNK | 38.46 | 30.73 |
8 Sep 10, 2024 |
GPT-4o Foshan university |
UNK | 22.12 | 25.62 |
9 Mar 15, 2024 |
GPT-3.5 + CT-3 baseline |
UNK | 10.57 | 15.84 |
10 Mar 15, 2024 |
GPT-3.5 + CT-3 + COT baseline |
UNK | 13.46 | 15.27 |
11 Mar 15, 2024 |
GPT-3.5 + API Doc baseline |
UNK | 14.42 | 11.83 |
12 Mar 15, 2024 |
T5-3b baseline |
3B | 0 | 0 |
12 Mar 15, 2024 |
T5-large baseline |
0.8B | 0 | 0 |
12 Mar 15, 2024 |
T5-base baseline |
0.2B | 0 | 0 |
Rank | Model | Size | Dev | Test |
---|---|---|---|---|
1 Sep 10, 2024 |
GPT-4o + zpoint-embedding KnowDee |
UNK | 25.96 | 42.94 |
2 Sep 10, 2024 |
GPT-4o + Deepseek-Coder-33b Harbin Institute of Technology |
UNK | 23.08 | 39.89 |
3 Sep 10, 2024 |
GPT-4o HITSZ-GDDW Tech |
UNK | 24.04 | 37.79 |
4 Sep 5, 2024 |
GPT-4o + deepseek IDMG (Beijing University of Posts and Telecommunications) |
UNK | 24.04 | 29.39 |
5 Sep 10, 2024 |
GPT-4o MI&TLab (Harbin Institute of Technology) |
UNK | 24.04 | 28.63 |
6 Sep 10, 2024 |
GPT-4o + all-MiniLM-L6-v2 NUDT |
UNK | 25.96 | 27.10 |
7 Sep 10, 2024 |
deepseek-chat JD-5Star |
UNK | 23.08 | 25.00 |
8 Sep 10, 2024 |
GPT-4o Foshan university |
UNK | 17.14 | 22.90 |
9 Mar 15, 2024 |
GPT-3.5 + CT-3 + COT baseline |
UNK | 12.50 | 15.49 |
10 Mar 15, 2024 |
GPT-3.5 + CT-3 baseline |
UNK | 10.58 | 12.21 |
11 Mar 15, 2024 |
GPT-3.5 + API Doc baseline |
UNK | 10.58 | 10.31 |
12 Mar 15, 2024 |
mT5-xl baseline |
3.7B | 0 | 0 |
12 Mar 15, 2024 |
mT5-large baseline |
1.2B | 0 | 0 |
12 Mar 15, 2024 |
mT5-base baseline |
0.6B | 0 | 0 |