Archer LogoArcher

A Human-Labeled Text-to-SQL Dataset with Arithmetic, Commonsense and Hypothetical Reasoning

About Archer

Archer is a challenging bilingual text-to-SQL dataset specific to complex reasoning, including arithmetic, commonsense and hypothetical reasoning. It contains 1,042 English questions and 1,042 Chinese questions, along with 521 unique SQL queries, covering 20 English databases across 20 domains. This leaderboard provides a different data splitting from the original paper for better evaluation, where we further choose 8 databases from train set to be test data. Thus now the train set contains 8 databases, the dev set contains 2 databases and the blind test set contains 10 databases.

Paper (Zheng et al. 24)

Data Examples

Arithmetic Reasoning Example

How much higher is the maximum power of a BMW car than the maximum power of a Fiat car?

宝⻢汽⻋的最⾼功率⽐⻜雅特汽⻋的最⾼功率⾼多少?

SELECT MAX(horsepower) - (SELECT MAX (horsepower) FROM cars_data A JOIN car_names B ON A.id=B.makeid WHERE B.model="fiat") AS diff FROM cars_data A JOIN car_names B ON A.id=B.makeid WHERE B.model="bmw"


Commonsense Reasoning Example

Which 4-cylinder car needs the most fuel to drive 300 miles? List how many gallons it needs, and its make and model.

开300英⾥耗油最多的四缸⻋的品牌和型号分别是什么,它需要多少加仑的油?

Commonsense Knowledge: Fuel used is calculated by divding distance driven by fuel consumption.

SELECT B. Make, B.Model, 1.0 * 300 / mpg AS n_gallon FROM cars_data A JOIN car_names B ON A.Id=B.MakeId WHERE cylinders="4" ORDER BY mpg ASC LIMIT 1


Hypothetical Reasoning Example

If all cars produced by the Daimler Benz company have 4- cylinders, then in all 4-cylinder cars, which one needs the most fuel to drive 300 miles? Please list how many gallons it needs, along with its make and model.

假如⽣产⾃奔驰公司的⻋都是四缸,开300英⾥耗油最多的 四缸⻋的品牌和型号分别是什么,它需要多少加仑的油

SELECT B.Make, B.Model, 1.0 * 300 / mpg AS n_gallon FROM cars_data A JOIN car_names B ON A.id=B.makeid JOIN model_list C ON B.model=C.model JOIN car_makers D on C.maker=D.id WHERE D.fullname="Daimler Benz" or A.cylinders="4” ORDER BY mpg ASC LIMIT 1

Submission

For submission, please follow the guidance here.

Leaderboard

The leaderboard of Archer is shown as follows. The evaluation metric is the EXecution accuracy (EX) of predicted SQL. The leaderboard is based on EX results on the blind test set.

English

Rank Model Size Dev Test

Chinese

Rank Model Size Dev Test