ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning

The offical codebase for our NeurIPS'25 (Datasets and Benchmarks Track) submission "ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning".

Dataset (Huggingface)

Quick Start

Setup

Create a conda environment and install dependencies:

conda create -n chinatravel python=3.9  
conda activate chinatravel  
pip install -r requirements.txt

Download the database and unzip it to the chinatravel/environment/ directory (Download Links: Google Drive, NJU Drive).
Download necessary models or tokenizers (e.g. deepseek tokenizer) to ./chinatravel/local_llm (You need to create the folder first)

Running

We support the deepseek (offical API from deepseek), gpt-4o (chatgpt-4o-latest), glm4-plus, and local inferences with Qwen, Mistral, Llama.

export OPENAI_API_KEY=""

# Act ReAct0 ReAct agent
python run_exp.py --splits easy --agent Act --llm gpt-4o # Replace "Act" with "ReAct0" or "ReAct" for other pure neural agents

# LLM-modulo agent with 10 refine_steps
python run_exp.py --splits medium --agent LLM-modulo --llm gpt-4o --refine_steps 10

# LLMNesy agent with oracle translation
python run_exp.py --splits human --agent LLMNeSy --llm deepseek --oracle_translation

# LLMNesy agent
python run_exp.py --splits human1000 --agent LLMNeSy --llm deepseek

Note:

Please download the weights of the open-source model to ./chinatravel/open_source_llm and modify the corresponding model path in ./chinatravel/agent/llms.py (This step is only necessary when using a locally deployed open-source model.).
We implemented the following agents:
1. Act: zero-shot Act agent
2. ReAct0: zero-shot ReAct agent
3. ReAct: one-shot ReAct agent
4. LLM-modulo: LLM-modulo agent
5. LLMNesy: Neuro-Symbolic agent
We retain the DSL annotations of "Human1000" as private information to prevent performance fraud or unfair comparisons. Researchers are encouraged to submit their results to us for evaluation on Human-1000.
If you want to skip the completed queries, please add the parameter --skip 1

Evaluation

python eval_exp.py --splits human --method LLMNeSy_deepseek_oracletranslation
python eval_exp.py --splits human --method LLMNeSy_deepseek

Docs

Environment Constraints

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
chinatravel		chinatravel
images		images
.gitignore		.gitignore
README.md		README.md
eval_exp.py		eval_exp.py
requirements.txt		requirements.txt
run_exp.py		run_exp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning

Quick Start

Setup

Running

Evaluation

Docs

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

LAMDASZ-ML/chinatravel_neurips25submission

Folders and files

Latest commit

History

Repository files navigation

ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning

Quick Start

Setup

Running

Evaluation

Docs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages