The offical codebase for our NeurIPS'25 (Datasets and Benchmarks Track) submission "ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning".
- Create a conda environment and install dependencies:
conda create -n chinatravel python=3.9
conda activate chinatravel
pip install -r requirements.txt -
Download the database and unzip it to the
chinatravel/environment/directory (Download Links: Google Drive, NJU Drive). -
Download necessary models or tokenizers (e.g. deepseek tokenizer) to
./chinatravel/local_llm(You need to create the folder first)
We support the deepseek (offical API from deepseek), gpt-4o (chatgpt-4o-latest), glm4-plus, and local inferences with Qwen, Mistral, Llama.
export OPENAI_API_KEY=""
# Act ReAct0 ReAct agent
python run_exp.py --splits easy --agent Act --llm gpt-4o # Replace "Act" with "ReAct0" or "ReAct" for other pure neural agents
# LLM-modulo agent with 10 refine_steps
python run_exp.py --splits medium --agent LLM-modulo --llm gpt-4o --refine_steps 10
# LLMNesy agent with oracle translation
python run_exp.py --splits human --agent LLMNeSy --llm deepseek --oracle_translation
# LLMNesy agent
python run_exp.py --splits human1000 --agent LLMNeSy --llm deepseek Note:
- Please download the weights of the open-source model to
./chinatravel/open_source_llmand modify the corresponding model path in./chinatravel/agent/llms.py(This step is only necessary when using a locally deployed open-source model.). - We implemented the following agents:
Act: zero-shot Act agentReAct0: zero-shot ReAct agentReAct: one-shot ReAct agentLLM-modulo: LLM-modulo agentLLMNesy: Neuro-Symbolic agent
- We retain the DSL annotations of "Human1000" as private information to prevent performance fraud or unfair comparisons. Researchers are encouraged to submit their results to us for evaluation on Human-1000.
- If you want to skip the completed queries, please add the parameter
--skip 1
python eval_exp.py --splits human --method LLMNeSy_deepseek_oracletranslation
python eval_exp.py --splits human --method LLMNeSy_deepseek