Skip to content
This repository was archived by the owner on May 22, 2025. It is now read-only.

LAMDASZ-ML/chinatravel_neurips25submission

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning

The offical codebase for our NeurIPS'25 (Datasets and Benchmarks Track) submission "ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning".

Dataset (Huggingface)

Quick Start

Setup

  1. Create a conda environment and install dependencies:
conda create -n chinatravel python=3.9  
conda activate chinatravel  
pip install -r requirements.txt  
  1. Download the database and unzip it to the chinatravel/environment/ directory (Download Links: Google Drive, NJU Drive).

  2. Download necessary models or tokenizers (e.g. deepseek tokenizer) to ./chinatravel/local_llm (You need to create the folder first)

Running

We support the deepseek (offical API from deepseek), gpt-4o (chatgpt-4o-latest), glm4-plus, and local inferences with Qwen, Mistral, Llama.

export OPENAI_API_KEY=""

# Act ReAct0 ReAct agent
python run_exp.py --splits easy --agent Act --llm gpt-4o # Replace "Act" with "ReAct0" or "ReAct" for other pure neural agents

# LLM-modulo agent with 10 refine_steps
python run_exp.py --splits medium --agent LLM-modulo --llm gpt-4o --refine_steps 10

# LLMNesy agent with oracle translation
python run_exp.py --splits human --agent LLMNeSy --llm deepseek --oracle_translation

# LLMNesy agent
python run_exp.py --splits human1000 --agent LLMNeSy --llm deepseek 

Note:

  1. Please download the weights of the open-source model to ./chinatravel/open_source_llm and modify the corresponding model path in ./chinatravel/agent/llms.py (This step is only necessary when using a locally deployed open-source model.).
  2. We implemented the following agents:
    1. Act: zero-shot Act agent
    2. ReAct0: zero-shot ReAct agent
    3. ReAct: one-shot ReAct agent
    4. LLM-modulo: LLM-modulo agent
    5. LLMNesy: Neuro-Symbolic agent
  3. We retain the DSL annotations of "Human1000" as private information to prevent performance fraud or unfair comparisons. Researchers are encouraged to submit their results to us for evaluation on Human-1000.
  4. If you want to skip the completed queries, please add the parameter --skip 1

Evaluation

python eval_exp.py --splits human --method LLMNeSy_deepseek_oracletranslation
python eval_exp.py --splits human --method LLMNeSy_deepseek

Docs

Environment Constraints

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages