This is repository containing code for the FlySearch benchmark.
We recommend using the drone.sh
script to run the benchmark. It is a Bash script that calls the drone.py
script
which performs the entire evaluation. It has several configurable parameters, such as:
scenario_type
-- eitherforest_random
,city_random
ormimic
. The first two mean that scenarios will be randomly generated, while the last means that scenario configurations will be copied (mimicked) from a known, existing run.mimic_run_path
-- use only ifscenario_type
is set tomimic
. This is the path to the directory where the run to be mimicked is stored.mimic_run_cls_names
-- for normal use should be set to*
(as is)model
-- the model to be used. GPT-4o will call OpenAI's API, while prefixing model name withanthropic-
will assume that Anthropic's library needs to be used. Ifgemini
is present in models name, we assume it's an appropriate Gemini model. To use Sonnet, useanthropic-claude-3-5-sonnet-20241022
. If model is not recognized, the script assumes that it's a VLLM model.log_directory
-- name of the directory (relative to this script) where you wish to keep logs from runs. We recommend to keep it calledall_logs
run_name
-- all trajectories will be saved inlog_directory/run_name
. We recommend to keep it calledrun_name
.dummy_first
-- for normal use, should be set totrue
. Controls whether we discard first trajectory for the simulation environment to "warm up" (in case it misses assets and so on). Note that mimic runs assume this behaviour and duplicate the first trajectory config to compensate for that.forgiveness
-- for normal use, should be set to5
. This is amount of consecutive validation errors of model's actions.glimpses
-- number of glimpses (or images) the agent is allowed to see in the trajectory. In our experiments, we used 10 glimpses for FS-1 and FS-Anomaly-1 and 20 glimpses for FS-2.number_of_runs
-- number of trajectories you wish to have generated.agent
-- for normal use should be set tosimple_llm
.line_of_sight_assured
-- whether the agent should be able to see the searched object at the start of the trajectory. Should be set totrue
while generating FS-1-like and FS-A-1-like scenarios, andfalse
for FS-2-like scenarios.show_class_image
-- whether the agent should receive additional, visual prompt containing image of the object being searched. Should be set totrue
while generating FS-2-like scenarios, andfalse
for FS-1-like and FS-A-1-like scenarios.prompt_type
-- eitherfs1
orfs2
.
To be able to run the benchmark, you need to download appropriate dependencies and configure some variables. The purpose of this section is to tell you how to do that.
Can be downloaded from https://doi.org/10.5281/zenodo.15428224.
city.tar.gz
contains the city environment and
forest.tar.gz
contains the forest environment. Extract them and then modify the drone.sh
script by:
- setting the
CITY_BINARY_PATH
to/your_location/simulator/CitySample/Binaries/Linux/CitySample
- setting the FOREST_BINARY_PATH to
your_location/simulator-dreamsenv/Linux/ElectricDreamsEnv/Binaries/Linux/ElectricDreamsSample
That's all you need to do to configure the binaries!
You can also verify manually that these work on your computer by
running ./simulator/CitySample/Binaries/Linux/CitySample
or
./simulator-dreamsenv/Linux/ElectricDreamsEnv/Binaries/Linux/ElectricDreamsSample
. These commands should start the UE5
environment and show it on your screen.
The file locations_city.csv
is provided with this repository. Set the LOCATIONS_CITY_PATH
variable in the drone.sh
to location of this file in your filesystem; this file is important for running city scenarios, as it contains
permissible safe locations for objects to be spawned.
The benchmark needs a font to overlay images from the engine with a navigation scaffold. Set the FONT_LOCATION
variable in the drone.sh
script to the location of a font file in your filesystem. The default one is
/usr/share/fonts/google-noto/NotoSerif-Bold.ttf
, which may or may not be present on your machine.
To use closed-source VLMs, you need to have an API key. To configure them, set appropriate variables in the
misc/config.py
file.