This GitHub page is the home of BlipFinder; the analysis pipeline and mock catalog generator described in 2301.00822 capable of searching for dark compact objects in the Milky Way through astrometric lensing in real or mock Gaia DR4 data.
If this pipeline is used in published work, please cite 2301.00822.
- I-Kai Chen
- Marius Kongsore
- Ken Van Tilburg
The Pipeline folder contains the analysis pipeline itself. It has several subfolders:
Datacontains mock Gaia DR4 AL scan data, without Gaussian noise added. To illustrate the format of the data, we have inserted two mock data file containing AL coordinate data for 100 source trajectories each, with 199 trajectories being entirely free, and the 200th being significantly perturbed by lensing. Note that the size and number of data files is drastically smaller than the mock catalogs we used in 2301.00822, which consist of ~3000 seperate data files containing ~500,000 source trajectories each.SourceInfocontains two types of files. The first ends withseedsand simply contains random seeds lists corresponding to the mock sources inDatain order to consistently generate the same Gaussian noise for each source trajectory. The other type of file begins withgaia_infoand contains information about the sources inDatalike G magntiude, distance, and all other information that is not the source trajectory.Resultscontains the results of the analysis. This folder has several subfolders.FreeScipyandFreeMultinestcontain the results from fitting the free model to the data using SciPy and PyMultinest, respectively. These results include source IDs, best fit parameters, and test statistics. TheFreePostsamplesfolder contains the nested sampling generated postsamples (covariance information) for each significant fit. The other six folders contain the same information, but for the acceleration and blip models.Analysiscontains the core analysis scripts that are executed to run the analysis. It contains several subscripts. The fileanalysis_fcns.pycontains various helper functions for executing the data analysis, like test statistic calculations.constraint_fcns.pycontains the constraints that are imposed on SciPy when fitting the free, acceleration, and blip model to the data.free_fit.pyreads trajectory data and source information from theDataandSourceInfofolders and then fits the free model to the data. The results are then saved toFreeScipyunder theResultsfolder.free_multinest.pytakes the results fromFreeScipyand refit to the data, while also imposing the 5 sigma cutoff in free log likelihood. The results from fitting the model using nested sampling are then saved toFreeMultinestandFreePostsamplesin theResultsfolder. The four similarly named scripts function that same way, but for the acceleration and blip models. Finally, thesignificant_plot.ipynbdisplays the results of the analysis.mock_generatorcontains the files to generate the null catalog and also the lensed catalog. It contains several subscripts. The filenull_trajectory.pycomputes and generate the one dimensional AL trajectory of the sources. The filelens_generator_x1.pygenerates and astrophysical BHs with the prioris inbh_prior_fcns.pyand save their parameters. The filelens_correction_x1.pytakes the parameter from the BHs we generated usinglens_generator_x1.pyand then updates the null trajectories with the pertubed ones.
Pipeline also contains several helper scripts and data used by both the analysis software and the mock catalog generator. These are:
dynamics_fcns.py: functions for modeling astrometric source and lens trajectories, with and without lensing;bh_prior_fcns.py: astrophysical black hole prior functions based on most recent observations;dm_prior_fcns.py: compact dark matter prior functions based on a NFW profile;rotation.py: functions for performing astrophysical coordinate transformations;coordinate_error.csv: data for mapping the G magnitude of each source to a Gaussian error.yellin.py: contains the function that compute the exclusion confidence level following the optimal interval method (see 0203002 for a detailed description of the optimal interval method).
To run the analysis, data and source information must be placed with the correct format in the Data and SourceInfo folders (see the example files in these folders for the correct formatting). Then, scripts must be run in the following order:
free_fit.py;free_multinest.py;accel_fit.py;accel_multinest.py;blip_fit.py;blip_multinest.py,
where the only line that needs to be changed in each script is the job_idx variable. This indexes over all files in the Data folder and is by default set to 0. To parallelize the pipeline on a slurm-based computing cluster, one may replace Data with int(sys.argv[1]) and instead batch submit a job array, e.g. via the script
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=24:00:00
#SBATCH --mem-per-cpu=1GB
#SBATCH --job-name=gaia-free-fit
#SBATCH --mail-type=NONE
#SBATCH --array=0-1
module load python/intel/3.8.6
python free_fit.py ${SLURM_ARRAY_TASK_ID}`
with the exact formatting depending on the cluster in use.
Once 1-6 have been run, the pipeline's output can then be viewed using the significant_plot.ipynb Jupyter notebook in the Analysis folder.
The PaperPlots folder contains the Jupyter Notebook scripts used to generate the plots shown in 2301.00822.

