- Our paper has been accepted to SIGIR 2025 🎉
CORONA is a coarse-to-fine recommendation framework that retrieves user-neighborhoods on user–item bipartite graphs and leverages LLM-augmented user profiles as side information. The coarse stage retrieves candidate users via graph-aware similarity with distance priors; the fine stage constructs compact subgraphs for downstream training/evaluation.
main.py: training/validation/testing for user retrievermodel.py: retriever model with distance-aware embedding transformationconstruct_graph.py: build per-user subgraphs from retrieved neighborschat_api_query.py: LLM-based user profiling and embedding generationload.py: utilities for dataset loading and diagnosticsnetflix_data/: example dataset placeholder (train/val/test splits and sparse matrices)
- Python 3.9+
- Install minimal deps:
pip install -r requirements-min.txtFor CUDA/Torch Geometric GPU wheels, follow the official guides.
Create .env (or copy .env.example) to specify paths/devices:
cp .env.example .envKey variables:
DATA_DIR: project root for data and outputs (default.)DATASET_DIR: dataset subdir (default/netflix_data)CUDA_VISIBLE_DEVICES: GPU id (default0)TOP_K: retrieved users per query (default500)OPENAI_*: LLM credentials for profiling
We experiment on Netflix, MovieLens, and Amazon-Book. Provide only textual side information for all methods.
- Place processed files under
${DATA_DIR}${DATASET_DIR}:train.json,val.json,test.json(uid -> item list)train_mat(scipy sparse user–item CSR, pickled)augmented_user_init_embedding_final(numpy array pickled, dim = user embedding)- Optional:
netflix_image_text/item_attribute.csvfor profiling
- For Netflix node features, we recommend following LLMRec instructions.
If you need to generate augmented_user_init_embedding_final:
make augmentThis reads train_mat/test.json and writes ${AUGMENT_FILE_PATH}/augmented_user_init_embedding_final.
make trainAfter training, the best model and retrieved nodes are saved to ${DATA_DIR}/Graph_RA_Rec/model_states/.
For testing independently:
make testmake graphsProduces user/item subgraphs under ${DATA_DIR}/Graph_RA_Rec/${basename(DATASET_DIR)}/.
- Netflix (KDD Cup 2007)
- MovieLens-10M (ACM TiiS 2015)
- Amazon-Book (EMNLP 2019) We follow LLMRec for Netflix/MovieLens splits and RLMRec for Amazon-Book. Textual info is encoded by Sentence-BERT.
- Determinism:
set_seed(3)inmain.py - GPU selection via
CUDA_VISIBLE_DEVICES - Cached tensors:
*_for_RA.pklare stored in${DATA_DIR}${DATASET_DIR}
If you find this repository helpful, please cite:
@inproceedings{corona2025,
title={CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models},
booktitle={Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval},
year={2025}
}
- He et al., LightGCN, SIGIR 2020
- Wei et al., LLMRec, arXiv 2024
- Ren et al., RLMRec, WWW 2024
- Bennett and Lanning, The Netflix Prize, KDD Cup 2007
- Harper and Konstan, MovieLens, ACM TiiS 2015
- Reimers and Gurevych, Sentence-BERT, EMNLP/IJCNLP 2019
This code is released for research purposes. See repository license if provided.