Skip to content

Stan: loo 3.0 -- Entering the next decade of efficient cross-validation #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
n-kall opened this issue May 30, 2025 · 0 comments
Open

Comments

@n-kall
Copy link

n-kall commented May 30, 2025

Project

Stan

Summary

The loo package implements efficient leave-one-out cross-validation, an integral part of Bayesian modelling, and is now 10 years old. This project aims to prepare the loo package for the future by improving maintainability, extensibility and cohesion in the codebase.

submitter

Noa Kallioinen

project lead

@avehtari

Community benefit

The Stan project provides software tools for building complex probabilistic models, and diagnosing and evaluating these models.

Cross-validation is an integral part of model evaluation and allows users to estimate how well a model would predict newly observed data. The loo package, which implements efficient leave-one-out cross-validation in R, is one of the most used packages for this purpose and is a core component of the Stan ecosystem. The efficient methods that it implements save vast amounts of resources and time (of both computer and user) that would otherwise be spent unnecessarily.

loo has grown substantially since its initial release in 2015, when it implemented a single method. Today, the codebase is 20 times larger, and includes implementations of several additional methods and an updated user experience. As its tenth anniversary approaches, there is a need to revisit and refine both the inner workings and user experience of the package to prepare it for the future. The loo 3.0 milestone aims for just that: By preparing the software for extensibility and maintainability, and improving the cohesion between implemented methods, all users and contributors (current and future) will benefit from an improved codebase and user experience.

Amount requested

10000

Execution plan

The grant will be used to fund Noa Kallioinen (@n-kall), working closely with other members of the development team, to implement the proposed changes. Noa Kallioinen (@n-kall) is a member of the Stan R packages team, and contributor to packages including loo and posterior, and the lead developer of the priorsense package.

Goals

The loo 3.0 project can be broken down into three main goals:

Improve maintainability

The first goal is to increase the ease of maintenance of the loo package. Specifically, (1) the status of open issues and pull requests will be evaluated and addressed; (2) existing code will be improved to abide by the latest recommended coding practices for R code (coding style, comments, documentation, testing, etc.); and (3) the package will be adjusted to import functions from the posterior package (a newer Stan package with overlapping functions) to substantially decrease the size of the codebase and reduce redundancy across the ecosystem.

Enhance extensibility

The second goal is to make it easier to add new functionality to loo. The data structures in loo were originally designed for one specific use case. However, new methods were added, requiring different data to be stored, which resulted in the creation of different structures. These will be unified, with extensibility in mind, to prepare for the addition of even newer methods.

Increase cohesion

The third goal is to create a more cohesive experience for users of existing (and new) functions. All user-facing functions should operate with the same 'grammar'. All user-facing functions will be evaluated for their relevance, and those that are deemed unneeded will be deprecated. Those that are not consistent with the expectations set by other functions will be adjusted. Additional messages and guidance will be provided in function output. Design guidelines will be made with regard to adding new functions to ensure cohesion.

Timeline

The grant will enable full-time work for a single developer (@n-kall) for four months, who will work closely with other members of the development team. In this time, regular milestones will be targeted:

  • Milestone 1 (weeks 1-2): Open issues and pull requests investigated, and appropriate actions taken
  • Milestone 2 (weeks 3-4): Pull request(s) made for changes to the existing codebase based on recommended coding practices for R code.
  • Milestone 3 (weeks 5-6): User-facing functions evaluated and appropriate tagging made regarding potential deprecation, lack of cohesion, benefit from additional guidance for users
  • Milestone 4 (weeks 7-8): Pull request(s) made to increase cohesion in user-facing functions
  • Milestone 5 (weeks 9-10): Design document for restructure of objects for extensibility created, and feedback requested
  • Milestone 6 (weeks 11-12): Pull request(s) made for importing of posterior functions
  • Milestone 7 (weeks 13-14): Pull request(s) made to restructure data structures for extensibility
  • Milestone 8 (weeks 15-16): Contributor guidelines updated with detailed instructions to ensure goals (maintainability, extensibility, cohesion) are kept going forward
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant