Skip to content

add outlier method as additional class for custom plotting methods #846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

rempsyc
Copy link
Member

@rempsyc rempsyc commented Aug 4, 2025

Minor class change for outliers for new see outlier plotting methods (easystats/see#407)

@rempsyc rempsyc requested a review from Copilot August 4, 2025 17:01
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @rempsyc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on improving the extensibility of outlier detection results for plotting purposes by introducing a method-specific class to the outlier objects. This change enables more granular control and differentiation when visualizing outlier analyses, particularly in conjunction with external plotting packages. Additionally, it includes minor code quality enhancements and updates to internal maintenance files.

Highlights

  • Enhanced Outlier Plotting: I've added a new, dynamic class to the outlier object within the check_outliers.default and check_outliers.data.frame functions. This new class, paste0("outlier_", method), allows for more specific plotting methods in the see package by differentiating outlier results based on the detection method used. This directly supports the new plotting methods referenced in the PR description (easystats/see#407).
  • Code Robustness Improvement: In the check_outliers.item_omega function, I've updated the for loops from 1:n to seq_len(n). This is a best practice in R to ensure loops behave correctly when n is zero, preventing potential errors and improving code robustness.
  • Internal Wordlist Updates: I've made several minor adjustments to the inst/WORDLIST file. These changes involve adding, removing, and reordering various terms, likely to refine the package's internal spell-checking or linting dictionary.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an additional class identifier to outlier detection results to support custom plotting methods in the easystats/see package. The change enables method-specific plotting by adding paste0("outlier_", method) to the class hierarchy.

Key changes:

  • Added method-specific class to outlier objects for enhanced plotting capabilities
  • Improved code style by replacing 1:n loops with seq_len(n) for better practices
  • Updated WORDLIST alphabetical ordering

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
R/check_outliers.R Added method-specific class identifier and improved loop iteration patterns
inst/WORDLIST Reordered entries alphabetically with no functional changes

Comment on lines 886 to 888
pair_names <- c(
pair_names,
paste(row.names(model_resid)[j], row.names(model_resid)[i], sep = "/")
paste(row.names(model_resid)[j], row.names(model_resid)[i], sep = "/") # nolint
Copy link
Preview

Copilot AI Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The # nolint comment suggests a linting issue that's being suppressed. Consider addressing the underlying linting concern rather than suppressing it, as this can hide legitimate code quality issues.

Copilot uses AI. Check for mistakes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lint is:

 Line 888[paste_linter] Construct file paths with file.path(...) instead of paste(..., sep = "/"). If you are using paste(sep = "/") to construct a date, consider using format() or lubridate helpers instead. Note that paste() converts empty inputs to "", whereas file.path() leaves it empty.

However, this is not a file path, we're creating label strings like "Var1/Var2" to name correlation pairs. Use of #nolint seems justified.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new class, outlier_<method>, to the results of check_outliers(). This will enable custom plotting methods based on the specific outlier detection technique used. The changes are applied consistently across different function implementations.

My review focuses on an opportunity to improve performance in R/check_outliers.R. I've suggested refactoring a nested loop that builds a character vector, which can be inefficient in R, with a more idiomatic and performant vectorized approach.

The changes to inst/WORDLIST appear to be maintenance-related and have not been reviewed for correctness.

Comment on lines 883 to 891
for (j in seq_len(nrow(model_resid))) {
for (i in seq_len(ncol(model_resid))) {
if (!is.na(model_resid[j, i])) {
pair_names <- c(
pair_names,
paste(row.names(model_resid)[j], row.names(model_resid)[i], sep = "/")
paste(row.names(model_resid)[j], row.names(model_resid)[i], sep = "/") # nolint
)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved performance and code clarity, this nested loop can be refactored. Growing a vector with c() inside a loop is often inefficient in R because it can require reallocating memory in each iteration.

A more idiomatic and performant approach is to first find the indices of the non-NA values and then construct the names in a single, vectorized operation. This avoids the nested loop and repeated memory allocation, which will be significantly faster for larger matrices. Using colnames() for the column index is also more explicit and robust than using row.names().

indices <- which(!is.na(model_resid), arr.ind = TRUE)
pair_names <- paste(
  row.names(model_resid)[indices[, "row"]],
  colnames(model_resid)[indices[, "col"]],
  sep = "/"
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion integrated

Copy link

codecov bot commented Aug 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.13%. Comparing base (98f2f6a) to head (1410efe).
⚠️ Report is 61 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #846      +/-   ##
==========================================
+ Coverage   61.90%   62.13%   +0.22%     
==========================================
  Files          89       93       +4     
  Lines        6641     7220     +579     
==========================================
+ Hits         4111     4486     +375     
- Misses       2530     2734     +204     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rempsyc rempsyc requested a review from strengejacke August 4, 2025 23:34
@rempsyc rempsyc closed this Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant