-
-
Notifications
You must be signed in to change notification settings - Fork 100
add outlier method as additional class for custom plotting methods #846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @rempsyc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request primarily focuses on improving the extensibility of outlier detection results for plotting purposes by introducing a method-specific class to the outlier
objects. This change enables more granular control and differentiation when visualizing outlier analyses, particularly in conjunction with external plotting packages. Additionally, it includes minor code quality enhancements and updates to internal maintenance files.
Highlights
- Enhanced Outlier Plotting: I've added a new, dynamic class to the
outlier
object within thecheck_outliers.default
andcheck_outliers.data.frame
functions. This new class,paste0("outlier_", method)
, allows for more specific plotting methods in thesee
package by differentiating outlier results based on the detection method used. This directly supports the new plotting methods referenced in the PR description (easystats/see#407). - Code Robustness Improvement: In the
check_outliers.item_omega
function, I've updated thefor
loops from1:n
toseq_len(n)
. This is a best practice in R to ensure loops behave correctly whenn
is zero, preventing potential errors and improving code robustness. - Internal Wordlist Updates: I've made several minor adjustments to the
inst/WORDLIST
file. These changes involve adding, removing, and reordering various terms, likely to refine the package's internal spell-checking or linting dictionary.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds an additional class identifier to outlier detection results to support custom plotting methods in the easystats/see package. The change enables method-specific plotting by adding paste0("outlier_", method)
to the class hierarchy.
Key changes:
- Added method-specific class to outlier objects for enhanced plotting capabilities
- Improved code style by replacing
1:n
loops withseq_len(n)
for better practices - Updated WORDLIST alphabetical ordering
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
R/check_outliers.R | Added method-specific class identifier and improved loop iteration patterns |
inst/WORDLIST | Reordered entries alphabetically with no functional changes |
R/check_outliers.R
Outdated
pair_names <- c( | ||
pair_names, | ||
paste(row.names(model_resid)[j], row.names(model_resid)[i], sep = "/") | ||
paste(row.names(model_resid)[j], row.names(model_resid)[i], sep = "/") # nolint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The # nolint
comment suggests a linting issue that's being suppressed. Consider addressing the underlying linting concern rather than suppressing it, as this can hide legitimate code quality issues.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lint is:
Line 888[paste_linter] Construct file paths with file.path(...) instead of paste(..., sep = "/"). If you are using paste(sep = "/") to construct a date, consider using format() or lubridate helpers instead. Note that paste() converts empty inputs to "", whereas file.path() leaves it empty.
However, this is not a file path, we're creating label strings like "Var1/Var2" to name correlation pairs. Use of #nolint
seems justified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new class, outlier_<method>
, to the results of check_outliers()
. This will enable custom plotting methods based on the specific outlier detection technique used. The changes are applied consistently across different function implementations.
My review focuses on an opportunity to improve performance in R/check_outliers.R
. I've suggested refactoring a nested loop that builds a character vector, which can be inefficient in R, with a more idiomatic and performant vectorized approach.
The changes to inst/WORDLIST
appear to be maintenance-related and have not been reviewed for correctness.
R/check_outliers.R
Outdated
for (j in seq_len(nrow(model_resid))) { | ||
for (i in seq_len(ncol(model_resid))) { | ||
if (!is.na(model_resid[j, i])) { | ||
pair_names <- c( | ||
pair_names, | ||
paste(row.names(model_resid)[j], row.names(model_resid)[i], sep = "/") | ||
paste(row.names(model_resid)[j], row.names(model_resid)[i], sep = "/") # nolint | ||
) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For improved performance and code clarity, this nested loop can be refactored. Growing a vector with c()
inside a loop is often inefficient in R because it can require reallocating memory in each iteration.
A more idiomatic and performant approach is to first find the indices of the non-NA values and then construct the names in a single, vectorized operation. This avoids the nested loop and repeated memory allocation, which will be significantly faster for larger matrices. Using colnames()
for the column index is also more explicit and robust than using row.names()
.
indices <- which(!is.na(model_resid), arr.ind = TRUE)
pair_names <- paste(
row.names(model_resid)[indices[, "row"]],
colnames(model_resid)[indices[, "col"]],
sep = "/"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion integrated
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #846 +/- ##
==========================================
+ Coverage 61.90% 62.13% +0.22%
==========================================
Files 89 93 +4
Lines 6641 7220 +579
==========================================
+ Hits 4111 4486 +375
- Misses 2530 2734 +204 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Minor class change for outliers for new see outlier plotting methods (easystats/see#407)