-
Notifications
You must be signed in to change notification settings - Fork 194
Ignore multi-line string contents optionally in line_length_linter #2799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2799 +/- ##
=======================================
Coverage 99.24% 99.24%
=======================================
Files 129 129
Lines 7282 7316 +34
=======================================
+ Hits 7227 7261 +34
Misses 55 55 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the tests I'm wondering whether all you need is supress lints if the lint line satisfies line1 < line < line2
for some //STR_CONST
?
there's also the issue of how to deal with lines like this: x <- 'long string starting on a short
line which also has long strings in
the other parts of the body' Before this rule, there's 3 lints, but under |
Bumping for review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to come back to this several time because the current version of is_in_string_body()
is quite complex. I think part of it is the switch between vectorized and vapply logic and the easily missed changes between str_idx/long_idx as well as str_data/parse_data.
#' @param length Maximum line length allowed. Default is `80L` (Hollerith limit). | ||
#' @param ignore_string_bodies Logical, default `FALSE`. If `TRUE`, the contents | ||
#' of string literals are ignored. The quotes themselves are included, so this | ||
#' mainly affects wide multiline strings, e.g. SQL queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' mainly affects wide multiline strings, e.g. SQL queries. | |
#' only affects wide multiline strings, e.g. SQL queries. |
?
Looking at the code, this seems clearer & more precise.
#' @param length Maximum line length allowed. Default is `80L` (Hollerith limit). | ||
#' @param ignore_string_bodies Logical, default `FALSE`. If `TRUE`, the contents | ||
#' of string literals are ignored. The quotes themselves are included, so this | ||
#' mainly affects wide multiline strings, e.g. SQL queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some minimal examples for this new argument please?
is_in_string_body <- function(parse_data, max_length, long_idx) { | ||
str_idx <- parse_data$token == "STR_CONST" | ||
if (!any(str_idx)) { | ||
return(rep(FALSE, length(long_idx))) | ||
} | ||
str_data <- parse_data[str_idx, ] | ||
if (all(str_data$line1 == str_data$line2)) { | ||
return(rep(FALSE, length(long_idx))) | ||
} | ||
# right delimiter just ends at 'col2', but 'col1' takes some sleuthing | ||
str_data$line1_width <- nchar(vapply( | ||
strsplit(str_data$text, "\n", fixed = TRUE), | ||
function(x) x[1L], | ||
FUN.VALUE = character(1L), | ||
USE.NAMES = FALSE | ||
)) | ||
str_data$col1_end <- str_data$col1 + str_data$line1_width | ||
vapply( | ||
long_idx, | ||
function(line) { | ||
# strictly inside a multi-line string body | ||
if (any(str_data$line1 < line & str_data$line2 > line)) { | ||
return(TRUE) | ||
} | ||
on_line1_idx <- str_data$line1 == line | ||
if (any(on_line1_idx)) { | ||
return(max(str_data$col1_end[on_line1_idx]) <= max_length) | ||
} | ||
# use parse data to capture possible trailing expressions on this line | ||
on_line2_idx <- parse_data$line2 == line | ||
any(on_line2_idx) && max(parse_data$col2[on_line2_idx]) <= max_length | ||
}, | ||
logical(1L) | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is correct but I'm really struggling to wrap my head around this function, which concerns me a little bit for maintainability.
In particular, all current tests pass with this much simpler version of the vapply() call
is_in_string_body <- function(parse_data, max_length, long_idx) { | |
str_idx <- parse_data$token == "STR_CONST" | |
if (!any(str_idx)) { | |
return(rep(FALSE, length(long_idx))) | |
} | |
str_data <- parse_data[str_idx, ] | |
if (all(str_data$line1 == str_data$line2)) { | |
return(rep(FALSE, length(long_idx))) | |
} | |
# right delimiter just ends at 'col2', but 'col1' takes some sleuthing | |
str_data$line1_width <- nchar(vapply( | |
strsplit(str_data$text, "\n", fixed = TRUE), | |
function(x) x[1L], | |
FUN.VALUE = character(1L), | |
USE.NAMES = FALSE | |
)) | |
str_data$col1_end <- str_data$col1 + str_data$line1_width | |
vapply( | |
long_idx, | |
function(line) { | |
# strictly inside a multi-line string body | |
if (any(str_data$line1 < line & str_data$line2 > line)) { | |
return(TRUE) | |
} | |
on_line1_idx <- str_data$line1 == line | |
if (any(on_line1_idx)) { | |
return(max(str_data$col1_end[on_line1_idx]) <= max_length) | |
} | |
# use parse data to capture possible trailing expressions on this line | |
on_line2_idx <- parse_data$line2 == line | |
any(on_line2_idx) && max(parse_data$col2[on_line2_idx]) <= max_length | |
}, | |
logical(1L) | |
) | |
} | |
is_in_string_body <- function(parse_data, max_length, long_idx) { | |
str_idx <- parse_data$token == "STR_CONST" | |
if (!any(str_idx)) { | |
return(rep(FALSE, length(long_idx))) | |
} | |
str_data <- parse_data[str_idx, ] | |
if (all(str_data$line1 == str_data$line2)) { | |
return(rep(FALSE, length(long_idx))) | |
} | |
vapply( | |
long_idx, | |
function(line) { | |
# strictly inside a multi-line string body | |
any(str_data$line1 < line & str_data$line2 > line) | |
}, | |
logical(1L) | |
) | |
} |
Are we missing tests?
Closes #856. Progress on #2737