Skip to content

Conversation

MichaelChirico
Copy link
Collaborator

Closes #856. Progress on #2737

Copy link

codecov bot commented Feb 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.24%. Comparing base (2b70b52) to head (a80c6e9).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2799   +/-   ##
=======================================
  Coverage   99.24%   99.24%           
=======================================
  Files         129      129           
  Lines        7282     7316   +34     
=======================================
+ Hits         7227     7261   +34     
  Misses         55       55           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@AshesITR AshesITR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the tests I'm wondering whether all you need is supress lints if the lint line satisfies line1 < line < line2 for some //STR_CONST?

@MichaelChirico
Copy link
Collaborator Author

Looking at the tests I'm wondering whether all you need is supress lints if the lint line satisfies line1 < line < line2 for some //STR_CONST?

there's also the issue of how to deal with lines like this:

x <- 'long string starting on a short
line which also has long strings in
the other parts of the body'

Before this rule, there's 3 lints, but under ignore_string_bodies=TRUE here, there are 0

@MichaelChirico
Copy link
Collaborator Author

Bumping for review :)

@MichaelChirico MichaelChirico requested review from Bisaloo, IndrajeetPatil and olivroy and removed request for olivroy June 27, 2025 18:55
@MichaelChirico MichaelChirico added this to the 3.3.0 milestone Jul 29, 2025
@MichaelChirico MichaelChirico changed the title Ignore multi-string contents optionally in line_length_linter Ignore multi-line string contents optionally in line_length_linter Jul 29, 2025
Copy link
Collaborator

@Bisaloo Bisaloo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to come back to this several time because the current version of is_in_string_body() is quite complex. I think part of it is the switch between vectorized and vapply logic and the easily missed changes between str_idx/long_idx as well as str_data/parse_data.

#' @param length Maximum line length allowed. Default is `80L` (Hollerith limit).
#' @param ignore_string_bodies Logical, default `FALSE`. If `TRUE`, the contents
#' of string literals are ignored. The quotes themselves are included, so this
#' mainly affects wide multiline strings, e.g. SQL queries.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' mainly affects wide multiline strings, e.g. SQL queries.
#' only affects wide multiline strings, e.g. SQL queries.

?

Looking at the code, this seems clearer & more precise.

Comment on lines +5 to +8
#' @param length Maximum line length allowed. Default is `80L` (Hollerith limit).
#' @param ignore_string_bodies Logical, default `FALSE`. If `TRUE`, the contents
#' of string literals are ignored. The quotes themselves are included, so this
#' mainly affects wide multiline strings, e.g. SQL queries.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some minimal examples for this new argument please?

Comment on lines +60 to +94
is_in_string_body <- function(parse_data, max_length, long_idx) {
str_idx <- parse_data$token == "STR_CONST"
if (!any(str_idx)) {
return(rep(FALSE, length(long_idx)))
}
str_data <- parse_data[str_idx, ]
if (all(str_data$line1 == str_data$line2)) {
return(rep(FALSE, length(long_idx)))
}
# right delimiter just ends at 'col2', but 'col1' takes some sleuthing
str_data$line1_width <- nchar(vapply(
strsplit(str_data$text, "\n", fixed = TRUE),
function(x) x[1L],
FUN.VALUE = character(1L),
USE.NAMES = FALSE
))
str_data$col1_end <- str_data$col1 + str_data$line1_width
vapply(
long_idx,
function(line) {
# strictly inside a multi-line string body
if (any(str_data$line1 < line & str_data$line2 > line)) {
return(TRUE)
}
on_line1_idx <- str_data$line1 == line
if (any(on_line1_idx)) {
return(max(str_data$col1_end[on_line1_idx]) <= max_length)
}
# use parse data to capture possible trailing expressions on this line
on_line2_idx <- parse_data$line2 == line
any(on_line2_idx) && max(parse_data$col2[on_line2_idx]) <= max_length
},
logical(1L)
)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct but I'm really struggling to wrap my head around this function, which concerns me a little bit for maintainability.

In particular, all current tests pass with this much simpler version of the vapply() call

Suggested change
is_in_string_body <- function(parse_data, max_length, long_idx) {
str_idx <- parse_data$token == "STR_CONST"
if (!any(str_idx)) {
return(rep(FALSE, length(long_idx)))
}
str_data <- parse_data[str_idx, ]
if (all(str_data$line1 == str_data$line2)) {
return(rep(FALSE, length(long_idx)))
}
# right delimiter just ends at 'col2', but 'col1' takes some sleuthing
str_data$line1_width <- nchar(vapply(
strsplit(str_data$text, "\n", fixed = TRUE),
function(x) x[1L],
FUN.VALUE = character(1L),
USE.NAMES = FALSE
))
str_data$col1_end <- str_data$col1 + str_data$line1_width
vapply(
long_idx,
function(line) {
# strictly inside a multi-line string body
if (any(str_data$line1 < line & str_data$line2 > line)) {
return(TRUE)
}
on_line1_idx <- str_data$line1 == line
if (any(on_line1_idx)) {
return(max(str_data$col1_end[on_line1_idx]) <= max_length)
}
# use parse data to capture possible trailing expressions on this line
on_line2_idx <- parse_data$line2 == line
any(on_line2_idx) && max(parse_data$col2[on_line2_idx]) <= max_length
},
logical(1L)
)
}
is_in_string_body <- function(parse_data, max_length, long_idx) {
str_idx <- parse_data$token == "STR_CONST"
if (!any(str_idx)) {
return(rep(FALSE, length(long_idx)))
}
str_data <- parse_data[str_idx, ]
if (all(str_data$line1 == str_data$line2)) {
return(rep(FALSE, length(long_idx)))
}
vapply(
long_idx,
function(line) {
# strictly inside a multi-line string body
any(str_data$line1 < line & str_data$line2 > line)
},
logical(1L)
)
}

Are we missing tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Turn off line length linter within string literals (optionally?)

3 participants