Skip to content

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Apr 21, 2025

Closes #1793 (in favor of this more holistic approach)

See also r-lib/vctrs#1968, but here's the TLDR below:

R 4.5.0 enables CATCH_ZERO_LENGTH_ACCESS here:

https://github.com/r-devel/r-svn/blob/9976c3d7f08c754593d01ba8380afb6be803dde2/src/main/memory.c#L4137-L4150

That means REAL() and friends now return an invalid pointer of (void *) 1 on 0-length R objects. I am not sure what it was returning before.

This does not prevent us from calling REAL() on 0-length R objects. i.e. code like this is fine and safe:

void fn() {
  R_xlen_t size = Rf_xlength(x);
  const double* p_x = REAL(x);

  for (R_xlen_t i = 0; i < size; ++i) {
    double elt = p_x[i];
  }
}

It's only if we try and use p_x that we may get in trouble on some platforms, like what @MichaelChirico reported. For example, with:

memcpy(p_dest, p_x, size * sizeof(double))
memset(p_x, 0, size * sizeof(double))

In a sane world, these would be safe because C should handle "if the size is 0, do nothing". But on some platforms it can detect that p_x is invalid memory, and we get yelled at. The most ergonomic solution for this is to implement "if the size is 0, do nothing" ourselves through r_memcpy() and r_memset() helpers, which is what I've done here. I've used them everywhere (even on memory not necessarily owned by R) except in the vendored xxhash code (which looks to be safe anyways).

@MichaelChirico
Copy link
Contributor

Makes sense to me. Linking this discussion which also has relevant insights:

mhahsler/arules#85

Maybe also worth noting the other, much subtler segfault we observed (in an old version of Bioconductor's {IRanges}):

SEXP lengths;
int upper_run;
int *lengths_elt = INTEGER(lengths);
upper_run = *lengths_elt;

One might hope this would "just work" for length-0 SEXP unless upper_run has an illegal access later, but no longer.

Copy link
Member

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh

@DavisVaughan DavisVaughan merged commit 311c12b into r-lib:main Apr 23, 2025
13 checks passed
@DavisVaughan DavisVaughan deleted the feature/safe-memcpy-memset branch April 23, 2025 13:22
@MichaelChirico
Copy link
Contributor

Hm, maybe this can be reverted in favor of just using R's own Memcpy() and Memzero():

https://stat.ethz.ch/pipermail/r-devel/2025-April/083995.html

@DavisVaughan
Copy link
Member Author

DavisVaughan commented Apr 23, 2025

@MichaelChirico
Copy link
Contributor

MichaelChirico commented Apr 23, 2025

Hm I see what you mean, however, I confirm that just using

#include <R_ext/RS.h>
Memcpy(p_out, p_x, cpy_size)

resolves my original issue with vctrs::vec_set_union(integer(), 12L), no segfault or other funny business.

@MichaelChirico
Copy link
Contributor

MichaelChirico commented Apr 23, 2025

I think the synthesis is thus: sizeof() is a compile-time operator, hence there's no run-time issue as *p is not actually doing the *.

https://www.geeksforgeeks.org/sizeof-operator-c/

@DavisVaughan
Copy link
Member Author

Per usual, mildly horrifying C behavior

// value of x doesn't change
y = sizeof(x++);

but that does seem to work in our favor here

@DavisVaughan
Copy link
Member Author

I think I'm still slightly happier with r_memcpy() over Memcpy() because ours really is a drop in replacement for memcpy(). i.e. no need to think about the fact that Memcpy() is doing (R_SIZE_T)(n) * sizeof(*p) for you, you just do what you were going to do for memcpy().

And for scalars like r_memcpy(p_dict, p_new_dict, sizeof(*p_dict)); I think I prefer our version over Memcpy(p_dict, p_new_dict, 1);

But in general it does seem like Memcpy() is good for general public usage!

@MichaelChirico
Copy link
Contributor

Per usual, mildly horrifying C behavior

FWIW that does get caught by a stingy compiler -Wunevaluated-expression

I think I'm still slightly happier with r_memcpy() over Memcpy() because ours really is a drop in replacement for memcpy()

Works for me. I had a look at switching to Memcpy() and yea, it requires a bit more concentration than the r_memcpy() approach. There are definitely a few places where we don't need r_memcpy() -- especially when the src is a CHARSXP, which is always \0-terminated & thus safe -- but I agree it's probably too much headache to think about that every time, much easier to just use r_memcpy() and be done with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants