-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[scudo] Add primary option to zero block on dealloc. #142394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-compiler-rt-sanitizer Author: None (piwicode) ChangesWhen all the blocks of a page are unused, the page will be full of zero and decommitted on operating systems that scan the memory. Change-Id: I278055d82057090b0a04d812b49cf93fdf467478 Full diff: https://github.com/llvm/llvm-project/pull/142394.diff 4 Files Affected:
diff --git a/compiler-rt/lib/scudo/standalone/allocator_config.def b/compiler-rt/lib/scudo/standalone/allocator_config.def
index 84fcec0877d40..2d58cb0a4d392 100644
--- a/compiler-rt/lib/scudo/standalone/allocator_config.def
+++ b/compiler-rt/lib/scudo/standalone/allocator_config.def
@@ -111,6 +111,11 @@ PRIMARY_OPTIONAL_TYPE(ConditionVariableT, ConditionVariableDummy)
// to, in increments of a power-of-2 scale. See `CompactPtrScale` also.
PRIMARY_OPTIONAL_TYPE(CompactPtrT, uptr)
+// Clears the memory slot when a allocation is returned to the allocator.
+// Operating systems that detects pages filled with zeroes will decommit
+// memory.
+PRIMARY_OPTIONAL(const bool, ZeroOnDealloc, false)
+
// SECONDARY_REQUIRED_TEMPLATE_TYPE(NAME)
//
// Defines the type of Secondary Cache to use.
diff --git a/compiler-rt/lib/scudo/standalone/primary64.h b/compiler-rt/lib/scudo/standalone/primary64.h
index 25ee999199114..145ee26dd855f 100644
--- a/compiler-rt/lib/scudo/standalone/primary64.h
+++ b/compiler-rt/lib/scudo/standalone/primary64.h
@@ -53,6 +53,7 @@ template <typename Config> class SizeClassAllocator64 {
static const uptr CompactPtrScale = Config::getCompactPtrScale();
static const uptr RegionSizeLog = Config::getRegionSizeLog();
static const uptr GroupSizeLog = Config::getGroupSizeLog();
+ static const bool ZeroOnDealloc = Config::getZeroOnDealloc();
static_assert(RegionSizeLog >= GroupSizeLog,
"Group size shouldn't be greater than the region size");
static const uptr GroupScale = GroupSizeLog - CompactPtrScale;
diff --git a/compiler-rt/lib/scudo/standalone/size_class_allocator.h b/compiler-rt/lib/scudo/standalone/size_class_allocator.h
index 7c7d6307f8f0a..219b08774bcb6 100644
--- a/compiler-rt/lib/scudo/standalone/size_class_allocator.h
+++ b/compiler-rt/lib/scudo/standalone/size_class_allocator.h
@@ -59,6 +59,11 @@ template <class SizeClassAllocator> struct SizeClassAllocatorLocalCache {
bool deallocate(uptr ClassId, void *P) {
CHECK_LT(ClassId, NumClasses);
+
+ if(SizeClassAllocator::ZeroOnDealloc) {
+ memset(P, 0, SizeClassAllocator::getSizeByClassId(ClassId));
+ }
+
PerClass *C = &PerClassArray[ClassId];
// If the cache is full, drain half of blocks back to the main allocator.
@@ -211,6 +216,10 @@ template <class SizeClassAllocator> struct SizeClassAllocatorNoCache {
bool deallocate(uptr ClassId, void *P) {
CHECK_LT(ClassId, NumClasses);
+ if(SizeClassAllocator::ZeroOnDealloc) {
+ memset(P, 0, SizeClassAllocator::getSizeByClassId(ClassId));
+ }
+
if (ClassId == BatchClassId)
return deallocateBatchClassBlock(P);
diff --git a/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp b/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp
index 7dc38c2ede8ca..47bfc3a9fc242 100644
--- a/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp
+++ b/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp
@@ -150,6 +150,27 @@ template <typename SizeClassMapT> struct TestConfig5 {
};
};
+// Enable `ZeroOnDealloc`
+template <typename SizeClassMapT> struct TestConfig6 {
+ static const bool MaySupportMemoryTagging = false;
+ template <typename> using TSDRegistryT = void;
+ template <typename> using PrimaryT = void;
+ template <typename> using SecondaryT = void;
+
+ struct Primary {
+ using SizeClassMap = SizeClassMapT;
+ static const scudo::uptr RegionSizeLog = 18U;
+ static const scudo::uptr GroupSizeLog = 18U;
+ static const scudo::s32 MinReleaseToOsIntervalMs = INT32_MIN;
+ static const scudo::s32 MaxReleaseToOsIntervalMs = INT32_MAX;
+ typedef scudo::uptr CompactPtrT;
+ static const scudo::uptr CompactPtrScale = 0;
+ static const bool EnableRandomOffset = true;
+ static const scudo::uptr MapSizeIncrement = 1UL << 18;
+ static const bool ZeroOnDealloc = true;
+ };
+};
+
template <template <typename> class BaseConfig, typename SizeClassMapT>
struct Config : public BaseConfig<SizeClassMapT> {};
@@ -191,7 +212,8 @@ struct ScudoPrimaryTest : public Test {};
SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TestConfig2) \
SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TestConfig3) \
SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TestConfig4) \
- SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TestConfig5)
+ SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TestConfig5) \
+ SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TestConfig6)
#endif
#define SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TYPE) \
|
6ca806a
to
7e9ce80
Compare
@cferris1000 @fabio-d Please take a look. |
7e9ce80
to
11d45d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is better to be done in deallocate()
path in combined.h.
@@ -34,6 +34,10 @@ SCUDO_FLAG(bool, delete_size_mismatch, true, | |||
|
|||
SCUDO_FLAG(bool, zero_contents, false, "Zero chunk contents on allocation.") | |||
|
|||
SCUDO_FLAG(bool, zero_on_dealloc, false, | |||
"Clears the memory slot when a allocation is returned to the " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an allocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
When all the blocks of a page are unused, the page will be full of zero and decommitted on operating systems that scan the memory. Change-Id: I278055d82057090b0a04d812b49cf93fdf467478
11d45d8
to
aa03ee5
Compare
That why I have a some preference for the SizeClassAllocator. Can you double check and let me know if you prefer to move this code to |
Zeroing-on-free is also a security feature ans that's why I think it is good to have. But it seems to me this is introduced for specific platform, then we may want to consider if we can implement it on platform side. For example, the malloc/free hooks. In addition, the size of blocks in the primary allocator can be large (like several pages), it depends on the configuration. So even if you only zeroing blocks in primary allocator, it can still cause some performance issue. We do have a way to implement this only for certain sizes but I would suggest having more discussion before we make it a formal feature in Scudo |
Thanks for taking the time to reply.
I made some conclusive measurements on Fuchsia. I don't have measures on Android but I this is will be positive too, given that linux zram store efficiently pages with a unique value.
I see a few challenges because
Indeed. That should be in the same ballpark as zeroing on allocation. I thought about relaying on a platform specific function later on that would use kernel operation to zero the memory. That feature could be implemented in memset or scudo. On the positive side, this decommits 30 MiB of memory on our system which is significant, I observe a reduction of the number of pages moved to zram, and a decrease in the time spent by the kernel on compression and decompression.
Sure, how can I get the discussion rolling? |
If this is supposed to be generally supported, then it shouldn't be only for primary allocator. Like the zero-on-allocation, it's used for all the blocks. Unless we have a good reason and it's for general cases (not for some specific cases). OTOH, we have the page releasing mechanism which also helps with memory usage. You can do a mandatory clean up as well. One thing I'm not sure is the cost of having the kernel detects the zero page, it seems not a cheap operation like does it check the page status periodically?
Sorry, I forgot you also want to clean the header. Then this is another behavior which is specific to a certain case. Adding zeroing-on-free, I may expect the zeroing cleans everything except the header (because it helps something like double-free-detection)
Not only the micro-benchmark, we should also take real cases into consideration. When turning on zeroing-on-alloc, we see significant performance impact on Android (especially for some processes like camera which is sensitive to fps) Thus I'm not sure what you mean "the effect is small". Even for microbenchmark, I believe you still see performance regression on free because it just introduce additional operations. I would like to see if we have more details about the measurements.
We can do it here. The main concern for me is that I'm not persuaded by the reason why we only support zeroing memory on free for primary and clean everything in the block. Scudo is an userspace memory allocator, if there's any performance/memory optimization relying on the OS/Kernel, then we should be careful about the expectation of new features. For example, it's better not to say the OS will release the zero pages. And the users are not only Fuchsia/Android, there are still few different platforms (yes, very few of them, but it's not only for one or two platforms). It's better to make changes based on more general use cases. |
My proposal aims for performance gains, not security, which is why it differs from the zero-on-allocation standard. Could a new name help clarify this distinction? My understanding is that memset is unnecessary on the secondary allocator because pages are unmapped upon deallocation. The next mapping will automatically provide a zero-initialized page, making a memset here a performance waste. (assuming no quanteene)
This is a process already happening as part of zram compression. This change makes it more efficient and avoids to process data of the deallocated blocks.
Ack. Keeping the headers defeats the purpose, as the memory pages would not be full of zeroes.
So far macro benchmark on the boot sequence show no significant impact on android boot sequence timing.
That is a good discussion to have. This proposal is aiming at performance, and the option should only be used on systems where it is beneficial. I understand that the person enabling this option is making an assumption that the operating system will be able to take part of it to release some pages when needed, and it is a new reclamation code-path. |
The underlying issue is that this change is fuchsia kernel specific. It will not help anything but that one kernel. That's why the proposal was that you use some of the extensions to do this in fuchsia specific code instead of adding a lot of code for an option that isn't very useful to other platforms . Especially the fact that this only applies to the primary, and might cause other issues if you don't want this to apply to larger primary allocations (many configs have primary allocations up to 65KB allocations or larger). On fuchsia, I believe the secondary allocations are unmapped on free, but on any platform that uses the secondary cache and has the decay timer set to a non-zero value, those allocations are not zeroed on free. So this would be a confusing config parameter. As mentioned by Chiahung, this turns off a security feature, which might be okay, but it should be fully understood. Which is why I suspect that if you run the unit tests with this option enabled, there will be failures, the use after free tests. |
When all the blocks of a page are unused, the page will be full of zero and decommitted on operating systems that scan the memory.
Change-Id: I278055d82057090b0a04d812b49cf93fdf467478