[scudo] Add primary option to zero block on dealloc. #142394

piwicode · 2025-06-02T13:52:46Z

When all the blocks of a page are unused, the page will be full of zero and decommitted on operating systems that scan the memory.

Change-Id: I278055d82057090b0a04d812b49cf93fdf467478

github-actions · 2025-06-02T13:53:05Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-06-02T13:53:38Z

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: None (piwicode)

Changes

When all the blocks of a page are unused, the page will be full of zero and decommitted on operating systems that scan the memory.

Change-Id: I278055d82057090b0a04d812b49cf93fdf467478

Full diff: https://github.com/llvm/llvm-project/pull/142394.diff

4 Files Affected:

(modified) compiler-rt/lib/scudo/standalone/allocator_config.def (+5)
(modified) compiler-rt/lib/scudo/standalone/primary64.h (+1)
(modified) compiler-rt/lib/scudo/standalone/size_class_allocator.h (+9)
(modified) compiler-rt/lib/scudo/standalone/tests/primary_test.cpp (+23-1)

diff --git a/compiler-rt/lib/scudo/standalone/allocator_config.def b/compiler-rt/lib/scudo/standalone/allocator_config.def
index 84fcec0877d40..2d58cb0a4d392 100644
--- a/compiler-rt/lib/scudo/standalone/allocator_config.def
+++ b/compiler-rt/lib/scudo/standalone/allocator_config.def
@@ -111,6 +111,11 @@ PRIMARY_OPTIONAL_TYPE(ConditionVariableT, ConditionVariableDummy)
 // to, in increments of a power-of-2 scale. See `CompactPtrScale` also.
 PRIMARY_OPTIONAL_TYPE(CompactPtrT, uptr)
 
+// Clears the memory slot when a allocation is returned to the allocator.
+// Operating systems that detects pages filled with zeroes will decommit
+// memory.
+PRIMARY_OPTIONAL(const bool, ZeroOnDealloc, false)
+
 // SECONDARY_REQUIRED_TEMPLATE_TYPE(NAME)
 //
 // Defines the type of Secondary Cache to use.
diff --git a/compiler-rt/lib/scudo/standalone/primary64.h b/compiler-rt/lib/scudo/standalone/primary64.h
index 25ee999199114..145ee26dd855f 100644
--- a/compiler-rt/lib/scudo/standalone/primary64.h
+++ b/compiler-rt/lib/scudo/standalone/primary64.h
@@ -53,6 +53,7 @@ template <typename Config> class SizeClassAllocator64 {
   static const uptr CompactPtrScale = Config::getCompactPtrScale();
   static const uptr RegionSizeLog = Config::getRegionSizeLog();
   static const uptr GroupSizeLog = Config::getGroupSizeLog();
+  static const bool ZeroOnDealloc = Config::getZeroOnDealloc();
   static_assert(RegionSizeLog >= GroupSizeLog,
                 "Group size shouldn't be greater than the region size");
   static const uptr GroupScale = GroupSizeLog - CompactPtrScale;
diff --git a/compiler-rt/lib/scudo/standalone/size_class_allocator.h b/compiler-rt/lib/scudo/standalone/size_class_allocator.h
index 7c7d6307f8f0a..219b08774bcb6 100644
--- a/compiler-rt/lib/scudo/standalone/size_class_allocator.h
+++ b/compiler-rt/lib/scudo/standalone/size_class_allocator.h
@@ -59,6 +59,11 @@ template <class SizeClassAllocator> struct SizeClassAllocatorLocalCache {
 
   bool deallocate(uptr ClassId, void *P) {
     CHECK_LT(ClassId, NumClasses);
+
+    if(SizeClassAllocator::ZeroOnDealloc) {
+      memset(P, 0, SizeClassAllocator::getSizeByClassId(ClassId));
+    }
+
     PerClass *C = &PerClassArray[ClassId];
 
     // If the cache is full, drain half of blocks back to the main allocator.
@@ -211,6 +216,10 @@ template <class SizeClassAllocator> struct SizeClassAllocatorNoCache {
   bool deallocate(uptr ClassId, void *P) {
     CHECK_LT(ClassId, NumClasses);
 
+    if(SizeClassAllocator::ZeroOnDealloc) {
+      memset(P, 0, SizeClassAllocator::getSizeByClassId(ClassId));
+    }
+
     if (ClassId == BatchClassId)
       return deallocateBatchClassBlock(P);
 
diff --git a/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp b/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp
index 7dc38c2ede8ca..47bfc3a9fc242 100644
--- a/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp
+++ b/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp
@@ -150,6 +150,27 @@ template <typename SizeClassMapT> struct TestConfig5 {
   };
 };
 
+// Enable `ZeroOnDealloc`
+template <typename SizeClassMapT> struct TestConfig6 {
+  static const bool MaySupportMemoryTagging = false;
+  template <typename> using TSDRegistryT = void;
+  template <typename> using PrimaryT = void;
+  template <typename> using SecondaryT = void;
+
+  struct Primary {
+    using SizeClassMap = SizeClassMapT;
+    static const scudo::uptr RegionSizeLog = 18U;
+    static const scudo::uptr GroupSizeLog = 18U;
+    static const scudo::s32 MinReleaseToOsIntervalMs = INT32_MIN;
+    static const scudo::s32 MaxReleaseToOsIntervalMs = INT32_MAX;
+    typedef scudo::uptr CompactPtrT;
+    static const scudo::uptr CompactPtrScale = 0;
+    static const bool EnableRandomOffset = true;
+    static const scudo::uptr MapSizeIncrement = 1UL << 18;
+    static const bool ZeroOnDealloc = true;
+  };
+};
+
 template <template <typename> class BaseConfig, typename SizeClassMapT>
 struct Config : public BaseConfig<SizeClassMapT> {};
 
@@ -191,7 +212,8 @@ struct ScudoPrimaryTest : public Test {};
   SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TestConfig2)                            \
   SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TestConfig3)                            \
   SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TestConfig4)                            \
-  SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TestConfig5)
+  SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TestConfig5)                            \
+  SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TestConfig6)
 #endif
 
 #define SCUDO_TYPED_TEST_TYPE(FIXTURE, NAME, TYPE)                             \

piwicode · 2025-06-13T16:11:02Z

@cferris1000 @fabio-d Please take a look.

ChiaHungDuan

I think this is better to be done in deallocate() path in combined.h.

ChiaHungDuan · 2025-06-13T17:33:51Z

compiler-rt/lib/scudo/standalone/flags.inc

@@ -34,6 +34,10 @@ SCUDO_FLAG(bool, delete_size_mismatch, true,

 SCUDO_FLAG(bool, zero_contents, false, "Zero chunk contents on allocation.")

+SCUDO_FLAG(bool, zero_on_dealloc, false,
+           "Clears the memory slot when a allocation is returned to the "


an allocation

When all the blocks of a page are unused, the page will be full of zero and decommitted on operating systems that scan the memory. Change-Id: I278055d82057090b0a04d812b49cf93fdf467478

piwicode · 2025-06-16T15:13:24Z

I think this is better to be done in deallocate() path in combined.h.

Zeroing is only useful for the primary allocator, where objects are small, and when all objects of a given page are zeroed, the page can be reclaimed by the OS. The secondary allocator can affort to simply releases the memory to the operating system because it is working on large objects.
In combined.h there are also two code path to deallocate depending on whenever the block is quarantined or not.

That why I have a some preference for the SizeClassAllocator. Can you double check and let me know if you prefer to move this code to combined.h?

ChiaHungDuan · 2025-06-18T05:08:27Z

I think this is better to be done in deallocate() path in combined.h.

Zeroing is only useful for the primary allocator, where objects are small, and when all objects of a given page are zeroed, the page can be reclaimed by the OS. The secondary allocator can affort to simply releases the memory to the operating system because it is working on large objects.

In combined.h there are also two code path to deallocate depending on whenever the block is quarantined or not.

That why I have a some preference for the SizeClassAllocator. Can you double check and let me know if you prefer to move this code to combined.h?

Zeroing-on-free is also a security feature ans that's why I think it is good to have. But it seems to me this is introduced for specific platform, then we may want to consider if we can implement it on platform side. For example, the malloc/free hooks. In addition, the size of blocks in the primary allocator can be large (like several pages), it depends on the configuration. So even if you only zeroing blocks in primary allocator, it can still cause some performance issue.

We do have a way to implement this only for certain sizes but I would suggest having more discussion before we make it a formal feature in Scudo

piwicode · 2025-06-19T10:37:45Z

Thanks for taking the time to reply.

Zeroing-on-free is also a security feature ans that's why I think it is good to have. But it seems to me this is introduced for specific platform, then we may want to consider if we can implement it on platform side.

I made some conclusive measurements on Fuchsia. I don't have measures on Android but I this is will be positive too, given that linux zram store efficiently pages with a unique value.

For example, the malloc/free hooks.

I see a few challenges because __scudo_deallocate_hook:

runs outside of the allocator lock, hence a given block could be in use when the hook is called.
does not know about the allocation header which also needs to be cleared.
runs indistinctly on primary and secondary allocator, wheras this change is a primary allocator optimization.
__scudo_deallocate_hook does not know the object size (easily fixed)

In addition, the size of blocks in the primary allocator can be large (like several pages), it depends on the configuration. So even if you only zeroing blocks in primary allocator, it can still cause some performance issue.

Indeed. That should be in the same ballpark as zeroing on allocation. I thought about relaying on a platform specific function later on that would use kernel operation to zero the memory. That feature could be implemented in memset or scudo.
I profiled scudo allocation and deallocation on my system and the effect is small. I observe not effect on macro bencharks.

On the positive side, this decommits 30 MiB of memory on our system which is significant, I observe a reduction of the number of pages moved to zram, and a decrease in the time spent by the kernel on compression and decompression.

We do have a way to implement this only for certain sizes but I would suggest having more discussion before we make it a formal feature in Scudo

Sure, how can I get the discussion rolling?

ChiaHungDuan · 2025-06-19T18:29:47Z

Thanks for taking the time to reply.

Zeroing-on-free is also a security feature ans that's why I think it is good to have. But it seems to me this is introduced for specific platform, then we may want to consider if we can implement it on platform side.

I made some conclusive measurements on Fuchsia. I don't have measures on Android but I this is will be positive too, given that linux zram store efficiently pages with a unique value.

If this is supposed to be generally supported, then it shouldn't be only for primary allocator. Like the zero-on-allocation, it's used for all the blocks. Unless we have a good reason and it's for general cases (not for some specific cases). OTOH, we have the page releasing mechanism which also helps with memory usage. You can do a mandatory clean up as well. One thing I'm not sure is the cost of having the kernel detects the zero page, it seems not a cheap operation like does it check the page status periodically?

For example, the malloc/free hooks.

I see a few challenges because __scudo_deallocate_hook:

runs outside of the allocator lock, hence a given block could be in use when the hook is called.

does not know about the allocation header which also needs to be cleared.

runs indistinctly on primary and secondary allocator, wheras this change is a primary allocator optimization.

__scudo_deallocate_hook does not know the object size (easily fixed)

Sorry, I forgot you also want to clean the header. Then this is another behavior which is specific to a certain case. Adding zeroing-on-free, I may expect the zeroing cleans everything except the header (because it helps something like double-free-detection)

In addition, the size of blocks in the primary allocator can be large (like several pages), it depends on the configuration. So even if you only zeroing blocks in primary allocator, it can still cause some performance issue.

Indeed. That should be in the same ballpark as zeroing on allocation. I thought about relaying on a platform specific function later on that would use kernel operation to zero the memory. That feature could be implemented in memset or scudo. I profiled scudo allocation and deallocation on my system and the effect is small. I observe not effect on macro bencharks.

Not only the micro-benchmark, we should also take real cases into consideration. When turning on zeroing-on-alloc, we see significant performance impact on Android (especially for some processes like camera which is sensitive to fps) Thus I'm not sure what you mean "the effect is small". Even for microbenchmark, I believe you still see performance regression on free because it just introduce additional operations. I would like to see if we have more details about the measurements.

On the positive side, this decommits 30 MiB of memory on our system which is significant, I observe a reduction of the number of pages moved to zram, and a decrease in the time spent by the kernel on compression and decompression.

We do have a way to implement this only for certain sizes but I would suggest having more discussion before we make it a formal feature in Scudo

Sure, how can I get the discussion rolling?

We can do it here. The main concern for me is that I'm not persuaded by the reason why we only support zeroing memory on free for primary and clean everything in the block. Scudo is an userspace memory allocator, if there's any performance/memory optimization relying on the OS/Kernel, then we should be careful about the expectation of new features. For example, it's better not to say the OS will release the zero pages. And the users are not only Fuchsia/Android, there are still few different platforms (yes, very few of them, but it's not only for one or two platforms). It's better to make changes based on more general use cases.

piwicode · 2025-06-20T16:23:45Z

If this is supposed to be generally supported, then it shouldn't be only for primary allocator. Like the zero-on-allocation, it's used for all the blocks. Unless we have a good reason and it's for general cases (not for some specific cases).

My proposal aims for performance gains, not security, which is why it differs from the zero-on-allocation standard. Could a new name help clarify this distinction?

My understanding is that memset is unnecessary on the secondary allocator because pages are unmapped upon deallocation. The next mapping will automatically provide a zero-initialized page, making a memset here a performance waste. (assuming no quanteene)

OTOH, we have the page releasing mechanism which also helps with memory usage. You can do a mandatory clean up as well. One thing I'm not sure is the cost of having the kernel detects the zero page, it seems not a cheap operation like does it check the page status periodically?

This is a process already happening as part of zram compression. This change makes it more efficient and avoids to process data of the deallocated blocks.
When the operating system is under pressure, it scans the pages in least accessed order, with the primary intent to compress them. When pages happens to be full of zeroes, the operating system can avoid compression and decompression cpu time, and save some compression storage. On Fuchsia OS we observe a reduction in time spent compressing / decompressing and a slight increase in compression ratio (I suspect this is because there are more zeros in the average page).

Sorry, I forgot you also want to clean the header. Then this is another behavior which is specific to a certain case. Adding zeroing-on-free, I may expect the zeroing cleans everything except the header (because it helps something like double-free-detection)

Ack. Keeping the headers defeats the purpose, as the memory pages would not be full of zeroes.

Not only the micro-benchmark, we should also take real cases into consideration. When turning on zeroing-on-alloc, we see significant performance impact on Android (especially for some processes like camera which is sensitive to fps) Thus I'm not sure what you mean "the effect is small". Even for microbenchmark, I believe you still see performance regression on free because it just introduce additional operations. I would like to see if we have more details about the measurements.

So far macro benchmark on the boot sequence show no significant impact on android boot sequence timing.
Dedicate performance test suites data is temporarily unavailable available for now. So I cannot tell for specific processes.
Reducing the memory pressure on the system decreases the amount of time spent paging in and out.
Like many performance knob, it is a tradeoff. Here it is positive on system under memory pressure, and will be less interesting when there is plenty memory available.

We can do it here. The main concern for me is that I'm not persuaded by the reason why we only support zeroing memory on free for primary and clean everything in the block. Scudo is an userspace memory allocator, if there's any performance/memory optimization relying on the OS/Kernel, then we should be careful about the expectation of new features. For example, it's better not to say the OS will release the zero pages. And the users are not only Fuchsia/Android, there are still few different platforms (yes, very few of them, but it's not only for one or two platforms). It's better to make changes based on more general use cases.

That is a good discussion to have. This proposal is aiming at performance, and the option should only be used on systems where it is beneficial. I understand that the person enabling this option is making an assumption that the operating system will be able to take part of it to release some pages when needed, and it is a new reclamation code-path.

cferris1000 · 2025-06-23T21:55:38Z

The underlying issue is that this change is fuchsia kernel specific. It will not help anything but that one kernel. That's why the proposal was that you use some of the extensions to do this in fuchsia specific code instead of adding a lot of code for an option that isn't very useful to other platforms . Especially the fact that this only applies to the primary, and might cause other issues if you don't want this to apply to larger primary allocations (many configs have primary allocations up to 65KB allocations or larger).

On fuchsia, I believe the secondary allocations are unmapped on free, but on any platform that uses the secondary cache and has the decay timer set to a non-zero value, those allocations are not zeroed on free. So this would be a confusing config parameter.

As mentioned by Chiahung, this turns off a security feature, which might be okay, but it should be fully understood. Which is why I suspect that if you run the unit tests with this option enabled, there will be failures, the use after free tests.

llvmbot added compiler-rt compiler-rt:scudo Scudo Hardened Allocator compiler-rt:sanitizer labels Jun 2, 2025

piwicode force-pushed the zero_on_dealloc branch 2 times, most recently from 6ca806a to 7e9ce80 Compare June 13, 2025 16:05

piwicode force-pushed the zero_on_dealloc branch from 7e9ce80 to 11d45d8 Compare June 13, 2025 16:16

ChiaHungDuan reviewed Jun 13, 2025

View reviewed changes

[scudo] Add primary option to zero block on dealloc.

aa03ee5

When all the blocks of a page are unused, the page will be full of zero and decommitted on operating systems that scan the memory. Change-Id: I278055d82057090b0a04d812b49cf93fdf467478

piwicode force-pushed the zero_on_dealloc branch from 11d45d8 to aa03ee5 Compare June 16, 2025 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[scudo] Add primary option to zero block on dealloc. #142394

[scudo] Add primary option to zero block on dealloc. #142394

Uh oh!

piwicode commented Jun 2, 2025

Uh oh!

github-actions bot commented Jun 2, 2025

Uh oh!

llvmbot commented Jun 2, 2025

Uh oh!

piwicode commented Jun 13, 2025

Uh oh!

ChiaHungDuan left a comment

Uh oh!

ChiaHungDuan Jun 13, 2025

Uh oh!

piwicode Jun 16, 2025

Uh oh!

piwicode commented Jun 16, 2025 •

edited

Loading

Uh oh!

ChiaHungDuan commented Jun 18, 2025

Uh oh!

piwicode commented Jun 19, 2025

Uh oh!

ChiaHungDuan commented Jun 19, 2025 •

edited

Loading

Uh oh!

piwicode commented Jun 20, 2025

Uh oh!

cferris1000 commented Jun 23, 2025

Uh oh!

Uh oh!

[scudo] Add primary option to zero block on dealloc. #142394

Are you sure you want to change the base?

[scudo] Add primary option to zero block on dealloc. #142394

Uh oh!

Conversation

piwicode commented Jun 2, 2025

Uh oh!

github-actions bot commented Jun 2, 2025

Uh oh!

llvmbot commented Jun 2, 2025

Uh oh!

piwicode commented Jun 13, 2025

Uh oh!

ChiaHungDuan left a comment

Choose a reason for hiding this comment

Uh oh!

ChiaHungDuan Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

piwicode Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

piwicode commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChiaHungDuan commented Jun 18, 2025

Uh oh!

piwicode commented Jun 19, 2025

Uh oh!

ChiaHungDuan commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

piwicode commented Jun 20, 2025

Uh oh!

cferris1000 commented Jun 23, 2025

Uh oh!

Uh oh!

piwicode commented Jun 16, 2025 •

edited

Loading

ChiaHungDuan commented Jun 19, 2025 •

edited

Loading