Skip to content

race condition in native cpu adaptor allocation functions #2070

Open
@ldrumm

Description

@ldrumm

There's a subtle race condition in get_alloc_info_entry.
get_alloc_info_entry protects the set of known allocations by grabbing the mutex protecting the same, but then returns a reference to the allocation by which point the mutex has been released.

It's plausible that by the time the caller dereferences the returned reference another thread may have freed the given allocation.

At first I thought that this might be enough:

diff --git a/source/adapters/native_cpu/context.hpp b/source/adapters/native_cpu/context.hpp
index c59ab4ea..d3f54b3d 100644
--- a/source/adapters/native_cpu/context.hpp
+++ b/source/adapters/native_cpu/context.hpp
@@ -110,7 +110,7 @@ struct ur_context_handle_t_ : RefCounted {
   }

   // Note this is made non-const to access the mutex
-  const native_cpu::usm_alloc_info &get_alloc_info_entry(const void *ptr) {
+  const native_cpu::usm_alloc_info get_alloc_info_entry(const void *ptr) {
     std::lock_guard<std::mutex> lock(alloc_mutex);
     auto it = allocations.find(ptr);
     if (it == allocations.end()) {

However, returning a value type here doesn't actually help since although it'll prevent a segfault or similar UB, there's no way to determine whether the information returned is still valid.

I'm not sure whether it's something we see in the wild, but it's a real possibility so might be worth a different approach

Metadata

Metadata

Assignees

No one assigned

    Labels

    native-cpuNative CPU adapter specific issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions