Skip to content

Adapter initialization order fiasco #1425

Open
@pbalcer

Description

@pbalcer

Currently, all adapters suffer from a problem where the statically constructed Adapter object is being destructed before the last urAdapterRelease is called, and potentially before all other UR objects are destroyed. In particular, an application can hold on to e.g., a ur_device_handle_t and call urDeviceRelease on that object after the entire static UR state has been already destroyed.
In most scenarios, this is benign, but we are relaying on a compiler-defined behavior. For example, in CUDA and OpenCL adapters we are relaying on std::mutex being trivially-destructible so that its safe to lock it after it has been destroyed (which is iffy in the first place). The standard does not allow this (even though gcc might).

This has had the most severe impact for the L0 adapter, where the Adapter class contained all the platform state, including all the allocated objects. The Adapter destructor invalidated all the UR handles coming from that adapter. Given the urgent nature of that problem, this necessitated merging a workaround into main: #1419

This workaround uses two different approaches depending on the platform with two different lifetime cycles for the new GlobalAdapter pointer, which might lead to more difficult to debug issues.

Here's a simple SYCL program that demonstrates this problem:

#include <sycl/sycl.hpp>
#include <vector>

using namespace sycl;

struct {
  std::vector<device> devices;
} foo;

int main(int argc, char* argv[]) {
  auto platform_list = sycl::platform::get_platforms();

  std::vector<device> root_devices;
  for (const auto& platform : platform_list) {
    if (platform.get_backend() != sycl::backend::opencl) {
      continue;
    }
    auto device_list = platform.get_devices();
    for (const auto& device : device_list) {
      if (device.is_gpu()) {
        foo.devices.push_back((device));
      }
    }
  }

  return 0;
}

The static UR state will get cleaned up before SYCL runtime calls urAdapterRelease.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcudaCUDA adapter specific issueslevel-zeroL0 adapter specific issuesopenclOpenCL adapter specific issuespiDPC++ PI requirement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions