Description
Currently, all adapters suffer from a problem where the statically constructed Adapter object is being destructed before the last urAdapterRelease
is called, and potentially before all other UR objects are destroyed. In particular, an application can hold on to e.g., a ur_device_handle_t
and call urDeviceRelease
on that object after the entire static UR state has been already destroyed.
In most scenarios, this is benign, but we are relaying on a compiler-defined behavior. For example, in CUDA and OpenCL adapters we are relaying on std::mutex
being trivially-destructible so that its safe to lock it after it has been destroyed (which is iffy in the first place). The standard does not allow this (even though gcc might).
This has had the most severe impact for the L0 adapter, where the Adapter class contained all the platform state, including all the allocated objects. The Adapter destructor invalidated all the UR handles coming from that adapter. Given the urgent nature of that problem, this necessitated merging a workaround into main: #1419
This workaround uses two different approaches depending on the platform with two different lifetime cycles for the new GlobalAdapter pointer, which might lead to more difficult to debug issues.
Here's a simple SYCL program that demonstrates this problem:
#include <sycl/sycl.hpp>
#include <vector>
using namespace sycl;
struct {
std::vector<device> devices;
} foo;
int main(int argc, char* argv[]) {
auto platform_list = sycl::platform::get_platforms();
std::vector<device> root_devices;
for (const auto& platform : platform_list) {
if (platform.get_backend() != sycl::backend::opencl) {
continue;
}
auto device_list = platform.get_devices();
for (const auto& device : device_list) {
if (device.is_gpu()) {
foo.devices.push_back((device));
}
}
}
return 0;
}
The static UR state will get cleaned up before SYCL runtime calls urAdapterRelease
.