Skip to content

[BUG] not enough memory, while I have plenty #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
BrickDesignerNL opened this issue Dec 8, 2024 · 12 comments
Open

[BUG] not enough memory, while I have plenty #1

BrickDesignerNL opened this issue Dec 8, 2024 · 12 comments

Comments

@BrickDesignerNL
Copy link

BrickDesignerNL commented Dec 8, 2024

I have the same issue as mentioned here:
https://www.reddit.com/r/webgpu/comments/1bzzul0/binding_size_141557760_of_buffer_is_larger_than/

Uncaught (in promise) OperationError: Failed to execute 'requestDevice' on 'GPUAdapter': Required limit (1073741824) is greater than the supported limit (134217728).

  • While validating maxStorageBufferBindingSize
  • While validating required limits

But code like this

const k1Gig = 1024 * 1024 * 1024 * 2;
const adapter = await navigator.gpu?.requestAdapter();
const device = adapter?.requestDevice({
  requiredLimits: { maxBufferSize: k1Gig },
  requiredFeatures: [ 'float32-filterable' ],
});

Works without a problem.
Resulting in a maxBufferSize of 2147483648 ~= 2048MB ~= 2GB available (2x the memory need by the model) instead of the 128MB the error claims is the maximum.

Can you fix the memory reservation?
I would love to use your tool.

By the way shader-f16 is available in my Chrome 131.0.6778.109 (Officiële build) (arm64), when WebGPU is enabled.

Testen on
Qualcomm X Elite, 32GB (with 16GB allocated to the NPU/GPU) => architecture: "adreno-8xx"
Chrome 131.0.6778.109 (Officiële build) (arm64), with WebGPU is enabled.

@wpmed92
Copy link
Contributor

wpmed92 commented Jan 3, 2025

Can you try my new f16 deployment? https://github.com/wpmed92/stable-diffusion-tinygrad-f16
Let me know if it works.

@BrickDesignerNL
Copy link
Author

BrickDesignerNL commented Jan 4, 2025

hI @wpmed92,

Chrome Versie 131.0.6778.205 (Officiële build) (arm64) gives me the following:

Uncaught (in promise) OperationError: Failed to execute 'requestDevice' on 'GPUAdapter': Required limit (1073741824) is greater than the supported limit (134217728).

  • While validating maxStorageBufferBindingSize
  • While validating required limits

at

stable-diffusion-tinygrad-f16/:409

The AI feature of Chrome Developer Console suggests:

This error, "Uncaught (in promise) OperationError: Failed to execute 'requestDevice' on 'GPUAdapter': Required limit (1073741824) is greater than the supported limit (134217728)," means your web application is trying to request a GPU device with capabilities that your current hardware or browser configuration doesn't support. Specifically, it's hitting a limit on the maximum size of a storage buffer binding.

Let's break down the problem and potential solutions:

  • requestDevice: This WebGPU API call attempts to get a handle to your graphics card (GPU) for performing computations.
  • maxStorageBufferBindingSize: This refers to the maximum size of a buffer that can be used for storage within the GPU's memory. Your application needs 1073741824 bytes (1GB), but your system only supports 134217728 bytes (128MB). This is the core of the issue.
  • Supported Limit (128MB): This is significantly lower than the requested 1GB, suggesting either:
    • Weak GPU: Your graphics card may be relatively old or low-powered, and therefore incapable of handling large storage buffers. Integrated graphics (iGPUs) often have more constrained resources.
    • Driver Issues: Outdated or corrupted graphics drivers can limit the capabilities reported to the browser.
    • Browser Limitations: Some browsers might have internal limits or configurations that restrict the accessible GPU memory.
    • Hardware Limitations (Rare): While less likely, there might be a hardware limitation on the system itself, beyond the GPU.
      Troubleshooting and Solutions:
  • Update Graphics Drivers: This is the first and often most effective step. Go to the website of your graphics card manufacturer (Nvidia, AMD, or Intel) and download the latest drivers for your specific card and operating system.
  • Check Browser Compatibility and Version: Ensure you're using a relatively up-to-date browser that properly supports WebGPU. Chrome and Firefox are the primary browsers with good WebGPU support. Try a different browser to see if the issue persists.
  • Reduce Buffer Size: If possible, modify your WebGPU code to reduce the size of the storage buffers you're requesting. This is the most direct solution, but might require significant changes to your application's algorithms. Can you process the data in smaller chunks?
  • Check System Requirements: Review the system requirements for the web application. If the application was designed for high-end hardware, it might not run correctly on a less powerful machine.
  • Inspect Hardware Capabilities: Use the WebGPU API to programmatically query your device's limits before attempting to request the device. This allows you to gracefully degrade the application or inform the user of the limitations. Example:
    Code
navigator.gpu.requestAdapter().then(adapter => {
  if (adapter) {
    adapter.requestDevice().then(device => {
      const limits = device.limits;
      console.log("maxStorageBufferBindingSize:", limits.maxStorageBufferBindingSize);
      // ... Your code to use the device ...
    }).catch(error => {
      console.error("Error requesting device:", error);
    });
  } else {
    console.error("No suitable GPU adapter found.");
  }
}).catch(error => {
  console.error("Error requesting adapter:", error);
});

By systematically checking these points, you should be able to identify the root cause of the error and find an appropriate solution. Remember to provide more context about your specific application if you need further tailored assistance. For example, what library are you using with WebGPU? What is the code doing that requires such a large buffer

executing the suggested code above gives:

maxStorageBufferBindingSize: 134217728

The code I posted earlier

const k1Gig = 1024 * 1024 * 1024 * 2;
const adapter = await navigator.gpu?.requestAdapter();
const device = adapter?.requestDevice({
  requiredLimits: { maxBufferSize: k1Gig },
  requiredFeatures: [ 'float32-filterable' ],
});

gives this information

GPUSupportLimits
  maxBindGroups: 4
  maxBindGroupsPlusVertexBuffers: 24
  maxBindingsPerBindGroup: 1000
  maxBufferSize: 2147483648
  maxColorAttachmentBytesPerSample: 32
  maxColorAttachments: 8
  maxComputeInvocationsPerWorkgroup: 256
  maxComputeWorkgroupSizeX: 256
  maxComputeWorkgroupSizeY: 256
  maxComputeWorkgroupSizeZ: 64
  maxComputeWorkgroupStorageSize: 16384
  maxComputeWorkgroupsPerDimension: 65535
  maxDynamicStorageBuffersPerPipelineLayout: 4
  maxDynamicUniformBuffersPerPipelineLayout: 8
  maxInterStageShaderComponents: 64
  maxInterStageShaderVariables: 16
  maxSampledTexturesPerShaderStage: 16
  maxSamplersPerShaderStage: 16
  maxStorageBufferBindingSize: 134217728
  maxStorageBuffersPerShaderStage: 8
  maxStorageTexturesPerShaderStage: 4
  maxSubgroupSize: 4294967295
  maxTextureArrayLayers: 256
  maxTextureDimension1D: 8192
  maxTextureDimension2D: 8192
  maxTextureDimension3D: 2048
  maxUniformBufferBindingSize: 65536
  maxUniformBuffersPerShaderStage: 12
  maxVertexAttributes: 16
  maxVertexBufferArrayStride: 2048
  maxVertexBuffers: 8
  minStorageBufferOffsetAlignment: 256
  minSubgroupSize: 4294967295
  minUniformBufferOffsetAlignment: 256

Does this help?

@wpmed92
Copy link
Contributor

wpmed92 commented Jan 4, 2025

Can you try commenting out the max storage binding size?

@BrickDesignerNL
Copy link
Author

@wpmed92 do you mean
requiredLimits.maxStorageBufferBindingSize = maxBufferSizeInSDModel;
on
https://wpmed92.github.io/stable-diffusion-tinygrad-f16/

@wpmed92
Copy link
Contributor

wpmed92 commented Jan 4, 2025

Yes, curious if removing that solves your problem

@BrickDesignerNL
Copy link
Author

BrickDesignerNL commented Jan 4, 2025

@wpmed92
Can you have a look at maxSubgroupSize in relation to maxStorageBufferBindingSize?
Since 128MB is the default for maxStorageBufferBindingSize, it might be that if you can design the logic in a way to use 128MB at max for maxStorageBufferBindingSize and utilize multiple Subgroups, that it will work on my machine (and probably others as well).

Is that possible?
https://developer.chrome.com/blog/new-in-webgpu-128?hl=en

@BrickDesignerNL
Copy link
Author

BrickDesignerNL commented Jan 4, 2025

@wpmed92 I've now installed a local web server and deleted the line of code:

requiredLimits.maxStorageBufferBindingSize = maxBufferSizeInSDModel;

It downloaded the 2.5GB safetensor file.
And the console.log says "File processing completed."

It executes fast:
31.8 ms / step
with the default settings and text.
But returns a black image.

And gives lots of errors in the console.

First error:

Binding size (151781376) of [Buffer (unlabeled)] is larger than the maximum binding size (134217728).

  • While validating entries[4] as a Buffer.
    Expected entry layout: {type: BufferBindingType::Storage, minBindingSize: 0, hasDynamicOffset: 0}
  • While validating [BindGroupDescriptor] against [BindGroupLayout (unlabeled)]
  • While calling [Device].CreateBindGroup([BindGroupDescriptor]).

Last error:

Binding size (268435456) of [Buffer (unlabeled)] is larger than the maximum binding size (134217728).

  • While validating entries[2] as a Buffer.
    Expected entry layout: {type: BufferBindingType::Storage, minBindingSize: 0, hasDynamicOffset: 0}
  • While validating [BindGroupDescriptor] against [BindGroupLayout (unlabeled)]
  • While calling [Device].CreateBindGroup([BindGroupDescriptor]).

@wpmed92
Copy link
Contributor

wpmed92 commented Jan 4, 2025

Yeah that step time is not relevant since you have validation errors. Unfortunately the buffers used by some kernels are larger than supported on your system. This requires a refactor in tinygrad to handle it somehow i think.

@BrickDesignerNL
Copy link
Author

BrickDesignerNL commented Jan 4, 2025

@wpmed92
I've tried to change the size:

            const maxBufferSizeInSDModel = 134217728;
            requiredLimits.maxStorageBufferBindingSize = maxBufferSizeInSDModel;
            requiredLimits.maxBufferSize = maxBufferSizeInSDModel;

It re-downloads the safetensor file (so it's not in a PWA cache?)
and got the following two errors:

Buffer size (1073741824) exceeds the max buffer size limit (268435456).

  • While calling [Device].CreateBuffer([BufferDescriptor]).

Buffer size (536870912) exceeds the max buffer size limit (268435456).

  • While calling [Device].CreateBuffer([BufferDescriptor]).

@BrickDesignerNL
Copy link
Author

BrickDesignerNL commented Jan 4, 2025

@wpmed92

Yeah that step time is not relevant since you have validation errors. Unfortunately the buffers used by some kernels are larger than supported on your system. This requires a refactor in tinygrad to handle it somehow i think.

Hmm.. while 128MB is actually the default setting of WebGPU.

Looking at https://chat.webllm.ai/ this works.
It has some logic to fix its it seems, it gaves a warning in the console

Requested maxStorageBufferBindingSize exceeds limit.
requested=1024MB,
limit=128MB.
WARNING: Falling back to 128MB..

that refers to https://chat.webllm.ai/sw.js these lines

                        if (1073741824 > A.limits.maxStorageBufferBindingSize && (console.log(`Requested maxStorageBufferBindingSize exceeds limit. 
requested=${computeMB(g)}, 
limit=${computeMB(A.limits.maxStorageBufferBindingSize)}. 
WARNING: Falling back to ${computeMB(134217728)}...`),
                        g = 134217728,

Does this help?

@BrickDesignerNL
Copy link
Author

BrickDesignerNL commented Jan 4, 2025

@wpmed92 that project https://github.com/mlc-ai/web-llm-chat even works on mobile phones.

Is ot possible to do similar with this project?
Or is that a different kind of cookie?
If that would be possible and you could add a int8 version the project will run quick and smooth on all new CoPilot+ PC's (regardless of the CPU /GPU brand): those have at least 8GB VRAM and 45TOPS. Mine has 16GB VRAM.

@BrickDesignerNL
Copy link
Author

@wpmed92 So if I understand you remark correctly, the code in the .js files is automatically generated using tinygrad and is very specific for the model. If you want an other SD model, even if it's fl16, the code needs to be regenerated, because lots of things are hardcoded or not dynamically coded (like the file size of the safetensors file and some derivitives of them).

An thus, updating the code to enable support for 128MB maxStorageBufferBindingSize needs to be a change in tinygrad. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants