[webgpu] Fix atomic shared mem load inside loop #10530

wpmed92 · 2025-05-27T11:58:38Z

test_arange::TestIndexing::test_index_mnist_opt with GROUPTOP causes a shared mem load inside a loop, which didn't get rewritten to packed load because of a missing allow_any_len=True in the wgsl pattern matcher.

The load inside loop part of the problematic kernel before the change:

if (((bool(lidx0))!=true)) {
    var acc0 = 0u;
    for (var ridx1001 = 0; ridx1001 < 16; ridx1001++) {
      var val4 = atomicLoad(&temp0[ridx1001]);
      acc0 = (acc0+val4);
    }
}

after the change with proper packed load.

if (((bool(lidx0))!=true)) {
    var acc0 = 0u;
    for (var ridx1001 = 0; ridx1001 < 16; ridx1001++) {
      var val4 = atomicLoad(&temp0[(ridx1001>>2)]);
      acc0 = (acc0+(u32(((val4>>(((u32(ridx1001))&3u)<<3u))&255u))));
    }
}

Although the load was atomic, we can see that the uchar is not shifted out in the first version.

chenyuxyz · 2025-05-27T12:33:35Z

do you know why atomic is wrong for webgpu shared memory?

wpmed92 · 2025-05-27T12:42:13Z

I haven't yet find out, i know that the packed load/store is the same as for "normal" memory from a webgpu codegen pov, so there's no reason it should give bad result.
I'll check with wgpu-py to see if it's a dawn issue and in the meantime I write some wgsl kernels that trigger this.

github-actions · 2025-05-31T12:34:11Z

Changes

Name                         Lines    Diff    Tokens/Line    Diff
-------------------------  -------  ------  -------------  ------
tinygrad/renderer/wgsl.py       82      -1           23.0    +0.2


total lines changes: -1

wpmed92 · 2025-05-31T12:43:49Z

@chenyuxyz I did my tests and it turned out I was wrong regarding shared mem atomics being wrong in webgpu. It was a missing allow_any_len=True that caused the missed packed load rewrite. I updated the readme of the PR.

chenyuxyz · 2025-05-31T13:29:00Z

cool yea this looks more like the true fix

wpmed92 added 2 commits May 27, 2025 13:58

Disable shared mem atomics on webgpu

ff2b7f0

Merge branch 'master' into shared-mem-atomics-wgpu

a3049a7

wpmed92 changed the title ~~Disable shared mem atomics on webgpu~~ [webgpu] Fix atomic shared mem load inside loop May 31, 2025

allow_any_len in load pattern matcher to fix temp load inside loop

3aded3b

wpmed92 force-pushed the shared-mem-atomics-wgpu branch from ab9e5b3 to 3aded3b Compare May 31, 2025 12:30

Merge branch 'master' into shared-mem-atomics-wgpu

84423a4

wpmed92 marked this pull request as ready for review May 31, 2025 12:43

chenyuxyz merged commit 35eb4d3 into tinygrad:master May 31, 2025
35 checks passed

wpmed92 deleted the shared-mem-atomics-wgpu branch May 31, 2025 13:31

wpmed92 mentioned this pull request May 31, 2025

[webgpu] Proper shared mem size for packed types #10585

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[webgpu] Fix atomic shared mem load inside loop #10530

[webgpu] Fix atomic shared mem load inside loop #10530

wpmed92 commented May 27, 2025 •

edited

Loading

Uh oh!

chenyuxyz commented May 27, 2025

Uh oh!

wpmed92 commented May 27, 2025

Uh oh!

github-actions bot commented May 31, 2025

Uh oh!

wpmed92 commented May 31, 2025

Uh oh!

chenyuxyz commented May 31, 2025

Uh oh!

Uh oh!

Uh oh!

[webgpu] Fix atomic shared mem load inside loop #10530

[webgpu] Fix atomic shared mem load inside loop #10530

Conversation

wpmed92 commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenyuxyz commented May 27, 2025

Uh oh!

wpmed92 commented May 27, 2025

Uh oh!

github-actions bot commented May 31, 2025

Changes

Uh oh!

wpmed92 commented May 31, 2025

Uh oh!

chenyuxyz commented May 31, 2025

Uh oh!

Uh oh!

Uh oh!

wpmed92 commented May 27, 2025 •

edited

Loading