Skip to content

[webgpu] Fix atomic shared mem load inside loop #10530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 31, 2025

Conversation

wpmed92
Copy link
Contributor

@wpmed92 wpmed92 commented May 27, 2025

test_arange::TestIndexing::test_index_mnist_opt with GROUPTOP causes a shared mem load inside a loop, which didn't get rewritten to packed load because of a missing allow_any_len=True in the wgsl pattern matcher.

The load inside loop part of the problematic kernel before the change:

if (((bool(lidx0))!=true)) {
    var acc0 = 0u;
    for (var ridx1001 = 0; ridx1001 < 16; ridx1001++) {
      var val4 = atomicLoad(&temp0[ridx1001]);
      acc0 = (acc0+val4);
    }
}

after the change with proper packed load.

if (((bool(lidx0))!=true)) {
    var acc0 = 0u;
    for (var ridx1001 = 0; ridx1001 < 16; ridx1001++) {
      var val4 = atomicLoad(&temp0[(ridx1001>>2)]);
      acc0 = (acc0+(u32(((val4>>(((u32(ridx1001))&3u)<<3u))&255u))));
    }
}

Although the load was atomic, we can see that the uchar is not shifted out in the first version.

@chenyuxyz
Copy link
Collaborator

do you know why atomic is wrong for webgpu shared memory?

@wpmed92
Copy link
Contributor Author

wpmed92 commented May 27, 2025

I haven't yet find out, i know that the packed load/store is the same as for "normal" memory from a webgpu codegen pov, so there's no reason it should give bad result.
I'll check with wgpu-py to see if it's a dawn issue and in the meantime I write some wgsl kernels that trigger this.

@wpmed92 wpmed92 changed the title Disable shared mem atomics on webgpu [webgpu] Fix atomic shared mem load inside loop May 31, 2025
@wpmed92 wpmed92 force-pushed the shared-mem-atomics-wgpu branch from ab9e5b3 to 3aded3b Compare May 31, 2025 12:30
Copy link
Contributor

Changes

Name                         Lines    Diff    Tokens/Line    Diff
-------------------------  -------  ------  -------------  ------
tinygrad/renderer/wgsl.py       82      -1           23.0    +0.2


total lines changes: -1

@wpmed92 wpmed92 marked this pull request as ready for review May 31, 2025 12:43
@wpmed92
Copy link
Contributor Author

wpmed92 commented May 31, 2025

@chenyuxyz I did my tests and it turned out I was wrong regarding shared mem atomics being wrong in webgpu. It was a missing allow_any_len=True that caused the missed packed load rewrite. I updated the readme of the PR.

@chenyuxyz
Copy link
Collaborator

cool yea this looks more like the true fix

@chenyuxyz chenyuxyz merged commit 35eb4d3 into tinygrad:master May 31, 2025
35 checks passed
@wpmed92 wpmed92 deleted the shared-mem-atomics-wgpu branch May 31, 2025 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants