[Tutorial] Fix `06-fused-attention.py` of FP8 provider #7043

whitneywhtsang · 2025-06-04T01:59:08Z

When the provider is fp8, v is permuted like below, and the new stride is (H*N_CTX*HEAD_DIM, N_CTX*HEAD_DIM, 1, N_CTX).

        if mode == "fwd" and "fp8" in provider:
            v = v.permute(0, 1, 3, 2).contiguous()
            v = v.permute(0, 1, 3, 2)

This PR fixes the FP8 dtype handling in the fused-attention kernel by separating k and v offset calculations and updating related configuration details. Key changes include:

Renaming and separating offset variables for k and v computations.
Adjusting offset calculation for FP8 dtype and updating the tensor descriptor creation.
Expanding configuration options for BLOCK_N and refining device-specific configuration conditions.

Copilot

Pull Request Overview

This PR fixes the FP8 dtype handling in the fused-attention kernel by separating key and value offset calculations and updating related configuration details. Key changes include:

Renaming and separating offset variables for key and value computations.
Adjusting offset calculation for FP8 dtype and updating the tensor descriptor creation.
Expanding configuration options for BLOCK_N and refining device-specific configuration conditions.

Comments suppressed due to low confidence (4)

python/tutorials/06-fused-attention.py:55

[nitpick] The variable 'offsetk_y' now clearly denotes the key tensor offset. Consider updating any adjacent comments or documentation to clarify the separation between key and value offsets to enhance readability.

offsetk_y = offset_y + lo

python/tutorials/06-fused-attention.py:56

[nitpick] It would be beneficial to add a comment explaining why offsetv_y is computed as 'offset_y * HEAD_DIM + lo' for the FP8 dtype, especially in light of the new stride requirements.

if dtype == tl.float8e5:

python/tutorials/06-fused-attention.py:171

[nitpick] Consider adding an inline comment to explain the purpose and expected behavior of FP8_OUTPUT in controlling the descriptor configuration, to aid future maintenance.

if FP8_OUTPUT:

python/tutorials/06-fused-attention.py:130

[nitpick] Consider including a brief comment explaining the significance of checking for device capability 9 and the BLOCK_M * BLOCK_N threshold to improve clarity for future maintainers.

and torch.cuda.get_device_capability()[0] == 9 and BLOCK_M * BLOCK_N < 128 * 128

python/tutorials/06-fused-attention.py

peterbell10

Thanks, can you add an fp8 case to test_op?

whitneywhtsang · 2025-06-05T03:53:21Z

Thanks, can you add an fp8 case to test_op?

It appears that test_op is written for bwd, and from looking at bench_flash_attention, there are no differences between provider triton-fp16 and triton-fp8 for bwd.

peterbell10 · 2025-06-05T13:59:12Z

It tests both forward and backward:

triton/python/tutorials/06-fused-attention.py

Lines 613 to 620 in 5450875

    
           # triton implementation 
        
           tri_out = attention(q, k, v, causal, sm_scale, warp_specialize).half() 
        
           tri_out.backward(dout) 
        
           tri_dv, v.grad = v.grad.clone(), None 
        
           tri_dk, k.grad = k.grad.clone(), None 
        
           tri_dq, q.grad = q.grad.clone(), None 
        
           # compare 
        
           torch.testing.assert_close(ref_out, tri_out, atol=1e-2, rtol=0)

Signed-off-by: Whitney Tsang <[email protected]>

whitneywhtsang · 2025-06-05T17:12:29Z

@peterbell10 RuntimeError: "baddbmm_cuda" not implemented for 'Float8_e5m2' any suggestions on how to get reference output for fp8?

Signed-off-by: Whitney Tsang <[email protected]>

whitneywhtsang requested a review from Copilot June 4, 2025 01:59

whitneywhtsang assigned whitneywhtsang and unassigned whitneywhtsang Jun 4, 2025

Copilot AI reviewed Jun 4, 2025

View reviewed changes

whitneywhtsang changed the title ~~Fix 06-fused-attention.py of FP8 dtype~~ Fix 06-fused-attention.py of FP8 provider Jun 4, 2025

whitneywhtsang marked this pull request as ready for review June 4, 2025 02:02

whitneywhtsang requested a review from ptillet as a code owner June 4, 2025 02:02

whitneywhtsang changed the title ~~Fix 06-fused-attention.py of FP8 provider~~ [Tutorial] Fix 06-fused-attention.py of FP8 provider Jun 4, 2025

peterbell10 reviewed Jun 4, 2025

View reviewed changes

python/tutorials/06-fused-attention.py Outdated Show resolved Hide resolved

whitneywhtsang requested a review from peterbell10 June 4, 2025 18:27

whitneywhtsang force-pushed the 06-fused-attention branch 2 times, most recently from 1ccc3f4 to 5450875 Compare June 4, 2025 19:20

peterbell10 reviewed Jun 5, 2025

View reviewed changes

whitneywhtsang added 2 commits June 5, 2025 15:09

Fix 06-fused-attention.py of FP8 dtype

cb81088

Signed-off-by: Whitney Tsang <[email protected]>

address review comment

c10df5d

Signed-off-by: Whitney Tsang <[email protected]>

whitneywhtsang force-pushed the 06-fused-attention branch 2 times, most recently from 5ec5364 to 00c9235 Compare June 5, 2025 16:24

Expand test_op with triton-fp8 provider

c324519

Signed-off-by: Whitney Tsang <[email protected]>

whitneywhtsang force-pushed the 06-fused-attention branch from 00c9235 to c324519 Compare June 5, 2025 16:48

Attempt to only use fp8 for triton implementation

e9c6d84

Signed-off-by: Whitney Tsang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tutorial] Fix `06-fused-attention.py` of FP8 provider #7043

[Tutorial] Fix `06-fused-attention.py` of FP8 provider #7043

whitneywhtsang commented Jun 4, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

peterbell10 left a comment

Uh oh!

whitneywhtsang commented Jun 5, 2025

Uh oh!

peterbell10 commented Jun 5, 2025

Uh oh!

whitneywhtsang commented Jun 5, 2025

Uh oh!

Uh oh!

[Tutorial] Fix 06-fused-attention.py of FP8 provider #7043

Are you sure you want to change the base?

[Tutorial] Fix 06-fused-attention.py of FP8 provider #7043

Conversation

whitneywhtsang commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

peterbell10 left a comment

Choose a reason for hiding this comment

Uh oh!

whitneywhtsang commented Jun 5, 2025

Uh oh!

peterbell10 commented Jun 5, 2025

Uh oh!

whitneywhtsang commented Jun 5, 2025

Uh oh!

Uh oh!

[Tutorial] Fix `06-fused-attention.py` of FP8 provider #7043

[Tutorial] Fix `06-fused-attention.py` of FP8 provider #7043

whitneywhtsang commented Jun 4, 2025 •

edited

Loading