Skip to content

chore: Minor QSPI code deduplication #2095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aapoalas
Copy link
Contributor

@aapoalas aapoalas commented Jun 13, 2025

Since this caught my eye in #2089 and was merged in #2071, I thought I'd put my hands where my mouth is and do the few deduplications that I thought might be considered useful or "pretty".

Adjacent code error?

In drv/auxflash-api/src/lib.rs:191 the call to self.get_blob_by_tag(tag) seems to be invalid; that function takes two parameters, (slot, tag). I wonder if this is dead code or what's going on with that?

Copy link
Collaborator

@labbott labbott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff stat on this is +136 −93 i.e. a net increase of code. That's a hard sell for code deduplication.

@@ -148,7 +148,7 @@ impl ServerImpl {
sleep: Option<u64>,
) -> Result<(), AuxFlashError> {
loop {
let status = self.qspi.read_status().map_err(qspi_to_auxflash)?;
let status = self.read_qspi_status()?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I don't love this pattern of having an inner helper function that much because I find it harder to follow the code to find the location where it's actually doing things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, after some assembly analysis it turned out these sorts of helper functions don't really compile well at all: I was hoping that the compiler would either choose to inline them or would choose to inline the self.qspi.read_status code into the qspi_read_status function for an in-total beneficial tradeoff.

This is not what happened: the helpers were not being inlined and the concrete calls were not being inlined into the helpers, so that's extra calls right there and extra code. Furthermore, the helpers were somehow really expensive? They were something like 30 or 60 bytes a pop, which seems ridiculous but that's what it was. Even further, for some reason the result of the original calls was being handled with the uxtb instruction but the result of the helpers was ending up using uxtb.w for an extra 2 bytes per instruction. So it was all around a bad job. Adding #[inline(always)] got it all back to the original state of course, but that's hardly a consolation since the only "benefit" at that point is an aesthetic one, which you admit to not loving.

(There's actually one singular read_qspi_status helper method of this sort in gimlet-hf-server/src/main.rs: should I remove it while here to align the overall coding style?)

@aapoalas
Copy link
Contributor Author

The diff stat on this is +136 −93 i.e. a net increase of code. That's a hard sell for code deduplication.

A good bit of that is coming from the added and/or copied comments, but that's absolutely true.

My thinking was that getting rid of superfluous/"not functionality wise critical" error conversions would make the code a bit less "noisy", and perhaps also improve generated code quality / icache usage. I didn't check the output though; I'll take a look at that to see if this actually has other upsides than just personal code esthetics :)

The changes in the STM32 QSPI driver code I do think pulls its own weight, regardless.

@aapoalas aapoalas force-pushed the chore/qspi-method-dedup branch from cc981a4 to c92a2eb Compare June 16, 2025 23:31
@aapoalas
Copy link
Contributor Author

After deeper analysis, the helper methods were a total dud. The STM32 QSPI driver changes are less clear but a possible dud as well. Here're the numbers up front:

  • write_impl changes from 306 bytes to 250 bytes. (measured on cosmo-a auxflash and gimlet-f hf)
  • read_impl changes from 302 bytes to 246 bytes. (same sources; auxflash had a slightly smaller size both before and after, probably +/-0 difference)
  • get_fifo_level is an added 60 bytes.
  • wait_for_transfer_complete is an added 72 bytes.

In total, that's a 20 byte increase in assembly size. Binary size of cosmo-a auxflash increases by a whopping 2868 bytes. This is a loss in total, but perhaps it is acceptable given the actually negative number of code lines? Applying #[inline(always)] can of course be done to (mostly) return to status quo ante bellum (though there's still a binary size increase of 276 bytes for cosmo-a auxflash after this).

Final side note: disable_all_interrupts is actually being done unconditionally no matter how read/write_impl_inner ends; the source code could optionally be simplified to reflect this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants