As I’ve mentioned in a couple posts, I’m somewhat of a spaghetti expert. In my opinion, the longer and more plentiful the spaghetti is, the better.
So let’s take a look at one of my favorite spaghetti recipes.
Step 1: Read The Label
Today’s spaghetti comes from my new favorite brand of spaghetti feed, vkoverhead. It’s a simple brand, but it really gets the job done when it comes to growing great spaghetti. This particular spaghetti feed is
vkoverhead -test 0, which is the most simple type. It grows the kind of spaghetti that everyone notices because it’s a staple of all graphics diets.
If I check out the state of this spaghetti feed with RADV now, I see the following:
$ ./vkoverhead -test 0 -output-only -duration 3 28345
Thus, I can see that I’m getting 28.3 million draws/second. Not too bad. Let’s check AMDPRO to get a little competition going.
$ VK_ICD_FILENAMES=/home/zmike/amd_pro.json ./vkoverhead -test 0 -output-only -duration 3 32889
It’s Totally Cool
…that AMDPRO is 15% faster than RADV. Yup, it’s totally fine. No anger problems here, no sir, not with me, not even a little furious.
Cool as a cucumber.
But if—and this is obviously just a hypothetical—If I were enraged and just recovering from a lengthy tantrum after seeing these results, I’d be looking at growing some artisanal spaghetti. To do that, I’d be running
perf on the
vkoverhead case and then checking out a flamegraph, which might even happen to look something like this
and you know it’s weird that the graph would look like that since in a graph like that the actual emission of draw packets is only 18% of the CPU time, which means it’s just throwing away CPU cycles, and no wonder the performance is worse, and I hate Wednesdays.
But again, don’t ask if I’m okay, I’m completely fine, this isn’t bothering me.
But if—and this is obviously just another hypothetical—If I’d just come back from a counseling session that was supposed to help me cope with these inferior performance results and wasn’t feeling any better at all, then I’d definitely be craving some spaghetti. And so I’d be looking at radv_emit_all_graphics_states() and radv_upload_graphics_shader_descriptors() to see what the actual farfalle was going on with these fat pieces of stortini.
And in the first of those functions, I’d see there were all kinds of null checks and branch chain disasters that were annihilating performance, so I’d probably rip and tear those right out, and then, just while I happened to be in the area, I’d simplify some cache-killing indirect access, and, well, it’s not like I’d leave without clearing up those branches, right? Hah, of course not, though this is all just hypothetical anyway.
I’m Not Being Defensive
Stop asking. I’m fine.
If I wasn’t fine, I’d probably be running
vkoverhead again at this point and seeing the following results
$ ./vkoverhead -test 0 -output-only -duration 3 36006
and then I’d be fine anyway since now RADV is up by 10%. Which is okay. It’s not bad. Nothing to brag about, you know, just being up by such a tiny little amount over the competition, but it’ll do.
Is what a responsible person would say.
But here at SGC, responsibility flies out the window when performance is involved, and I don’t have enough spaghetti yet, so buckle in because this pasta machine is just getting started.
perf time again, and I’ve got another totally hypothetical flamegraph
which is less consumed by the stupidity of those fat pieces of stortini I insalted above, but I’m not in the mood for stortini at all today. They gotta go.
radv_upload_graphics_shader_descriptors() I got my eye on you and your little radv_flush_constants() too. Why is
radv_flush_constants() even showing up here? What’s the deal with that? There’s no constants to flush. I’m taking ‘em out.
Get the rolling pin, flatten out the dough, and what happens?
$ ./vkoverhead -test 0 -output-only -duration 3 38629
Now We’re Cooking
perf, and I’m getting out another flamegraph, and it’s better
because of course it is. That draw packet emission is getting more time, the fat stortini is slimming down, and everything is great.
But does anyone out there actually think I’m about to stop now? When I’m only up by a tenuous 36% from where I started, and my lead over AMDPRO is a barely-noticeable 17%?
Take off your jacket, because I’m turning the heat of the burners up to high.
Look at this eyesore
I’m about to end this function’s whole career. By inlining it.
$ ./vkoverhead -test 0 -output-only -duration 3 41878
When serving any sort of dish, it’s important to add a garnish. And you know what isn’t a fucking garnish?
This thing in my debugoptimized build.
So now it’s gone and what is the performance at now?
$ ./vkoverhead -test 0 -output-only -duration 3 44073
Incredible. The flavor (of winning), the atmosphere (of being a winner), the experience (of being #1), are all unparalleled.
This makes for a 55% increase in RADV’s draw throughput as well as a much more reasonable 30% lead over AMDPRO.
All from growing just the right amount of spaghetti.