Pruning
Time Constraints
As many of you have seen, I’ve been deleting a lot of code lately. There’s a reason for this, aside from it being a really great feeling to just obliterate some entire subsystem, and that reason is time.
There are 24 hours in a day. You sleep for 6. You work for 8. Spend an hour eating, and then you’re down to only 9 hours at the gym minus a few minutes to manage those pesky social and romantic obligations. That doesn’t leave a lot of time for mucking around in random codebases.
For example. Suppose I maintain a Gallium driver. This likely means I know my way around that driver, various related infrastructure, the GL state tracker, NIR, maybe enough GLSL to rubber stamp some MRs from @tarceri, and I know which channel on IRC in which to scream when my MRs get blocked by something that is definitely not me failing to test-compile the patches before merging them. Everything outside of these areas is out of scope for this hypothetical version of me, which means it may as well be a black box.
Now imagine I am all the maintainers of all the Gallium drivers. My collective scope has expanded. I am the master of all things src/gallium/drivers
. I wave my hand and src/mesa
obeys my whim. CI is always green, except when matters beyond the control of mere mortals conspire against me. I have a blog. News sites cover my MRs as though OpenGL is still relevant.
But there are still black boxes. Vulkan drivers, for example, are a mystery. CI is an artifact from a distant civilization which, though alien, ensures everything functions as I know it does. And then there are the esoteric parts of the tree in src/gallium/frontends
. People I’ve never met may file bug reports against my drivers with tags for one of these components. Who is sexypixel420
, what is a teflon
, and why is that my problem?
Maintenance
A key aspect of any good Open Source project is maintenance. This is, relatively speaking, how well it is expected to function if Joe Randomguy installs and runs it. Maintenance of projects requires people to work on them and fix bugs. These are maintainers. When a project has a maintainer, we say that it is maintained. A project which does not have a maintainer is unmaintained. Simple enough.
Mesa is a project comprised of many subprojects. We call this an ecosystem. An ecosystem functions when all its projects work together in harmony towards a common goal, in this case blasting out those pixels into as many green triangles per second as possible.
What happens when a maintained project has an issue? Well, that’s when the maintainer steps in to fix it (assuming some other random contributor doesn’t, but we’re assuming a very low bus factor here). Tickets are filed, maintainers analyze and fix, and end users are happy because the software they randomly installed happens to work as they expect.
But what happens when a project with no maintainer has an issue? In short, nothing. That issue is filed away into the void, never to be resolved ever in a million years (unless some kind soul happens to pitch an unreviewed #TrustMeBuddy patch into the repo, but this is rare). These issues accumulate, and nobody even notices because nobody is subscribed to that label on the issue tracker. The project is derelict. If the project accumulates enough of these issues, distributions may even stop packaging it; packaging a defective piece of software creates downstream tickets for packagers, and much of the time they are not looking to drag their editor upstream and solve all the problems because they have more than enough problems already with packaging.
Now here’s where things start to get messy: what happens when an unmaintained subproject in an ecosystem has an issue? Some might be tempted to say this is the same as the above scenario, but it’s subtly different because the issue might not be directly user-facing. It might be “what happens in this codebase if I change this thing over here?” And if a codebase is unmaintained, then nobody knows what happens. The code can be read, but without a maintainer who possesses deep knowledge about the intent of the machinery, such shallow readings can only do so much.
This Is Why We Prune
Like trees with dead limbs, dead parts of Open Source projects must be periodically pruned to keep the rest of the project healthy. Having all these dead limbs around creates a larger surface area for the ecosystem, which creates the potential for unintended side effects (and bizarro bugs from unknown components) to manifest. It also has a hidden cost, which is burnout. When a maintainer must step outside their area in an attempt to triage something in a codebase that they do not know, instinctual fear and distaste of Other Code kicks in: this code is terrible because I didn’t write it. Also what the fuck is with this formatting? Is that a same-line brace with no space after the closing parens?! That’s it, I’m clocking it for today.
We’ve all been out in the jungle with some code that may as well be written in dirt. It sucks. And any time you’re stuck out in the dirt for more than a couple minutes, you want to be able to call in an expert to bail you out. Those experts are called maintainers. When you enter territory which is unmaintained, you’re effectively stranded unless you can cut your way out. If you can’t cut your way out, you’re stuck, and being stuck is frustrating, and being frustrated makes you not want to work on your thing anymore, which is how you end up losing maintainers. One of the ways, that is, because we’re all just one sarcastic winky-face away from a ranty ragequit mail.
Now is when I reveal that this long-winded, circuitous explanation is not actually about everyone’s favorite D3D9 state tracker (pour one out for a legend) or whatever the hell XA was. I’m talking about last week when I deleted legacy renderpass support from Zink. It’s been a long time coming, and realistically I should have done this sooner.
Zink Struggles
Like Mesa, Zink is an ecosystem supporting a wide variety of projects, but it’s also a single project with a single maintainer. A bug in RadeonSI code will not affect me, but a bug in Zink code affects me even if it is not code which has been tested or even used in the past 5 years. While it’s likely true that any code in Zink is code that I have written, there’s a big difference between code written in the past year and code written back in like 2020: in the former case I probably know what’s happening and why, and in the latter case it’s more likely that I’m confused how the code still exists.
Vulkan is a moving target. Every month brings changes and improvements, fun new extensions to misuse, and long-lost validation errors to tell us that nobody actually knows how to use SPIR-V. Over time, these new features and changes become more widely adopted, which makes them reliable, but historically Zink has been very lax in requiring “new” features.
There is this idea that Zink should be able to provide high-level OpenGL support to any device which provides any amount of conformant Vulkan support. It’s a neat idea: provide Vulkan 1.0, and you get GL 4.6 + ES 3.2 for free. There are, however, a number of issues with this pie-in-the-sky thinking:
- Generally speaking, you can have broad GL support, or you can have performant GL support. This is the difference between your apps running at near-native speed and running much, much slower. Without relying on newer Vulkan features, it is impossible to achieve good across-the-board performance.
- You can have broad GL support, or you can have reliable GL support. This is the difference between your apps running as expected vs crashing randomly on some bizarre assertion. Old codepaths are not tested or exercised, and often even keeping them around requires concessions to CPU-based performance.
- There is only one me, unfortunately, and I do not spend a majority of my time working on Zink anymore. I fix the hard bugs, I refactor the incredibly gross old code after I rewrite half the tree, and I still do cool perf enhancements every now and then. Zink is huge though (5th largest Gallium driver by volume), and there are a crazy number of feature combinations which can hit bizarre codepaths (think
tiler renderpass tracking + GPL + descriptor templates
vsdesktop rendering + shader objects + descriptor buffer
, now tell me which one gets better perf on RADV–it’s the first one because Zink does not ever create linked/optimized shader objects).
I’m not saying all this as a cry for help, though help is always appreciated, encouraged, and welcomed. This is a notice that I’m going to be pruning some old and unused codepaths to keep things manageable. Zink isn’t going to work on Vulkan 1.0; that goal is a nice idea but not achievable, especially when there is fierce competition like ANGLE gunning for every fraction of a perf percent they can get. I don’t foresee requiring any new extensions/features the day they ship, but I also don’t foresee keeping legacy fallbacks for codepaths which should be standard by now.
TL;DR: If you want Zink on old drivers/hardware, try Mesa 25.1. Everyone else, business as usaul.