Fold shaders doing "a * b + c" on integers from the pattern generated by
Nvidia's GL compiler.
On a somewhat complex compute shader it reduces the code size by 16
instructions from 2 matches on Turing GPUs.
On Intel as extracted from KHR_pipeline_executable_properties:
Before the optimization:
```
Instruction Count: 2057
Basic Block Count: 45
Scratch Memory Size: 14752
Spill Count: 232
Fill Count: 261
SEND Count: 610
Cycle Count: 11325
```
After the optimization:
```
Instruction Count: 2046
Basic Block Count: 44
Scratch Memory Size: 13728
Spill Count: 219
Fill Count: 268
SEND Count: 604
Cycle Count: 11367
```
Mostly fixing unused *, implicit conversion, braced scalar init,
fpermissive, and some others.
Some Clang errors likely remain in video_core, and std::ranges is still
a pertinent issue in shader_recompiler
shader_recompiler: cmake: Force bracket depth to 1024 on Clang
Increases the maximum fold expression depth
thread_worker: Include condition_variable
Don't use list initializers in control flow
Co-authored-by: ReinUsesLisp <reinuseslisp@airmail.cc>