GetMacroAddress only reads a couple of indices per macro, but ProcessMacro was
building a full std::vector<GPUVAddr> with one push_back per parameter word
every submission. macro_segments already holds (base, count) per chunk, so
GetMacroAddress can just walk it instead — dropping the per-word loop, a
.clear(), and a vector member.
Also returns on the first match in the macro dispatch instead of running every
std::get_if check.
No behaviour change, just removes redundant per-submission work.
Signed-off-by: simply0001 <nicolas.tobago@icloud.com>