Here is how the prefill versus generation split exposes GPU structural inefficiencies in AI processor designs.
My self-hosted setup holds up pretty well for my coding tasks ...