Google Research recently revealed TurboQuant, a compression algorithm that reduces the memory footprint of large language ...
Researchers at Tsinghua University and Z.ai built IndexCache to eliminate redundant computation in sparse attention models ...