PQCache: Product Quantization-based KVCache for Long Context LLM Inference

Published in Proceedings of the ACM on Management of Data (SIGMOD), 2025

Hailin Zhang, Xiaodong Ji, Yilin Chen, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, Weipeng Chen, Bin Cui