Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion Paper • 2505.21467 • Published May 27, 2025 • 1
Performance Prediction for Large Systems via Text-to-Text Regression Paper • 2506.21718 • Published Jun 26, 2025 • 6
SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs Paper • 2502.12444 • Published Feb 18, 2025
ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models Paper • 2406.16635 • Published Jun 24, 2024