HyperCLOVA X Technical Report
			Paper
			•
			2404.01954
			•
			Published
				
			•
				
				25
			
Batches are grouped by similar token length to help optimize gpu/hardware. Mini batch lengths are different but the max number of tokens is the same.