Is it possible to run inference on an A100 GPU?
#23
by
						
Tony664
	
							
						- opened
							
					
"Sincerely seeking help.
Since the A100 does not support FP8, the Linear layers use BF16, which leads to Out of Memory errors. We want to try more model sharding, but convert.py seems to allow a maximum of 16 MP."
"Sincerely seeking help.
Since the A100 does not support FP8, the Linear layers use BF16, which leads to Out of Memory errors. We want to try more model sharding, but convert.py seems to allow a maximum of 16 MP."
"Sincerely seeking help.
Since the A100 does not support FP8, the Linear layers use BF16, which leads to Out of Memory errors. We want to try more model sharding, but convert.py seems to allow a maximum of 16 MP."
Did you solve it?

 
						