Is it possible to run inference on an A100 GPU?

#23

by Tony664 - opened 18 days ago

18 days ago

"Sincerely seeking help.
Since the A100 does not support FP8, the Linear layers use BF16, which leads to Out of Memory errors. We want to try more model sharding, but convert.py seems to allow a maximum of 16 MP."

Mrdips

12 days ago

"Sincerely seeking help.
Since the A100 does not support FP8, the Linear layers use BF16, which leads to Out of Memory errors. We want to try more model sharding, but convert.py seems to allow a maximum of 16 MP."

zlk

8 days ago

•

edited 8 days ago

"Sincerely seeking help.
Since the A100 does not support FP8, the Linear layers use BF16, which leads to Out of Memory errors. We want to try more model sharding, but convert.py seems to allow a maximum of 16 MP."

Did you solve it?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment