Anyone have used Llama 3 70B-I in huggingface Inference API ?

#24

by TikaToka - opened Apr 23, 2024

Apr 23, 2024

I thought they would not support as it is 70B, but looks they do as appear in the model page.
Anyone have tried using it with api calls? If possible, I am trying to pay for PRO subscription

LJunius

Apr 23, 2024

You can use it in poe.

maxikq

Apr 23, 2024

I'm just testing it, but it doesn't seem to be working well.

It looks like there is a max output token limit to ~250 which is very low
I'm not sure if it correctly recognizes the format with roles - answers are weird
I get duplicated response sometimes
"return_full_text" = false doesn't work. I get in the response my initial prompt.

TikaToka

Apr 24, 2024

@LJunius @maxikq Thanks, I'll just try at my local with quantized model for now :(

PhillipWhillier

Jun 17, 2024

•

edited Jun 17, 2024

It never returns a full response for me.

Does anyone know how to get a full response that is not truncated?
Or can someone suggest a model that returns a full response?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment