Anyone have used Llama 3 70B-I in huggingface Inference API ?
#24
by
						
TikaToka
	
							
						- opened
							
					
I thought they would not support as it is 70B, but looks they do as appear in the model page.
Anyone have tried using it with api calls? If possible, I am trying to pay for PRO subscription
You can use it in poe.
I'm just testing it, but it doesn't seem to be working well.
- It looks like there is a max output token limit to ~250 which is very low
 - I'm not sure if it correctly recognizes the format with roles - answers are weird
 - I get duplicated response sometimes
 - "return_full_text" = false doesn't work. I get in the response my initial prompt.
 
It never returns a full response for me.
Does anyone know how to get a full response that is not truncated?
Or can someone suggest a model that returns a full response?