Doron Adler's picture

Doron Adler PRO

Norod78

·

https://linktr.ee/Norod78

AI & ML interests

Fooling around with Generative machine learning models.

Recent Activity

liked a Space 1 day ago

elismasilva/gradio_tokenizertextbox

reacted to John6666's post with 👍 1 day ago

If your Space stops working after restarting mainly for the last 5 days (https://discuss.huggingface.co/t/my-space-suddenly-went-offline-the-cpu-cannot-restart/151121/22), try some of following. 1. Add `pydantic==2.10.6` to `requirements.txt` or upgrade Gradio to the latest version. 2. Upgrade PyTorch to 2.2.0 or later (`torch>=2.2.0` for Zero GPU space). 3. Fix Transformers to 4.49.0 or earlier (`transformers<=4.49.0`for spaces using Transformers or Diffusers). 4. Fix `huggingface_hub` to the old version (`huggingface_hub==0.25.2` for if an error like `cached_download` is not available occurs or inference does not work properly) 5. Specifying `WORKDIR` in `Dockerfile` may cause the application to fail to start with error 137. (Docker Spaces, https://discuss.huggingface.co/t/error-code-137-cache-error/152177) About `pydantic==2.10.6`: https://discuss.huggingface.co/t/error-no-api-found/146226 https://discuss.huggingface.co/t/internal-server-error-bool-not-iterable/149494 Edit: Zero GPU space has been upgraded from A100 to H200. This is likely the reason why older versions of PyTorch are no longer supported. In fact, an error message to that effect was displayed. https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/163

posted an update 2 days ago

Multilingual Tokenization Showdown Analyzing 12 LLM Tokenizers Across 204 Languages. First, I've created a dataset with Wikipedia's "Cat" article text in 272 languages: https://huggingface.co/datasets/Norod78/WikiCat-Multilingual For each language entry with at least 100 words, I tokenized the text using 12 tokenizers and calculated the "Characters per token" ratio and "Word per token" ratio. The higher this ratio is, the more information each token represents on average for that language (and perhaps allowing the llm to potentially learn more per-parameter if trained on a dataset of that language). You can see a slideshow summary of the results here: https://norod.github.io/wikicat-tokenizer-eval/tokenizer-slideshow.html I hope I interpreted the results correctly, I've made the code available on GitHub so you can re-create the raw results jsonl with this repo: https://github.com/Norod/wikicat-tokenizer-eval Post on X: https://x.com/Norod78/status/1984366900550266999

View all activity

Organizations

New activity in apple/MobileCLIP2-S0 about 2 months ago

The MobileCLIP2-S0 checkpoint seems to have issues

#2 opened 2 months ago by

New activity in multimodalart/OmniSVG-3B 3 months ago

Image to SVG to produce black on black SVG with paths unrelated to the input image

#1 opened 3 months ago by

New activity in Norod78/hebrew_lyrics_generator-gemma3_4b 4 months ago

דוגמא לג׳ינרור משעשע

#1 opened 4 months ago by

New activity in Norod78/sd2-dreambooth-ClaymationXmas 4 months ago

Adding `safetensors` variant of this model

#2 opened 4 months ago by

New activity in Norod78/ddpm-EmojiAlignedFaces-64 4 months ago

Adding `safetensors` variant of this model

#1 opened 4 months ago by

New activity in Norod78/ml-generated-muppets-rudalle 4 months ago

Adding `safetensors` variant of this model

#1 opened 4 months ago by

New activity in Norod78/sd2-cartoon-blip 4 months ago

Adding `safetensors` variant of this model

#2 opened 4 months ago by

New activity in Norod78/sd2-simpsons-blip 4 months ago

Adding `safetensors` variant of this model

#2 opened 4 months ago by

New activity in Norod78/microsoft-fluentui-emoji-512-whitebg 7 months ago

Request: DOI

#4 opened 7 months ago by

New activity in Norod78/humeow-flux 7 months ago

c91e1d0523c19676933dadd50174de1e4932af55

#3 opened 7 months ago by

Create Ares

#2 opened 7 months ago by

New activity in Norod78/sdxl-muppetshow-lora 7 months ago

Upload Screenshot_20250404_150856_Gallery.jpg

#1 opened 7 months ago by

New activity in Norod78/OnceUponATime-florence2-captions 8 months ago

Librarian Bot: Add language metadata for dataset

#2 opened 8 months ago by

New activity in Norod78/once-upon-a-time-cartoon-style-flux-v3 9 months ago

Add generated example

#3 opened 9 months ago by

Add generated example

#2 opened 9 months ago by

Add generated example

#1 opened 9 months ago by

New activity in apple/coreml-mobileclip 9 months ago

Conversion script (PyTorch to CoreML)

#1 opened 9 months ago by

New activity in Norod78/SD15-BambaBaby-LoRA 9 months ago

🚩 Report: Ethical issue(s)

#1 opened 9 months ago by

DigitalHarmonyML

New activity in Norod78/pill-and-candy-mosaic-style-flux 10 months ago

Add generated example

#2 opened 10 months ago by

Add generated example

#1 opened 10 months ago by