MaskBit: Embedding-free Image Generation via Bit Tokens
Paper
•
2409.16211
•
Published
•
17
This model is the VQGAN+ tokenizer with a vocabulary size of 12 bits. It uses a downsampling factor of 16 and is trained on ImageNet for images of resolution 256.
You can find more details on the project page and in the paper.