Aisha Halder
Ahalder
·
AI & ML interests
AI & ML,Networking,P2P
Organizations
None yet
Embedding
Multimodal
Image generation
-
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion
Paper • 2401.13388 • Published • 13 -
BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models
Paper • 2401.13974 • Published • 14 -
Runtime error420
Real ESRGAN
🏃420 -
Vchitect/Vchitect-2.0-2B
Text-to-Video • Updated • 10 • 39
NLP LLM
-
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21 -
distilbert/distilbert-base-uncased-finetuned-sst-2-english
Text Classification • 67M • Updated • 3.41M • • 860 -
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
Paper • 2401.14112 • Published • 20 -
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Paper • 2401.04092 • Published • 21
Games
Video generattion
Recognition
Time series
SLM
Image Processing
Dataset
-
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 50 -
Nfiniteai/product-masks-sample
Viewer • Updated • 2.71k • 536 • 14 -
HuggingFaceFV/finevideo
Viewer • Updated • 39.5k • 9.45k • 339 -
rulins/MassiveDS-140B
Viewer • Updated • 3.08M • 1.96k • 7
Speech and Audio
-
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 2.17M • 384 -
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper • 2402.16153 • Published • 59 -
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer
Paper • 2409.10819 • Published • 18 -
jadechoghari/openmusic
Text-to-Audio • Updated • 214 • 72
Segmentation
RAG & Quering
papers
-
Runtime error80
Dailypapershackernews
📈80 -
Prithvi WxC: Foundation Model for Weather and Climate
Paper • 2409.13598 • Published • 45 -
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles
Paper • 2410.05262 • Published • 11 -
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Paper • 2410.15316 • Published • 12
Agent
Time series
Embedding
SLM
Multimodal
Image Processing
Image generation
-
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion
Paper • 2401.13388 • Published • 13 -
BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models
Paper • 2401.13974 • Published • 14 -
Runtime error420
Real ESRGAN
🏃420 -
Vchitect/Vchitect-2.0-2B
Text-to-Video • Updated • 10 • 39
Dataset
-
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 50 -
Nfiniteai/product-masks-sample
Viewer • Updated • 2.71k • 536 • 14 -
HuggingFaceFV/finevideo
Viewer • Updated • 39.5k • 9.45k • 339 -
rulins/MassiveDS-140B
Viewer • Updated • 3.08M • 1.96k • 7
NLP LLM
-
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21 -
distilbert/distilbert-base-uncased-finetuned-sst-2-english
Text Classification • 67M • Updated • 3.41M • • 860 -
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
Paper • 2401.14112 • Published • 20 -
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Paper • 2401.04092 • Published • 21
Speech and Audio
-
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 2.17M • 384 -
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper • 2402.16153 • Published • 59 -
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer
Paper • 2409.10819 • Published • 18 -
jadechoghari/openmusic
Text-to-Audio • Updated • 214 • 72
Games
Segmentation
Video generattion
RAG & Quering
Recognition
papers
-
Runtime error80
Dailypapershackernews
📈80 -
Prithvi WxC: Foundation Model for Weather and Climate
Paper • 2409.13598 • Published • 45 -
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles
Paper • 2410.05262 • Published • 11 -
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Paper • 2410.15316 • Published • 12