Great work! I have a question/curiosity. What is the advantage of splitting the model across two GPUs during training? (mentioned you use Tensor Parallelism (TP) of 2, Data Parallelism (DP) of 4 on each 8-GPU node). I am guessing the model can fit on a single GPU given it is small? I would have thought in such a case the most efficient implementation would be to use DP=8 ?

commented a paper 8 months ago

SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians

Paper • 2504.12292 • Published Apr 16 •

New activity in CrucibleAI/ControlNetMediaPipeFace almost 2 years ago

Control lips, teeth and tongue for lip sync task.

#20 opened over 2 years ago by

Temir

Liam Schoneveld PRO

AI & ML interests

Recent Activity

Organizations

nlml's activity

SHeaP - Self-Supervised Head Geometry Predictor

SHeaP - Self-Supervised Head Geometry Predictor

Control lips, teeth and tongue for lip sync task.