Index
Overview
Model Overview
Description:
Gr00t-N1 for ultrasound robotics is a vision language action model (VLA) fine-tuned to mimic a simple liver ultrasound sweep in the Isaac for Healthcare ultrasound environment. It uses the weights and architecture from the NVIDIA Isaac GR00T N1, and is fine-tuned with simulation data from Isaac for Healthcare.
This model is ready for commercial/non-commercial use.
License/Terms of Use
NSCL V1 License
Deployment Geography:
Global
Use Case:
This model is only intended to be used within Isaac for Healthcare as a demonstration that a policy can sufficiently learn the simulated tasks of an ultrasound liver scan. 
Release Date: Huggingface 07/121/25 via [URL]
Reference(s):
NVIDIA Isaac GR00T N1
Isaac For Healthcare
Model Architecture:
Architecture Type: Vision Language Action model
Network Architecture: [GR00T N1](https://github.com/NVIDIA/Isaac-GR00T)
**This model was developed based on GR00T N1
** This model has 2.2 billion parameters.
Computational Load (Internal)
Cumulative Compute: Follow Instructions
4 Hours X 4 GPU X 1500 TFLOPS X 0.7 = 16800
Estimated Energy and Emissions for Model Training:  Follow Instructions
Total Kwh = 1 * 4 * 350W * 0.001 * 80% * 4 Hours * 1.4 = 6.27 kWh
Total Emission = 410.5 gCO2e/kWh * 6.27 kWh * 0.000001 = 0.002573
Input:
Input Type(s): Images, Text, and Joint kinematics
Input Format: Red, Green, Blue (RGB) Image Tensors, String, FP16 Tensor
Input Parameters:: 224x224x3 Images, 250 token max Text String, 1x7 Joint State Tensor
Other Properties Related to Input:
Input Images:
- Room Camera: 224x224x3 RGB image
 - Wrist Camera: 224x224x3 RGB image
 
Input Prompt:
- Text String (250 tokens max)
 
Input Kinematics:
- 7D tensor representing articulation radians for each Franka arm joint
 
Output:
Output Type(s): Kinematic Tensor
Output Format: 16x6 Tensor [x, y, z, rx, ry, rz]
Output Parameters: 16x6 Tensor [x, y, z, rx, ry, rz]
Other Properties Related to Output: The model predicts the next 16 6D relative actions. The first 3 indices are the relative translation, and the last 3 indices are the relative axis angle rotation.
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g., GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration:
Runtime Engine(s):
- Pytorch - 2.5.1
 
Supported Hardware Microarchitecture Compatibility:
NVIDIA Ampere
NVIDIA Blackwell
NVIDIA Hopper
Preferred/Supported Operating System(s):
- Linux
 
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
Model Version(s):
- GR00T N1 for ultrasound robotics
 
Training Datasets:
Training Dataset:
Data Modality:
[Video], [Text], [Actions]
Video Training Data Size:
[Less than 10,000 Hours]
[Text]
[Less than a Billion Tokens]
[Actions]
400 action trajectories each with average 210 time steps
Data Collection Method by Dataset:
- Automated
 
Labeling Method by Dataset:
- Automated
 
Properties: 400 simulated liver ultrasound sweeps collected at 30Hz with an average length of 210 time steps.
Evaluation Results:
| Category | Average Success Rate (radius=0.01m) | 
|---|---|
| Precision | 83.8% | 
To measure model accuracy, we reserve an evaluation set of 50 ground-truth training examples. Accuracy is calculated by counting the number of predicted steps that fall within a 0.01 m radius of the corresponding ground-truth example. This evaluation is repeated three times, and we report the average success rate.
Inference:
Engine: Pytorch
Test Hardware:
Ada RTX 6000
Ampere RTX A6000
RTX 4090
| Hardware | Average Latency | Memory Usage | 
|---|---|---|
| Ampere RTX A6000 | 350 ms | 9.45 GB | 
Limitations:
This model was trained on data from the Isaac for Healthcare ultrasound workflow state machine. Therefore, the model will only perform well in that singular environment. This model is not expected to generalize to different robot platforms, ultrasound probes, or ultrasound phantoms.
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.
Bias
| Field | Response | 
|---|---|
| Participation considerations from adversely impacted groups, protected classes, in model design and testing: | Not Applicable | 
| Measures taken to mitigate against unwanted bias: | Not Applicable | 
Explainability
| Field | Response | 
|---|---|
| Intended Domain: | Ultrasound robotics | 
| Model Type: | Vision Language Action Model | 
| Intended Users: | Isaac For Healthcare users testing the ultrasound environment. | 
| Output: | Kinematic tensor (outputs the next 16 relative inverse kinematic actions to complete a simple liver ultrasound sweep) | 
| Describe how the model works: | The input images and text prompt are encoded using the NVIDIA Eagle-2 VLM backbone. The Diffusion Transformer ingests the current joint states and the output from the VLM backbone to generate the next 16 action tensors with a denoising process. | 
| Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable | 
| Technical Limitations & Mitigation: | This model was trained on data from the Isaac for Healthcare ultrasound workflow state machine. Therefore, the model will only perform well in that singular environment. This model is not expected to generalize to different robot platforms, ultrasound probes, or ultrasound phantoms. | 
| Verified to have met prescribed NVIDIA quality standards: | Yes | 
| Performance Metrics: | Latency, Accuracy | 
| Potential Known Risks: | The model may not perfectly follow the canonical liver ultrasound sweep path. This may happen due to: unexpected torso positions, inconsistent camera positioning, and different deployment environments outside of the Isaac for Healthcare simulation environment. | 
| Licensing: | NSCL V1 License | 
Privacy
| Field | Response | 
|---|---|
| Generatable or reverse engineerable personal data? | None | 
| Personal data used to create this model? | None | 
| How often is dataset reviewed? | Before Release | 
| Is there provenance for all datasets used in training? | Yes | 
| Does data labeling (annotation, metadata) comply with privacy laws? | Yes | 
| Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ | 
Safety & Security
| Field | Response | 
|---|---|
| Model Application(s): | Ultrasound robotics | 
| Model Application Field(s): | Machinery and Robotics Medical Devices | 
| Describe the life critical impact (if present). | This model could pose a risk if deployed on a robotic system in the real world. This model has only seen Isaac for Healthcare simulation data been tested with simulation data using Isaac for Healthcare and may make unexpected movements if attempted to be deployed in a new environment. This model is not expected to generalize to different environments, robot platforms, ultrasound probes, or ultrasound phantoms. | 
| Use Case Restrictions: | Abide by NSCL V1 License | 
| Model and dataset restrictions: | The Principle of Least Privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. | 
- Downloads last month
 - 34