NVIDIA Jetson platforms provide industry-leading edge AI performance through integrated GPU acceleration and mature TensorRT optimization. This post delivers comprehensive implementation guidance for deploying quantized YOLOv8 models on Jetson hardware, covering JetPack environment configuration, TensorRT engine compilation from ONNX models, INT8 calibration procedures on target hardware, performance tuning strategies, thermal management techniques, and platform-specific optimization patterns across Jetson Nano, Xavier NX, and Orin families.
Part 2 covered training and quantizing YOLOv8 models with PTQ and QAT approaches. This post focuses on production Jetson deployment: preparing JetPack development environments, compiling optimized TensorRT engines with layer fusion and precision calibration, implementing efficient inference pipelines, managing thermal constraints for sustained performance, and benchmarking actual inference rates across Jetson hardware variants.
NVIDIA Jetson Platform Overview
Understanding Jetson hardware capabilities and constraints guides deployment decisions and optimization strategies.
Jetson Hardware Specifications:
Jetson Nano (discontinued but widely deployed) features 128 CUDA cores, 472 GFLOPS FP16 performance, 4GB LPDDR4 memory, 5-10W power consumption, and passive/active cooling options. Suitable for lightweight YOLOv8n models at 15-25 FPS with 640×640 input.
Jetson Xavier NX provides 384 CUDA cores, 21 TOPS INT8 performance, 8GB LPDDR4x memory, 10-15W power consumption, and active cooling requirement. Handles YOLOv8s/m models at 25-40 FPS with excellent quantization support.
Jetson Orin Nano delivers 1024 CUDA cores, 40 TOPS INT8 performance, 8GB LPDDR5 memory, 7-15W configurable power modes, and supports YOLOv8l models at 30-50 FPS with advanced features like attention mechanisms.
Jetson AGX Orin offers 2048 CUDA cores, 275 TOPS INT8 performance, 32GB/64GB LPDDR5 memory, 15-60W configurable TDP, and enables multi-model concurrent inference, high-resolution processing (1920×1080), and complex post-processing pipelines.
Platform Selection Considerations: Choose platforms based on model complexity (nano models on Jetson Nano, medium/large on Orin), throughput requirements (target FPS and concurrent streams), power budget (battery vs AC powered deployments), and thermal environment (enclosed spaces require more capable cooling). For production deployments, Jetson Orin Nano represents optimal balance of performance, power efficiency, and cost as of early 2025.
JetPack Environment Setup
JetPack SDK provides complete development environment including operating system, CUDA toolkit, cuDNN, TensorRT, and multimedia libraries optimized for Jetson hardware.
JetPack Installation: Flash JetPack to Jetson device using NVIDIA SDK Manager on Ubuntu host machine or using pre-configured SD card images. As of early 2025, JetPack 6.0 provides latest optimizations for Orin platforms while JetPack 5.1.x remains stable choice for Xavier platforms.
# Verify JetPack installation
sudo apt-cache show nvidia-jetpack
# Expected output shows JetPack version and components
# JetPack 6.0 includes:
# - CUDA 12.2
# - cuDNN 8.9
# - TensorRT 8.6
# - OpenCV 4.8 with CUDA support
Python Development Environment: Configure Python environment with required dependencies for model deployment and development:
# Update system packages
sudo apt-get update
sudo apt-get upgrade
# Install Python development tools
sudo apt-get install python3-pip python3-dev python3-venv
# Create virtual environment
python3 -m venv ~/jetson-env
source ~/jetson-env/bin/activate
# Install PyTorch for Jetson (specific builds for JetPack)
# Download from https://forums.developer.nvidia.com/t/pytorch-for-jetson/
# Example for JetPack 6.0:
wget https://nvidia.box.com/shared/static/pytorch-2.1.0-jp60.whl
pip3 install pytorch-2.1.0-jp60.whl
# Install TensorRT Python bindings
pip3 install pycuda
# Install Ultralytics and dependencies
pip3 install ultralytics opencv-python-headless pillow
# Verify installations
python3 -c "import torch; print(f'PyTorch: {torch.__version__}')"
python3 -c "import tensorrt; print(f'TensorRT: {tensorrt.__version__}')"
python3 -c "import ultralytics; print(f'Ultralytics: {ultralytics.__version__}')"
TensorRT Verification: Confirm TensorRT installation and CUDA availability:
#!/usr/bin/env python3
"""Verify TensorRT and CUDA setup"""
import tensorrt as trt
import torch
import pycuda.driver as cuda
import pycuda.autoinit
# TensorRT version
print(f"TensorRT version: {trt.__version__}")
# CUDA availability
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"cuDNN version: {torch.backends.cudnn.version()}")
# GPU information
if torch.cuda.is_available():
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
print(f"CUDA Cores: {torch.cuda.get_device_properties(0).multi_processor_count}")
# PyCUDA verification
print(f"\nPyCUDA initialized successfully")
print(f"CUDA Device: {cuda.Device(0).name()}")
TensorRT Engine Compilation
Compiling ONNX models to TensorRT engines enables hardware-specific optimizations including layer fusion, kernel auto-tuning, and precision calibration.
flowchart TD
A[ONNX Model] --> B[TensorRT Builder]
B --> C[Network Definition]
C --> D[Optimization Profile]
D --> E[Builder Config]
E --> F{Precision Mode}
F -->|FP32| G[FP32 Engine]
F -->|FP16| H[FP16 Engine]
F -->|INT8| I[INT8 Calibration]
I --> J[Calibration Cache]
J --> K[INT8 Engine]
G --> L[Serialized Engine]
H --> L
K --> L
L --> M[Deploy to Jetson]Basic TensorRT Engine Compilation (Python):
#!/usr/bin/env python3
"""
Compile ONNX model to TensorRT engine with FP16 precision
"""
import tensorrt as trt
import os
def build_engine_fp16(onnx_file, engine_file):
"""Build TensorRT FP16 engine from ONNX"""
# Create builder and network
logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)
network = builder.create_network(
1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
)
parser = trt.OnnxParser(network, logger)
# Parse ONNX model
with open(onnx_file, 'rb') as model:
if not parser.parse(model.read()):
print('ERROR: Failed to parse ONNX file')
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
# Configure builder
config = builder.create_builder_config()
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 4 << 30) # 4GB
# Enable FP16 precision
if builder.platform_has_fast_fp16:
config.set_flag(trt.BuilderFlag.FP16)
print("FP16 mode enabled")
# Build engine
print("Building TensorRT engine... This may take a few minutes")
serialized_engine = builder.build_serialized_network(network, config)
if serialized_engine is None:
print('ERROR: Failed to build engine')
return None
# Save engine
with open(engine_file, 'wb') as f:
f.write(serialized_engine)
print(f"Engine saved to {engine_file}")
return engine_file
# Example usage
if __name__ == '__main__':
onnx_path = 'yolov8n.onnx'
engine_path = 'yolov8n_fp16.engine'
build_engine_fp16(onnx_path, engine_path)
INT8 Engine Compilation with Calibration: INT8 precision requires calibration step to determine optimal quantization parameters for target hardware:
#!/usr/bin/env python3
"""
Compile ONNX to TensorRT INT8 engine with calibration
"""
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import cv2
import os
from glob import glob
class Int8Calibrator(trt.IInt8EntropyCalibrator2):
"""INT8 calibrator for YOLOv8 models"""
def __init__(self, calibration_images, cache_file, batch_size=8):
trt.IInt8EntropyCalibrator2.__init__(self)
self.cache_file = cache_file
self.batch_size = batch_size
self.current_index = 0
# Load and preprocess calibration images
self.images = []
for img_path in calibration_images[:1000]: # Use up to 1000 images
img = cv2.imread(img_path)
img = cv2.resize(img, (640, 640))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img.transpose(2, 0, 1).astype(np.float32) / 255.0
self.images.append(img)
self.images = np.array(self.images)
self.num_batches = len(self.images) // batch_size
# Allocate device memory
self.device_input = cuda.mem_alloc(
batch_size * 3 * 640 * 640 * np.dtype(np.float32).itemsize
)
print(f"Calibration: {len(self.images)} images, {self.num_batches} batches")
def get_batch_size(self):
return self.batch_size
def get_batch(self, names):
if self.current_index >= self.num_batches:
return None
# Get batch
batch_start = self.current_index * self.batch_size
batch_end = batch_start + self.batch_size
batch = self.images[batch_start:batch_end]
# Copy to device
cuda.memcpy_htod(self.device_input, np.ascontiguousarray(batch))
self.current_index += 1
return [int(self.device_input)]
def read_calibration_cache(self):
if os.path.exists(self.cache_file):
with open(self.cache_file, 'rb') as f:
return f.read()
return None
def write_calibration_cache(self, cache):
with open(self.cache_file, 'wb') as f:
f.write(cache)
def build_engine_int8(onnx_file, engine_file, calibration_images,
cache_file='calibration.cache'):
"""Build TensorRT INT8 engine with calibration"""
logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)
network = builder.create_network(
1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
)
parser = trt.OnnxParser(network, logger)
# Parse ONNX
with open(onnx_file, 'rb') as model:
if not parser.parse(model.read()):
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
# Configure builder
config = builder.create_builder_config()
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 4 << 30)
# Enable INT8 precision
config.set_flag(trt.BuilderFlag.INT8)
# Create calibrator
calibrator = Int8Calibrator(calibration_images, cache_file, batch_size=8)
config.int8_calibrator = calibrator
print("Building INT8 engine with calibration...")
print("This may take 10-20 minutes...")
# Build engine
serialized_engine = builder.build_serialized_network(network, config)
if serialized_engine is None:
print('ERROR: Failed to build INT8 engine')
return None
# Save engine
with open(engine_file, 'wb') as f:
f.write(serialized_engine)
print(f"INT8 engine saved to {engine_file}")
return engine_file
# Example usage
if __name__ == '__main__':
onnx_path = 'yolov8n.onnx'
engine_path = 'yolov8n_int8.engine'
# Get calibration images
calib_images = glob('/path/to/calibration/images/*.jpg')
build_engine_int8(onnx_path, engine_path, calib_images)
Optimization Profile Configuration: TensorRT optimization profiles define input dimensions and batch sizes. For edge deployment with fixed input sizes, static optimization provides maximum performance:
# Static optimization profile for 640x640 input
profile = builder.create_optimization_profile()
profile.set_shape(
"images", # Input name
min=(1, 3, 640, 640),
opt=(1, 3, 640, 640),
max=(1, 3, 640, 640)
)
config.add_optimization_profile(profile)
Static profiles enable aggressive layer fusion and kernel optimization. Dynamic profiles support variable input sizes but sacrifice 10-20% performance.
TensorRT Inference Implementation
Efficient inference requires proper engine loading, memory management, and asynchronous execution patterns.
TensorRT Inference Class (Python):
#!/usr/bin/env python3
"""
TensorRT inference wrapper for YOLOv8
"""
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import cv2
import time
class TensorRTInference:
"""Efficient TensorRT inference for YOLOv8"""
def __init__(self, engine_path):
"""Initialize TensorRT engine"""
# Load engine
self.logger = trt.Logger(trt.Logger.WARNING)
with open(engine_path, 'rb') as f:
self.runtime = trt.Runtime(self.logger)
self.engine = self.runtime.deserialize_cuda_engine(f.read())
self.context = self.engine.create_execution_context()
# Allocate buffers
self.inputs = []
self.outputs = []
self.bindings = []
self.stream = cuda.Stream()
for binding in self.engine:
size = trt.volume(self.engine.get_binding_shape(binding))
dtype = trt.nptype(self.engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
self.bindings.append(int(device_mem))
if self.engine.binding_is_input(binding):
self.inputs.append({'host': host_mem, 'device': device_mem})
else:
self.outputs.append({'host': host_mem, 'device': device_mem})
print(f"TensorRT engine loaded: {engine_path}")
print(f"Input shape: {self.engine.get_binding_shape(0)}")
print(f"Output shape: {self.engine.get_binding_shape(1)}")
def preprocess(self, image):
"""Preprocess image for inference"""
# Resize to model input size
img = cv2.resize(image, (640, 640))
# Convert BGR to RGB
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# Normalize to 0-1
img = img.astype(np.float32) / 255.0
# Transpose to CHW format
img = img.transpose(2, 0, 1)
# Add batch dimension
img = np.expand_dims(img, axis=0)
return np.ascontiguousarray(img)
def infer(self, image):
"""Run inference on image"""
# Preprocess
input_data = self.preprocess(image)
# Copy input to device
np.copyto(self.inputs[0]['host'], input_data.ravel())
cuda.memcpy_htod_async(
self.inputs[0]['device'],
self.inputs[0]['host'],
self.stream
)
# Run inference
self.context.execute_async_v2(
bindings=self.bindings,
stream_handle=self.stream.handle
)
# Copy output from device
cuda.memcpy_dtoh_async(
self.outputs[0]['host'],
self.outputs[0]['device'],
self.stream
)
# Synchronize
self.stream.synchronize()
# Reshape output
output = self.outputs[0]['host'].reshape(
self.engine.get_binding_shape(1)
)
return output
def postprocess(self, output, conf_threshold=0.25, iou_threshold=0.45):
"""Post-process model output"""
# YOLOv8 output format: [batch, 84, 8400]
# 84 = 4 bbox coords + 80 class scores
output = output[0] # Remove batch dimension
output = output.T # Transpose to [8400, 84]
# Extract boxes and scores
boxes = output[:, :4]
scores = output[:, 4:].max(axis=1)
class_ids = output[:, 4:].argmax(axis=1)
# Filter by confidence
mask = scores > conf_threshold
boxes = boxes[mask]
scores = scores[mask]
class_ids = class_ids[mask]
# Convert from center format to corner format
x_center, y_center, width, height = boxes.T
x1 = x_center - width / 2
y1 = y_center - height / 2
x2 = x_center + width / 2
y2 = y_center + height / 2
boxes = np.stack([x1, y1, x2, y2], axis=1)
# NMS
indices = self.nms(boxes, scores, iou_threshold)
return boxes[indices], scores[indices], class_ids[indices]
def nms(self, boxes, scores, iou_threshold):
"""Non-maximum suppression"""
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 2]
y2 = boxes[:, 3]
areas = (x2 - x1) * (y2 - y1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0, xx2 - xx1)
h = np.maximum(0, yy2 - yy1)
inter = w * h
iou = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(iou <= iou_threshold)[0]
order = order[inds + 1]
return np.array(keep)
def __del__(self):
"""Cleanup resources"""
del self.context
del self.engine
del self.runtime
# Example usage
if __name__ == '__main__':
# Load engine
inference = TensorRTInference('yolov8n_int8.engine')
# Load test image
image = cv2.imread('test.jpg')
# Warmup
for _ in range(10):
_ = inference.infer(image)
# Benchmark
times = []
for _ in range(100):
start = time.perf_counter()
output = inference.infer(image)
times.append(time.perf_counter() - start)
# Post-process
boxes, scores, class_ids = inference.postprocess(output)
print(f"\nInference latency: {np.mean(times)*1000:.2f}ms ± {np.std(times)*1000:.2f}ms")
print(f"Throughput: {1/np.mean(times):.1f} FPS")
print(f"Detections: {len(boxes)}")
Performance Tuning Strategies
Optimizing Jetson inference performance requires understanding platform capabilities and applying appropriate tuning techniques.
CUDA Stream Management: Asynchronous execution with CUDA streams enables overlapping computation and data transfer:
# Create multiple streams for pipelined execution
stream1 = cuda.Stream()
stream2 = cuda.Stream()
# Alternate between streams for continuous processing
for i, frame in enumerate(video_frames):
stream = stream1 if i % 2 == 0 else stream2
# Async memory transfer
cuda.memcpy_htod_async(input_device, input_host, stream)
# Async inference
context.execute_async_v2(bindings, stream.handle)
# Async output transfer
cuda.memcpy_dtoh_async(output_host, output_device, stream)
Memory Pool Configuration: Configure TensorRT memory pools for optimal performance:
# Adjust workspace size based on available memory
# Larger workspace enables more optimization
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 4 << 30) # 4GB
# For memory-constrained devices (Jetson Nano)
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 30) # 1GB
Jetson Power Mode Configuration: Jetson platforms support multiple power modes affecting performance and thermal behavior:
# View available power modes
sudo nvpmodel -q
# Jetson Orin Nano power modes:
# Mode 0: 15W (MAXN - maximum performance)
# Mode 1: 10W (balanced)
# Mode 2: 7W (power efficient)
# Set maximum performance mode
sudo nvpmodel -m 0
# Set CPU and GPU clocks to maximum
sudo jetson_clocks
# Verify clock frequencies
sudo jetson_clocks --show
Maximum performance mode (MAXN) provides highest throughput but requires active cooling. Power efficient modes suitable for battery operation or passive cooling scenarios.
Thermal Management
Sustained inference workloads generate significant heat. Thermal throttling can reduce performance by 30-50% if not managed properly.
Thermal Monitoring Script (Python):
#!/usr/bin/env python3
"""
Monitor Jetson thermal zones and clock frequencies
"""
import time
import subprocess
def read_thermal_zones():
"""Read all thermal zone temperatures"""
temps = {}
zones = [
'CPU-therm',
'GPU-therm',
'SOC-therm',
'Tboard_tegra',
'Tdiode_tegra'
]
for zone in zones:
try:
result = subprocess.run(
['cat', f'/sys/devices/virtual/thermal/thermal_zone*/type'],
capture_output=True,
text=True
)
# Find zone index
for idx, line in enumerate(result.stdout.strip().split('\n')):
if zone in line:
temp_result = subprocess.run(
['cat', f'/sys/devices/virtual/thermal/thermal_zone{idx}/temp'],
capture_output=True,
text=True
)
temps[zone] = int(temp_result.stdout.strip()) / 1000.0
except:
continue
return temps
def read_clock_frequencies():
"""Read current clock frequencies"""
clocks = {}
# GPU frequency
try:
result = subprocess.run(
['cat', '/sys/devices/gpu.0/devfreq/17000000.gv11b/cur_freq'],
capture_output=True,
text=True
)
clocks['GPU'] = int(result.stdout.strip()) / 1e6 # MHz
except:
pass
# CPU frequencies
for cpu in range(8): # Orin has 8 cores
try:
result = subprocess.run(
['cat', f'/sys/devices/system/cpu/cpu{cpu}/cpufreq/scaling_cur_freq'],
capture_output=True,
text=True
)
clocks[f'CPU{cpu}'] = int(result.stdout.strip()) / 1000 # MHz
except:
pass
return clocks
def monitor_thermal(duration=60, interval=1):
"""Monitor thermal and clock behavior"""
print("Monitoring thermal behavior...")
print("Press Ctrl+C to stop\n")
start_time = time.time()
try:
while time.time() - start_time < duration:
temps = read_thermal_zones()
clocks = read_clock_frequencies()
print(f"\r[{time.time()-start_time:.1f}s] ", end='')
# Print temperatures
for zone, temp in temps.items():
print(f"{zone}: {temp:.1f}°C ", end='')
# Print GPU clock
if 'GPU' in clocks:
print(f"| GPU: {clocks['GPU']:.0f}MHz ", end='')
time.sleep(interval)
except KeyboardInterrupt:
print("\nMonitoring stopped")
if __name__ == '__main__':
monitor_thermal(duration=300, interval=1) # 5 minutes
Thermal Throttling Mitigation: Strategies for maintaining performance under thermal constraints:
Active cooling requirement: Jetson Xavier NX and Orin platforms require active cooling (fan) for sustained workloads above 10W. Passive cooling sufficient only for intermittent inference or power modes below 10W. Ensure adequate airflow with minimum 25mm clearance around heatsink/fan assembly.
Thermal interface material: Quality thermal interface between die and heatsink critical for heat transfer. Reapply thermal paste if temperatures exceed 75C under moderate load. Use high-quality thermal pads (>5 W/mK thermal conductivity) for optimal performance.
Workload scheduling: For battery-powered deployments, implement duty cycling with inference bursts followed by idle periods for thermal recovery. Monitor GPU temperature and reduce inference frequency if exceeding 80C to prevent throttling.
Platform-Specific Benchmarks
Performance characteristics vary significantly across Jetson platforms. Understanding actual throughput guides deployment decisions.
Jetson Nano (Discontinued) Benchmarks: YOLOv8n FP16 achieves 18-22 FPS, YOLOv8n INT8 achieves 25-30 FPS. YOLOv8s models drop to 8-12 FPS even with INT8. Thermal throttling common after 5-10 minutes sustained inference without active cooling. Suitable for lightweight monitoring applications with intermittent inference.
Jetson Xavier NX Benchmarks: YOLOv8n INT8 achieves 45-55 FPS, YOLOv8s INT8 achieves 30-38 FPS, YOLOv8m INT8 achieves 18-22 FPS. Maintains performance with active cooling under MAXN (15W) mode. Excellent balance for production deployments requiring real-time performance with moderate model complexity.
Jetson Orin Nano Benchmarks: YOLOv8n INT8 achieves 65-75 FPS, YOLOv8s INT8 achieves 48-58 FPS, YOLOv8m INT8 achieves 30-38 FPS, YOLOv8l INT8 achieves 18-24 FPS. Consistent performance under 15W MAXN mode with proper cooling. Recommended platform for new deployments as of early 2025.
Jetson AGX Orin Benchmarks: YOLOv8n INT8 achieves 120-140 FPS, YOLOv8s INT8 achieves 90-110 FPS, YOLOv8m INT8 achieves 60-75 FPS, YOLOv8l INT8 achieves 35-45 FPS. Supports concurrent multi-model inference and high-resolution inputs. Suitable for demanding edge applications requiring maximum performance.
Key Takeaways
NVIDIA Jetson platforms provide mature edge AI infrastructure with comprehensive tooling and excellent TensorRT integration. JetPack SDK bundles complete development environment enabling rapid deployment of optimized models. TensorRT engine compilation applies hardware-specific optimizations including layer fusion, kernel auto-tuning, and precision calibration delivering 2-5x performance improvement over generic frameworks.
INT8 calibration on target hardware ensures optimal quantization parameters for deployed models. Proper calibration dataset selection (500-1000 representative images) critical for maintaining accuracy while maximizing performance. Asynchronous inference with CUDA streams enables efficient pipeline execution overlapping data transfer and computation.
Thermal management essential for sustained performance. Active cooling required for workloads above 10W with proper thermal interface ensuring effective heat transfer. Power mode configuration balances performance against thermal constraints and power budget. Jetson Orin Nano represents optimal platform for new deployments offering 40 TOPS INT8 performance at 7-15W configurable power envelope.
Part 4 continues with multi-language inference server implementation, covering Node.js/Express and C#/ASP.NET Core server architectures, camera integration patterns, asynchronous request handling, error recovery mechanisms, and achieving 15-22ms end-to-end latency supporting 30+ concurrent inference requests.
References
- NVIDIA Jetson Orin Platform Documentation (https://developer.nvidia.com/embedded/jetson-orin)
- NVIDIA JetPack SDK Documentation (https://docs.nvidia.com/jetson/jetpack/index.html)
- NVIDIA TensorRT Developer Guide (https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html)
- TensorRT Python API Documentation (https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/)
- NVIDIA Jetson Developer Forums (https://forums.developer.nvidia.com/c/agx-autonomous-machines/jetson-embedded-systems/70)
- Jetson Inference Library (https://github.com/dusty-nv/jetson-inference)
- TensorRT Integration Best Practices (https://developer.nvidia.com/blog/tensorrt-integration-speeds-tensorflow-inference/)
- NVIDIA Jetson Tutorials (https://developer.nvidia.com/embedded/learn/tutorials)
