When you’re scaling computer vision projects, format compatibility blocks your pipeline. Your team annotates in CVAT. Your training expects YOLO. Someone sends you COCO segmentation data. Now what?
Re-annotate everything? Waste three days debugging coordinate systems? Hope your intern doesn’t silently break the dataset?
I’ve converted 50,000+ annotated images at AI and ML Network. Here’s the production-grade process that prevents data loss, silent errors, and training failures.
Every annotation tool picks its own coordinate system. CVAT outputs XML with absolute pixels. YOLO wants normalized .txt files. COCO uses complex JSON with polygon arrays.
Your options:
- Re-annotate from scratch (expensive, slow, demoralizing)
- Modify training code for multiple formats (maintenance nightmare)
- Convert formats correctly (fast, scales, works)
Option three wins. But only if your conversion doesn’t introduce silent coordinate errors that destroy model accuracy.
CVAT XML Structure:
- XML-based with nested
<image> and <box> tags
- Absolute pixel coordinates:
xtl, ytl, xbr, ybr
- Supports bounding boxes, polygons, keypoints, attributes
- One XML file per annotation task
YOLO Format:
- One
.txt file per image
- Space-separated:
class_id x_center y_center width height
- All coordinates normalized 0-1 range
- Requires
data.yaml for class mapping
COCO JSON Format:
- Single JSON for entire dataset
- Arrays:
images, annotations, categories
- Segmentation as polygon coordinates or RLE
- Bounding box format:
[x, y, width, height] in absolute pixels
The critical difference? Coordinate normalization and origin points. Get this wrong once, and every bounding box in your dataset is misaligned.
Environment Setup (Ubuntu 24.04)
pip install opencv-python lxml --break-system-packages
The Conversion Script
Create cvat_to_yolo.py:
import xml.etree.ElementTree as ET
import os
from pathlib import Path
def parse_cvat_xml(xml_path):
"""Extract annotations from CVAT XML and convert coordinate system"""
tree = ET.parse(xml_path)
root = tree.getroot()
annotations = []
for image in root.findall('image'):
img_name = image.get('name')
img_width = int(image.get('width'))
img_height = int(image.get('height'))
image_annotations = []
for box in image.findall('box'):
label = box.get('label')
xtl = float(box.get('xtl'))
ytl = float(box.get('ytl'))
xbr = float(box.get('xbr'))
ybr = float(box.get('ybr'))
# CRITICAL: Convert absolute pixels to normalized YOLO format
x_center = ((xtl + xbr) / 2) / img_width
y_center = ((ytl + ybr) / 2) / img_height
width = (xbr - xtl) / img_width
height = (ybr - ytl) / img_height
image_annotations.append({
'label': label,
'x_center': x_center,
'y_center': y_center,
'width': width,
'height': height
})
annotations.append({
'image_name': img_name,
'image_width': img_width,
'image_height': img_height,
'boxes': image_annotations
})
return annotations
def write_yolo_labels(annotations, output_dir, class_mapping):
"""Write YOLO format label files with proper formatting"""
Path(output_dir).mkdir(parents=True, exist_ok=True)
for img_data in annotations:
img_name = Path(img_data['image_name']).stem
label_path = os.path.join(output_dir, f"{img_name}.txt")
with open(label_path, 'w') as f:
for box in img_data['boxes']:
class_id = class_mapping.get(box['label'], 0)
# 6 decimal precision prevents rounding errors
line = f"{class_id} {box['x_center']:.6f} {box['y_center']:.6f} {box['width']:.6f} {box['height']:.6f}\n"
f.write(line)
# Production usage
xml_path = "/path/to/annotations.xml"
output_dir = "/path/to/labels"
class_mapping = {'person': 0, 'vehicle': 1, 'bicycle': 2}
annotations = parse_cvat_xml(xml_path)
write_yolo_labels(annotations, output_dir, class_mapping)
Create data.yaml (Required for YOLO Training)
# data.yaml
path: /path/to/dataset
train: images/train
val: images/val
test: images/test
nc: 3 # number of classes
names: ['person', 'vehicle', 'bicycle']
Common Conversion Errors That Break Training
1. Coordinate Overflow (Values > 1.0)
- Root cause: Annotation boxes extend outside image bounds
- Fix: Audit CVAT project for edge-case annotations
- Detection: Run validation before training
2. Missing Label Files
- Root cause: Images without annotations don’t get
.txt files
- Impact: YOLO training crashes or skips images
- Fix: Create empty
.txt files for annotation-free images
3. Class Mapping Mismatch
- Root cause: CVAT label names don’t match
data.yaml class order
- Impact: Model trains on wrong classes, accuracy tanks
- Fix: Explicit class mapping dictionary, verify before batch processing
YOLO segmentation extends bounding boxes with normalized polygon coordinates:
Format: class_id x1 y1 x2 y2 x3 y3 ... xn yn
All coordinates normalized 0-1, representing polygon vertices.
Setup Requirements
pip install pycocotools opencv-python numpy --break-system-packages
COCO to YOLO Segmentation Script
Create coco_to_yolo_seg.py:
import json
import os
from pathlib import Path
import numpy as np
def load_coco_json(json_path):
"""Load and validate COCO format JSON"""
with open(json_path, 'r') as f:
coco_data = json.load(f)
# Validate required fields
required = ['images', 'annotations', 'categories']
if not all(key in coco_data for key in required):
raise ValueError(f"Invalid COCO JSON. Missing: {[k for k in required if k not in coco_data]}")
return coco_data
def coco_to_yolo_segmentation(coco_data, output_dir):
"""Convert COCO polygon segmentation to YOLO normalized format"""
Path(output_dir).mkdir(parents=True, exist_ok=True)
# Build lookup tables
images = {img['id']: img for img in coco_data['images']}
categories = {cat['id']: idx for idx, cat in enumerate(coco_data['categories'])}
# Group annotations by image
image_annotations = {}
for ann in coco_data['annotations']:
img_id = ann['image_id']
if img_id not in image_annotations:
image_annotations[img_id] = []
image_annotations[img_id].append(ann)
# Process each image's annotations
conversion_stats = {'total': 0, 'polygons': 0, 'rle': 0, 'errors': 0}
for img_id, annotations in image_annotations.items():
img_info = images[img_id]
img_name = Path(img_info['file_name']).stem
img_width = img_info['width']
img_height = img_info['height']
label_path = os.path.join(output_dir, f"{img_name}.txt")
with open(label_path, 'w') as f:
for ann in annotations:
if 'segmentation' not in ann or not ann['segmentation']:
continue
class_id = categories[ann['category_id']]
conversion_stats['total'] += 1
# Handle polygon segmentation (list format)
if isinstance(ann['segmentation'], list):
conversion_stats['polygons'] += 1
for seg in ann['segmentation']:
# seg format: [x1, y1, x2, y2, ..., xn, yn]
if len(seg) < 6: # Need at least 3 points
conversion_stats['errors'] += 1
continue
# Normalize all coordinates
normalized_points = []
for i in range(0, len(seg), 2):
x_norm = seg[i] / img_width
y_norm = seg[i + 1] / img_height
# Clamp to valid range
x_norm = max(0.0, min(1.0, x_norm))
y_norm = max(0.0, min(1.0, y_norm))
normalized_points.extend([x_norm, y_norm])
# Write YOLO segmentation line
points_str = ' '.join([f"{p:.6f}" for p in normalized_points])
f.write(f"{class_id} {points_str}\n")
# Handle RLE format (dict format)
elif isinstance(ann['segmentation'], dict):
conversion_stats['rle'] += 1
print(f"Warning: RLE format detected for {img_name}. Requires conversion to polygon.")
print(f"Conversion complete:")
print(f" Total annotations: {conversion_stats['total']}")
print(f" Polygons converted: {conversion_stats['polygons']}")
print(f" RLE format (needs handling): {conversion_stats['rle']}")
print(f" Errors: {conversion_stats['errors']}")
def create_yaml(coco_data, output_path):
"""Generate data.yaml from COCO categories"""
categories = [cat['name'] for cat in sorted(coco_data['categories'], key=lambda x: x['id'])]
yaml_content = f"""# Generated from COCO dataset
path: /path/to/dataset
train: images/train
val: images/val
nc: {len(categories)}
names: {categories}
"""
with open(output_path, 'w') as f:
f.write(yaml_content)
print(f"Created {output_path} with {len(categories)} classes")
# Production usage
coco_json = "/path/to/annotations.json"
output_labels = "/path/to/labels"
coco_data = load_coco_json(coco_json)
coco_to_yolo_segmentation(coco_data, output_labels)
create_yaml(coco_data, "data.yaml")
Handling RLE (Run-Length Encoding) in COCO
Some COCO datasets encode masks as RLE instead of polygons. Here’s the conversion:
from pycocotools import mask as maskUtils
from skimage import measure
import numpy as np
def rle_to_polygon(rle, img_height, img_width):
"""Convert COCO RLE mask to polygon coordinates for YOLO"""
# Decode RLE to binary mask
if isinstance(rle, dict):
binary_mask = maskUtils.decode(rle)
else:
rle_obj = {'size': [img_height, img_width], 'counts': rle}
binary_mask = maskUtils.decode(rle_obj)
# Extract contours from binary mask
contours = measure.find_contours(binary_mask, 0.5)
polygons = []
for contour in contours:
# Simplify contour (reduce points while preserving shape)
contour = measure.approximate_polygon(contour, tolerance=1.0)
if len(contour) < 3: # Need minimum 3 points
continue
# Convert from (row, col) to (x, y) and flatten
polygon = []
for point in contour:
x = point[1] # column = x
y = point[0] # row = y
polygon.extend([x, y])
polygons.append(polygon)
return polygons
# Integrate into main conversion
def handle_rle_annotation(ann, img_width, img_height):
"""Process RLE annotation within COCO to YOLO pipeline"""
rle = ann['segmentation']
polygons = rle_to_polygon(rle, img_height, img_width)
yolo_lines = []
class_id = ann['category_id']
for polygon in polygons:
# Normalize coordinates
normalized = []
for i in range(0, len(polygon), 2):
x_norm = polygon[i] / img_width
y_norm = polygon[i + 1] / img_height
normalized.extend([x_norm, y_norm])
points_str = ' '.join([f"{p:.6f}" for p in normalized])
yolo_lines.append(f"{class_id} {points_str}")
return yolo_lines
Validation: Catch Errors Before Training
import cv2
import numpy as np
def validate_yolo_format(label_path, img_width, img_height):
"""Validate YOLO label file format and coordinate ranges"""
errors = []
with open(label_path, 'r') as f:
for line_num, line in enumerate(f, 1):
parts = line.strip().split()
if len(parts) < 5:
errors.append(f"Line {line_num}: Insufficient values")
continue
try:
class_id = int(parts[0])
coords = [float(x) for x in parts[1:]]
except ValueError as e:
errors.append(f"Line {line_num}: Parse error - {e}")
continue
# Validate bounding box (5 values)
if len(parts) == 5:
x_center, y_center, width, height = coords
# Check normalization
if not (0 <= x_center <= 1 and 0 <= y_center <= 1):
errors.append(f"Line {line_num}: Center out of bounds")
if not (0 < width <= 1 and 0 < height <= 1):
errors.append(f"Line {line_num}: Size out of bounds")
# Validate segmentation (odd length after class_id)
else:
if len(coords) % 2 != 0:
errors.append(f"Line {line_num}: Odd coordinate count")
if len(coords) < 6:
errors.append(f"Line {line_num}: Polygon needs ≥3 points")
# Check all points normalized
for i, coord in enumerate(coords):
if coord < 0 or coord > 1:
errors.append(f"Line {line_num}: Coord {i} = {coord:.4f} out of range")
return errors
def visualize_yolo_annotations(img_path, label_path, class_names):
"""Draw YOLO annotations on image for visual verification"""
img = cv2.imread(img_path)
if img is None:
return None
h, w = img.shape[:2]
with open(label_path, 'r') as f:
for line in f:
parts = line.strip().split()
class_id = int(parts[0])
class_name = class_names[class_id] if class_id < len(class_names) else f"Class{class_id}"
# Bounding box
if len(parts) == 5:
x_center, y_center, width, height = map(float, parts[1:])
# Convert to pixel coordinates
x1 = int((x_center - width/2) * w)
y1 = int((y_center - height/2) * h)
x2 = int((x_center + width/2) * w)
y2 = int((y_center + height/2) * h)
cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(img, class_name, (x1, y1-10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
# Segmentation polygon
else:
coords = list(map(float, parts[1:]))
points = []
for i in range(0, len(coords), 2):
x = int(coords[i] * w)
y = int(coords[i+1] * h)
points.append([x, y])
points = np.array(points, dtype=np.int32)
cv2.polylines(img, [points], True, (0, 255, 0), 2)
# Add label at polygon centroid
M = cv2.moments(points)
if M["m00"] != 0:
cx = int(M["m10"] / M["m00"])
cy = int(M["m01"] / M["m00"])
cv2.putText(img, class_name, (cx, cy),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
return img
# Run validation pipeline
errors = validate_yolo_format("label.txt", 1920, 1080)
if errors:
print("Validation errors found:")
for error in errors:
print(f" - {error}")
else:
print("✓ Validation passed")
# Visual spot check
img = visualize_yolo_annotations("image.jpg", "label.txt", ['person', 'vehicle'])
if img is not None:
cv2.imwrite("verification.jpg", img)
Batch Processing for Production Pipelines
import glob
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor
def batch_convert_cvat_to_yolo(xml_files, output_dir, class_mapping, workers=4):
"""Multi-threaded batch conversion with progress tracking"""
def process_file(xml_file):
try:
annotations = parse_cvat_xml(xml_file)
write_yolo_labels(annotations, output_dir, class_mapping)
return len(annotations), None
except Exception as e:
return 0, str(e)
total_images = 0
errors = []
with ThreadPoolExecutor(max_workers=workers) as executor:
results = list(tqdm(
executor.map(process_file, xml_files),
total=len(xml_files),
desc="Converting CVAT to YOLO"
))
for result, error in results:
if error:
errors.append(error)
else:
total_images += result
print(f"\nConversion complete:")
print(f" Files processed: {len(xml_files)}")
print(f" Images converted: {total_images}")
print(f" Errors: {len(errors)}")
if errors:
print("\nError log:")
for err in errors[:10]: # Show first 10
print(f" - {err}")
return total_images, errors
# Usage for entire dataset
xml_files = glob.glob("/path/to/annotations/*.xml")
class_mapping = {'person': 0, 'vehicle': 1, 'bicycle': 2}
total, errors = batch_convert_cvat_to_yolo(
xml_files,
"/path/to/output/labels",
class_mapping,
workers=8
)
Pre-Training Verification Checklist
Run this before training to prevent silent failures:
1. File Count Verification
# Count images vs labels
image_count=$(find images/ -type f | wc -l)
label_count=$(find labels/ -type f -name "*.txt" | wc -l)
echo "Images: $image_count | Labels: $label_count"
2. Format Validation
def verify_dataset(labels_dir, images_dir, class_names):
"""Comprehensive dataset verification"""
label_files = glob.glob(f"{labels_dir}/*.txt")
issues = {
'missing_images': [],
'format_errors': [],
'coordinate_errors': [],
'empty_files': []
}
for label_file in tqdm(label_files, desc="Verifying"):
# Check corresponding image exists
img_name = Path(label_file).stem
img_path = f"{images_dir}/{img_name}.jpg" # Adjust extension
if not os.path.exists(img_path):
issues['missing_images'].append(img_name)
continue
# Get image dimensions
img = cv2.imread(img_path)
if img is None:
continue
h, w = img.shape[:2]
# Validate format
errors = validate_yolo_format(label_file, w, h)
if errors:
issues['format_errors'].append((img_name, errors))
# Check for empty files
if os.path.getsize(label_file) == 0:
issues['empty_files'].append(img_name)
# Print summary
print("\n=== Dataset Verification Report ===")
print(f"Total labels checked: {len(label_files)}")
print(f"Missing images: {len(issues['missing_images'])}")
print(f"Format errors: {len(issues['format_errors'])}")
print(f"Empty label files: {len(issues['empty_files'])}")
return issues
# Run verification
issues = verify_dataset("labels/train", "images/train", ['person', 'vehicle'])
3. Visual Spot Checks
import random
def random_spot_check(labels_dir, images_dir, class_names, samples=10):
"""Randomly verify visual alignment"""
label_files = glob.glob(f"{labels_dir}/*.txt")
samples = random.sample(label_files, min(samples, len(label_files)))
for label_file in samples:
img_name = Path(label_file).stem
img_path = f"{images_dir}/{img_name}.jpg"
if os.path.exists(img_path):
vis = visualize_yolo_annotations(img_path, label_file, class_names)
if vis is not None:
output = f"spot_check_{img_name}.jpg"
cv2.imwrite(output, vis)
print(f"Generated: {output}")
random_spot_check("labels/train", "images/train", ['person', 'vehicle'])
What Happens When You Get This Wrong
Symptom 1: Model trains but accuracy stays at 0%
- Root cause: Class IDs misaligned between labels and
data.yaml
- Fix: Verify class mapping, rebuild labels
Symptom 2: Boxes appear in wrong locations
- Root cause: Coordinate normalization error or origin mismatch
- Fix: Verify conversion math, check for absolute vs. normalized coordinates
Symptom 3: Training crashes with “invalid bbox” errors
- Root cause: Coordinates outside 0-1 range
- Fix: Run validation, clamp coordinates during conversion
Symptom 4: Some classes never get detected
- Root cause: Missing or empty label files for certain classes
- Fix: Verify all classes present in training data, check label distribution
We’ve converted 50,000+ annotated images for computer vision teams. Our conversion pipeline includes:
- Automated validation that catches coordinate errors before they reach training
- Visual spot-checks on random samples to verify alignment
- Format-specific QA for CVAT, Label Studio, COCO, YOLO, Pascal VOC
- Same-day turnaround for urgent project needs
We guarantee 99%+ conversion accuracy. Your model trains on correct data, or we fix it free.
Need a free 50-image sample conversion? Send us your dataset. We’ll convert it and return a validation report so you can verify quality before committing to the full batch.
Final Production Checklist
Before deploying your converted dataset:
Format conversion isn’t glamorous work. But done right, it saves your team days of debugging and prevents silent training failures that waste GPU hours.
Ready to scale your annotation workflow? We handle the tedious conversion work so your ML team can focus on model development. Contact us at aiandml.net for a free sample batch.