Project information

  • Company: ORamaVR
  • Categories: Data Engineering, MLOps, 3D Deep Learning
  • Main technologies: Python, PyTorch, HuggingFace Datasets, Kaolin, Metaflow

Summary

Training generative 3D models requires massive, well-structured datasets. OVR-Datasets was built as the foundational data engineering pipeline to handle 3D data at scale.

I developed an automated processing pipeline capable of standardizing and pre-processing millions of 3D meshes for deep learning workflows. A core feature of the project was creating a streamlined codebase to efficiently store, version, and retrieve these large-scale 3D datasets directly from HuggingFace.

By orchestrating the workflow with Metaflow and utilizing NVIDIA's Kaolin library for 3D operations, I ensured that massive datasets could be seamlessly and efficiently injected into heavy PyTorch training pipelines.