AI Data Engineer
LEGO Digital Play will bring the LEGO brand into digital entertainment in new, innovative, and wholly-owned ways. Our mission is to ensure the LEGO brand remains as powerfully a part of children’s lives in the coming decades as it has ever been. We aim to reach every kid on the planet, their parents, and adult fans of LEGO—and provide them with meaningful, magical, and playful new experiences.
We are at the earliest phases of this new company, offering a unique opportunity to build a new entity for the world's most beloved and trusted brand. Our culture is open, collaborative, intellectually rigorous, and creatively vibrant.
Job Summary:
As our AI Data Engineer, you will be the architect of LDP’s data foundation—turning raw LEGO brand geometry, scans, images, and video into the structured, high-quality datasets that power our AI breakthroughs. You’ll take ownership of data pipelines end-to-end, from ingestion and transformation to quality control and compliance, ensuring our models are trained on safe, brand-aligned, and future-proof data.
This is a hands-on, impact-driven role at the heart of LEGO Digital Play’s AI journey. You’ll collaborate with model engineers, product managers, and external partners to deliver world-class data pipelines that unlock new play experiences. If you’re excited about scaling data for 3D, computer vision, and hybrid play, this is your chance to build the foundations of AI innovation at LEGO Digital Play.
Key Responsibilities
- Design and operate ingestion pipelines for LEGO brand bricks and sets, including geometry, semantics, images, video, and 3D scans.
- Build 3D asset preparation flows, such as mesh clean-up, normalization, UVs/LODs, and voxel/point-cloud conversions, ensuring data is optimized for model training.
- Implement labelling and QA workflows, manage external annotation partners, and automate quality checks to deliver reliable datasets at scale.
- Define schemas and metadata for digital LEGO brand assets; maintain catalogues, lineage, and versioning to guarantee reproducibility and governance.
- Enforce data quality, drift detection, and compliance (GDPR/COPPA), including watermarking and traceability, to protect LEGO Digital Play’s IP and child privacy.
- Collaborate with infrastructure engineers to optimize storage and compute efficiency, ensuring large-scale datasets are cost-effective and performant.
- Partner with model engineers to define data specifications, support experiments, and implement active learning loops that continuously improve LEGO Digital Play’s AI models.
- Work closely with designers and gameplay engineers to align data requirements with real-world play experiences, ensuring outputs fuel authentic LEGO brand-specific creativity.
Qualifications
- Experience in data engineering for ML/AI, with strong skills in Python, SQL, and distributed data tools (e.g., PySpark).
- Hands-on expertise with cloud data platforms (GCP, AWS, or Azure), orchestration frameworks (e.g., Airflow), and ELT/ETL tools.
- Familiarity with 2D image formats and 3D data formats (e.g., OBJ, FBX, glTF), point clouds, and large-scale computer vision datasets — or willingness to quickly upskill in 3D pipelines.
- Experience with data quality, lineage, and versioning tools (e.g., Git/Perforce, DVC), and data storage (e.g., XML, JSON, databases).
- Solid grounding in privacy-by-design, security, and compliance, including GDPR/COPPA, with awareness of IP protection and brand safety.
- Builder’s mindset: able to prototype quickly, iterate fast, and document clearly.
- Detail-oriented and safety-conscious: committed to child privacy, brand integrity, and IP protection.
- Collaborative communicator who can translate between creative, engineering, and legal teamsNice to have: Blender scripting, photogrammetry/NeRF dataset prep, Unity/Unreal asset pipelines.
Preferred Qualifications:
- Experience with Blender scripting, photogrammetry, or NeRF dataset preparation.
- Familiarity with Unity/Unreal pipelines and integration of 3D assets into interactive environments.
- Knowledge of feature stores, active learning workflows, or dataset augmentation for ML/AI.
- Passion for LEGO play and curiosity about how data fuels creativity and hybrid play experiences.
- Locations
- LEGO Campus, LEGO Digital Play London Office
- Remote status
- Hybrid