AI Infrastructure Engineer
LEGO Digital Play will bring the LEGO brand into digital entertainment in new, innovative, and wholly-owned ways. Our mission is to ensure the LEGO brand remains as powerfully a part of children’s lives in the coming decades as it has ever been. We aim to reach every kid on the planet, their parents, and adult fans of LEGO—and provide them with meaningful, magical, and playful new experiences.
We are at the earliest phases of this new company, offering a unique opportunity to build a new entity for the world's most beloved and trusted brand. Our culture is open, collaborative, intellectually rigorous, and creatively vibrant.
Job Summary
As an AI Infrastructure Engineer, you will build the backbone that powers LEGO Digital Play’s AI products. From scalable inference APIs and model deployment pipelines to observability, security, and developer tooling, you’ll make sure our AI systems run reliably, safely, and at scale.
This is a platform-first role where impact is measured in stability and speed. You’ll design cloud-native services, optimize GPU orchestration, and create the “paved roads” that let model and product teams move fast without breaking things. By enabling SDK externalization with monitoring, authentication, and billing, you’ll ensure LDP’s AI foundations are not only world-class but ready to power a new generation of experiences with the LEGO brand.
Key Responsibilities
- Design MLOps platform: CI/CD for models, model registry, feature store, GPU orchestration.
- Implement low-latency REST/gRPC APIs for model inference used by engines and SDK clients.
- Engineer content services for digital LEGO brand assets: storage, CDN, security, metadata.
- Set up observability: logs, traces, metrics, error budgets, availability, latency, saturation and model monitoring (drift, performance, cost).
- Harden security & compliance (e.g., OAuth2/OIDC, RBAC, GDPR/COPPA); manage secrets and VPCs, encryption in transit/at rest, data residency controls, least privilege by default.
- Integrate with partner clouds; implement monitoring & usage metering for external SDK use.
- Continuously optimize cost-to-serve and reliability; define and meet SLOs.
- Adopt a phased approach and scale up to consumer (managed services first; graduate to K8s/GPU orchestration as scale/latency require).
Qualifications
- Experience in back-end/platform engineering (e.g., Go/TypeScript/Python); microservices and gRPC/REST.
- Cloud-native (GCP/AWS/Azure), containers; GPU scheduling (e.g.,) NVIDIA stack) or managed ML platforms (e.g., Vertex AI).
- Proficiency with core datastores (e.g., Postgres, Redis, and object storage) for structured, cached, and large-scale asset/model data.
- Experience with streaming and messaging systems (e.g., Kafka, Pub/Sub, or Kinesis) to support real-time data and event-driven pipelines.
- MLOps tooling (e.g., MLflow/Kubeflow/Triton/Airflow) and CI/CD (e.g., GitHub Actions/Jenkins).
- Security/IAM, IaC (e.g., Terraform), and performance tuning for high-QPS low-latency services.
- Reliability-obsessed; proactive on incident prevention and postmortems.
- Platform thinker: invests in great developer experience.
- Ethical and privacy-minded; designs for safety and resilience.
- Game backend/real-time networking; Unity/Unreal integration experience (nice to have)
- Locations
- LEGO Digital Play London Office, LEGO Campus
- Remote status
- Hybrid