Director, ML/Dev Ops (Tip.AI)
Company: Marriott Hotels Resorts
Location: Bethesda
Posted on: April 2, 2026
|
|
|
Job Description:
Description Position Overview Lead the transformation of how
applications and AI systems are delivered, operated, and evolved at
enterprise scale. This role owns the design and execution of
AI?powered DevOps in the Marriott’s AI Platform. The goal is to
enabling teams to ship production?grade, observable, self?healing
services with minimal human toil. You will partner deeply with the
Kubernetes platform team, devops platform team and other
organizational leaders to help produce safe scalable solutions that
protect the core AI platform services so we can provide high
business value interactions to the org. Key Responsibilities •
Build resilient CI/CD pipelines for platform services that include
testing, monitoring and auto?rollback. • Deploy models and
workloads via Kubernetes SageMaker, KFServing, Ray Serve, etc.;
sustain latency/error budgets. • Work with other platform teams to
advance their innovation roadmaps as an early adopter. • Embed
OpenTelemetry traces, vector?metrics, cost monitors into unified
dashboards. • Implement MCP?compliant gateways for safe
human?and?agent invocations. • Champion the use of internal
autonomous agents to eliminate repetitive DevOps and SRE toil
across build, deploy, and runtime operations • Serve as a thought
leader for AI?based operations, influencing architecture standards,
platform roadmaps, and engineering culture. • Coach senior
engineers and platform teams on modern DevOps, SRE, and AI?Ops
patterns. • Delivery and reliability of the platform: Lead
post?incident learning and drive systemic improvements through
blameless retrospectives and automation. Qualifications • Extensive
experience working on highly scalable and available systems as a
software engineering experience, • Deep knowledge of standard
devOps practices and cloud infrastructure. This includes identity
management and networking. • Experience in ML Ops working with live
models. • IaC mastery (CDK/Terraform) and secrets management
(Vault, AWS Secrets Manager). • Proven record hitting SLOs for
containerized ML services at fleet scale. • Deep Experience working
with cloud • Strong servant?leader with a passion for
work?automation and incident retros. • Extreme desire to be part of
a committed team that is building for global scale to change the
way the world does travel. • Excellent verbal communication skills,
with the ability to articulate complex architectural decisions
clearly. • Ability to produce/review extremely clean software
documentation • Ability to effectively communicate async with
remote team members across the globe. Preferred Skills • Experience
moving legacy CI to agent?augmented pipelines. • Cost?aware
autoscaling and GPU quota governance. • Experience building with
Harness.io • Certifications in AWS or GCP Why This role Matters
This role exists to set the bar for how software systems are
delivered and operated at enterprise scale, moving from manual
DevOps to AI?driven, self?healing platforms. You will embed
intelligence into CI/CD pipelines and Kubernetes runtimes so teams
can ship faster, safer, and with far less operational toil. Working
closely with platform and Kubernetes teams, you’ll introduce
AI?based improvements that materially raise reliability,
scalability, and efficiency. If you want to lead the shift from
reactive operations to systems that learn, adapt, and run
themselves, this role gives you the scope and influence to do it.
At Marriott International, we are dedicated to being an equal
opportunity employer, welcoming all and providing access to
opportunity. We actively foster an environment where the unique
backgrounds of our associates are valued and celebrated. Our
greatest strength lies in the rich blend of culture, talent, and
experiences of our associates. We are committed to
non-discrimination on any protected basis, including disability,
veteran status, or other basis protected by applicable law.
Keywords: Marriott Hotels Resorts, Wheaton-Glenmont , Director, ML/Dev Ops (Tip.AI), IT / Software / Systems , Bethesda, Maryland