A woman smiling in front of a fountain with a sculpture of an eagle on top, surrounded by trees and other people sitting on the fountain's edge.

👋
Hello, I’m Libby Hotson

I’m a Senior Technical Program Manager (and sometimes Product Manager) with 10 years of experience, focusing on large-scale distributed systems and site reliability. I’m also a proud dog mom and undisputed Mario Kart champion.

My story

I’ve been shaping reliability and operational excellence in engineering organizations since 2015, helping teams move faster without compromising quality or resilience. Over the years, I’ve partnered with product, platform, and SRE groups to design processes, standards, and systems that turn chaos into clarity.

At Redfin, I led the company-wide migration to Datadog, coordinating with 50+ teams on 100+ use cases to deliver millions in annual savings and enhanced observability. I also authored and rolled out Redfin’s Service Ownership strategy, building a standards library, scorecards, and dashboards that gave every engineering leader real-time visibility into cost, reliability, and test coverage. Through my work on Incident Management, I reduced P1 acknowledgement time from 30 minutes to under 5 minutes and established a repeatable RCA program that turned incidents into learnings.

Throughout my career, I’ve stepped beyond program management, acting as an interim team lead, mentoring engineers and TPMs, and creating durable frameworks that outlast any single project. My passion and focus have always been the same: enabling teams to deliver at scale, with confidence, and with the right balance of speed and stability.

Where I’ve worked

  • Senior Technical Program Manager
    August 2020 - August 2025

    • Led full lifecycle management of Datadog platform migration, from cross-org requirements gathering to usage forecasting, budgeting, and contract scheduling. Partnered with procurement to align licensing with anticipated adoption growth across 50+ teams and 100+ use cases, optimizing observability ROI and cutting alert noise.

    • Owned the service reliability roadmap for 50+ teams, aligning observability, incident response, and operational health goals with business priorities.

    • Authored product requirements and success criteria for reliability initiatives, including Datadog migration, service health dashboards, and incident tooling.
      • Boosted best practice compliance from >20% to 96% for Tier 1 and 2 services

    • Partnered with senior engineers on distributed system design and incident tooling, enabling in-depth technical discussions and credible roadmap tradeoffs.

    • Cut P1 incident MTTA from 30 min to 4 min by implementing incident management processes and in-house orchestration tooling.

    • Served as interim team lead, sustaining delivery of multiple high-impact programs and mentoring peers on reliability best practices.

    • Pioneered RCA Program and utilized AI-powered analysis of incident data, identifying systemic gaps and informing pre-deployment safeguards and post-deployment checks, resulting in an 86% reduction in repeat incidents.

    • Spearheaded the engineering-wide RCA Program, establishing a standardized post-incident analysis process that surfaced root causes, contributing factors, and actionable recommendations across 50+ teams.

    • Automated RCA intake by developing a bot that filed tickets, notified owners, and tracked follow-ups, reducing manual overhead and ensuring accountability.

    • Elevated incident analysis into a core operational excellence practice, recognized by leadership as a critical enabler of system reliability and service health improvements.

    • Regularly synthesized incident trends and metrics into executive-level reports and cross-functional reviews, driving continuous improvement and embedding reliability best practices across engineering.

  • Technical Program Manager II
    July 2019 - May 2020

    • Owned the reliability roadmap for Nordstrom’s Tier 1 Payments Platform, balancing product requirements with engineering constraints to translate business vision into scalable technical design.

    • Led a cross-functional team of 10 engineers in assessing and rebuilding the entire platform following the company’s largest-ever site outage, delivering a new AWS-based architecture in under 6 months.

    • Achieved 99.999% availability in the first six months post-migration and reduced operating costs by 97% through architectural redesign, capacity optimization, and dependency orchestration.
      • Established Payments as a model of system reliability and cost efficiency, strengthening Nordstrom.com’s resilience during peak load events.

    • Partnered with product and engineering stakeholders to align business requirements with technical design, ensuring scalability and resilience in the new architecture.

  • Systems Analyst I
    November 2015 - March 2018

    • Managed work intake and demand for the enterprise infrastructure team, supporting several critical business applications and services

    • Analyzed operational data from the team's ticketing system and made recommendations for improvements to operational processes to leadership

    • Established KPIs based on available historical data, built and supported reports and live dashboards in and outside of the ticketing system to communicate operational performance metrics
      Worked with the team working on the adoption of DevOps

    • Analyzed the software deployment process for the purpose of automating a self-service integration between the service management platform and the system configuration management system.

    • Worked with an application development team supporting a critical business-facing application

    • Performed requirements gathering and analysis, and implemented changes to the application through the UI

    • Completed a conceptual redesign of the application's UI in a development environment, tested design with users, and implemented the design with CSS and Javascript

    • Led the team's SCRUM development practice, and led planning and retrospective meetings

Resume

Blog posts

Superman, depicted as a cartoon, sitting on a bed with his shirt pulled up, flexing his muscles and smiling.

Portfolio

Testimonials

Libby’s knack for organization is incredible. She resourcefully found a way to leverage the tools we have to report on where a project is in a way no one else has quite figured out.

— Stakeholder at Redfin

Her rally and fire alone are a force to be reckoned with… Libby has created processes that save my team hundreds of hours a year.”

— Manager at Redfin

“Hands down one of the best Program Managers I have ever worked with! In a nutshell, she ROCKS! When teaming with Libby I truly felt like we were on the same team and never encountered the “us vs. them” mentality that is frequently exhibited towards security.”

— Peer at Nordstrom

Libby sets the example which other PM’s should strive to achieve. She makes all of us more effective & productive because of her performance.”

— Partner at Nordstrom

Libby is curious and asks great questions. She is a driver, whether that be herself or for the sake of the team. She brings clarity in how she listens and communicates with the team. She has strong technical aptitude.”

— Manager at Nordstrom

Libby can create a program or process out of nothing, with not much guidance, and land it well. You give her something she’s not familiar with, she figures it out, you tell her to write a strategy, she knocks it out—she can do anything.”

— VP at Redfin

Contact

Want to connect or chat?
Contact me via LinkedIn.

Want to race? 🏎️ 🏁
Add libbykarts on Switch
Friend Code: SW-2192-5595-1423