Kubernetes Monitoring at Sumo Logic
From Zero to $7.1M — Designing Kubernetes Observability at Sumo Logic
In 2018, enterprises were adopting Kubernetes fast. No single tool gave them a clear view of what was happening. We saw the opportunity — and built it.
My Role: Lead Designer
Team: 3 Designers (including myself) · 1 Shared Researcher · 15 FE & BE Engineers · 2 Product Managers · Cross-geo
My Contribution: User research · Navigation design · Auto-generated dashboards · Query experience · Cross-functional facilitation
Timeline: MVP in 2.5 quarters · GA in 10 months · Launched at Illuminate, Sumo Logic User Conference, September 2019
Context
Sumo Logic was known for log management. But in 2018, the industry was shifting fast — enterprises adopting Kubernetes needed more than logs. They needed metrics, topology views, and a unified way to monitor complex, distributed systems.
Based on competitive landscape analysis, there was no single market leader in Kubernetes monitoring. That was the opportunity.
The Challenge
For users: Engineers monitoring Kubernetes were juggling three or more tools simultaneously — DataDog for metrics, Sumo Logic for logs, Prometheus for Kubernetes data. There was no holistic view of system health. No way to move fluidly between infrastructure and services views. Writing a Sumo metrics query was notoriously difficult and time-consuming.
The stakes were real. A site going down could cost a company $6,000 per minute.
Three goals guided the project:
Make it easy for any user — SREs, DevOps, Developers — to monitor a complex system
Provide value on the day data gets into Sumo
Increase adoption of the Metrics product
How might we make Kubernetes monitoring so intuitive that engineers could get value from day one — without learning a new tool from scratch?
Research — Going to the Field
We started where the work actually happens — with customer visits and field interviews. Rather than asking users to remember their workflow, we observed it directly.
Key research questions:
How do users monitor their Kubernetes deployment today?
What are the pain points with current tools and workflows?
How many tools do users need to successfully monitor and troubleshoot?

What we found:
Users were context-switching between three or more tools just to complete one monitoring workflow — creating dangerous gaps in visibility
No existing tool provided a holistic view of system health
Two distinct user types emerged: the Dashboard Creator who builds during peacetime, and the Dashboard Consumer who relies on those dashboards during wartime incidents
Writing metrics queries in Sumo was a significant barrier to adoption
Design Framework — Wartime and Peacetime
Sumo Logic used two internal contexts to think about user needs:
Wartime — active incidents, high pressure, every second counts. Engineers need to detect anomalies instantly, navigate between dashboards quickly, and collaborate with teammates in real time.
Peacetime — stable systems, time to build, configure, and maintain. Engineers need to ingest and enrich data, create visualizations optimized for wartime reading, and maintain dashboards over time.

These two contexts shaped every design decision I made. For each choice, the filter was simple: does this serve an engineer in wartime, or in peacetime?
Key Design Decisions
1. What vs. Why Navigation — a new mental model for Sumo Logic I introduced a new navigation pattern built around two questions engineers actually ask during an incident: What is happening? (metrics) and Why is it happening? (logs). This mental model guided users naturally between the two data types — teaching them how to use metrics and logs together without requiring training.

2. Auto-generated Dashboards — value on day one One of the biggest barriers to adoption was the time required to build dashboards from scratch. I designed auto-generated, out-of-the-box dashboards based on users' actual data — giving engineers immediate value the moment their data landed in Sumo, without writing a single query.
3. Metrics Query with Seamless Log Switching Rather than forcing metrics users into Sumo's log-centric query experience, I adapted the query language for metrics while making it easy to switch fluidly between logs and metrics. This respected how engineers already thought about their data — and removed the biggest friction point identified in research.

How We Got There — Process
I partnered with the two PMs who knew Kubernetes deeply to whiteboard the customer journey — mapping every touchpoint from data ingestion to incident resolution.
I then translated that into paper storyboards in a couple of days — fast, low-fidelity, focused on flow.
Next I looped in engineering leads into Figma for real-time wireframing. We created 12+ screens in half a day.

From there I worked with our prototyper to build rapid clickable prototypes — testing in multiple rounds, each focused on a different part of the journey:
Round 1: Topology navigation — how users discover contextual information as they move through the hierarchy
Round 2: Dashboard layout — information needed for monitoring Kubernetes in wartime
Round 3: End-to-end flow — connecting the new Kubernetes experience with existing Sumo workflows
Constraints
Technical: Kubernetes data is ephemeral — pods and nodes appear and disappear constantly. This created unique UI challenges around topology visualization, performance, and scalability that had no existing design patterns to reference.
Cross-geo collaboration: Working across time zones with distributed scrum teams required design to be the connective tissue — clear, visual, and always accessible in Figma so every team could stay aligned regardless of location.
The Launch
Explore launched as General Availability on September 11, 2019 at Illuminate — Sumo Logic's annual user conference. From vision prototype to MVP in 2.5 quarters. From MVP to GA in 10 months.
We exceeded the goals set during our customer preview program.

Business impact

70% adoption rate increase within the first month of GA
83 to 186 customers in just 3 months after launch — more than doubling the customer base
$2.4M directly attributed to Kubernetes by January 2020
$7.1M attributed to Kubernetes by January 2021 — nearly 3x growth in one year
Contributed to Sumo Logic's IPO in 2020
Customer Testimonials
"It's more intuitive than Grafana." — Lior M., Informatica
"I could not have made the transition to production on K8s as quickly as I did without the visibility you provided." — Alex P., Quizlet
"This puts Sumo leaps and bounds ahead of competitors." — Graham T., Quizlet
Reflection
Shipping in a market with no leader taught me that when there's no benchmark to copy, user insight is the only compass you have. Going to the field — watching engineers work under real pressure — was what made the difference between guessing and knowing.
I'd invest more time on the peacetime persona in the next iteration. The wartime use case was so compelling that it naturally dominated our design focus. But the peacetime experience of building and maintaining dashboards is what determines whether engineers adopt the product long-term. A stronger peacetime design would have driven deeper, more sustained engagement post-launch.
The business impact reinforced something I believe about design's role in a company: when design solves the right problem at the right time, it doesn't just improve the product — it changes the trajectory of the business.
