Edwin Fernandez

About

Resume

Product Design · Enterprise · AI/ML

Improving
the
Log Analyzer

Improving
the
Log Analyzer

Redesigning Panacea.ai — an AI-powered log analysis platform for Nutanix Site Reliability Engineers

to unify fragmented workflows and cut root cause analysis time by 87.5%.

COMPANY

Nutanix

DURATION

8 Months

ROLE

Product Designer

TOOLS

Figma · FigJam

PHASE

Phase 1 · Shipped

SCROLL TO EXPLORE

87.5%

Reduction in root cause analysis time

45

UAT users in pilot testing

405

Log bundles analyzed in testing

8min

Average processing time per

bundle

00 — BUSINESS CONTEXT

The business problem behind the tool problem..

Before designing anything, I needed to understand what slow RCA was actually costing Nutanix — not just inconveniencing engineers. The real design opportunity only became visible when I zoomed out from the interface to the business layer.

Before designing anything, I needed to understand what slow RCA was actually costing Nutanix — not just inconveniencing engineers. The real design opportunity only became visible when I zoomed out from the interface to the business layer

What I found was that the fragmented toolchain wasn't a UX problem. It was a business risk with three faces.

Before designing anything, I needed to understand what slow RCA was actually costing Nutanix — not just inconveniencing engineers. The real design opportunity only became visible when I zoomed out from the interface to the business layer

What I found was that the fragmented toolchain wasn't a UX problem. It was a business risk with three faces.

SLA EXPOSURE

Every minute of delay is contractual risk

Slow root cause analysis directly threatened Nutanix's enterprise SLA commitments. At the enterprise tier, time-to-resolution isn't a metric — it's a legal obligation. Delayed diagnosis meant SLA breach risk on every open incident.

ESCALATION COST

Junior SREs couldn't close cases independently

Junior SREs couldn't resolve complex cases without pulling in senior engineers. Escalation rates were burning senior talent on work that shouldn't require them — a hidden headcount cost compounding across every shift.

RETENTION RISK

Tool fatigue was eroding

morale

SREs reported context-switching fatigue as a core frustration — not a minor annoyance. Tool fragmentation was eroding confidence and morale in a role already under pressure. This wasn't showing up in support metrics, but it was showing up in people.

Design reframe: A faster, unified RCA tool wasn't just better UX — it was a direct lever on SLA compliance, support org costs, and SRE satisfaction. Every design decision needed to connect to at least one of these three levers to earn its place in the product.

01 — THE PROBLEM

Fragmented tools.
Delayed diagnoses.

When a customer opens a support case, SREs had to juggle NCC, Panacea,

Insights, Remote Commands, Logs, and Service pages simultaneously — none of

which spoke to each other. Every minute spent context-switching was a minute a

customer waited.

sre_workflow.log — Before

[09:14:02] ALERT: Customer VM down — cluster xyz-prod-01

[09:14:05] → Open Panacea.ai ... scanning log bundle

[09:22:18] → Switch to Insights ... correlating raw metrics

[09:35:44] → Open NCC ... cross-checking health checks

[09:48:01] → Remote Commands ... pulling CVM service status

[10:03:17] → Back to Panacea ... re-reading log signatures

[10:14:52] RCA still incomplete. 1hr+ elapsed.

sre_workflow.log — After

[09:14:02] ALERT: Customer VM down — cluster xyz-prod-01

[09:14:05] → Panacea.ai Upload bundle → AI Summary generated

[09:22:05] ✓ Root cause identified. RCA drafted. 8 minutes.

PROBLEM 01

Tool Fragmentation

SREs had to navigate 5+ disconnected tools to complete a single RCA. Context-switching caused critical information to slip through the gaps, especially for junior engineers still building domain knowledge.

PROBLEM 02

Manual Interpretation Burden

Even with AI analysis from Panacea, engineers still had to manually cross-reference Insights data and interpret raw metrics. The AI "helped" but didn't eliminate the cognitive load.

PROBLEM 03

No Unified View

Log data, cluster health, CVM metrics, and historical patterns existed in silos. There was no single pane of glass where an SRE could see the full picture of a cluster incident at once.

PROBLEM 01

The Junior SRE Gap

Senior engineers could intuit correlations across tools. Junior SREs could not — leading to inconsistent troubleshooting quality, longer case resolution times, and excessive escalations.

PROBLEM 05 — THE HIDDEN ONE

Design, engineering, and product weren't speaking the same language

Behind the tool fragmentation was a team fragmentation: design decisions made in Figma weren't translating cleanly into engineering, and product couldn't communicate changes to leadership without a designer in the room. The communication breakdown was as real a blocker as the UX one. This became one of the explicit design problems to solve — not just a team dynamics issue.

02 — THE RESEARCH & DISCOVERY

2 months of
structured learning.

Designing for an unfamiliar domain required deep immersion. I embedded

myself in the product, the team, and the workflow before drawing a single frame.

STAKEHOLDER 01

SRE Team

Real-time cluster monitoring without tab-hopping

Quick incident resolution with AI-guided triag

Consistent process regardless of experience level

Script Wizard for common cluster operations

STAKEHOLDER 02

PM

User experience as a priority metric

Rapid feature prioritization and delivery

Clear visibility into design decisions

Roadmap alignment across all teams

STAKEHOLDER 03

Panacea Dev Team

Richer AI-powered summaries and conclusions

Strict adherence to design system guidelines

Detailed API contracts per feature

Technical documentation for each component

"Panacea AI provides a great foundation for building an enterprise-level troubleshooting tool for SREs. The tool is able to quickly parse large amounts of log data, identify known issues, and provide a view into key events. The RCA summary is a first step into providing an automated Root Cause Analysis tool."


— Pilot SRE Tester, Nutanix UAT Program


OOUX Exercise with Senior SRE

Object-oriented UX sessions helped map out every data entity in the system — bundles, signatures, CVM nodes, alerts, layers — and how they related to each other. This became the backbone of the information architecture.

PRD & Grooming Sessions

Deep dives into the Product Requirements Document alongside recurring PM grooming sessions ensured design decisions were anchored in real requirements — not assumptions.

Glean AI for Domain Knowledge

Used Glean AI to rapidly build understanding of Nutanix-specific infrastructure concepts — AOS, hypervisors, CVM services, NCC — enabling me to speak the same language as the SREs I was designing for.

Deep Exploration of Existing Tools

Hands-on use of both Panacea.ai and Insights revealed the specific moments where engineers lost context, made errors, or gave up and escalated. These friction points became our design targets.

03 — DESIGN PROCESS

How might we unify
without

overwhelming?

The HMW question guided every design decision: create a platform that serves

diverse technical skill levels, blends AI capabilities naturally, and stays scalable without

a fixed end-vision.

PHASE 01

Discovery

2 months

Stakeholder interviews, PRD analysis, domain learning, OOUX exercise

PHASE 02

Define

10 days

Report architecture, data mapping, requirements with Senior SRE

Phase 03

Ideate

10 days

FigJam flows, sketches, first-draft

UI explorations

Phase 04

Design

3 months

High-fidelity UI, iterative sessions with SREs, design system alignment

Phase 05

Test & Ship

~2 months

45 UAT testers, 405 bundles, feedback loops, iteration

THE PIVOT THAT CHANGED EVERYTHING

We got our primary user wrong.

We assumed senior SREs were the primary users — the ones who'd set the benchmark. We optimised for power-user density: deep data access, maximum configurability, raw log depth surfaced by default.

UAT showed junior SREs were handling 60%+ of triage. This completely reframed the information hierarchy — from power-user density to guided clarity with progressive disclosure.

What we changed: Promoted AI RCA Summary to the hero interaction. Buried raw log depth behind a reveal. Redesigned IA around guided → expert, not expert-first.

DESIGN AS PRODUCT DISCOVERY

Three features were never in the PRD.

They were found through structured design exercises with stakeholders — not roadmap planning. This was the clearest validation that design is a discovery method, not just a delivery method.

Script Wizard — surfaced through SRE workflow observation sessions

CVM Configuration panel — emerged from the OOUX data-mapping exercise

Insights integration — identified through workflow fragmentation analysis

Structured exploration with stakeholders produces roadmap items that PM sessions miss. Defend that time.

04 — KEY DESIGN DECISIONS

What we built,
and why it matters.

Every feature was born from a real pain point uncovered during research. Each

addresses a specific failure mode in the original workflow.

NEW FEATURE

Panacea Reports

A completely new module enabling SREs to create and combine log bundles for cross-bundle correlation.

Turns isolated snapshots into a continuous cluster health narrative — critical for

multi-incident pattern detection.


Panacea Reports — cross-bundle correlation and cluster health narrative


AI-POWERED

AI RCA Summary

The most critical deliverable. Natural-language summaries of anomalies, ranked by severity,with affected CVM IPs highlighted.

Multi bundle analysis provided in event timeline with option to view only the curated events, improved the focus area to for SREs.

Transforms raw log data into an actionable diagnosis in seconds — not hours.

Before

After

AI RCA Summary — before/after. Note the shift from raw log output to structured, ranked diagnosis.

VISUALIZATION

Heatmap & Event Timeline

A temporal visualization of log anomaly density and cluster events.

Lets SREs immediately identify when an issue started and what preceded it

replacing manual log scanning with pattern recognition at a glance.

Before

After

Heatmap & Timeline — pattern recognition replaces manual log scanning

AI FEEDBACK

Add Rule Mechanism

Contextual AI chat enabling SREs to query log data in plain language and surface related KB articles.

Reduces the gap between junior and senior SRE troubleshooting

capability significantly.

CONVERSATIONAL AI

Ask AI

Contextual AI chat enabling SREs to query log data in plain language and surface related KB articles.

Reduces the gap between junior and senior SRE troubleshooting

capability significantly.

INTEGRATION - COMPLETELY NEW

Insights Data & Metrics Explorer

Raw cluster metrics and CVM configuration data from Insights are surfaced directly within Panacea

— eliminating the #1 context switch that previously broke SRE focus during triage.

Before

After

Insights data embedded in Panacea — the #1 context switch eliminated at the interface level

05 — WHAT THE PROCESS FORCED US TO DECIDE

Designing the AI experience, not just the AI features.

Adding AI features is the easy part. Designing the experience around AI uncertainty, fallback states, and feedback loops is where the real design decisions live. These are the three questions the process forced us to answer — and how we answered them.

Before designing anything, I needed to understand what slow RCA was actually costing Nutanix — not just inconveniencing engineers. The real design opportunity only became visible when I zoomed out from the interface to the business layer

What I found was that the fragmented toolchain wasn't a UX problem. It was a business risk with three faces.

DECISION 01

What does the AI say vs. what does the SRE see?

AI output is presented as a structured summary with severity ranking — not raw model output. We designed the information hierarchy deliberately: SREs see conclusions first, supporting evidence on demand. The AI's job is to surface the answer, not create a new format for the SRE to re-interpret.

DECISION 02

How do we handle AI uncertainty without losing trust?

When model confidence is low or signatures are ambiguous, the UI surfaces a "Review required" state — clearly distinguishing "AI found a root cause" from "AI found possible causes." False certainty in a high-stakes ops environment is worse than no AI at all. We designed for the failure mode first.

DECISION 03

How do we close the feedback loop so the AI improves?

The Add Rule Mechanism lets SREs flag misclassifications inline, grounding the model in real operational knowledge. The feedback loop is a first-class design feature. Every correction makes the system more accurate — the design's value compounds over time, not just at launch.

Scalability principle: Every component was designed as a modular pattern that could accommodate AI capability growth — the design system was the infrastructure for future AI features, not just a style guide for current ones. With no fixed AI roadmap, pattern-first was the only responsible approach.

06 — RESOLVING PROBLEM 05

Design as the connective tissue across three teams.

Problem 05 — the team communication breakdown — wasn't going to fix itself. I made it an explicit design problem, with the same rigour as the UX problems.

Before designing anything, I needed to understand what slow RCA was actually costing Nutanix — not just inconveniencing engineers. The real design opportunity only became visible when I zoomed out from the interface to the business layer

What I found was that the fragmented toolchain wasn't a UX problem. It was a business risk with three faces.

PROBLEM 05 → RESOLVED

Design decisions made in Figma weren't translating cleanly to engineering, and product couldn't communicate changes to leadership without a designer in the room. The fix required design to own the communication layer — not just the pixels. Living specs, API-aligned functional documentation, and PM-ready update decks became as much a deliverable as the screens themselves.

Composable design system

Built for a roadmap without a fixed end-state

Components designed for extensibility — not current requirements but AI capability growth over the roadmap horizon

Every pattern documented with edge states: empty, loading, error, AI uncertain, AI confident — the full state machine, not just the happy path

Modular architecture: Insights data, AI summary, and log views are independently composable — new AI features slot in without a redesign

Cross-team communication design

Making design legible to three different audiences

Living design specs per page — detailed enough for API contracts, legible for PM stakeholder updates to leadership

Functional specs aligned with dev team's API contracts, significantly reducing revision cycles and back-and-forth

Update decks for PM to communicate changes to wider audiences — design owned the cross-team communication layer, not just the screens

07 — IMPACT & RESULT

From 1 hour
to 8 minutes.

87.5%

Reduction in root cause
analysis time

Simplified log analysis via automatic signature detection — eliminating manual pattern matching

Reduced need for manual log investigationthrough AI-curated

log views

Accelerated case triage time with unified bundle + Insights

view

Standardized troubleshooting quality across all SRE experience levels

Multi-bundle analysis enabled previously impossible cross-incident correlation

ClickHouse migration delivered performance and cost savings across engineering

45

UAT Testers

405

Bundles Analyzed

55

Combo

Bundles

8min

Avg Process Time

CUMMUNICATION CHALLENGE SOLVED

Prepared detailed design documentation for all pages, functional specs aligned with Panacea dev team API contracts, and update decks for PM to communicate changes to wider audiences — resolving the three-way communication breakdown between design, engineering, and product.

08 — LEARNINGS

What this project
taught me.

Every feature was born from a real pain point uncovered during research.

Each addresses a specific failure mode in the original workflow.

01

Documentation as Design

Writing clear, accessible design documentation wasn't overhead — it was the primary communication channel that unified three teams with conflicting priorities and vocabularies. Every page had a living spec.

02

Consistency Over Intensity

Recurring, methodical sessions with the end user (Senior SRE) produced more reliable design outputs than intensive sporadic sprints. Cadence created trust. Trust created candor. Candor created better design.

03

Use the Product You Design For

Extensively using Panacea.ai and Insights as a quasi-user revealed UX issues that no brief or PRD would have surfaced. Lived experience in the product generated the most valuable design hypotheses.

04

Multi-Perspective Reviews

Presenting designs within the design team before SRE reviews caught assumptions early and diversified solutions. Peer critique reduced the number of revision cycles with engineering significantly.

05

Design for Scalability First

With no fixed end-vision, every component had to be extensible. This meant advocating for a design system approach and creating modular patterns that could accommodate AI capability growth over time.

06

Design-Driven Feature Discovery

Script Wizard, CVM Configuration, and Insights data integration were all discovered through design exercises — not the PRD. Structured exploration with stakeholders is a legitimate product discovery method.

Enterprise AI design isn't about features. It's about restoring cognitive bandwidth at the moment of highest stress.

Panacea taught me that the best AI experiences are the ones where the engineer stops thinking about the tool and starts thinking about the problem. That's the design goal — not just better UI, but cognitive bandwidth restored for what actually matters.

I care about enterprise design because the stakes are real. A slow RCA isn't an inconvenience — it's a customer's infrastructure down, an SLA at risk, a junior engineer's confidence eroded. Design that solves those problems is worth doing carefully.

Designed & built with care · Edwin Fernandes © 2026

Available for full-time roles

Work

About

Contact

Resume

Edwin Fernandez