Work

About

Contact

Resume

Edwin Fernandez

Edwin Fernandez

Work

About

Contact

Resume

Product Design · Enterprise · AI/ML

Improving
the
Log Analyzer

Redesigning Panacea.ai — an AI-powered log analysis platform for Nutanix Site Reliability Engineers

to unify fragmented workflows and cut root cause analysis time by 87.5%.

COMPANY

Nutanix

DURATION

8 Months

ROLE

Product Designer

TOOLS

Figma · FigJam

PHASE

Phase 1 · Shipped

SCROLL TO EXPLORE

87.5%

Reduction in root cause analysis time

45

UAT users in pilot testing

405

Log bundles analyzed in testing

8min

Average processing time per

bundle

01 — THE PROBLEM

Fragmented tools.
Delayed diagnoses.

When a customer opens a support case, SREs had to juggle NCC, Panacea,

Insights, Remote Commands, Logs, and Service pages simultaneously — none of

which spoke to each other. Every minute spent context-switching was a minute a

customer waited.

sre_workflow_before.log — The painful reality

[09:14:02] ALERT: Customer VM down — cluster xyz-prod-01

[09:14:05] → Open Panacea.ai ... scanning log bundle

[09:22:18] → Switch to Insights ... correlating raw metrics

[09:35:44] → Open NCC ... cross-checking health checks

[09:48:01] → Remote Commands ... pulling CVM service status

[10:03:17] → Back to Panacea ... re-reading log signatures

[10:14:52] RCA still incomplete. 1hr+ elapsed.

// AFTER REDESIGN

[09:14:02] ALERT: Customer VM down — cluster xyz-prod-01

[09:14:05] → Panacea.ai Upload bundle → AI Summary generated

[09:22:05] ✓ Root cause identified. RCA drafted. 8 minutes.

Tool Fragmentation

SREs had to navigate 5+ disconnected tools to complete a single RCA. Context-switching caused critical information to slip through the gaps, especially for junior engineers still building domain knowledge.

Manual Interpretation Burden

Even with AI analysis from Panacea, engineers still had to manually cross-reference Insights data and interpret raw metrics. The AI "helped" but didn't eliminate the cognitive load.

No Unified View

Log data, cluster health, CVM metrics, and historical patterns existed in silos. There was no single pane of glass where an SRE could see the full picture of a cluster incident at once.

Junior SRE Gap

Senior engineers could intuit correlations across tools. Junior SREs could not — leading to inconsistent troubleshooting quality, longer case resolution times, and excessive escalations.

02 — THE RESEARCH & DISCOVERY

2 months of
structured learning.

Designing for an unfamiliar domain required deep immersion. I embedded

myself in the product, the team, and the workflow before drawing a single frame.

STAKEHOLDER 01

SRE Team

Real-time cluster monitoring without tab-hopping

Quick incident resolution with AI-guided triag

Consistent process regardless of experience level

Script Wizard for common cluster operations

STAKEHOLDER 02

PM

User experience as a priority metric

Rapid feature prioritization and delivery

Clear visibility into design decisions

Roadmap alignment across all teams

STAKEHOLDER 03

Panacea Dev Team

Richer AI-powered summaries and conclusions

Strict adherence to design system guidelines

Detailed API contracts per feature

Technical documentation for each component

"Panacea AI provides a great foundation for building an enterprise-level troubleshooting tool for SREs. The tool is able to quickly parse large amounts of log data, identify known issues, and provide a view into key events. The RCA summary is a first step into providing an automated Root Cause Analysis tool."


— Pilot SRE Tester, Nutanix UAT Program


OOUX Exercise with Senior SRE

Object-oriented UX sessions helped map out every data entity in the system — bundles, signatures, CVM nodes, alerts, layers — and how they related to each other. This became the backbone of the information architecture.

PRD & Grooming Sessions

Deep dives into the Product Requirements Document alongside recurring PM grooming sessions ensured design decisions were anchored in real requirements — not assumptions.

Glean AI for Domain Knowledge

Used Glean AI to rapidly build understanding of Nutanix-specific infrastructure concepts — AOS, hypervisors, CVM services, NCC — enabling me to speak the same language as the SREs I was designing for.

Deep Exploration of Existing Tools

Hands-on use of both Panacea.ai and Insights revealed the specific moments where engineers lost context, made errors, or gave up and escalated. These friction points became our design targets.

03 — DESIGN PROCESS

How might we unify
without

overwhelming?

The HMW question guided every design decision: create a platform that serves

diverse technical skill levels, blends AI capabilities naturally, and stays scalable without

a fixed end-vision.

PHASE 01

Discovery

2 months

Stakeholder interviews, PRD analysis, domain learning, OOUX exercise

PHASE 02

Define

10 days

Report architecture, data mapping, requirements with Senior SRE

Phase 03

Ideate

10 days

FigJam flows, sketches, first-draft

UI explorations

Phase 04

Design

3 months

High-fidelity UI, iterative sessions with SREs, design system alignment

Phase 05

Test & Ship

~2 months

45 UAT testers, 405 bundles, feedback loops, iteration

04 — KEY DESIGN DECISIONS

What we built,
and why it matters.

Every feature was born from a real pain point uncovered during research. Each

addresses a specific failure mode in the original workflow.

NEW FEATURE

Panacea Reports

A completely new module enabling SREs to create and combine log bundles for cross-bundle correlation.

Turns isolated snapshots into a continuous cluster health narrative — critical for

multi-incident pattern detection.

AI-POWERED

AI RCA Summary

The most critical deliverable. Natural-language summaries of anomalies, ranked by severity,

with affected CVM IPs highlighted. Transforms raw log data into

an actionable diagnosis in seconds — not hours.

VISUALIZATION

Heatmap & Event Timeline

A temporal visualization of log anomaly density and cluster events.

Lets SREs immediately identify when an issue started and what preceded it

replacing manual log scanning with pattern recognition at a glance.

INTEGRATION

Insights Data in Panacea

Raw cluster metrics and CVM configuration data from Insights are surfaced directly within Panacea

— eliminating the #1 context switch that previously

broke SRE focus during triage.

CONVERSATIONAL AI

Ask AI

Raw cluster metrics and CVM configuration data from Insights are surfaced directly within Panacea

— eliminating the #1 context switch that previously

broke SRE focus during triage.

AI FEEDBACK

Add Rule Mechanism

Contextual AI chat enabling SREs to query log data in plain language and surface related KB articles.

Reduces the gap between junior and senior SRE troubleshooting

capability significantly.

05 — IMPACT & RESULT

From 1 hour
to 8 minutes.

87.5%

Reduction in root cause
analysis time

Simplified log analysis via automatic signature detection — eliminating manual pattern matching

Reduced need for manual log investigationthrough AI-curated

log views

Accelerated case triage time with unified bundle + Insights

view

Standardized troubleshooting quality across all SRE experience levels

Multi-bundle analysis enabled previously impossible cross-incident correlation

ClickHouse migration delivered performance and cost savings across engineering

45

UAT Testers

405

Bundles Analyzed

55

Combo

Bundles

8min

Avg Process Time

CUMMUNICATION CHALLENGE SOLVED

Prepared detailed design documentation for all pages, functional specs aligned with Panacea dev team API contracts, and update decks for PM to communicate changes to wider audiences — resolving the three-way communication breakdown between design, engineering, and product.

05 — LEARNINGS

What this project
taught me.

Every feature was born from a real pain point uncovered during research.

Each addresses a specific failure mode in the original workflow.

01

Documentation as Design

Writing clear, accessible design documentation wasn't overhead — it was the primary communication channel that unified three teams with conflicting priorities and vocabularies. Every page had a living spec.

02

Consistency Over Intensity

Recurring, methodical sessions with the end user (Senior SRE) produced more reliable design outputs than intensive sporadic sprints. Cadence created trust. Trust created candor. Candor created better design.

03

Use the Product You Design For

Extensively using Panacea.ai and Insights as a quasi-user revealed UX issues that no brief or PRD would have surfaced. Lived experience in the product generated the most valuable design hypotheses.

04

Multi-Perspective Reviews

Presenting designs within the design team before SRE reviews caught assumptions early and diversified solutions. Peer critique reduced the number of revision cycles with engineering significantly.

05

Design for Scalability First

With no fixed end-vision, every component had to be extensible. This meant advocating for a design system approach and creating modular patterns that could accommodate AI capability growth over time.

06

Design-Driven Feature Discovery

Script Wizard, CVM Configuration, and Insights data integration were all discovered through design exercises — not the PRD. Structured exploration with stakeholders is a legitimate product discovery method.