System Health | Keva Docs

The System Health dashboard shows the operational status of Keva's infrastructure. Monitor uptime, API performance, and service availability in real-time.

Health Overview

The health dashboard displays:

Overall Status - System-wide health indicator
Service Status - Individual component health
Performance Metrics - Response times and throughput
Recent Incidents - Past issues and resolutions

Status Indicators

Status	Meaning
Operational	All systems normal
Degraded	Partial issues, reduced performance
Partial Outage	Some features unavailable
Major Outage	Critical services down

Service Components

Monitor each service independently:

Core Services

Service	Description
Web App	Main application interface
API	REST API endpoints
Worker	Background job processing
Database	PostgreSQL cluster

AI Services

Service	Description
AI Engine	Claude/Anthropic API
Embeddings	Vector search service
Brain	Learning and memory

Integrations

Service	Description
Email	IMAP/SMTP connections
Connectors	Platform integrations
Webhooks	Inbound events

Performance Metrics

API Response Times

Track latency across endpoints:

p50 - Median response time
p95 - 95th percentile
p99 - Worst case scenarios

Target: p95 under 200ms

Throughput

Requests processed per minute:

Current rate
Peak rate (24h)
Rate limit remaining

Error Rate

Percentage of failed requests:

4xx errors (client issues)
5xx errors (server issues)
Target: < 0.1%

Uptime Tracking

View historical availability:

Last 24 hours:  ████████████████████████  100%
Last 7 days:    ███████████████████████░   99.8%
Last 30 days:   ███████████████████████░   99.9%
Last 90 days:   ███████████████████████░   99.95%

Click any period for detailed breakdown.

Incident History

Review past issues:

Date	Duration	Impact	Root Cause
Mar 20	5 min	Degraded	Database failover
Mar 15	2 min	None	Deployment
Mar 10	15 min	Partial	Third-party API

Click incidents for full post-mortem.

Alerts Configuration

Set up health alerts:

Threshold Alerts

Go to Health > Alerts
Click New Alert
Choose metric (response time, error rate)
Set threshold value
Configure notification channel

Alert Channels

Email notification
Slack channel message
Webhook to external system
SMS for critical alerts

Example Alerts

Alert	Condition	Severity
High Error Rate	> 1% errors	Critical
Slow Response	p95 > 500ms	Warning
Service Down	Health check fails	Critical
High Queue	> 1000 pending jobs	Warning

Maintenance Windows

Schedule planned maintenance:

Go to Health > Maintenance
Click Schedule Window
Set start time and duration
Add description
Alerts are suppressed during window

SOC 2 Health Checks

For compliance, automated checks run every 5 minutes:

Service availability
Security controls active
Backup verification
Access control status

Results feed into compliance evidence collection.

External Monitoring

Keva's status is also available at:

Status Page - Public availability dashboard
API Endpoint - /api/health returns JSON status
RSS Feed - Subscribe to incident updates

Troubleshooting

If you see degraded status:

Check the affected component
Review recent changes or deployments
Check third-party service status
Contact support if persistent