Performance Troubleshooting
Performance Troubleshooting
Section titled “Performance Troubleshooting”This guide helps identify and resolve performance issues with Bifrost Proxy.
Performance Diagnostics
Section titled “Performance Diagnostics”Before troubleshooting, gather baseline performance data.
Measure Response Times
Section titled “Measure Response Times”# Single request timingtime curl -x http://localhost:7080 https://httpbin.org/ip -o /dev/null -s
# Detailed timing breakdowncurl -x http://localhost:7080 https://httpbin.org/ip -o /dev/null -s -w \"DNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTLS: %{time_appconnect}s\nStart: %{time_starttransfer}s\nTotal: %{time_total}s\n"
# Multiple requests averagefor i in {1..10}; do curl -x http://localhost:7080 https://httpbin.org/ip -o /dev/null -s -w "%{time_total}\n"done | awk '{sum+=$1} END {print "Average:", sum/NR, "seconds"}'Check Server Statistics
Section titled “Check Server Statistics”# Overall statisticscurl -s http://localhost:7082/api/v1/stats | jq
# Backend latencycurl -s http://localhost:7082/api/v1/backends | jq '.[].stats'
# Active connectionscurl -s http://localhost:7082/api/v1/stats | jq '.active_connections'Monitor Resource Usage
Section titled “Monitor Resource Usage”# CPU and memory usagetop -p $(pgrep bifrost-server)
# Detailed process statsps aux | grep bifrost
# Memory usage over timewatch -n 5 'ps aux | grep bifrost | awk "{print \$6/1024\" MB\"}"'
# Goroutine count (if metrics enabled)curl -s http://localhost:7090/metrics | grep bifrost_goroutinesHigh Latency
Section titled “High Latency”Symptoms: Requests take longer than expected to complete.
Cause 1: Backend Latency
Section titled “Cause 1: Backend Latency”The upstream backend is slow to respond.
Diagnosis:
# Compare proxy vs direct timingecho "Via proxy:"time curl -x http://localhost:7080 https://example.com -o /dev/null -s
echo "Direct:"time curl https://example.com -o /dev/null -s
# Check backend health and latencycurl -s http://localhost:7082/api/v1/backends | jq '.[] | {name, healthy, latency: .stats.avg_latency_ms}'Solution:
-
Switch to a faster backend:
routes:- domains: ["*"]backends:- fast-backend -
Use load balancing with health checks:
routes:- domains: ["*"]backends:- backend1- backend2load_balance: least_conn # Route to backend with lowest latency -
Enable caching for repeated requests:
cache:enabled: truememory:max_size: "256MB"
Cause 2: DNS Resolution Slow
Section titled “Cause 2: DNS Resolution Slow”DNS lookups are adding latency.
Diagnosis:
# Measure DNS lookup timetime nslookup example.com
# Check DNS timing in curlcurl -x http://localhost:7080 https://example.com -o /dev/null -s -w "DNS: %{time_namelookup}s\n"Solution:
-
Use faster DNS servers:
backends:- name: wg-vpntype: wireguardconfig:dns:- "1.1.1.1" # Cloudflare (typically fast)- "8.8.8.8" # Google -
Enable DNS caching:
vpn:dns:enabled: truecache_ttl: "5m"
Cause 3: Connection Setup Overhead
Section titled “Cause 3: Connection Setup Overhead”Each request creates a new connection.
Diagnosis:
# Check if keep-alive is workingcurl -x http://localhost:7080 https://example.com -v 2>&1 | grep -i keep-alive
# Check connection reusecurl -x http://localhost:7080 https://example.com https://example.com/path -v 2>&1 | grep "Re-using"Solution:
Enable and tune keep-alive settings:
server: http: idle_timeout: "120s" # Keep connections alive longer max_idle_conns_per_host: 100Cause 4: TLS Handshake Overhead
Section titled “Cause 4: TLS Handshake Overhead”TLS negotiation adds latency for each new connection.
Diagnosis:
# Measure TLS handshake timecurl -x http://localhost:7080 https://example.com -o /dev/null -s -w "TLS handshake: %{time_appconnect}s\n"Solution:
- Enable connection keep-alive (reduces handshakes)
- Use TLS session resumption (automatic in Go)
- Consider HTTP/2 for multiplexed connections
High CPU Usage
Section titled “High CPU Usage”Symptoms: Bifrost process consuming excessive CPU.
Cause 1: Too Many Connections
Section titled “Cause 1: Too Many Connections”High connection count requires more processing.
Diagnosis:
# Check connection countcurl -s http://localhost:7082/api/v1/stats | jq '.active_connections'
# Check requests per secondcurl -s http://localhost:7090/metrics | grep bifrost_requests_totalSolution:
-
Implement rate limiting:
rate_limit:enabled: truerequests_per_second: 100burst: 200 -
Use connection limits:
server:http:max_connections: 10000
Cause 2: Heavy Logging
Section titled “Cause 2: Heavy Logging”Debug logging can be CPU-intensive.
Diagnosis:
# Check current log levelgrep -i "level" /etc/bifrost/config.yaml
# Monitor log output ratetail -f /var/log/bifrost/server.log | pv -l > /dev/nullSolution:
Reduce log level in production:
logging: level: warn # Or 'error' for minimal logging format: json # More efficient than textCause 3: Expensive Routing Rules
Section titled “Cause 3: Expensive Routing Rules”Complex regex patterns in routing rules.
Diagnosis:
Check for regex patterns in routes:
# Potentially expensiveroutes: - domains: ["*.complex-pattern-.*\\.example\\.com"]Solution:
Simplify routing patterns:
# More efficientroutes: - domains: ["*.example.com"]Cause 4: Encryption Overhead
Section titled “Cause 4: Encryption Overhead”WireGuard or other encryption consuming CPU.
Diagnosis:
# Check backend-specific CPU usage# Compare latency with encryption vs direct
# Test direct backendcurl -x http://localhost:7080 -H "X-Backend: direct" https://example.com -o /dev/null -s -w "%{time_total}\n"Solution:
- Use hardware-accelerated encryption if available
- Consider CPU architecture with AES-NI support
- For high-throughput scenarios, consider direct backend for non-sensitive traffic
High Memory Usage
Section titled “High Memory Usage”Symptoms: Memory consumption grows over time or is excessive.
Cause 1: Request Log Buffer Too Large
Section titled “Cause 1: Request Log Buffer Too Large”Large request log consumes memory.
Diagnosis:
# Check request log sizecurl -s http://localhost:7082/api/v1/requests | jq 'length'Solution:
api: request_log_size: 500 # Reduce from default 1000 # Or disable entirely enable_request_log: falseCause 2: Cache Memory
Section titled “Cause 2: Cache Memory”In-memory cache consuming too much RAM.
Diagnosis:
# Check cache statisticscurl -s http://localhost:7082/api/v1/cache/stats | jq
# Check via metricscurl -s http://localhost:7090/metrics | grep bifrost_cacheSolution:
Limit cache memory usage:
cache: memory: max_size: "128MB" # Limit memory cache max_items: 10000
# Use disk cache for larger storage disk: enabled: true path: "/var/cache/bifrost" max_size: "1GB"Cause 3: Connection Pool Growth
Section titled “Cause 3: Connection Pool Growth”Connection pools growing unbounded.
Solution:
Configure connection pool limits:
server: http: max_idle_conns: 100 max_idle_conns_per_host: 10 idle_conn_timeout: "90s"Cause 4: Memory Leak (Rare)
Section titled “Cause 4: Memory Leak (Rare)”Gradual memory growth without release.
Diagnosis:
# Monitor memory over timewhile true; do ps aux | grep bifrost | awk '{print strftime("%H:%M:%S"), $6/1024, "MB"}' sleep 60done
# Check goroutine count for leakscurl -s http://localhost:7090/metrics | grep bifrost_goroutinesSolution:
- Restart the service as a temporary fix
- Check for latest version with bug fixes
- Report the issue with memory profiles
High Bandwidth/Throughput Issues
Section titled “High Bandwidth/Throughput Issues”Symptoms: Network throughput is lower than expected.
Cause 1: MTU Fragmentation
Section titled “Cause 1: MTU Fragmentation”Packets are being fragmented, reducing throughput.
Diagnosis:
# Test with different packet sizesping -M do -s 1400 example.comping -M do -s 1200 example.com
# Check for fragmentation in netstatnetstat -s | grep -i fragmentSolution:
# For WireGuardbackends: - name: wg-vpn type: wireguard config: mtu: 1280 # Conservative value
# For VPN modevpn: mtu: 1280Cause 2: Buffer Sizes Too Small
Section titled “Cause 2: Buffer Sizes Too Small”Network buffers limiting throughput.
Solution:
Increase system buffer sizes:
# Linuxsudo sysctl -w net.core.rmem_max=26214400sudo sysctl -w net.core.wmem_max=26214400sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 26214400"sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 26214400"
# Make permanent in /etc/sysctl.confCause 3: Backend Bandwidth Limit
Section titled “Cause 3: Backend Bandwidth Limit”The backend connection has limited bandwidth.
Diagnosis:
# Speed test through proxycurl -x http://localhost:7080 https://speed.cloudflare.com/__down?bytes=100000000 -o /dev/null -s -w "Speed: %{speed_download} bytes/sec\n"
# Compare directcurl https://speed.cloudflare.com/__down?bytes=100000000 -o /dev/null -s -w "Speed: %{speed_download} bytes/sec\n"Solution:
Use multiple backends with load balancing for higher aggregate throughput.
Performance Tuning Checklist
Section titled “Performance Tuning Checklist”System Level
Section titled “System Level”# Increase file descriptor limitsulimit -n 65536
# Optimize TCP settingssysctl -w net.core.somaxconn=65535sysctl -w net.ipv4.tcp_max_syn_backlog=65535sysctl -w net.ipv4.tcp_fin_timeout=30sysctl -w net.ipv4.tcp_tw_reuse=1
# Increase buffer sizessysctl -w net.core.rmem_max=26214400sysctl -w net.core.wmem_max=26214400Bifrost Configuration
Section titled “Bifrost Configuration”# Optimized production configurationlogging: level: warn format: json
server: http: listen: ":7080" read_timeout: "30s" write_timeout: "30s" idle_timeout: "120s" max_connections: 50000
api: enable_request_log: false # Disable for performance
cache: enabled: true memory: max_size: "256MB" disk: enabled: true path: "/var/cache/bifrost" max_size: "2GB"Monitoring Configuration
Section titled “Monitoring Configuration”metrics: enabled: true listen: ":7090"
# Key metrics to watch:# - bifrost_request_duration_seconds# - bifrost_connections_active# - bifrost_backend_latency_seconds# - bifrost_goroutines# - bifrost_memory_bytesPerformance Metrics Reference
Section titled “Performance Metrics Reference”| Metric | Healthy Value | Warning Threshold |
|---|---|---|
| Request latency (p95) | < 100ms | > 500ms |
| Active connections | < 10,000 | > 50,000 |
| Error rate | < 0.1% | > 1% |
| CPU usage | < 50% | > 80% |
| Memory usage | < 512MB | > 2GB |
| Goroutine count | < 10,000 | > 50,000 |
# Quick performance checkcurl -s http://localhost:7090/metrics | grep -E "(bifrost_request_duration|bifrost_connections_active|bifrost_memory)"