Performance Troubleshooting

This guide helps identify and resolve performance issues with Bifrost Proxy.

Performance Diagnostics

Before troubleshooting, gather baseline performance data.

Measure Response Times

# Single request timing
time curl -x http://localhost:7080 https://httpbin.org/ip -o /dev/null -s

# Detailed timing breakdown
curl -x http://localhost:7080 https://httpbin.org/ip -o /dev/null -s -w \
"DNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTLS: %{time_appconnect}s\nStart: %{time_starttransfer}s\nTotal: %{time_total}s\n"

# Multiple requests average
for i in {1..10}; do
  curl -x http://localhost:7080 https://httpbin.org/ip -o /dev/null -s -w "%{time_total}\n"
done | awk '{sum+=$1} END {print "Average:", sum/NR, "seconds"}'

Check Server Statistics

# Overall statistics
curl -s http://localhost:7082/api/v1/stats | jq

# Backend latency
curl -s http://localhost:7082/api/v1/backends | jq '.[].stats'

# Active connections
curl -s http://localhost:7082/api/v1/stats | jq '.active_connections'

Monitor Resource Usage

# CPU and memory usage
top -p $(pgrep bifrost-server)

# Detailed process stats
ps aux | grep bifrost

# Memory usage over time
watch -n 5 'ps aux | grep bifrost | awk "{print \$6/1024\" MB\"}"'

# Goroutine count (if metrics enabled)
curl -s http://localhost:7090/metrics | grep bifrost_goroutines

High Latency

Symptoms: Requests take longer than expected to complete.

Cause 1: Backend Latency

The upstream backend is slow to respond.

Diagnosis:

# Compare proxy vs direct timing
echo "Via proxy:"
time curl -x http://localhost:7080 https://example.com -o /dev/null -s

echo "Direct:"
time curl https://example.com -o /dev/null -s

# Check backend health and latency
curl -s http://localhost:7082/api/v1/backends | jq '.[] | {name, healthy, latency: .stats.avg_latency_ms}'

Solution:

Switch to a faster backend:

routes:
  - domains: ["*"]
    backends:
      - fast-backend

Use load balancing with health checks:

routes:
  - domains: ["*"]
    backends:
      - backend1
      - backend2
    load_balance: least_conn  # Route to backend with lowest latency

Enable caching for repeated requests:

cache:
  enabled: true
  memory:
    max_size: "256MB"

Cause 2: DNS Resolution Slow

DNS lookups are adding latency.

Diagnosis:

# Measure DNS lookup time
time nslookup example.com

# Check DNS timing in curl
curl -x http://localhost:7080 https://example.com -o /dev/null -s -w "DNS: %{time_namelookup}s\n"

Solution:

Use faster DNS servers:

backends:
  - name: wg-vpn
    type: wireguard
    config:
      dns:
        - "1.1.1.1"  # Cloudflare (typically fast)
        - "8.8.8.8"  # Google

Enable DNS caching:

vpn:
  dns:
    enabled: true
    cache_ttl: "5m"

Cause 3: Connection Setup Overhead

Each request creates a new connection.

Diagnosis:

# Check if keep-alive is working
curl -x http://localhost:7080 https://example.com -v 2>&1 | grep -i keep-alive

# Check connection reuse
curl -x http://localhost:7080 https://example.com https://example.com/path -v 2>&1 | grep "Re-using"

Solution:

Enable and tune keep-alive settings:

server:
  http:
    idle_timeout: "120s"  # Keep connections alive longer
    max_idle_conns_per_host: 100

Cause 4: TLS Handshake Overhead

TLS negotiation adds latency for each new connection.

Diagnosis:

# Measure TLS handshake time
curl -x http://localhost:7080 https://example.com -o /dev/null -s -w "TLS handshake: %{time_appconnect}s\n"

Solution:

Enable connection keep-alive (reduces handshakes)
Use TLS session resumption (automatic in Go)
Consider HTTP/2 for multiplexed connections

High CPU Usage

Symptoms: Bifrost process consuming excessive CPU.

Cause 1: Too Many Connections

High connection count requires more processing.

Diagnosis:

# Check connection count
curl -s http://localhost:7082/api/v1/stats | jq '.active_connections'

# Check requests per second
curl -s http://localhost:7090/metrics | grep bifrost_requests_total

Solution:

Implement rate limiting:

rate_limit:
  enabled: true
  requests_per_second: 100
  burst: 200

Use connection limits:

server:
  http:
    max_connections: 10000

Cause 2: Heavy Logging

Debug logging can be CPU-intensive.

Diagnosis:

# Check current log level
grep -i "level" /etc/bifrost/config.yaml

# Monitor log output rate
tail -f /var/log/bifrost/server.log | pv -l > /dev/null

Solution:

Reduce log level in production:

logging:
  level: warn  # Or 'error' for minimal logging
  format: json  # More efficient than text

Cause 3: Expensive Routing Rules

Complex regex patterns in routing rules.

Diagnosis:

Check for regex patterns in routes:

# Potentially expensive
routes:
  - domains: ["*.complex-pattern-.*\\.example\\.com"]

Solution:

Simplify routing patterns:

# More efficient
routes:
  - domains: ["*.example.com"]

Cause 4: Encryption Overhead

WireGuard or other encryption consuming CPU.

Diagnosis:

# Check backend-specific CPU usage
# Compare latency with encryption vs direct

# Test direct backend
curl -x http://localhost:7080 -H "X-Backend: direct" https://example.com -o /dev/null -s -w "%{time_total}\n"

Solution:

Use hardware-accelerated encryption if available
Consider CPU architecture with AES-NI support
For high-throughput scenarios, consider direct backend for non-sensitive traffic

High Memory Usage

Symptoms: Memory consumption grows over time or is excessive.

Cause 1: Request Log Buffer Too Large

Large request log consumes memory.

Diagnosis:

# Check request log size
curl -s http://localhost:7082/api/v1/requests | jq 'length'

Solution:

api:
  request_log_size: 500  # Reduce from default 1000
  # Or disable entirely
  enable_request_log: false

Cause 2: Cache Memory

In-memory cache consuming too much RAM.

Diagnosis:

# Check cache statistics
curl -s http://localhost:7082/api/v1/cache/stats | jq

# Check via metrics
curl -s http://localhost:7090/metrics | grep bifrost_cache

Solution:

Limit cache memory usage:

cache:
  memory:
    max_size: "128MB"  # Limit memory cache
    max_items: 10000

  # Use disk cache for larger storage
  disk:
    enabled: true
    path: "/var/cache/bifrost"
    max_size: "1GB"

Cause 3: Connection Pool Growth

Connection pools growing unbounded.

Solution:

Configure connection pool limits:

server:
  http:
    max_idle_conns: 100
    max_idle_conns_per_host: 10
    idle_conn_timeout: "90s"

Cause 4: Memory Leak (Rare)

Gradual memory growth without release.

Diagnosis:

# Monitor memory over time
while true; do
  ps aux | grep bifrost | awk '{print strftime("%H:%M:%S"), $6/1024, "MB"}'
  sleep 60
done

# Check goroutine count for leaks
curl -s http://localhost:7090/metrics | grep bifrost_goroutines

Solution:

Restart the service as a temporary fix
Check for latest version with bug fixes
Report the issue with memory profiles

High Bandwidth/Throughput Issues

Symptoms: Network throughput is lower than expected.

Cause 1: MTU Fragmentation

Packets are being fragmented, reducing throughput.

Diagnosis:

# Test with different packet sizes
ping -M do -s 1400 example.com
ping -M do -s 1200 example.com

# Check for fragmentation in netstat
netstat -s | grep -i fragment

Solution:

# For WireGuard
backends:
  - name: wg-vpn
    type: wireguard
    config:
      mtu: 1280  # Conservative value

# For VPN mode
vpn:
  mtu: 1280

Cause 2: Buffer Sizes Too Small

Network buffers limiting throughput.

Solution:

Increase system buffer sizes:

# Linux
sudo sysctl -w net.core.rmem_max=26214400
sudo sysctl -w net.core.wmem_max=26214400
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 26214400"
sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 26214400"

# Make permanent in /etc/sysctl.conf

Cause 3: Backend Bandwidth Limit

The backend connection has limited bandwidth.

Diagnosis:

# Speed test through proxy
curl -x http://localhost:7080 https://speed.cloudflare.com/__down?bytes=100000000 -o /dev/null -s -w "Speed: %{speed_download} bytes/sec\n"

# Compare direct
curl https://speed.cloudflare.com/__down?bytes=100000000 -o /dev/null -s -w "Speed: %{speed_download} bytes/sec\n"

Solution:

Use multiple backends with load balancing for higher aggregate throughput.

Performance Tuning Checklist

System Level

# Increase file descriptor limits
ulimit -n 65536

# Optimize TCP settings
sysctl -w net.core.somaxconn=65535
sysctl -w net.ipv4.tcp_max_syn_backlog=65535
sysctl -w net.ipv4.tcp_fin_timeout=30
sysctl -w net.ipv4.tcp_tw_reuse=1

# Increase buffer sizes
sysctl -w net.core.rmem_max=26214400
sysctl -w net.core.wmem_max=26214400

Bifrost Configuration

# Optimized production configuration
logging:
  level: warn
  format: json

server:
  http:
    listen: ":7080"
    read_timeout: "30s"
    write_timeout: "30s"
    idle_timeout: "120s"
    max_connections: 50000

api:
  enable_request_log: false  # Disable for performance

cache:
  enabled: true
  memory:
    max_size: "256MB"
  disk:
    enabled: true
    path: "/var/cache/bifrost"
    max_size: "2GB"

Monitoring Configuration

metrics:
  enabled: true
  listen: ":7090"

# Key metrics to watch:
# - bifrost_request_duration_seconds
# - bifrost_connections_active
# - bifrost_backend_latency_seconds
# - bifrost_goroutines
# - process_resident_memory_bytes  (from the process collector)

Performance Metrics Reference

Metric	Healthy Value	Warning Threshold
Request latency (p95)	< 100ms	> 500ms
Active connections	< 10,000	> 50,000
Error rate	< 0.1%	> 1%
CPU usage	< 50%	> 80%
Memory usage	< 512MB	> 2GB
Goroutine count	< 10,000	> 50,000

# Quick performance check
curl -s http://localhost:7090/metrics | grep -E "(bifrost_request_duration|bifrost_connections_active|bifrost_memory)"