Performance Tuning Guide
Practical optimization strategies for scaling Lynq to thousands of nodes.
Understanding Performance
Lynq uses three reconciliation layers:
- Event-Driven (Immediate): Reacts to resource changes instantly
- Periodic (30 seconds): Fast status updates and drift detection
- Database Sync (Configurable): Syncs node data at defined intervals
This architecture ensures:
- ✅ Immediate drift correction
- ✅ Fast status reflection (30s)
- ✅ Configurable database sync frequency
Configuration Tuning
1. Database Sync Interval
Adjust how frequently the operator checks your database:
apiVersion: operator.lynq.sh/v1
kind: LynqHub
metadata:
name: my-hub
spec:
source:
syncInterval: 1m # Default: 1 minuteRecommendations:
- High-frequency changes:
30s- Faster node provisioning, higher DB load - Normal usage:
1m(default) - Balanced performance - Stable nodes:
5m- Lower DB load, slower updates
2. Resource Wait Timeouts
Control how long to wait for resources to become ready:
deployments:
- id: app
waitForReady: true
timeoutSeconds: 300 # Default: 5 minutes (max: 3600)Recommendations:
- Fast services:
60s- Quick deployments (< 1 min) - Normal apps:
300s(default) - Standard deployments - Heavy apps:
600s- Database migrations, complex initialization - Skip waiting: Set
waitForReady: falsefor non-critical resources
3. Creation Policy Optimization
Reduce unnecessary reconciliations:
configMaps:
- id: init-config
creationPolicy: Once # Create once, never reapplyUse Cases:
Once: Init scripts, immutable configs, security resourcesWhenNeeded(default): Normal resources that may need updates
Template Optimization
1. Keep Templates Simple
✅ Good - Efficient template:
nameTemplate: "{{ .uid }}-app"❌ Bad - Complex template:
nameTemplate: "{{ .uid }}-{{ .region }}-{{ .planId }}-{{ now | date \"20060102\" }}"
# Avoid: timestamps, random values, complex logicTips:
- Keep templates simple and predictable
- Avoid
now,randAlphaNum, or other non-deterministic functions - Use consistent naming patterns
- Cache-friendly templates improve performance
2. Dependency Graph Optimization
✅ Good - Shallow dependency tree:
resources:
- id: namespace # No dependencies
- id: deployment # Depends on: namespace
- id: service # Depends on: deployment
# Depth: 3 - Resources can be created in parallel groups❌ Bad - Deep dependency tree:
resources:
- id: a # No dependencies
- id: b # Depends on: a
- id: c # Depends on: b
- id: d # Depends on: c
- id: e # Depends on: d
# Depth: 5 - Fully sequential, slowImpact:
- Shallow trees enable parallel execution
- Deep trees force sequential execution
- Each level adds wait time
3. Minimize Resource Count
Example: Create 5 essential resources per node instead of 15
# Essential only
spec:
namespaces: [1]
deployments: [1]
services: [1]
configMaps: [1]
ingresses: [1]
# Total: 5 resourcesImpact:
- Fewer resources = Faster reconciliation
- Less API server load
- Lower memory usage
Scaling Considerations
Resource Limits
Adjust operator resource limits based on node count:
# values.yaml for Helm
resources:
limits:
cpu: 2000m # For 1000+ nodes
memory: 2Gi # For 1000+ nodes
requests:
cpu: 500m # Minimum for stable operation
memory: 512Mi # Minimum for stable operationGuidelines:
- < 100 nodes: Default limits (500m CPU, 512Mi RAM)
- 100-500 nodes: 1 CPU, 1Gi RAM
- 500-1000 nodes: 2 CPU, 2Gi RAM
- 1000+ nodes: Consider horizontal scaling (coming in v1.3)
Database Optimization
- Add indexes to node table:
CREATE INDEX idx_is_active ON node_configs(is_active);
CREATE INDEX idx_node_id ON node_configs(node_id);Use read replicas for high-frequency syncs
Connection pooling: Operator uses persistent connections
Monitoring Performance
Key Metrics to Watch
Monitor these Prometheus metrics:
# Reconciliation duration (target: < 5s P95)
histogram_quantile(0.95,
sum(rate(lynqnode_reconcile_duration_seconds_bucket[5m])) by (le)
)
# Node readiness rate (target: > 95%)
sum(lynqnode_resources_ready) / sum(lynqnode_resources_desired)
# High error rate alert (target: < 5%)
sum(rate(lynqnode_reconcile_duration_seconds_count{result="error"}[5m]))
/ sum(rate(lynqnode_reconcile_duration_seconds_count[5m]))See Monitoring Guide for complete metrics reference.
Troubleshooting Slow Performance
Symptom: Slow Node Creation
Check:
- Database query performance
waitForReadytimeouts- Dependency chain depth
Solution:
# Check reconciliation times
kubectl logs -n lynq-system -l control-plane=controller-manager | grep "Reconciliation completed"
# Reduce sync interval if database is slow
kubectl patch lynqhub my-hub --type=merge -p '{"spec":{"source":{"syncInterval":"2m"}}}'Symptom: High CPU Usage
Check:
- Reconciliation frequency
- Template complexity
- Total node count
Solution:
# Check CPU usage
kubectl top pods -n lynq-system
# Increase resource limits
kubectl edit deployment -n lynq-system lynq-controller-managerSymptom: Memory Growth
Possible causes:
- Too many cached resources
- Large template outputs
- Memory leak (file an issue)
Solution:
# Restart operator to clear cache
kubectl rollout restart deployment -n lynq-system lynq-controller-manager
# Monitor memory over time
kubectl top pods -n lynq-system --watchBest Practices Summary
- ✅ Start with defaults - Only optimize if you see issues
- ✅ Keep templates simple - Avoid complex logic and non-deterministic functions
- ✅ Use shallow dependency trees - Enable parallel resource creation
- ✅ Set appropriate timeouts - Balance speed vs reliability
- ✅ Monitor key metrics - Watch reconciliation duration and error rates
- ✅ Index your database - Improve sync query performance
- ✅ Use
CreationPolicy: Once- For immutable resources
See Also
- Monitoring Guide - Complete metrics reference and dashboards
- Prometheus Queries - Ready-to-use queries
- Configuration Guide - All operator settings
- Troubleshooting Guide - Common issues and solutions
