You Can’t Optimize What You Don’t Measure
You’ve implemented all the optimizations from this series. Infrastructure is solid, caching is aggressive, database is tuned. How do you know it’s working? How do you catch problems before they impact users?
The answer: Continuous monitoring and regular testing.
This final post covers the monitoring strategies and testing practices that keep enterprise WordPress sites performing consistently at scale.
Understanding Monitoring Types
Different monitoring approaches reveal different insights.
Synthetic Monitoring
What it is: Automated tests from specific locations checking if site is accessible and performing well.
Tools: Pingdom, UptimeRobot, StatusCake
What it measures:
- Is site up or down?
- Response time from various locations
- Uptime percentage
Limitations:
- Tests from specific locations (not real user distribution)
- Simple checks (load homepage, check response code)
- Doesn’t reflect real user experience
When to use: Uptime monitoring, basic performance checks, alerting on outages.
Real User Monitoring (RUM)
What it is: Collects performance data from actual users’ browsers as they browse your site.
Tools: Google Analytics (basic), New Relic, Datadog RUM, SpeedCurve
What it measures:
- Actual load times users experience
- Performance across different devices/browsers/locations
- Core Web Vitals from real traffic
- Conversion correlation with performance
Advantages:
- Real data from real users
- Captures full diversity of devices, networks, locations
- Shows impact of performance on business metrics
When to use: Understanding actual user experience, finding regional performance issues, correlating performance with conversion.
Application Performance Monitoring (APM)
What it is: Monitors application internals—database queries, PHP execution, external API calls.
Tools: New Relic APM, Datadog APM, Scout APM
What it measures:
- Slow database queries
- PHP function execution time
- External service dependencies
- Memory usage patterns
When to use: Diagnosing performance problems, optimizing code, finding bottlenecks.
Key Performance Metrics to Track
Core Web Vitals (Google’s Metrics)
Largest Contentful Paint (LCP):
- Measures loading performance
- Time until largest element renders
- Target: <2.5 seconds
- Typical issue: Large unoptimized images
Interaction to Next Paint (INP):
- Measures interactivity/responsiveness
- Time from user interaction to visual response
- Target: <200ms
- Typical issue: Heavy JavaScript blocking main thread
Cumulative Layout Shift (CLS):
- Measures visual stability
- How much page layout shifts during loading
- Target: <0.1
- Typical issue: Images without dimensions, late-loading ads
Why Core Web Vitals matter:
- Google ranking factor
- Strong correlation with user satisfaction
- Industry-standard metrics
Monitoring Core Web Vitals:
// Collect CWV using web-vitals library
import {getCLS, getFID, getLCP} from 'web-vitals';
function sendToAnalytics(metric) {
// Send to your analytics endpoint
fetch('/analytics', {
method: 'POST',
body: JSON.stringify(metric)
});
}
getCLS(sendToAnalytics);
getFID(sendToAnalytics);
getLCP(sendToAnalytics);
Time to First Byte (TTFB)
Measures server response time—how long before browser receives first byte of response.
Target: <200ms
What affects TTFB:
- Server processing time (PHP execution, database queries)
- Network latency
- CDN performance
High TTFB indicates:
- Slow server
- Database bottlenecks
- Cache not working
Page Load Time
Total time until page fully loads.
Measure with:
- Navigation Timing API
- Google Analytics
- RUM tools
Target: <3 seconds (faster is better)
Database Performance Metrics
Query time:
- Average query response time
- Slow query count (>100ms)
Connection count:
- Active database connections
- Connection pool utilization
Replication lag:
- Time between write on primary and replication to replicas
- Target: <1 second
Server Resource Metrics
CPU usage:
- Application server CPU
- Database server CPU
- Target: <70% average (leaves headroom for spikes)
Memory usage:
- Available vs used memory
- Watch for memory leaks
Disk I/O:
- Read/write operations
- High disk I/O can indicate database not fully cached
Monitoring Tools and Setup
New Relic (Recommended for Most)
What it monitors:
- Application performance (APM)
- Real user monitoring (RUM)
- Infrastructure metrics
- Database performance
- External service dependencies
WordPress setup:
- Install New Relic agent on servers
- Add JavaScript snippet to WordPress header
- Configure application in New Relic dashboard
Cost: Free tier available, paid tiers scale with usage.
Datadog
Similar to New Relic but stronger infrastructure focus:
- Server metrics (CPU, memory, disk)
- Application performance
- Log management
- Custom dashboards
WordPress setup:
- Install Datadog agent on servers
- Enable PHP APM
- Configure WordPress integration
Cost: Paid service, pricing based on hosts monitored.
Google Analytics + PageSpeed Insights
Free option with Google-centric metrics:
- Google Analytics: Basic performance timing
- PageSpeed Insights: Core Web Vitals from Chrome User Experience Report
Limitations:
- Less detailed than paid tools
- Delayed data (not real-time)
- Google ecosystem only
WordPress-Specific Tools
Query Monitor plugin:
- Shows all database queries on page
- Highlights slow queries
- Identifies duplicate queries
- Development/staging only (performance overhead)
Debug Bar:
- PHP warnings and errors
- WordPress hook timing
- Cache statistics
Setting Up Effective Alerts
Alert Fatigue is Real
Too many alerts = team ignores all alerts.
Bad alerts:
- CPU is >50% (too sensitive)
- Any database query >50ms (too noisy)
- Alert on every minor issue
Good alerts:
- Average response time >500ms for 5 minutes (actionable)
- Error rate >1% for 2 minutes (urgent)
- Database CPU >90% for 3 minutes (concerning)
Alert Priority Levels
Critical (wake someone up):
- Site completely down
- Error rate >5%
- Database unavailable
High (investigate immediately during work hours):
- Response time degraded significantly
- Traffic spike causing issues
- Security incident detected
Medium (investigate within hours):
- Slow queries trending up
- Cache hit rate decreasing
- Disk space low
Low (investigate when convenient):
- Minor performance degradation
- Non-critical features failing
- Informational notices
Alert Channels
PagerDuty: Critical alerts, on-call rotation Slack: High/medium alerts, team collaboration Email: Low priority alerts, daily summaries
Load Testing: Simulating Traffic Before It Happens
Don’t discover performance problems during real traffic spikes.
Why Load Test
Find breaking points:
- At what traffic level does site slow down?
- Which component fails first?
Validate optimizations:
- Did caching improvement actually help?
- How much more traffic can site handle now?
Prevent surprises:
- Test before product launches
- Verify infrastructure changes don’t degrade performance
Load Testing Tools
k6 (recommended):
- Open-source, powerful
- Write tests in JavaScript
- Run locally or in cloud
Example k6 test:
import http from 'k6/http';
import { check, sleep } from 'k6';
export let options = {
stages: [
{ duration: '5m', target: 100 }, // Ramp to 100 users
{ duration: '10m', target: 100 }, // Stay at 100 users
{ duration: '5m', target: 500 }, // Spike to 500 users
{ duration: '10m', target: 500 }, // Stay at 500 users
{ duration: '5m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95% of requests <500ms
http_req_failed: ['rate<0.01'], // Error rate <1%
},
};
export default function () {
let response = http.get('https://yoursite.com/');
check(response, {
'status is 200': (r) => r.status === 200,
'response time <500ms': (r) => r.timings.duration < 500,
});
// Simulate realistic user behavior
sleep(Math.random() * 5 + 5); // 5-10 seconds between requests
}
Loader.io:
- Cloud-based, simple UI
- No coding required
- Free tier for basic tests
JMeter:
- Enterprise-grade load testing
- Complex scenarios
- Steeper learning curve
Load Testing Best Practices
Realistic scenarios:
- Mix of homepage, articles, category pages (not just homepage)
- Different user behaviors (browser, search, direct traffic)
- Mobile vs desktop traffic ratios
Gradual ramp-up:
- Start at normal traffic level
- Slowly increase (10% every 5 minutes)
- Find where performance degrades
Test from multiple locations:
- Closer to CDN edge = better performance
- Far from CDN = realistic worst-case
Don’t just test peak:
- Test sustained load (2 hours at high traffic)
- Verify no memory leaks or resource exhaustion
Test production-like environment:
- Staging should match production infrastructure
- Same caching, same database size
- Realistic data volume
Interpreting Load Test Results
Good results:
- Response time stays flat as users increase
- Error rate remains <0.1%
- Resources scale linearly
Concerning results:
- Response time increases with users (caching not effective)
- Errors spike at certain threshold (resource exhaustion)
- CPU/memory maxes out (need more capacity)
After load test:
- Identify bottleneck (database, cache, CPU)
- Optimize bottleneck
- Re-test to verify improvement
- Repeat until performance goals met
Performance Budgets
Set limits on key metrics and enforce them.
What is Performance Budget
Defined limits:
- Total page weight: <1MB
- JavaScript bundle: <300KB
- LCP: <2.5s
- Requests per page: <50
Why budgets matter:
- Prevents performance regression
- Forces prioritization (can’t add more features without optimizing)
- Team accountability
Enforcing Performance Budgets
Lighthouse CI:
- Runs Lighthouse on every commit
- Fails build if budget exceeded
// lighthouserc.json
{
"ci": {
"assert": {
"assertions": {
"first-contentful-paint": ["error", {"maxNumericValue": 2000}],
"largest-contentful-paint": ["error", {"maxNumericValue": 2500}],
"cumulative-layout-shift": ["error", {"maxNumericValue": 0.1}],
"total-byte-weight": ["error", {"maxNumericValue": 1000000}]
}
}
}
}
WebPageTest budget:- Set budgets in WebPageTest
- Get alerts when exceeded
Continuous Performance Optimization Workflow
Performance optimization isn’t one-time—it’s continuous process.
Monthly Routine
Week 1: Monitor and analyze
- Review performance dashboards
- Check Core Web Vitals trends
- Identify any degradation
Week 2: Prioritize issues
- List performance issues found
- Prioritize by impact (what affects most users?)
- Plan fixes
Week 3: Implement fixes
- Optimize slow queries
- Reduce asset sizes
- Fix caching issues
Week 4: Test and verify
- Load test improvements
- Verify in production
- Document changes
Performance Review Meetings
Monthly performance review:
- Review key metrics vs targets
- Discuss user complaints related to performance
- Plan next month’s optimizations
Quarterly deep dives:
- Comprehensive performance audit
- Load test at higher scale
- Review infrastructure capacity
Monitoring at Enterprise Scale
At 10M+ daily visitors:
Must monitor:
- Core Web Vitals (LCP, INP, CLS)
- TTFB and page load time
- Error rates (4xx, 5xx)
- Database performance
- Cache hit rates
- Server resources (CPU, memory, disk)
Best practices:
- Real User Monitoring (RUM) for actual user experience
- APM for diagnosing issues
- Synthetic monitoring for uptime
- Load testing monthly
- Performance budgets enforced in CI/CD
Alert on:
- Site down
- Error rate >1%
- Response time >500ms sustained
- Database issues
- Traffic spikes
Review:
- Daily: Quick dashboard check
- Weekly: Detailed metrics review
- Monthly: Performance optimization sprint
- Quarterly: Load testing and capacity planning
Conclusion: The Complete Enterprise WordPress Stack
Over this 10-post series, we’ve covered everything needed to scale WordPress to 10M+ daily visitors:
- Infrastructure: Managed vs self-managed hosting decisions
- Caching: Page, object, and CDN caching strategies
- Database: Read replicas, query optimization, cleanup
- Assets: Image, JavaScript, CSS optimization
- Traffic spikes: Auto-scaling, pre-warming, monitoring
- WordPress: Plugin management, theme optimization, core config
- Personalization: ESI, JavaScript, microservices for logged-in users
- Security: DDoS protection, WAF, monitoring, incident response
- Monitoring: RUM, APM, metrics, alerting
- Testing: Load testing, performance budgets, continuous optimization
The common thread: WordPress absolutely can scale to enterprise traffic with proper architecture.
It’s not about WordPress—it’s about building proper infrastructure, implementing aggressive caching, optimizing continuously, and monitoring everything.
If you’re scaling WordPress or planning for enterprise traffic, we’d be happy to help review your architecture and recommend optimizations.
Connect with Matt Dorman on LinkedIn or reach out at ndevr.io/contact
Let’s build WordPress platforms that scale.




