Making PHP scream: OPcache, optimized builds and PGO
When working with heavy applications, especially WordPress, I have frequently run into performance issues, with response times taking upwards of a second, and in extreme cases, more than 5 or 10 seconds.
Those problems are hard to fix with conventional optimization techniques: using a profiler and editing the code to remove bottlenecks.
First of all, because you'll have to patch plugins, which you will then no longer be able to update.
And secondly because performance issues in large sites are frequently due to the sheer amount of function calls triggered by WordPress' hooks system. There is no generic way to fix that, because hooks cannot be removed without potentially breaking a plugin.
For those reasons, it is more effective to work outside the system, and look at optimizing PHP to simply run all that code faster.
First off all, make sure that OPcache is enabled.
Then, building PHP with optimizations for your CPU family might help significantly.
To squeeze out a few additional percent, use PGO, but make sure to run the profiler on your app.
Enable the OPcache
This is a no-brainer, and by far the most important tool in the toolbox.
The PHP OPcache caches the bytecode (opcodes) that is generated from PHP scripts. Thus it is no longer necessary to read and parse all scripts on every request. That gives a huge performance boost with little effort.
Testing PHP 7.2.15 running a fresh install of WordPress 5.0.3, these are the 50th percentile response times with and without OPcache:
- Without OPcache: 74 ms
- With OPcache: 9 ms (-88%)
I've also tested the same PHP build with a heavier app, a site with WooCommerce, a fancy theme and many other plugins:
- Without OPcache: 1106 ms
- With OPcache: 641 ms (-42%)
Building PHP with optimizations
CFLAGS='-O2 -march=native' builds an optimized version of PHP for your
CPU family. Distribution packages need to be compatible with all CPUs for your
architecture, so cannot include these optimizations.
In my testing, this makes no difference for a basic WordPress install, since the bottlenecks are presumably elsewhere (in the database, or code that cannot be optimized).
However, on the application I tested, the results are good:
- -O2/-march=native without OPcache: 732 ms (-34% vs defaults without OPcache)
- -O2/-march=native with OPcache: 433 ms (-32% vs defaults with OPcache)
- -O3/-march=native with OPcache: 429 ms (-33% vs defaults with OPcache)
The effect is probably bigger with newer CPUs, as new CPU families have more features that can be used for optimizations.
Build PHP with profile-guided optimization (PGO)
First building PHP with
make prof-gen, benchmarking the application to
generate profiles, and then running
make prof-use should build a version of
PHP that is optimized for the application.
Again, with a simple WordPress site, this makes no difference, but with the same heavy WooCommerce site, it does have a small impact:
- -O2/-march=native/PGO(WP) with OPcache: 452 ms (-29% vs defaults with OPcache)
- -O2/-march=native/PGO(app) with OPcache: 406 ms (-37% vs defaults with OPcache)
In the first case, the PGO build was done with WordPress only. In the second case it was done on the app. As you can tell, it is important to use a representative workload for PGO.
Clang is slower than GCC
Out of interest, I tested with Clang.
Using clang 7.0.1 instead of GCC 8.2.1 yields a build that is ever so slightly slower, but only when using the OPcache:
- GCC: 74 ms
- Clang -O2 -march=native: 74 ms
- GCC with OPcache: 9 ms
- Clang -O2 -march=native with OPcache: 10 ms