A number of performance improvements were made in Keystone Mitaka, including caching the catalog, which should make token creation faster according to the Keystone developers. In this blog post, I will test this assertion.
My setup is unique to how I run keystone, you may be using different token formats, different backends, different web servers, and a different load balancer architecture. The point here is just to test Mitaka vs Liberty in my setup.
I’m running a 3 node Keystone cluster on virtual machines running in my OpenStack cloud. The nodes are fronted by another virtual machine running haproxy. The keystone cluster is using round-robin load balancing. I am requesting the tokens from a third virtual machine via the VIP provided by haproxy. The keystone nodes have 2 VCPUs + 4G RAM.
Keystone is running inside a docker container, which runs uwsgi. uwsgi has 2 static threads.
- The Mitaka code is based on stable/mitaka from March 22, 2016.
- The Liberty code is based on stable/liberty from March 16, 2016.
Note: I retested again with branches from April 17 and April 15 respectively, results were the same.
Keystone is configured to use Fernet tokens and the mysql backend.
I did not rebuild the machines, the mitaka runs are based on nodes upgraded to Mitaka from Liberty.
I am doing 20 benchmark runs against each setup, delaying 120 seconds in between each run. The goal here is to even out performance changes based on the fact that these are virtual machines running in a cloud. The tests run as follows:
- Create 200 tokens serially
- Validate 200 tokens serially
- Create 1000 tokens concurrently (20 at once)
- Validate 500 tokens concurrently (20 at once)
The code for running these benchmarks, which I borrowed from Dolph Mathew’s and made a bit easier to use, is available on github. Patches welcome.
Is Mitaka faster? Answer: No.
Something is amiss in Mitaka Fernet token performance and there is a serious degradation here.
The charts tell the story, each chart below shows how many requests per second can be handled, and concurrent validation is the most concerning because this is the standard model of what a cloud is doing. Dozens of API calls being made at once to tens of services and each one wants to validate a token.
So you can see that concurrent validation is much slower. Let’s also compare with memcache enabled:
Let’s look at raw data which is more damning due to the scale on the charts:
I spent some time thinking about why this might be slower and I see one clue, the traffic to memcache (shown using stats command) in Mitaka is 3-4x what it is in Liberty. Perhaps Keystone is caching too much or too often? I don’t really know but that is an interesting difference here.
I’m hopeful that this gets fixed or looked at in Newton and backported to Mitaka.
Possible sources of error:
- These are VMs, could have noisy neighbors. Mitigation: Run it a whole lot. Re-run on new VMs. Run at night.
- Revocation events. Mitigation: Check and clear revocation table entries before running perf tests.
I’d really like someone else to reproduce this, especially using a different benchmarking tool to confirm this data. If you did, please let me know.