The awesome Keystone team has been working hard in Ocata to improve overall keystone performance when token revocations are present. If you’ve not read my previous post as to why this is an issue you should start there. Here’s the tl;dr background: when any token revocations are present in your database, token validation performance suffers and suffers greatly. Token validations are at the heart of your cloud. Every single OpenStack API call requires a token validation. Revocations happen when a user or project is deleted, a token revoke API call is made, or until recently, someone logged out of Horizon.
So here’s the simplification of this path: Revoked tokens slow down token validation which slows down all OpenStack API calls, ergo, revoked tokens slow down your OpenStack APIs.
Here is what this looks like in Liberty, can you see when our regression tests run and generate revocations?
Can you tell in Cinder when we have revoked tokens?
Fortunately the team focused on fixing this in Ocata and the good news is that it seemed to work. In Ocata (currently on master) there is now no longer a correlation between revoked tokens and token validation performance.
The experimental setup is the same as my previous post, except different software. The nodes are running keystone in docker containers with uwsgi using stable/newton as of Nov 12 2016. The Ocata round is using master as of commit 498d700c. Both tests are using Fernet tokens with caching.
Validations Per Second as a Function of Number of Revocations
The first chart will show the number of token validations that can be completed per second. For this test more is better, it means more validations get pushed through and the test completes faster.
As you can see we no longer have the exponential decay which is good. Now the rate is steady and we will not have the spike in timings that we see after we our regression tests run. You may also notice that the base rate is a bit slower in Ocata. If you never have any token revokes this may be concerning, but this timing is still pretty fast. As I said before I was doing 20 threads at a time, if this is raised to 50 or 100 the rate would be much higher. In other words this is not a performance ceiling you are seeing, just comparing N to O under the same conditions.
99% Percentile Time to Complete a Validation
This chart examines the same data in a different way. This chart shows the time in milliseconds in which 99% of the token revocations are completed. In this chart, lower is better. You can see a linear progression in the amount of time to complete the token validation. In Newton, by the time you have 1000 revocations it goes from 99ms to validate a token to 1300 ms.
More Fixes to Come
This work is great news for having predictable keystone token performance. I won’t have to tell anyone to go truncate the revocation_event table when things get slow and we shouldn’t have graph spikes anymore. Also there is more work to come. The Keystone team is working on more fixes and improvements in this area. You can track the progress of that here: https://review.openstack.org/#/q/project:openstack/keystone+branch:master+topic:bug/1524030