Monthly Archives: November 2016

Token Revocation Performance Improvements in Keystone Ocata

The awesome Keystone team has been working hard in Ocata to improve overall keystone performance when token revocations are present. If you’ve not read my previous post as to why this is an issue you should start there. Here’s the tl;dr background: when any token revocations are present in your database, token validation performance suffers and suffers greatly. Token validations are at the heart of your cloud. Every single OpenStack API call requires a token validation. Revocations happen when a user or project is deleted, a token revoke API call is made, or until recently, someone logged out of Horizon.

So here’s the simplification of this path: Revoked tokens slow down token validation which slows down all OpenStack API calls, ergo, revoked tokens slow down your OpenStack APIs.

Here is what this looks like in Liberty, can you see when our regression tests run and generate revocations?

Can you tell in Cinder when we have revoked tokens?

Fortunately the team focused on fixing this in Ocata and the good news is that it seemed to work. In Ocata (currently on master) there is now no longer a correlation between revoked tokens and token validation performance.

Experimental Setup

The experimental setup is the same as my previous post, except different software. The nodes are running keystone in docker containers with uwsgi using stable/newton as of Nov 12 2016. The Ocata round is using master as of commit 498d700c. Both tests are using Fernet tokens with caching.

Results

Validations Per Second as a Function of Number of Revocations

The first chart will show the number of token validations that can be completed per second. For this test more is better, it means more validations get pushed through and the test completes faster.

As you can see we no longer have the exponential decay which is good. Now the rate is steady and we will not have the spike in timings that we see after we our regression tests run. You may also notice that the base rate is a bit slower in Ocata. If you never have any token revokes this may be concerning, but this timing is still pretty fast. As I said before I was doing 20 threads at a time, if this is raised to 50 or 100 the rate would be much higher. In other words this is not a performance ceiling you are seeing, just comparing N to O under the same conditions.

99% Percentile Time to Complete a Validation

This chart examines the same data in a different way. This chart shows the time in milliseconds in which 99% of the token revocations are completed. In this chart, lower is better. You can see a linear progression in the amount of time to complete the token validation. In Newton, by the time you have 1000 revocations it goes from 99ms to validate a token to 1300 ms.

image-1

More Fixes to Come

This work is great news for having predictable keystone token performance. I won’t have to tell anyone to go truncate the revocation_event table when things get slow and we shouldn’t have graph spikes anymore. Also there is more work to come. The Keystone team is working on more fixes and improvements in this area. You can track the progress of that here: https://review.openstack.org/#/q/project:openstack/keystone+branch:master+topic:bug/1524030

Tagged ,

New PCI-DSS Features in Keystone Newton

Keystone Newton offers up some new PCI-DSS features which will help secure your Keystone deployment. I recently built a Newton dev environment (we’re running Mitaka still for the next few weeks) and tried them out. All of these features are disabled by default (commented out in the config file) and need to be enabled in order to use.

Before diving into this post you may want to read the docs on the new features which are available here. These new features are not mentioned in the Newton release notes or config guides (bug). However you can look at the generated config files which explain more about the settings here. Also note if you have a custom identity driver or you use the ldap identity driver, some of these features will not work (but the password expiry/complexity stuff should still work). You need to use the standard sql identity driver. Finally when you consider my conclusions, please keep in mind that I’m not a PCI-DSS expert, but I do know how to deploy and run OpenStack clouds.

And now the features…

Account Lockout

The first feature we will cover is lock-out. If a user fails N login attempts they will be disabled. They will be disabled for the specified duration (in seconds) and if that is not set they will be disabled indefinitely. Let’s try this one out:

Modify your config file and set:

lockout_failure_attempts=3
lockout_duration=1800

Then bounce keystone.

[DEV] root@dev01-keystone-001:~# service keystone restart
...

Next create a user and give them a role in a project:

[DEV] root@dev01-build-001:~# openstack user create --password secretpass bob
+---------------------+----------------------------------+
| Field               | Value                            |
+---------------------+----------------------------------+
| domain_id           | default                          |
| enabled             | True                             |
| id                  | 630b22fe7f814feeb5a498dc124d814c |
| name                | bob                              |
| password_expires_at | None                             |
+---------------------+----------------------------------+
[DEV] root@dev01-build-001:~# openstack role add --user bob --project admin _member_

Now let’s have bob use the wrong password a few times and get locked out:

[DEV] root@dev01-build-001:~# export OS_PASSWORD=wrongpassword
[DEV] root@dev01-build-001:~# openstack token issue
The request you have made requires authentication. (HTTP 401) (Request-ID: req-077dc218-65dd-44d0-ba23-2020abb125c3)
[DEV] root@dev01-build-001:~# openstack token issue
The request you have made requires authentication. (HTTP 401) (Request-ID: req-abf8e31f-a846-4ba3-9b08-8a60fcc48b5c)
[DEV] root@dev01-build-001:~# openstack token issue
The request you have made requires authentication. (HTTP 401) (Request-ID: req-e19a2d8b-4bb7-4ce2-bd1a-590facf2e149)
[DEV] root@dev01-build-001:~# openstack token issue
The account is locked for user: 630b22fe7f814feeb5a498dc124d814c (HTTP 401) (Request-ID: req-1b891922-01f0-485e-91f2-52bb1b6c3f79)

So bob is now locked out. One thing that surprised me is that bob is not in fact disabled if you look at his user object (openstack user show bob), just locked out. It’s a separate table entry in the local_user table. In order to unlock bob, you need to wait for the lock to expire or manually change the database.

mysql> select * from local_user where name="bob";
+----+----------------------------------+-----------+------+-------------------+---------------------+
| id | user_id                          | domain_id | name | failed_auth_count | failed_auth_at      |
+----+----------------------------------+-----------+------+-------------------+---------------------+
| 22 | 630b22fe7f814feeb5a498dc124d814c | default   | bob  |                 3 | 2016-11-09 16:21:41 |
+----+----------------------------------+-----------+------+-------------------+---------------------+

Interestingly once locked you can’t change your password either:

[DEV] root@dev01-build-001:~# openstack user password set
The account is locked for user: 630b22fe7f814feeb5a498dc124d814c (HTTP 401) (Request-ID: req-d6433510-615f-48af-b2ea-b4fd3b8407c7)

DOSing Your Cloud for Fun & Profit

To be honest this feature scares me because of the way it can DOS your cloud. The first way is what if someone (or some tool) misconfigures a service account for say, nova. Nova tries to do its job and it starts trying to get tokens. After it fails 3 times, you’ve now essentially bricked nova for 30 minutes. Here’s a worse way, what if I just pretend I’m nova by setting OS_USERNAME and trying to get a token.

[DEV] root@dev01-build-001:~# export OS_TENANT_NAME=services
[DEV] root@dev01-build-001:~# export OS_USERNAME=nova
[DEV] root@dev01-build-001:~# export OS_PASSWORD=evil_hax0r_dos_nova
[DEV] root@dev01-build-001:~# openstack token issue
The request you have made requires authentication. (HTTP 401) (Request-ID: req-47390107-236b-4317-b97a-b5e811af87b2)
[DEV] root@dev01-build-001:~# openstack token issue
The request you have made requires authentication. (HTTP 401) (Request-ID: req-99f88665-d653-4e73-8e03-1b897e14440f)
[DEV] root@dev01-build-001:~# openstack token issue
The request you have made requires authentication. (HTTP 401) (Request-ID: req-7ca5878a-f633-4f23-9b67-fdb5bec13dcf)
[DEV] root@dev01-build-001:~# openstack token issue
The account is locked for user: 8935e905d8ab416d96509022fa6c00d1 (HTTP 401) (Request-ID: req-cbff7497-88c2-45ce-bf9e-14bfee9d1e78)

Nova is now broken. Repeat for all services or even better lock the admin account or lock the account of the guy next to you who chews too loudly at lunch…

So with that in mind, here’s some things to consider before enabling this:

  • The failed_auth_count resets anytime there is a successful login.
  • There’s no method I saw for an admin to unlock the user without hacking the database
  • When the user is locked they can’t change their password
  • This feature can DOS your cloud

Inactive Users

The next feature added is the ability to disable users who have not logged in within the past N days. To enable this, set the feature and restart keystone, I will choose 1 day for this example.

disable_user_account_days_inactive = 1

Once enabled Keystone will begin to track last login with day granularity in the user table.

+----------------------------------+---------+---------------------+----------------+
| id                               | enabled | created_at          | last_active_at |
+----------------------------------+---------+---------------------+----------------+
| 630b22fe7f814feeb5a498dc124d814c |       1 | 2016-11-09 15:51:35 | 2016-11-09     |
+----------------------------------+---------+---------------------+----------------+

Since this has day granularity only, you will end up with an effective timeout of < the number of days you put in. In other words although bob logged in above at 3pm, he will get expired at 12:01AM the next day, not 3pm the next day. (Also don't set this to 1 day this is just an experiment). I was too impatient to wait a whole day so I just hacked the DB and so it looked like bob hadn't logged in since yesterday. Just like the failed logins feature, the account ends up disabled.

[DEV] root@dev01-build-001:~# openstack token issue
The account is disabled for user: 630b22fe7f814feeb5a498dc124d814c (HTTP 401) (Request-ID: req-8ce84bd3-911e-4bab-a972-664ff4ff03f1)

You should consider if you have both this and the failed logins enabled you could end up with a user who is essentially 2x disabled. First for failing logins and then for not having successfully logged in.

Password Expiration

By default all user objects returned by the v3 APIs will include a password expiry field with Keystone Newton. The field is null unless this feature is enabled however. Here’s what it looks like:

{"user": {"password_expires_at": null, "links": {"self": "http://dev01-admin.os.cloud.twc.net:35357/v3/users/3607079988c140a3a49311a2c6b75f86"}, "enabled": true, "email": "root@localhost", "id": "3607079988c140a3a49311a2c6b75f86", "domain_id": "default", "name": "admin"}}

To enable this feature, we set “password_expires_days = 1” and bouncing keystone, but then also we need to change the existing password or make a new user. This setting is not retroactive on existing accounts and passwords.

So to make this take full effect we will reset bob’s password.

[DEV] root@dev01-build-001:~# openstack user password set
Current Password:
New Password:
Repeat New Password:

And then look at his user object:

[DEV] root@dev01-build-001:~# openstack user show bob
+---------------------+----------------------------------+
| Field               | Value                            |
+---------------------+----------------------------------+
| domain_id           | default                          |
| enabled             | True                             |
| id                  | 630b22fe7f814feeb5a498dc124d814c |
| name                | bob                              |
| password_expires_at | 2016-11-10T17:28:49.000000       |
+---------------------+----------------------------------+

Note that unlike the inactive user settings this stores data with second-level granularity.

Interestingly when the password was updated it kept the old one in the DB and just expired it, so in the DB bob has 2 passwords, one valid, one expired. This is new behavior.

mysql> select * from password where local_user_id=22;
+----+---------------+-------------------------------------------------------------------------------------------------------------------------+---------------------+--------------+---------------------+
| id | local_user_id | password                                                                                                                | expires_at          | self_service | created_at          |
+----+---------------+-------------------------------------------------------------------------------------------------------------------------+---------------------+--------------+---------------------+
| 23 |            22 | $6$rounds=10000$eif6VK1cIZshn4v.$4B5zaiGTiTc7BFD5OAP0uro0HAdNzh0SttL6Lt3CjYKDt8Esvt./y3rTlQS7XTqVhhVJvpvpxb7UDeeATZVxn1 | 2016-11-09 17:28:50 |            0 | 2016-11-09 15:51:35 |
| 24 |            22 | $6$rounds=10000$anOfjEPsi92JXxO2$HySHiUPI6JI4wmWoRJkMc7X4lvOgFbXu.AXTSBbAuYwYUUEjbcg4xhqkrjlAKFXhM0Mbvb/J0pzQXv1uq65mD. | 2016-11-10 17:28:49 |            1 | 2016-11-09 17:28:50 |
+----+---------------+-------------------------------------------------------------------------------------------------------------------------+---------------------+--------------+---------------------+

Now lets try to login using the new password we set:

[DEV] root@dev01-build-001:~# export OS_PASSWORD=newsecretpass
[DEV] root@dev01-build-001:~# openstack token issue
The password is expired and needs to be reset by an administrator for user: 630b22fe7f814feeb5a498dc124d814c (HTTP 401) (Request-ID: req-ef525f73-d6bf-4f07-b70f-34e7d1ae29c2)

This is a clear error message that the user will hopefully see and take action on. Note the message that an admin has to do the reset. bob cannot reset his own password once it expires.

There is also a related setting here called password_expires_ignore_user_ids which exempts certain user ids from password expiry requirements. You would probably set this for your service accounts. Note that this is id not name which may make your deployment automation to be more complex since ids vary across environments..

Password Complexity & Uniqueness

One of the most requested features was enforcing password complexity for users and now with Keystone Newton it is available. There are two settings here that can be enabled, the first is a regex for complexity and the 2nd is a user-readable description of that regex. This is used for error messages. To enable let’s set both.

As an admin, I’ve decided that all passwords must have “BeatTexas” in them because I’m a WVU fan and they play on Saturday. So lets set the settings, bounce keystone, and try it:

password_regex = ^.*BeatTexas.*$
password_regex_description = "passwords must contain the string BeatTexas"

Now let’s have bob try to set his password:

[DEV] root@dev01-build-001:~# openstack user password set --original-password newsecretpass --password HookEmHorns
Password validation error: The password does not meet the requirements: passwords must contain the string BeatTexas (HTTP 400) (Request-ID: req-9bfcf550-e822-4053-b566-f08f3b52e26e)

UT fan denied! But it works if you follow my rules:

[DEV] root@dev01-build-001:~# openstack user password set --original-password newsecretpass --password BeatTexas100-0

Password Uniqueness/Frequency of Change

Password uniqueness works along the same lines as the complexity rules, it will keep you from setting it to something that was previously used. If you recall from above Keystone is storing old passwords, even when this setting is not enabled. If you have a lot of users and a lot of them change passwords a lot you may want to prune the password table at some point.

unique_last_password_count= 5

You can also enforce a limit to how often a password is changed, this is set in days. Be cautious that this value is less than the length of the password expiry.

minimum_password_age = 1

User Messaging and Tooling

One thing that is unclear here is how the user will know that they are disabled and what they should do about it. Especially if they don’t use the CLI. Once disabled the user cannot tell if it is from bad logins or from inactivity. Will Horizon show any of this? Answer: yes it will: https://review.openstack.org/#/c/370973/

Additionally, as an operator you will probably also want a report of people who are disabled or about to be disabled in the next week so you can email them. There will need to be Horizon changes, user education, and tooling around these features.

Summary

Keystone has added some long-requested PCI-DSS features in Newton. I’m happy with the direction these are moving but you should use caution and consider tooling and user messaging before enabling them. Look for more improvements on these in the future.

Tagged , ,