Token Revocation Performance Improvements in Keystone Ocata

The awesome Keystone team has been working hard in Ocata to improve overall keystone performance when token revocations are present. If you’ve not read my previous post as to why this is an issue you should start there. Here’s the tl;dr background: when any token revocations are present in your database, token validation performance suffers and suffers greatly. Token validations are at the heart of your cloud. Every single OpenStack API call requires a token validation. Revocations happen when a user or project is deleted, a token revoke API call is made, or until recently, someone logged out of Horizon.

So here’s the simplification of this path: Revoked tokens slow down token validation which slows down all OpenStack API calls, ergo, revoked tokens slow down your OpenStack APIs.

Here is what this looks like in Liberty, can you see when our regression tests run and generate revocations?

Can you tell in Cinder when we have revoked tokens?

Fortunately the team focused on fixing this in Ocata and the good news is that it seemed to work. In Ocata (currently on master) there is now no longer a correlation between revoked tokens and token validation performance.

Experimental Setup

The experimental setup is the same as my previous post, except different software. The nodes are running keystone in docker containers with uwsgi using stable/newton as of Nov 12 2016. The Ocata round is using master as of commit 498d700c. Both tests are using Fernet tokens with caching.

Results

Validations Per Second as a Function of Number of Revocations

The first chart will show the number of token validations that can be completed per second. For this test more is better, it means more validations get pushed through and the test completes faster.

As you can see we no longer have the exponential decay which is good. Now the rate is steady and we will not have the spike in timings that we see after we our regression tests run. You may also notice that the base rate is a bit slower in Ocata. If you never have any token revokes this may be concerning, but this timing is still pretty fast. As I said before I was doing 20 threads at a time, if this is raised to 50 or 100 the rate would be much higher. In other words this is not a performance ceiling you are seeing, just comparing N to O under the same conditions.

99% Percentile Time to Complete a Validation

This chart examines the same data in a different way. This chart shows the time in milliseconds in which 99% of the token revocations are completed. In this chart, lower is better. You can see a linear progression in the amount of time to complete the token validation. In Newton, by the time you have 1000 revocations it goes from 99ms to validate a token to 1300 ms.

image-1

More Fixes to Come

This work is great news for having predictable keystone token performance. I won’t have to tell anyone to go truncate the revocation_event table when things get slow and we shouldn’t have graph spikes anymore. Also there is more work to come. The Keystone team is working on more fixes and improvements in this area. You can track the progress of that here: https://review.openstack.org/#/q/project:openstack/keystone+branch:master+topic:bug/1524030

Tagged ,

New PCI-DSS Features in Keystone Newton

Keystone Newton offers up some new PCI-DSS features which will help secure your Keystone deployment. I recently built a Newton dev environment (we’re running Mitaka still for the next few weeks) and tried them out. All of these features are disabled by default (commented out in the config file) and need to be enabled in order to use.

Before diving into this post you may want to read the docs on the new features which are available here. These new features are not mentioned in the Newton release notes or config guides (bug). However you can look at the generated config files which explain more about the settings here. Also note if you have a custom identity driver or you use the ldap identity driver, some of these features will not work (but the password expiry/complexity stuff should still work). You need to use the standard sql identity driver. Finally when you consider my conclusions, please keep in mind that I’m not a PCI-DSS expert, but I do know how to deploy and run OpenStack clouds.

And now the features…

Account Lockout

The first feature we will cover is lock-out. If a user fails N login attempts they will be disabled. They will be disabled for the specified duration (in seconds) and if that is not set they will be disabled indefinitely. Let’s try this one out:

Modify your config file and set:

lockout_failure_attempts=3
lockout_duration=1800

Then bounce keystone.

[DEV] root@dev01-keystone-001:~# service keystone restart
...

Next create a user and give them a role in a project:

[DEV] root@dev01-build-001:~# openstack user create --password secretpass bob
+---------------------+----------------------------------+
| Field               | Value                            |
+---------------------+----------------------------------+
| domain_id           | default                          |
| enabled             | True                             |
| id                  | 630b22fe7f814feeb5a498dc124d814c |
| name                | bob                              |
| password_expires_at | None                             |
+---------------------+----------------------------------+
[DEV] root@dev01-build-001:~# openstack role add --user bob --project admin _member_

Now let’s have bob use the wrong password a few times and get locked out:

[DEV] root@dev01-build-001:~# export OS_PASSWORD=wrongpassword
[DEV] root@dev01-build-001:~# openstack token issue
The request you have made requires authentication. (HTTP 401) (Request-ID: req-077dc218-65dd-44d0-ba23-2020abb125c3)
[DEV] root@dev01-build-001:~# openstack token issue
The request you have made requires authentication. (HTTP 401) (Request-ID: req-abf8e31f-a846-4ba3-9b08-8a60fcc48b5c)
[DEV] root@dev01-build-001:~# openstack token issue
The request you have made requires authentication. (HTTP 401) (Request-ID: req-e19a2d8b-4bb7-4ce2-bd1a-590facf2e149)
[DEV] root@dev01-build-001:~# openstack token issue
The account is locked for user: 630b22fe7f814feeb5a498dc124d814c (HTTP 401) (Request-ID: req-1b891922-01f0-485e-91f2-52bb1b6c3f79)

So bob is now locked out. One thing that surprised me is that bob is not in fact disabled if you look at his user object (openstack user show bob), just locked out. It’s a separate table entry in the local_user table. In order to unlock bob, you need to wait for the lock to expire or manually change the database.

mysql> select * from local_user where name="bob";
+----+----------------------------------+-----------+------+-------------------+---------------------+
| id | user_id                          | domain_id | name | failed_auth_count | failed_auth_at      |
+----+----------------------------------+-----------+------+-------------------+---------------------+
| 22 | 630b22fe7f814feeb5a498dc124d814c | default   | bob  |                 3 | 2016-11-09 16:21:41 |
+----+----------------------------------+-----------+------+-------------------+---------------------+

Interestingly once locked you can’t change your password either:

[DEV] root@dev01-build-001:~# openstack user password set
The account is locked for user: 630b22fe7f814feeb5a498dc124d814c (HTTP 401) (Request-ID: req-d6433510-615f-48af-b2ea-b4fd3b8407c7)

DOSing Your Cloud for Fun & Profit

To be honest this feature scares me because of the way it can DOS your cloud. The first way is what if someone (or some tool) misconfigures a service account for say, nova. Nova tries to do its job and it starts trying to get tokens. After it fails 3 times, you’ve now essentially bricked nova for 30 minutes. Here’s a worse way, what if I just pretend I’m nova by setting OS_USERNAME and trying to get a token.

[DEV] root@dev01-build-001:~# export OS_TENANT_NAME=services
[DEV] root@dev01-build-001:~# export OS_USERNAME=nova
[DEV] root@dev01-build-001:~# export OS_PASSWORD=evil_hax0r_dos_nova
[DEV] root@dev01-build-001:~# openstack token issue
The request you have made requires authentication. (HTTP 401) (Request-ID: req-47390107-236b-4317-b97a-b5e811af87b2)
[DEV] root@dev01-build-001:~# openstack token issue
The request you have made requires authentication. (HTTP 401) (Request-ID: req-99f88665-d653-4e73-8e03-1b897e14440f)
[DEV] root@dev01-build-001:~# openstack token issue
The request you have made requires authentication. (HTTP 401) (Request-ID: req-7ca5878a-f633-4f23-9b67-fdb5bec13dcf)
[DEV] root@dev01-build-001:~# openstack token issue
The account is locked for user: 8935e905d8ab416d96509022fa6c00d1 (HTTP 401) (Request-ID: req-cbff7497-88c2-45ce-bf9e-14bfee9d1e78)

Nova is now broken. Repeat for all services or even better lock the admin account or lock the account of the guy next to you who chews too loudly at lunch…

So with that in mind, here’s some things to consider before enabling this:

  • The failed_auth_count resets anytime there is a successful login.
  • There’s no method I saw for an admin to unlock the user without hacking the database
  • When the user is locked they can’t change their password
  • This feature can DOS your cloud

Inactive Users

The next feature added is the ability to disable users who have not logged in within the past N days. To enable this, set the feature and restart keystone, I will choose 1 day for this example.

disable_user_account_days_inactive = 1

Once enabled Keystone will begin to track last login with day granularity in the user table.

+----------------------------------+---------+---------------------+----------------+
| id                               | enabled | created_at          | last_active_at |
+----------------------------------+---------+---------------------+----------------+
| 630b22fe7f814feeb5a498dc124d814c |       1 | 2016-11-09 15:51:35 | 2016-11-09     |
+----------------------------------+---------+---------------------+----------------+

Since this has day granularity only, you will end up with an effective timeout of < the number of days you put in. In other words although bob logged in above at 3pm, he will get expired at 12:01AM the next day, not 3pm the next day. (Also don't set this to 1 day this is just an experiment). I was too impatient to wait a whole day so I just hacked the DB and so it looked like bob hadn't logged in since yesterday. Just like the failed logins feature, the account ends up disabled.

[DEV] root@dev01-build-001:~# openstack token issue
The account is disabled for user: 630b22fe7f814feeb5a498dc124d814c (HTTP 401) (Request-ID: req-8ce84bd3-911e-4bab-a972-664ff4ff03f1)

You should consider if you have both this and the failed logins enabled you could end up with a user who is essentially 2x disabled. First for failing logins and then for not having successfully logged in.

Password Expiration

By default all user objects returned by the v3 APIs will include a password expiry field with Keystone Newton. The field is null unless this feature is enabled however. Here’s what it looks like:

{"user": {"password_expires_at": null, "links": {"self": "http://dev01-admin.os.cloud.twc.net:35357/v3/users/3607079988c140a3a49311a2c6b75f86"}, "enabled": true, "email": "root@localhost", "id": "3607079988c140a3a49311a2c6b75f86", "domain_id": "default", "name": "admin"}}

To enable this feature, we set “password_expires_days = 1” and bouncing keystone, but then also we need to change the existing password or make a new user. This setting is not retroactive on existing accounts and passwords.

So to make this take full effect we will reset bob’s password.

[DEV] root@dev01-build-001:~# openstack user password set
Current Password:
New Password:
Repeat New Password:

And then look at his user object:

[DEV] root@dev01-build-001:~# openstack user show bob
+---------------------+----------------------------------+
| Field               | Value                            |
+---------------------+----------------------------------+
| domain_id           | default                          |
| enabled             | True                             |
| id                  | 630b22fe7f814feeb5a498dc124d814c |
| name                | bob                              |
| password_expires_at | 2016-11-10T17:28:49.000000       |
+---------------------+----------------------------------+

Note that unlike the inactive user settings this stores data with second-level granularity.

Interestingly when the password was updated it kept the old one in the DB and just expired it, so in the DB bob has 2 passwords, one valid, one expired. This is new behavior.

mysql> select * from password where local_user_id=22;
+----+---------------+-------------------------------------------------------------------------------------------------------------------------+---------------------+--------------+---------------------+
| id | local_user_id | password                                                                                                                | expires_at          | self_service | created_at          |
+----+---------------+-------------------------------------------------------------------------------------------------------------------------+---------------------+--------------+---------------------+
| 23 |            22 | $6$rounds=10000$eif6VK1cIZshn4v.$4B5zaiGTiTc7BFD5OAP0uro0HAdNzh0SttL6Lt3CjYKDt8Esvt./y3rTlQS7XTqVhhVJvpvpxb7UDeeATZVxn1 | 2016-11-09 17:28:50 |            0 | 2016-11-09 15:51:35 |
| 24 |            22 | $6$rounds=10000$anOfjEPsi92JXxO2$HySHiUPI6JI4wmWoRJkMc7X4lvOgFbXu.AXTSBbAuYwYUUEjbcg4xhqkrjlAKFXhM0Mbvb/J0pzQXv1uq65mD. | 2016-11-10 17:28:49 |            1 | 2016-11-09 17:28:50 |
+----+---------------+-------------------------------------------------------------------------------------------------------------------------+---------------------+--------------+---------------------+

Now lets try to login using the new password we set:

[DEV] root@dev01-build-001:~# export OS_PASSWORD=newsecretpass
[DEV] root@dev01-build-001:~# openstack token issue
The password is expired and needs to be reset by an administrator for user: 630b22fe7f814feeb5a498dc124d814c (HTTP 401) (Request-ID: req-ef525f73-d6bf-4f07-b70f-34e7d1ae29c2)

This is a clear error message that the user will hopefully see and take action on. Note the message that an admin has to do the reset. bob cannot reset his own password once it expires.

There is also a related setting here called password_expires_ignore_user_ids which exempts certain user ids from password expiry requirements. You would probably set this for your service accounts. Note that this is id not name which may make your deployment automation to be more complex since ids vary across environments..

Password Complexity & Uniqueness

One of the most requested features was enforcing password complexity for users and now with Keystone Newton it is available. There are two settings here that can be enabled, the first is a regex for complexity and the 2nd is a user-readable description of that regex. This is used for error messages. To enable let’s set both.

As an admin, I’ve decided that all passwords must have “BeatTexas” in them because I’m a WVU fan and they play on Saturday. So lets set the settings, bounce keystone, and try it:

password_regex = ^.*BeatTexas.*$
password_regex_description = "passwords must contain the string BeatTexas"

Now let’s have bob try to set his password:

[DEV] root@dev01-build-001:~# openstack user password set --original-password newsecretpass --password HookEmHorns
Password validation error: The password does not meet the requirements: passwords must contain the string BeatTexas (HTTP 400) (Request-ID: req-9bfcf550-e822-4053-b566-f08f3b52e26e)

UT fan denied! But it works if you follow my rules:

[DEV] root@dev01-build-001:~# openstack user password set --original-password newsecretpass --password BeatTexas100-0

Password Uniqueness/Frequency of Change

Password uniqueness works along the same lines as the complexity rules, it will keep you from setting it to something that was previously used. If you recall from above Keystone is storing old passwords, even when this setting is not enabled. If you have a lot of users and a lot of them change passwords a lot you may want to prune the password table at some point.

unique_last_password_count= 5

You can also enforce a limit to how often a password is changed, this is set in days. Be cautious that this value is less than the length of the password expiry.

minimum_password_age = 1

User Messaging and Tooling

One thing that is unclear here is how the user will know that they are disabled and what they should do about it. Especially if they don’t use the CLI. Once disabled the user cannot tell if it is from bad logins or from inactivity. Will Horizon show any of this? Answer: yes it will: https://review.openstack.org/#/c/370973/

Additionally, as an operator you will probably also want a report of people who are disabled or about to be disabled in the next week so you can email them. There will need to be Horizon changes, user education, and tooling around these features.

Summary

Keystone has added some long-requested PCI-DSS features in Newton. I’m happy with the direction these are moving but you should use caution and consider tooling and user messaging before enabling them. Look for more improvements on these in the future.

Tagged , ,

Writing a Nova Filter Scheduler for Trove

In the process of deploying Trove, we had one simple requirement: “Only Run Trove instances on Trove nodes”. Surprisingly this is a difficult requirement to meet. What follows is our attempts to fix it and what we ended up doing. Some of these things mentioned do not work because of how we want to run our cloud and may not apply to you. Also this is not deployed in production yet, if I end up trashing or significantly changing this idea I will update the blog.

Option 1 – Use Special Trove Flavors

So you want to only run Trove instances on Trove compute nodes, nova can help you with this. The first option is to enable the deftly named AggregateInstanceExtraSpecsFilter in Nova. If you turn on this filter and then attach extra-specs to your flavors it will work as designed. As an aside, if you’re a software developer, placing warnings in CLI tools that end-users use is only going to cause consternation, for example the warnings below.

mfischer@Matts-MBP-4:~$ openstack aggregate list
WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils
+------+-------+-------------------+
|   ID | Name  | Availability Zone |
+------+-------+-------------------+
|  417 | M4    | None              |
|  423 | M3    | None              |
| 1525 | Trove | None              |
+------+-------+-------------------+
mfischer@Matts-MBP-4:~$ openstack aggregate show 1525
WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils
+-------------------+--------------------------------------------------------------------------------------+
| Field             | Value                                                                                |
+-------------------+--------------------------------------------------------------------------------------+
| availability_zone | None                                                                                 |
| created_at        | 2016-05-11T20:35:07.000000                                                           |
| deleted           | False                                                                                |
| deleted_at        | None                                                                                 |
| hosts             | [u'bfd02-compute-trove-005', u'bfd02-compute-trove-004', u'bfd02-compute-trove-003'] |
| id                | 1525                                                                                 |
| name              | Trove                                                                                |
| properties        | CPU='Trove'                                                                          |
| updated_at        | None                                                                                 |
+-------------------+--------------------------------------------------------------------------------------+

Note the properties portion here. This then matches with the special Trove flavors that we made. On the flavors we set the again deftly named aggregate_instance_extra_specs.

mfischer@Matts-MBP-4:~$ openstack flavor show 9050
WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils
+----------------------------+-------------------------------------------------------------------------------------------+
| Field                      | Value                                                                                     |
+----------------------------+-------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                                     |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                         |
| access_project_ids         | None                                                                                      |
| disk                       | 5                                                                                         |
| id                         | 9050                                                                                      |
| name                       | t4.1CPU.512MB                                                                             |
| os-flavor-access:is_public | True                                                                                      |
| properties                 | aggregate_instance_extra_specs:CPU='Trove',              |
|                            |                                                         |
| ram                        | 512                                                                                       |
| rxtx_factor                | 1.0                                                                                       |
| swap                       |                                                                                           |
| vcpus                      | 1                                                                                         |
+----------------------------+-------------------------------------------------------------------------------------------+

We do all this currently with puppet automation and facter facts. If you are a trove compute node you get a fact defined and then puppet sticks you in the right host aggregate.

So this solution works but has issues. The problem with new flavors is that everyone sees them, so someone can nova boot anything they want and it will end up on your Trove node, thus violating the main requirement. Enter Option 2.

Option 2 – Set Image Metadata + a Nova Scheduler

In combination with Option 1, we can set special image metadata such that nova will only schedule images to that node. The scheduler that kinda does this is obviously AggregateImagePropertiesIsolation (pro-tip: do not let Nova devs name your child). This scheduler matches metadata like the flavors above except does it on images. Trove images would be tagged with something like trove=true, for example:

openstack image set --property trove=true cirros-tagged

[DEV] root@dev01-build-001:~# openstack image list
+--------------------------------------+----------------+--------+
| ID                                   | Name           | Status |
+--------------------------------------+----------------+--------+
| 846ee606-9559-4fdc-83b9-1ca57895cf92 | cirros-no-tags | active |
| a12fda2c-d2ff-4b7b-b8f0-a8400939df78 | cirros-tagged  | active |
+--------------------------------------+----------------+--------+
[DEV] root@dev01-build-001:~# openstack image show a12fda2c-d2ff-4b7b-b8f0-a8400939df78
+------------------+-----------------------------------------------------------------------------------------------------+
| Field            | Value                                                                                               |
+------------------+-----------------------------------------------------------------------------------------------------+
<snips abound>
| id               | a12fda2c-d2ff-4b7b-b8f0-a8400939df78                                                                |
| properties       | description='', direct_url='rbd://b589a8c7-9b74-49dd-adbf-90733ee1e31a/images/a12fda2c-d2ff-4b7b-   |
|                  | b8f0-a8400939df78/snap', trove='true'                                                |                                                                                            |
+------------------+-----------------------------------------------------------------------------------------------------+

The problem is that the AggregateImagePropertiesIsolation scheduler considers images that do not have the tag at all to be a match. So while this is solvable for images we control and automate, it is not solvable for images that customers upload, they will end up on Trove nodes because they will not have the trove property. You could solve this with cron but thats terrible for a number of reasons.

Option 2a – Write Your Own Scheduler

So now we just bite the bullet and write our own scheduler. Starting with the AggregateImagePropertiesIsolation we hacked it down to the bare minimum logic and that looks like this:

    def host_passes(self, host_state, spec_obj):
        """Run Trove images on Trove nodes and not anywhere else."""

        image_props = spec_obj.get('request_spec', {}).\
            get('image', {}).get('properties', {})

        is_trove_host = False
        for ha in host_state.aggregates:
            if ha.name == 'Trove':
                is_trove_host = True

        # debug prints for is_trove_host here

        is_trove_image = 'tesora_edition' in image_props.keys()

        if is_trove_image:
            return is_trove_host
        else:
            return not is_trove_host

So what does it do. First it determines if this is a trove compute host or not, this is a simple check, are you in a host-aggregate called Trove or not? Next we determine if someone is booting a Trove image or not. For this we use the tesora_edition tag which is present on our Trove images. Note we don’t really care what it’s set to, just that it exists. This logic could clearly be re-worked or made more generic and/or configurable #patcheswelcome.

Deploying

A few notes on deploying this. Once your python code is shipped you will need to configure it. There are two settings that you need to change:

- scheduler_available_filters - Defines filter classes made available to the
scheduler. This setting can be used multiple times.

- scheduler_default_filters - Of the available filters, defines those that the
scheduler uses by default.

The scheduler_available_filters defaults to a setting that basically means “all” except that doesn’t mean your scheduler, just the default ones that ship with Nova, so when you turn this on you need to change both settings. This is a multi-value string option which means in basic terms you set it multiple times in your configs, like so:

scheduler_available_filters=nova.scheduler.filters.all_filters
scheduler_available_filters=nova_utils.scheduler.trove_image_filter.TroveImageFilter

(Note for Puppet users: The ability to set this as a MultiStringOpt in Nova was not landed until June as commit e7fe8c16)

Once that’s set you need to make it available, I added it to the list of things we’re already using:

scheduler_default_filters = <usual stuff>,TroveImageFilter

Note that available has the path to the class and default has the class name, do this wrong and the scheduler will error out saying it can’t find your scheduler.

Once you make this settings I also highly recommend enabling debug and then bouncing nova-scheduler. With debug on, you will see nova walk the filters and see how it picks the node. Unsurprisingly it will be impossible to debug without this enabled.

In Action

With this enabled and with 3 compute nodes I launched 6 instances. My setup was as follows:

compute3 – Trove host-aggregate
compute1,2 – Not in Trove host-aggregate

Launch 3 images with the tagged images, note they all go to compute3.
Launch 3 images with the un-tagged images, note they all go to compute1,2

Here’s some of the partial output from the scheduler log with debug enabled.

2016-09-23 01:45:56.763 1 DEBUG nova_utils.scheduler.trove_image_filter 
(dev01-compute-002, dev01-compute-002.os.cloud.twc.net) ram:30895 disk:587776 
io_ops:1 instances:1 is NOT a trove node host_passes 
/venv/local/lib/python2.7/site-packages/nova_utils/scheduler/trove_image_filter.py:47
2016-09-23 01:45:56.763 1 DEBUG nova_utils.scheduler.trove_image_filter 
(dev01-compute-003, dev01-compute-003.os.cloud.twc.net) ram:31407 disk:588800
io_ops:0 instances:0 is a trove node host_passes
/venv/local/lib/python2.7/site-packages/nova_utils/scheduler/trove_image_filter.py:44

Conclusion

So although I didn’t really want to, we wrote our own filter scheduler. Since there’s lots of good examples out there we had it working in less than an hour. In fact it took me longer to cherry-pick the puppet fixes I need and figure out the config options than to write the code.

Writing a nova scheduler filter let us solve a problem that had been bugging us for some time. If you plan on writing your own filter too you could look at the barebones docs for new filter writing here, note that there’s no section header for this so look for “To create your own filter”. (when this lands there will be section headers on the page: https://review.openstack.org/#/c/375702/) I’d also recommend when you’re first working on it, just copy an existing one and hack on it in the same folder, then you don’t have to deal with changing the scheduler_available_filters setting, it loads everything in the filters folder.

Tagged , ,

Keep Your OpenStack API Databases Tidy

After running a cloud for 2+ years our OpenStack API databases are full of cruft. Deleted instances, deleted networks, deleted volumes, they all are still in the databases. OpenStack has no periodic clean-up for this stuff, it’s left up to you. This is partly because there’s no unified way to do it and also because each operator has different requirements on how long to retain data. Over the past few weeks I’ve been cleaning up our records and would like to share what I’ve found.

Warning: This is an advanced operation. Before doing this: You should backup everything. You should test it in a dev environment. You should assess the impact to your APIs. I did all of those before attempting any of this, including importing prod data into a test OpenStack environment to assess the impact there.

Each service has it’s own database, and depending on how that code is written and usage, some services stored more data than others. Here’s the order in terms of largest to smallest of our databases, on disk size (which we will discuss later).

Pre-cleaning DB sizes
3.6G /var/lib/mysql/nova
1.1G /var/lib/mysql/heat
891M /var/lib/mysql/cinder
132M /var/lib/mysql/designate
131M /var/lib/mysql/neutron
103M /var/lib/mysql/glance
41M /var/lib/mysql/keystone
14M /var/lib/mysql/horizon

So with this in mind I started digging into how to clean this stuff up. Here’s what I found. I’m noting in here what release we’re on for each because the tooling may be different or broken for other releases.

Heat – Mitaka

Heat was the first one I did, mainly because if Heat blows up, I can probably still keep my job. Heat has a great DB cleanup tool and it works very well. Heat lets you say “purge deleted records > X days/months/etc old”. When I did this heat had so much junk that I “walked it in”, starting with 365 days, then 250, etc etc. Heat developers win the gold medal here for best DB clean-up tool.

heat-manage purge_deleted -g days 30

Keystone – All Versions

Guess what? Keystone doesn’t keep ANY deleted junk in it’s database, once it’s gone, it’s gone. This can actually be an issue when you find a 2 year old instance that has a userid you can’t track down, but that’s how it is. So as long as you’re not storing tokens in here, you’re good. We’re using Fernet tokens, so no issues here.

Cinder – Liberty

Cinder’s DB cleanup tool is broken in Liberty. It is supposed to be fixed in Mitaka, but we’re not running Mitaka. Hoping to try this after we upgrade. We have a lot of volume cruft laying around.

Glance – Liberty

Glance has no cleanup tool at all that I can find. So I wrote one, but we ended up not using it. Why? Well because it seems that Glance can and will report deleted images via the V2 API and I could never quite convince myself that we’d not break stuff by doing a cleanup. Anyone know otherwise?

Here’s my code to do the cleanup, be careful with it! Like Heat you should probably “walk this in” by changing “1 MONTH” to “1 YEAR” or “6 MONTHS”. These deletes will lock the tables which will hang up API calls while they are running, plan appropriately. Note if you look on the internet you might find other versions that disable foreign key constraints, don’t do that.

mysql -u root glance -e "DELETE FROM image_tags WHERE image_id in\
    (SELECT images.id FROM images WHERE images.status='deleted'\
    AND images.deleted_at <DATE_SUB(NOW(),INTERVAL 1 MONTH));"
mysql -u root glance -e "DELETE FROM image_properties WHERE image_id in\
    (SELECT images.id FROM images WHERE images.status='deleted'\
    AND images.deleted_at <DATE_SUB(NOW(),INTERVAL 1 MONTH));"
mysql -u root glance -e "DELETE FROM image_members WHERE image_id in\
    (SELECT images.id FROM images WHERE images.status='deleted'\
    AND images.deleted_at <DATE_SUB(NOW(),INTERVAL 1 MONTH));"
mysql -u root glance -e "DELETE FROM image_locations WHERE image_id in\
    (SELECT images.id FROM images WHERE images.status='deleted'\
    AND images.deleted_at <DATE_SUB(NOW(),INTERVAL 1 MONTH));"
mysql -u root glance -e "DELETE FROM images WHERE images.status='deleted'\
    AND images.deleted_at <DATE_SUB(NOW(),INTERVAL 1 MONTH);"

Nova – Liberty

Like Heat, Nova also has a clean-up tool, and also like Heat, Nova has a huge database. Unlike Heat, Nova’s clean-up tool is more limited. The only thing you can tell it is “don’t delete more than this many rows”. Actually Nova calls it’s tool “archiving” because it doesn’t delete records, it moves them to shadow tables. So even if you use this, you need to go back and truncate all the shadow tables.

Also near as I can tell nova just tries to archive up to the max records paying not attention to any database constraints, so when you use it, you will get warnings. These appear safe to be ignore. Also the Nova archive (in Liberty) doesn’t tell you anything (I think this is fixed in Mitaka), so figuring out when you are done is some interesting guess work. Basically I just re-ran it over and over and compared the sizes of the shadow tables, when they stop changing we’re done.

Also one quick note, when this finishes and you run du, you’re going to find out that you are now using more disk space. That’s because you just did a bunch of inserts into the shadow tables.

Like everything else, walk this in.

$ nova-manage db archive_deleted_rows 50000
2016-08-30 21:49:01.404 1 WARNING nova.db.sqlalchemy.api [req-f329a277-4fe2-45d6-ba3a-51f93827ed2f - - - - -] IntegrityError detected when archiving table aggregate_metadata
2016-08-30 21:49:11.900 1 WARNING nova.db.sqlalchemy.api [req-f329a277-4fe2-45d6-ba3a-51f93827ed2f - - - - -] IntegrityError detected when archiving table instances

How Much Disk Will I Get Back

Surprise, you get nothing back! That’s because the disk space is already allocated. If this important to you, then you will need to OPTIMIZE the tables. This ends up doing a full recreate (depending on what DB you are using) and this WILL lock your tables and hang API calls. Be very careful when doing this. How much size can you save? Well for Heat it was about 6-7x smaller, 1.1G to 170M, the gain in Nova was more like 30%. Glance was also about 8x but I was took chicken to take that past our dev environment because of the API.

Why?

This is a question you should ask yourself before attempting this. Some of these operations are risky, but it’s also going to hurt your performance if you let these grow without bound. Some of you may want to just do the cleanup and skip the optimization steps. If you do the optimizations I’d recommend you know how long it takes for each table/service. If you can export your prod data onto a test node that will give you a better idea.

Others

  • Horizon just stores ephemeral session data so it’s pretty clean.
  • Neutron – the DB is small so I’ve not looked into it, anyone tried it? Comments welcome.
  • Designate – we’re on an ancient version (Juno) so any experiments here will happen on something more modern

Evolution of a Cloud As Seen Through Gerrit

Over the past two+ years our cloud has evolved, we’ve expanded, we’ve containerized, we’ve added services. Last week I thought it would be interesting to see this evolution by looking at Gerrit reviews over time, so like sand through an hourglass, these are the reviews of our cloud. Note: I used some artistic license to skip reviews that were abandoned or boring.

Review 1000 – Initial version of JDG jobs for git-crypt – Oct 2014

git-crypt is a tool that’s useful when you need to store stuff like certificates in git. We couldn’t find a package for it, so this commit sets up Jenkins Debian Glue jobs to build it.

Review 2000 – Adds monasca rabbitmq checks – Jan 2015

We’ve been running Monasca now for some time, it’s a great tool for monitoring, especially time-series data, like seeing how long it takes to build an instance. Seems like we added checks for rabbit around this time. I’m fairly sure that we were the first non-HP team to roll this out and we are active contributors to this and authored the puppet module.

Review 3010 – Enable keystone auth for designate – March 2015

Around this time we were rolling out Designate, the DNS as a Service project. It’s been pretty solid, so solid that we’re still running the Juno version today.

Review 4001 – Remove scenario variable – May 2015

Our first install was deployed using Puppet OpenStack Builder. As time permitted, we removed all the references to it. This review was to remove a scenario variable. A scenario was like “2 nodes” or “single node” or “full HA”, since we were no longer doing a reference architecture, we didn’t need this code anymore.

Review 5000 – Revert “Adding new launch instance dialog for testing” – July 2015

Even when we do testing, the code is pushed out via the standard process. It’s reverted that way too.

Review 6000 – Add more dnvrco03 test boxes – Sep 2015

Around this time we were standing up our v2 network/system architecture. We setup a fake environment here called dnvrco03 that lets us do burn-in testing on systems.

Review 7002 – Implements A10 templates for NTP server – November 2015

We believe that you need to automate all infrastructure, in this review we’re working on automation for our hardware load balancers. This was a key part of moving to our second network architecture.

Review 8000 – Merge branch ‘import/import-2015121002341449714853’ – December 2015

We use git-upstream to track and merge in changes to projects like Horizon, and also all the puppet modules. This was a merge of puppet-cinder into our local copy.

Review 9000 – Parallelize cephosd/swift/compute deploy – Feb 2016

As you grow hardware you need to speed up the deploy process. This ansible commit changes how we parallelize puppet runs on some servers. We’re still tweaking the deploy to this day.

Review 10000 – Add tesora docker support tooling – March 2016

We’re working on Trove with Tesora now and may roll it out in Docker containers. This was prep work for that effort.

Review 11000 – Update IP for bfd02-trove-002 – June 2016

This one seems boring but it’s interesting for me. The trove box needs a new IP because it’s an OpenStack VM. We’re going to run some services, like Trove, as VMs on top of OpenStack. This avoids having to chew up hardware for just a simple API server. We’re specifically doing Trove as a separate node because we want it to have a separate Rabbit cluster.

Review 12000 – Update heat/docker with new avi-heat libs August 2016

Right now we’re testing out new Heat resources for Avi Networks load balancing. The repo is here: https://github.com/avinetworks/avi-heat

We should have review 20000 in 2017, I’m really curious to see what it will be!

Keystone Token Performance: Liberty vs Mitaka

A number of performance improvements were made in Keystone Mitaka, including caching the catalog, which should make token creation faster according to the Keystone developers. In this blog post, I will test this assertion.

My setup is unique to how I run keystone, you may be using different token formats, different backends, different web servers, and a different load balancer architecture. The point here is just to test Mitaka vs Liberty in my setup.

Keystone Setup

I’m running a 3 node Keystone cluster on virtual machines running in my OpenStack cloud. The nodes are fronted by another virtual machine running haproxy. The keystone cluster is using round-robin load balancing. I am requesting the tokens from a third virtual machine via the VIP provided by haproxy. The keystone nodes have 2 VCPUs + 4G RAM.

Keystone is running inside a docker container, which runs uwsgi. uwsgi has 2 static threads.

  • The Mitaka code is based on stable/mitaka from March 22, 2016.
  • The Liberty code is based on stable/liberty from March 16, 2016.

Note: I retested again with branches from April 17 and April 15 respectively, results were the same.

Keystone is configured to use Fernet tokens and the mysql backend.

I did not rebuild the machines, the mitaka runs are based on nodes upgraded to Mitaka from Liberty.

Experimental Setup

I am doing 20 benchmark runs against each setup, delaying 120 seconds in between each run. The goal here is to even out performance changes based on the fact that these are virtual machines running in a cloud. The tests run as follows:

  • Create 200 tokens serially
  • Validate 200 tokens serially
  • Create 1000 tokens concurrently (20 at once)
  • Validate 500 tokens concurrently (20 at once)

The code for running these benchmarks, which I borrowed from Dolph Mathew’s and made a bit easier to use, is available on github. Patches welcome.

Results

Is Mitaka faster? Answer: No.

Something is amiss in Mitaka Fernet token performance and there is a serious degradation here.

The charts tell the story, each chart below shows how many requests per second can be handled, and concurrent validation is the most concerning because this is the standard model of what a cloud is doing. Dozens of API calls being made at once to tens of services and each one wants to validate a token.

Liberty vs Mitaka: No Caching

So you can see that concurrent validation is much slower. Let’s also compare with memcache enabled:

Liberty vs Mitaka with Caching

Let’s look at raw data which is more damning due to the scale on the charts:

Data

Notes

I spent some time thinking about why this might be slower and I see one clue, the traffic to memcache (shown using stats command) in Mitaka is 3-4x what it is in Liberty. Perhaps Keystone is caching too much or too often? I don’t really know but that is an interesting difference here.

I’m hopeful that this gets fixed or looked at in Newton and backported to Mitaka.

Possible sources of error:

  • These are VMs, could have noisy neighbors. Mitigation: Run it a whole lot. Re-run on new VMs. Run at night.
  • Revocation events. Mitigation: Check and clear revocation table entries before running perf tests.

I’d really like someone else to reproduce this, especially using a different benchmarking tool to confirm this data. If you did, please let me know.

Tagged , ,

OpenStack Deployments Book: The Home Stretch

Over the past year, I’ve been working with Elizabeth K Joseph (pleia on IRC) on a book about OpenStack deployments. The idea of this book is that sysadmins and engineers can read it and follow along by setting up OpenStack. This way they can get a feel for how it works, how to diagnose issues, and plan their deployments.

Deployments Book

So how did this project develop? Well, Liz had approached me in Vancouver about the book and we met to discuss it. During a summit lunch overlooking the harbor, we went over scope and responsibilities and with some caveats, I signed on as a contributing author. My main role would be to write puppet code to install and run OpenStack. The idea is that this code would be in a form where users could choose the bits and pieces they wanted to try out and match that against cloud use cases. This would be incorporated into the book’s idea of “OpenStack recipes”. So off and on over the next 12 months, I’ve been reviewing content, discussing strategy, and writing puppet code. Much of the code that I did for this book was actually done in the upstream OpenStack puppet modules, and being core in those modules made that much easier. I’ve watched those modules get more and more mature during the 12 months we worked on this book. During the time we worked on the book we’ve switched from Kilo to Liberty to Mitaka, and although there’s always some bugs during a switch, each one has had less and less.

So where is this book at now? Well the good news is that as of tonight we’re officially code complete and mostly text complete. We’ll continue to work out some kinks and bugs over the next month or so and the book looks like it’s on track for a summer release.

Aside from pre-ordering, you can follow along for updates on the book in a few different ways. Check the website for some updates from Liz (coming soon): http://deploymentsbook.com/. Also follow the official OpenStack Deployments Book twitter feed: @deploymentsbook

Once we publish, I’ll also post an update here.

My Summer Sabbatical from Open Source and Volunteerism

tl;dr open source vacation

I’ve always enjoyed being part of communities. Whether it’s in software, the Ubuntu community, the Puppet community, the OpenStack community, etc. Or whether it’s outside of software, doing volunteer work for the US Forest Service or my son’s scout group. While I enjoy these activities and consider them to also be hobbies, they are massive time sinks and that time has to come from somewhere. I have the unfortunate personal habit of signing up for more work than I can or should do at the expense of other activities. I don’t want to call it burn-out but it does cause me to be frustrated, which I’ve noticed I’ve become more and more. So I’ve decide that I want to enjoy my summer to it’s fullest extent, so to that end I announced that I was detaching from opensource/volunteer work for the summer.

I’ve gotten a bunch of questions already so here goes:

Q: Why?
A: See Above

Q: Why announce it?
A: Because it makes it easier for me to say “no” to something and people won’t wonder where I disappeared to.

Q: Do you still like us?
A: Yes.

Q: Will you still be on IRC/mailing lists etc.
A: Yes, this is still my full time job. I will be online and available during normal work hours. I will probably stop my IRC bouncer.

Q: Will you still be pushing commits to opensource stuff?
A: Yes, for my day job.

Q: Okay so what won’t you be doing?
A: I won’t be on IRC 24/7. I probably won’t be replying to emails on the operators mailing list. I won’t be pushing stuff to osops. I won’t be doing any packaging work. I won’t be on my laptop every evening working on stuff.

Q: So what will you be doing?
A: Riding bikes with my wife & kids. Hiking and camping. Working on beer recipes. Trying new breweries. Cleaning my basement. Sitting on my patio. In other words, like a normal person, work during work hours, don’t work during non-work hours.

OpenStack Summit Talks

It’s become tough for me to go back and find links to talks I’ve given at past OpenStack summits and conferences so I’m making this page as an index that I can point people at. So here they all are in chronological order. A quick note, if you hear us talking about things like CI process and the talk is old, there’s likely to be significant improvements since then.

OpenStack Live – Sunnyvale – April 2015

Deploying an OpenStack Cloud at Scale at Time Warner Cable (no video, slides only)

OpenStack Summit – Vancouver – April 2015

Building Clouds with OpenStack Puppet Modules – good overview of the OpenStack puppet modules and how they will mix with your code to build running cloud.

Getting DNSaaS to Production with Designate – the process we went through to get Designate into production and how we wrote a Designate sink handler.

A CI/CD Alternative to Push & Pray for OpenStack – great overview of a full CI/CD process for OpenStack deployments, using Ansible, puppet, gerrit, and Jenkins. This contains updated information from the OpenStack Live talk above.

Real World Experiences with Upgrading OpenStack at Time Warner Cable – an overview of our upgrade process and pitfalls in going from the Juno to Kilo release.

PuppetConf – Portland – September 2015

Practical CI/CD with Puppet Code and Configuration – a deep dive into our CI/CD process with a puppet focus. If you use Puppet code this applies generally, if you use Puppet & OpenStack watch this and the talk from Vancouver.

OpenStack Summit – Tokyo – October 2015

Proud to be a Noob: How to Make the Most of Your First OpenStack Summit – a panel discussion on how to get the most out of your first OpenStack summit, watch before Austin!

Duct-tape, Bubblegum, and Bailing Wire: 12 Steps in Operating OpenStack – A humorous look at the trials and tribulations of running OpenStack as an operator.

OpenStack Summit – Austin – April 2016 – coming soon

Videos coming soon! Hope to see you there.

Get Ready for Fernet Tokens – Fernet is coming. Are you prepared? Come to this talk to understand how they work and how to operate them.

Moving a Running OpenStack Cloud to a New Data Center – Our data center is out of space, but we need more capacity. What to do? Move the cloud of course! But don’t upset customers in the process, it has to stay running! Come hear about how we did it.

Experiences and Priorities for Private Cloud Keystone and Public Cloud Keystone – a panel discussion with other operators about Keystone’s present, priorities, and futures.

Tagged

Consuming Keystone CADF Events From RabbitMQ

This started with a simple requirement: “I’d like to know when users or projects are added or removed and who did the action”

As it turns out there’s no great way to do this. Sure you can log it when a user is deleted:

"DELETE /v2.0/users/702b12ec7f0e4f7d93945eebb95705e1 HTTP/1.1" 204 - "-" "python-keystoneclient"

The only problem is that ‘702b12ec7f0e4f7d93945eebb95705e1’ is meaningless without the DB entry which is now conveniently gone.

But if you had an async way to get events from Keystone, you could solve this yourself. That was my idea with my Keystone CADF Event Logger tool. Before we dive into the tool, some quick background on CADF events. You can read the DMTF mumbo-jumbo at the link in the previous sentence, but just know, Keystone CADF events log anything interesting that happens in Keystone. They also tell you who did it, from where they did it, and when they did it. All important things for auditing. (This article from Steve Martinelli has some more great background)

So how does this solve my problem? CADF events still just log ids, not names. My solution was a simple rabbit consuming async daemon that cached a user and project names locally and used it to do lookups. Here’s an example of what it does:

Logs user auth events

Note that V2 doesn’t log much info on these, although that is fixed in Liberty I believe.

INFO 2015-09-24 15:09:27.172 USER AUTH: success: nova
INFO 2015-09-24 15:09:27.524 USER AUTH: success: icinga
INFO 2015-09-24 15:09:27.800 USER AUTH: success: neutron
INFO 2015-09-24 15:09:27.800 USER AUTH: failure: neutron

Log user/project crud events

Note again V2 issues here with Kilo leave us with less than full info.

USER CREATED: success: user ffflll at 2015-09-18 16:00:10.426372 by unknown (unknown) (project: unknown (unknown)).
USER DELETED: success: user ffflll at 2015-09-18 16:02:13.196172 by unknown (unknown) (project: unknown (unknown)).

Figures it out when rabbit goes away

INFO 2015-11-11 20:46:59.325 Connecting to 1.2.3.4:5672
ERROR 2015-11-11 22:16:59.514 Socket Error: 104
WARNING 2015-11-11 22:16:59.515 Socket closed when connection was open
WARNING 2015-11-11 22:16:59.515 Disconnected from RabbitMQ at top-secret-internal-company-url.com:5672 (0): Not specified
WARNING 2015-11-11 22:16:59.516 Connection closed, reopening in 5 seconds: (0) Not specified

Setup

This requires that Keystone is configured to talk to rabbit and emit CADF events. The previously referenced blog from Steve Martinelli has good info on this. Here’s what I set:

notification_format=cadf
notification_driver=messaging
notification_topics=keystone_to_cadf_logger

This code also assumes that /var/log/keystone_cadf is there and writable. I setup this with puppet in my environment.

You should ensure Keystone is talking to Rabbit and has made the queues and exchanges before trying the program.

Usage

I designed this to run in a docker container, which explains the overly full requirements.txt, you can probably get away with the requirements.txt.ORIG. After you build it (python ./setup.py build && python ./setup.py install, just run it by passing in creds for Keystone and for RabbitMQ. You can also use environment variables which is I how I ran in it my docker container.

source openrc
keystone-cadf-logger --rabbit_user rabbit --rabbit-pass pass1 --rabbit-host dev-lb.twc.net

Issues

So what issues exist with this? First some small ones. The code that parses the events is horrible and I hate it, but it worked. You can probably improve it. Second, the big issue. In our environment this code introduced a circular dependency between our control nodes, where rabbit runs, and our keystone nodes which now need to talk to rabbit. For this reason, we ended up not deploying this code, even thought I had all the puppet and docker portions working. If you don’t have this issue, then this code will work well for you. I also don’t have much operating experience with this, it might set all your disks on fire and blow up in spectacular fashion. I planned to deploy it to our dev environment and tweak things as needed. So if you operate it, do it cautiously.

Tweaks

If you are interested in more event types, just change the on_message code. You might also want to change the action that happens. Right now it just logs, but how about emailing the team anytime a user is removed or noting it in your team chat.

Conclusion

This code consists of a few parts and I hope at least some of it is useful to someone. It was fun to write and I was a bit disappointed that we couldn’t fully use it, but I hope that something in here, even if it’s just the async rabbit code might be useful to you. But what about our requirement? Well, we’ll probably still log CADF events locally on the Keystone node and consume them, or we might write a pipeline filter that does something similar, whatever we decide I will update on this site. So please pull the code and play with it!

Github Link

Tagged , ,