Monthly Archives: September 2016

Writing a Nova Filter Scheduler for Trove

In the process of deploying Trove, we had one simple requirement: “Only Run Trove instances on Trove nodes”. Surprisingly this is a difficult requirement to meet. What follows is our attempts to fix it and what we ended up doing. Some of these things mentioned do not work because of how we want to run our cloud and may not apply to you. Also this is not deployed in production yet, if I end up trashing or significantly changing this idea I will update the blog.

Option 1 – Use Special Trove Flavors

So you want to only run Trove instances on Trove compute nodes, nova can help you with this. The first option is to enable the deftly named AggregateInstanceExtraSpecsFilter in Nova. If you turn on this filter and then attach extra-specs to your flavors it will work as designed. As an aside, if you’re a software developer, placing warnings in CLI tools that end-users use is only going to cause consternation, for example the warnings below.

mfischer@Matts-MBP-4:~$ openstack aggregate list
WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils
|   ID | Name  | Availability Zone |
|  417 | M4    | None              |
|  423 | M3    | None              |
| 1525 | Trove | None              |
mfischer@Matts-MBP-4:~$ openstack aggregate show 1525
WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils
| Field             | Value                                                                                |
| availability_zone | None                                                                                 |
| created_at        | 2016-05-11T20:35:07.000000                                                           |
| deleted           | False                                                                                |
| deleted_at        | None                                                                                 |
| hosts             | [u'bfd02-compute-trove-005', u'bfd02-compute-trove-004', u'bfd02-compute-trove-003'] |
| id                | 1525                                                                                 |
| name              | Trove                                                                                |
| properties        | CPU='Trove'                                                                          |
| updated_at        | None                                                                                 |

Note the properties portion here. This then matches with the special Trove flavors that we made. On the flavors we set the again deftly named aggregate_instance_extra_specs.

mfischer@Matts-MBP-4:~$ openstack flavor show 9050
WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils
| Field                      | Value                                                                                     |
| OS-FLV-DISABLED:disabled   | False                                                                                     |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                         |
| access_project_ids         | None                                                                                      |
| disk                       | 5                                                                                         |
| id                         | 9050                                                                                      |
| name                       | t4.1CPU.512MB                                                                             |
| os-flavor-access:is_public | True                                                                                      |
| properties                 | aggregate_instance_extra_specs:CPU='Trove',              |
|                            |                                                         |
| ram                        | 512                                                                                       |
| rxtx_factor                | 1.0                                                                                       |
| swap                       |                                                                                           |
| vcpus                      | 1                                                                                         |

We do all this currently with puppet automation and facter facts. If you are a trove compute node you get a fact defined and then puppet sticks you in the right host aggregate.

So this solution works but has issues. The problem with new flavors is that everyone sees them, so someone can nova boot anything they want and it will end up on your Trove node, thus violating the main requirement. Enter Option 2.

Option 2 – Set Image Metadata + a Nova Scheduler

In combination with Option 1, we can set special image metadata such that nova will only schedule images to that node. The scheduler that kinda does this is obviously AggregateImagePropertiesIsolation (pro-tip: do not let Nova devs name your child). This scheduler matches metadata like the flavors above except does it on images. Trove images would be tagged with something like trove=true, for example:

openstack image set --property trove=true cirros-tagged

[DEV] root@dev01-build-001:~# openstack image list
| ID                                   | Name           | Status |
| 846ee606-9559-4fdc-83b9-1ca57895cf92 | cirros-no-tags | active |
| a12fda2c-d2ff-4b7b-b8f0-a8400939df78 | cirros-tagged  | active |
[DEV] root@dev01-build-001:~# openstack image show a12fda2c-d2ff-4b7b-b8f0-a8400939df78
| Field            | Value                                                                                               |
<snips abound>
| id               | a12fda2c-d2ff-4b7b-b8f0-a8400939df78                                                                |
| properties       | description='', direct_url='rbd://b589a8c7-9b74-49dd-adbf-90733ee1e31a/images/a12fda2c-d2ff-4b7b-   |
|                  | b8f0-a8400939df78/snap', trove='true'                                                |                                                                                            |

The problem is that the AggregateImagePropertiesIsolation scheduler considers images that do not have the tag at all to be a match. So while this is solvable for images we control and automate, it is not solvable for images that customers upload, they will end up on Trove nodes because they will not have the trove property. You could solve this with cron but thats terrible for a number of reasons.

Option 2a – Write Your Own Scheduler

So now we just bite the bullet and write our own scheduler. Starting with the AggregateImagePropertiesIsolation we hacked it down to the bare minimum logic and that looks like this:

    def host_passes(self, host_state, spec_obj):
        """Run Trove images on Trove nodes and not anywhere else."""

        image_props = spec_obj.get('request_spec', {}).\
            get('image', {}).get('properties', {})

        is_trove_host = False
        for ha in host_state.aggregates:
            if == 'Trove':
                is_trove_host = True

        # debug prints for is_trove_host here

        is_trove_image = 'tesora_edition' in image_props.keys()

        if is_trove_image:
            return is_trove_host
            return not is_trove_host

So what does it do. First it determines if this is a trove compute host or not, this is a simple check, are you in a host-aggregate called Trove or not? Next we determine if someone is booting a Trove image or not. For this we use the tesora_edition tag which is present on our Trove images. Note we don’t really care what it’s set to, just that it exists. This logic could clearly be re-worked or made more generic and/or configurable #patcheswelcome.


A few notes on deploying this. Once your python code is shipped you will need to configure it. There are two settings that you need to change:

- scheduler_available_filters - Defines filter classes made available to the
scheduler. This setting can be used multiple times.

- scheduler_default_filters - Of the available filters, defines those that the
scheduler uses by default.

The scheduler_available_filters defaults to a setting that basically means “all” except that doesn’t mean your scheduler, just the default ones that ship with Nova, so when you turn this on you need to change both settings. This is a multi-value string option which means in basic terms you set it multiple times in your configs, like so:


(Note for Puppet users: The ability to set this as a MultiStringOpt in Nova was not landed until June as commit e7fe8c16)

Once that’s set you need to make it available, I added it to the list of things we’re already using:

scheduler_default_filters = <usual stuff>,TroveImageFilter

Note that available has the path to the class and default has the class name, do this wrong and the scheduler will error out saying it can’t find your scheduler.

Once you make this settings I also highly recommend enabling debug and then bouncing nova-scheduler. With debug on, you will see nova walk the filters and see how it picks the node. Unsurprisingly it will be impossible to debug without this enabled.

In Action

With this enabled and with 3 compute nodes I launched 6 instances. My setup was as follows:

compute3 – Trove host-aggregate
compute1,2 – Not in Trove host-aggregate

Launch 3 images with the tagged images, note they all go to compute3.
Launch 3 images with the un-tagged images, note they all go to compute1,2

Here’s some of the partial output from the scheduler log with debug enabled.

2016-09-23 01:45:56.763 1 DEBUG nova_utils.scheduler.trove_image_filter 
(dev01-compute-002, ram:30895 disk:587776 
io_ops:1 instances:1 is NOT a trove node host_passes 
2016-09-23 01:45:56.763 1 DEBUG nova_utils.scheduler.trove_image_filter 
(dev01-compute-003, ram:31407 disk:588800
io_ops:0 instances:0 is a trove node host_passes


So although I didn’t really want to, we wrote our own filter scheduler. Since there’s lots of good examples out there we had it working in less than an hour. In fact it took me longer to cherry-pick the puppet fixes I need and figure out the config options than to write the code.

Writing a nova scheduler filter let us solve a problem that had been bugging us for some time. If you plan on writing your own filter too you could look at the barebones docs for new filter writing here, note that there’s no section header for this so look for “To create your own filter”. (when this lands there will be section headers on the page: I’d also recommend when you’re first working on it, just copy an existing one and hack on it in the same folder, then you don’t have to deal with changing the scheduler_available_filters setting, it loads everything in the filters folder.

Tagged , ,