My Key Learning from Paris

I learned a bunch of things in Paris: French food is amazing, always upgrade OVS, but the most important thing I learned in Paris?

All operators have the same problems.

Every operator session was a revelation for me. As it turns out, I’m not the only one writing Icinga check jobs, Ansible scripts, trying to figure out how to push out updates, or fighting with ovs. These sessions not only provided by validation to let me know that we’re not doing stuff totally wrong, but also allowed everyone to share solutions. Below are some of the themes that I found particularly interesting or relevant to this premise.

Upgrades

One topic in which operators have a common interest was upgrades. Only a few of the operators have any real experience with this, but its been a pain point before for many. Some have resorted to fork-lift style “bring up a new cluster” upgrades, which is not nice for customers and requires extra hardware. How do we solve this? It seems that some projects have upgrade guides, but finding documentation on a holistic upgrade is difficult, especially gathering information on ordering (Cinder before Nova, for a contrived example). Issues specifically called out in the session were config changes (deprecations and new additions) and database migrations (including rollback). Rollback is especially worrying as its not well tested. There was no solution for this except a resolve to share information on upgrading via the Operators Mailing List.

Etherpad for Upgrades

CI/CD & Packaging

Another issue that operators face after an Openstack deployment is how to get fixes and new code out to the nodes. An upstream bug fix might take a few days, plus a week for a backport, plus a month for the distro to pick it up. This means that even if the operator fixes it themselves, they still have a delay in getting it releases. During this delay you might be impacting customers who may not be that patient. The solution for many is using a custom CI/CD system that builds “packages” of some sort, whether distro packages or custom built vdevs. It was interesting to hear that people have a myriad of solutions here. We use a toolchain quite similar to the upstream Openstack tool chain that outputs Ubuntu packages. However even with this method we still rely on dependencies and libraries provided by the Ubuntu Cloud Archive, as much as possible anyway.

There was a bunch of talks on this subjects, not just operator talks, here are a link to the ones I attended:

  • CI/CD Pipeline to Deploy and Maintain an OpenStack Iaas Cloud (HP)
  • CI/CD in Practice (Comcast)
  • Building the RackStack (Rackspace)
  • If you know other ones that I should go back and watch please comment here.

    Automation (Puppet)

    We use Puppet to configure and manage our Openstack deployment, so this was an area of interest to me. It was great to see a 100% full room with everyone focused on improving how Openstack is configured with puppet. From discussions on new check jobs to a conversation about how to better handle HA, it was great to see the real sense of community around Puppet.

    Etherpad

    The second puppet session was more of a “how are you solving X” and “how could it be better”. This was also a great session with some interesting notes.

    Towards a Project?

    Finally, one of the more interesting things happened later in the week when Michael Chapman and Dan Bode grabbed me in the lobby and wanted me to preview an email that was about to go out. In brief, they were proposing an operations project. This was born of the realization that I’d also made, we’re all solving the same issues. This email led to an impromptu meeting of about 30 people in the lobby of the Meridien hotel and the beginnings of an Operations Project.

    Birth of a New Project

    Birth of a New Project

    It was not the ideal venue, but we still had a great discussion on a few topics. The first was, can we have a framework that will allow us to share code? We all agreed that this project’s purpose is not to bless specific tools (for example Icinga) but instead to allow us to share Icinga scripts alongside other monitoring tools. Although this had been tried before with a github repo it had only a couple contributions, hopefully this will improve. We then dove into all the different tools that people use for packaging and whether or not any could be adapted as general purpose. Having a good tool for Ubuntu/Debian packages seemed to be something in great demand. Adding debian support for Anvil seemed to be something worth investigating further.

    Other topics of discussion included:

  • log aggregation tools and filters
  • ops dashboarding tools
  • Ansible playbooks
  • You can see the Etherpad link below to see what options for each were discussed, but the best idea is to catch up on the mailing list threads which are still ongoing and add if you have to share anything.

    Etherpad

    Operations Wiki

    A Path Towards Contributing (via Commits) in Open Stack

    For 2014, I had a goal of doing 12 contributions (commits) to OpenStack. I set this goal because I wanted to learn more about the different components of OpenStack and I wanted to contribute back to the project. In order to do this, I continued a pattern that I started when I first began working on Ubuntu, one which I consider to be a great way to come up to speed on and contribute to OpenStack. Sometime around May I stopped counting contributions and consider my goal to have been met(*). Since I consider that a success, I’d like to share my process with you here and hopefully it can inspire someone else.

    Step 1: Learn About the Project
    The first step was to learn more about the project, reading is interesting, but diving in and using is better, so I started with DevStack. This should be everyone’s first step. When you encounter something you don’t understand, go find a blog post or OpenStack Summit video about the subject.

    Step 2: Contributing with Bug Triage
    Once I had a basic understanding of the parts of OpenStack, I started looking at bugs. Joining BugSquad and BugControl was the first thing I did when I did community work on Ubuntu and doing triage for OpenStack is also a good first step. Start with this wikipage, especially if you’re new to Launchpad. By triaging bugs you can get an idea where the issues are and get a second level understanding of the projects. A bug report might cause you to ask yourself, why is neutron talking to nova in this way? It could be something that was not obvious from playing with DevStack. During this process you can mark bug dupes, ask follow-up questions, and perform other triage work. For details on OpenStack bug triage work, read more here. The best part about Bug Triage for new committers? There is an unlimited and never-ending supply of bugs. So which to pick? I like to focus on areas that I have a basic understanding of and interest in. Pick a few projects that make sense to you and look there.

    Step 3: Finding a Bug
    The real mission during bug triage is digging for a golden nugget, your first bug. This is more difficult than you’d think. You need to find a bug that you can fix, something relatively simple, because your goal in fixing your first bug is to learn the process. During your bug triage work you may have seen some bugs tagged with “low-hanging-fruit”, these are issues that have been identified as ones that would be simple to fix, a good first choice. If you don’t see an obvious bug tagged thusly, I’d recommend starting with the python-*client projects. I find that this code is easier to understand, easier to test, and has more unclaimed issues. You can also even just pull the source for one you like and look for FIXME notes. Another idea is unit tests, all projects love more tests. And finally, the docs can always use help.

    Step 4: Working on the Bug
    After finding your bug, get to work on fixing it. Before doing so you have some pre-work to do, like creating a launchpad account and signing the developer agreement. I’m not going to list all these steps, but it’s not too difficult. The next steps are fairly obvious, testing, reviewing, fixing, etc. A brief interlude on testing: I do all my testing during this process against DevStack or in DevStack. I have an up-to-date DevStack VM with me at all times and a git repo with my config changes for it. I’d recommend you do the same. Note, when you make your DevStack VM, give it enough RAM and disk, 2GB RAM + 4GB swap and a 20GB disk should be enough. As for code reviews, keep in mind that you will have to likely go multiple rounds on your code reviews. Most of my changes go at least 3-4 rounds, some more than 10, but don’t lose hope, it will eventually land and it will be awesome!

    Anyway that’s my process for getting to a first commit with OpenStack. Hopefully if you’re just getting started this method will work for you. Good luck finding your bug and doing your first commit! If you have some other ideas on any of the stuff I covered, please comment below.

    * – I’m including StackForge commits in my count.

    Tagged

    Quick Update on Stacked Auth

    A quick update on Stacked/Hybrid auth for Keystone. I spent some time this week updating the driver for Icehouse. That branch is available here and includes packaging info. I’ve been too lazy to make a non-packaging copy of the branch but it should be easy to do. During my debugging of the new driver, I realized one thing, even though this driver does a bind using the credentials of the user you are trying to authenticate, you still need a service account to bind with if you are using subtree search. The reason is that the LDAP code does an initial bind to map your user id to a dn so that then it can do the “real” bind. If you see any issues with this code please let me know!

    Tagged , ,

    Keystone: Using Stacked Auth (SQL & LDAP)

    Most large companies have an Active Directory or LDAP system to manage identity and integrating this with OpenStack makes sense for some obvious reasons, like not having multiple passwords and automatically de-authenticating users who leave the company. However doing this in practice can be a bit tricky. OpenStack needs some basic service accounts. Then your users probably want their own service accounts, since they don’t want to stick their AD password in a Jenkins config. So multiple the number of teams that may use your OpenStack by 4 or 5 and you may have a ballpark estimate on the number of service accounts you need. Then take that number to your AD team and be sure to run it by your security guys too. This is the response you might get.

    Note: All security guys wear cloaks.

    The way you can tell how a conversation with a security guy is going is to count the number of facepalms they do.

    In this setup you probably also do not have write access to your LDAP box. This means that you need a way to track role assignment outside of LDAP, using the SQL back-end (a subject I’ve previous covered)

    So this leaves us with an interesting problem, fortunately one with a few solutions. The one we chose was to use a “stacked” or “hybrid” auth in Keystone. One where service accounts live in Keystone’s SQL backend and if users fail to authenticate there they fallback to LDAP. For role Assignment we will store everything in Keystone’s SQL back-end. The good news is that Ionuț Arțăriși from SUSE has already written a driver that does exactly what you need. His driver for Identity and Assignment is available here on github and has a nice README to get you started. I did have a few issues getting it to work, which I will cover below which hopefully saves you some time in setting this up.

    The way Ionut’s driver works is pretty simple, try the username and password first locally against SQL. If that fails, try to use the combo to bind to LDAP. This saves the time of having to bind with a service account and search, so it’s simpler and quicker. You will need to be able to at least bind RO with your standard AD creds for this to work though.

    From this point forward, you probably will only have the issues I have if you have an AD/LDAP server with a large number of users. If you’re in a lab environment, just following the README in Ionut’s driver is all you need. For everyone else, read on…

    Successes and Problems

    After configuring the LDAP driver, I switched and enabled hybrid identity and restarted keystone. The authentication module seemed to be working great. I did my first test by doing some basic auth and then running a user-list. The auth worked, although my user lacked any roles that allowed him to do a user list. When I set the SERVICE_TOKEN/ENDPOINT and re-ran the user list, well that was interesting. I noticed that since I had enabled the secret LDAP debug option (hopefully soon to be non-secret) that it was doing A TON of queries and I eventually hit the server max before setting page_size=1000. Below is an animation that shows what it was like running keystone user-list:

    This isn't really animated, but is still an accurate representation of what happened.

    This isn’t really animated, but is still an accurate representation of how slow it was

    It did eventually finish, but it took 5 minutes and 25 seconds, and ain’t nobody got time for that. One simple and obvious fix for this is to just modify the driver to not pull LDAP users in the list. There are too many to manage anyway, so I made the change and now it only showed local accounts. This was a simple change to the bottom of the driver, to just return SQL users and not query LDAP.

    root@mfisch:~# keystone user-list
    +----------------------------------+------------+---------+--------------------+
    | id | name | enabled | email |
    +----------------------------------+------------+---------+--------------------+
    | e0ede62ebcb64e489a41f4a9b18cd63c | admin | True | root@localhost |
    | f9d76feb7081463f973eeda82c918547 | ceilometer | True | root@localhost |
    | 176e232176c74164a42e2f773695671b | cinder | True | cinder@localhost |
    | fc8007f82b7542e88ca28fab5eda1b5c | glance | True | glance@localhost |
    | 635b4022aafc4f5488c048109c7b1665 | heat | True | heat@localhost |
    | b3175fc7b62748679b4c958fe89fbdf0 | heat-cfn | True | heat-cfn@localhost |
    | 4873cd322d0d4582908002b619e3940a | neutron | True | neutron@localhost |
    | 37ef623c60474bf5a384a2047719acf4 | nova | True | nova@localhost |
    | ff07bd356b8d4959be4326a43c297db0 | swift | True | swift@localhost |
    +----------------------------------+------------+---------+--------------------+

    After making this change, I went to add the admin role to my AD user and found I had a second problem. It seemed to be relying on my user being in the user list before it would assign them a role.

    root@mfisch:~# keystone user-role-add --user-id MyADUser --role-id admin --tenant-id 6031675b7618458b8bb85b92857532b06No user with a name or ID of 'MyADUser' exists.

    I didn’t really want to wait five minutes to assign or list a role, so this needed fixing.

    Solutions

    It was late on Friday afternoon when I started toying with some solutions for the role issue. The first solution is to add OpenStack users into a special ou so that you can limit the scope of the search. This is done via the user_filter setting in keystone.conf. If this is easy for your organization, it’s a simple fix. It also allows tools like Horizon to show all your users. If you do this you can obviously undo the change to not return LDAP users.

    The second idea that came to mind was that every time an AD user authenticated, I could just make a local copy of that user, without a password and setup in a way that they’d be unable to authenticate locally. I did manage to get this code functional, but it has a few issues. The most obvious one is that when users leave the company there’s no corresponding clean-up of their records. The second one is that it doesn’t allow for any changes to things like names or emails once they’ve authenticated. In general it’s bad to have two copies of something like this info anyway. I did however manage to get this code working and it’s available on github if you ever have a use for it. It is not well tested and it’s not very clean in the current state. With this change, I had this output (truncated):

    root@mfisch:~# keystone user-list
    +----------------------------------+------------+---------+--------------------+
    | id | name | enabled | email |
    +----------------------------------+------------+---------+--------------------+
    | MyADUser | MyADUser | True | my corporate email |
    | e0ede62ebcb64e489a41f4a9b18cd63c | admin | True | root@localhost |
    | f9d76feb7081463f973eeda82c918547 | ceilometer | True | root@localhost |
    ...
    +----------------------------------+------------+---------+--------------------+

    Due to the concerns raised above, we decided to drop this idea.

    My feelings on this solution...

    My feelings on this solution…

    The final idea which came after some discussions was to figure out why the heck a user list was needed to add a role. I dug into the client code, hacked a fix, and then asked on IRC. It turns out I wasn’t the first person to notice it. The bug (#1189933) is that when the user ID is not a UUID, it thinks you must be passing in a name and does a search. The fix for this is to pull a copy of python-keystoneclient thats 0.4 or newer. We had 0.3.2 in our repo. If you upgrade this package to the latest (0.7x) you will need a copy of python-six that’s 1.4.0 or newer. This should all be cake if you’re using the Ubuntu Cloud Archive.

    Final State

    In the final state of things, I have taken the code from Ionut and:

    1. disabled listing of LDAP users in the driver
    2. upgraded python-six and python-keystoneclient
    3. packaged and installed Ionut’s drivers (as soon as I figure out the license for them, I’ll publish the packaging branch)
    4. reconfigured keystone via puppet

    So far it’s all working great with just one caveat so far. If users leave your organization they will still have roles and tenants defined in Keystone’s SQL. You will probably need to define a process for cleaning those up occasionally. As long as the users are deactivated in AD/LDAP though the roles/tenant mappings for them should be relatively harmless cruft.

    Tagged , , ,

    Announcing forth-keystoneclient

    Over the past few weeks I’ve been working on a new client implementation for keystone. Python is a bit bloated for nimble CLIs like keystone, and so I went back to a language I used at my first job back in 1999, Forth. Forth is interpreted like python but it’s also very simple and memory efficient. And unlike python, forth can fit on a floppy disk. This allows this client to be installed onto satellites and other memory constrained environments. For embedded systems, forth also makes a great alternative to C. Now, onto the client.

    Mastering Forth

    Mastering Forth

    As you probably know, forth is stack-based, and no real programmer likes someone else managing the stack for them. I’ve taken this paradigm into the client design. So let’s dive in. Below is an example usage so you can see how simple it is:

    First we remember the definition of what user-role-list does. Python offers the “help” and “dir” for introspection, forth has ‘ and DUMP which work just as well. First let’s make sure user-role-list is defined:

    ' user-role-list  ok
    

    Then let’s see what it’s doing:

    ' user-role-list 64 cells dump
    7F0A31F7F300: D5 44 40 00  00 00 00 00 - 00 00 00 00  00 00 00 00  .D@.............
    7F0A31F7F310: 25 46 40 00  00 00 00 00 - 20 DE F2 31  0A 7F 00 00  %F@..... ..1....
    7F0A31F7F320: 88 46 40 00  00 00 00 00 - 99 99 99 99  99 99 99 99  .F@.............
    ...
     ok
    

    As you can simply tell from the above, we need to pass in the tenant-id and user-id, so let’s make the call.

    S" de4442c6e54a43459eaab97e26dc21bc2"  ok
    S" 2ebadab9148d421287eee6d264f29736d"  ok
    user-role-list 33  
    +----------------------------------+--------------------+----------------------------------+----------------------------------+
    | id | name | user_id | tenant_id |
    +----------------------------------+--------------------+----------------------------------+----------------------------------+
    | 006eaf0730e44756bc679038477d3bbd | Member | 2ebadab9148d421287eee6d264f29736d | de4442c6e54a43459eaab97e26dc21bc2 |
    | 03e83b65036a4e0cbd7cff5bff858c76 | admin | 2ebadab9148d421287eee6d264f29736d | de4442c6e54a43459eaab97e26dc21bc2 |
    +----------------------------------+--------------------+----------------------------------+----------------------------------+
    ok
    

    I’ll be handing this out on floppy disks at the OpenStack conference in Atlanta if you’re interested, come find me and I’ll make you a copy.

    Prepping for distribution at the conference

    Prepping for distribution at the conference

    A note on debugging, if you plan on installing this on satellites remember how the speed of light can interact with your token expiration timeout.

    Tagged ,

    Keystone: User Enabled Emulation Can Lead to Bad Performance

    An update on my previous post about User Enabled Emulation, tl;dr, don’t use it. It’s slow. Here’s what I found:

    I just spent parts of today debugging why keystone user-list was so slow. It was taking between 8 and 10 seconds to list 20 users. I spent a few hours tweaking the caching settings, but the database was so small that the cache never filled up, and so I realized that this was not the main issue. A colleague asked me if basic ldapsearch was slow, and no it was fine. Then I dug into what Keystone is doing with the enabled emulation code. Unlike a user_filter, Keystone appears to be querying the user list and then re-querying LDAP for each user to check if they’re in the enabled_emulation group. This leads to a lot of extra queries which slows things down. When I disabled this setting, the query performance improved dramatically to between 2-2.5 seconds, about a 4x speed-up. If you are in a real environment with more than 20 users, the gain will be pretty good in terms of real seconds.

    Disabling the enabled_emulation leaves us with no user enabled information. In order to get the Enabled field back, I’m going to add a field to the schema to emulate what AD does with the enabled users. Since this portion of Keystone was designed for AD, this blog post may help clear up what exactly it expects here from an AD point of view. Read that page to the end and you get a special treat, how to use Logical OR in LDAP, see if it makes less sense than the bf language does.

    Also to reduce my user count, I did enable a user_filter, which since it’s just part of the initial query does NOT appear to slow things down. You could skip the new field and just use the filter if you want, however it’s not clear what impact a “blank” for Enabled has, other than perhaps some confusion. If it has a real impact, PLEASE comment here!

    Tagged , , ,

    Landscape Tags to Puppet Facter Facts

    I’ve been playing around this week with using Landscape to interact with puppet. I really like Landscape as an administration tool, but I also really like using puppet to manage packages and configuration. So the question is how to get puppet to do something based on Landscape. Landscape tags seemed like an obvious choice for this, tag a system as “compute-node” and it becomes a nova compute node, but how do you make this happen?

    Standing on Shoulders

    After running through a couple of ideas, I was inspired by a great article from Mike Milner. Mike’s blog post uses Landscape tags to do node classification and I wanted to use tags to set facter facts so I needed a few changes.

    Make Some Tags

    First I went to Landscape and added some basic tags to my box:

    Just a couple tags

    Just a couple tags

    Get Some Tags

    Next, I sat down to write a script that would run on each system and get the tags for the system it was running on. I did this first before I looked into the glue with Facter. This turned out to be pretty since since the Landscape API, is easy to use, well documented, and comes with a CLI implementation that makes testing easy.

    ubuntu@mfisch:~$ get_tags.py
    nova-compute
    vm
    

    Facter Time

    Once that worked, it was time to look at Facter. A colleague told me about executable facter facts. The tl;dr for this is, drop a script into /etc/facter/facts.d, make it executable and facter will take the output from it and turn it into facter facts. This was super cool. I had planned on some complex hooks being required, but all I had to do was print the tags to stdout. However, Facter wants a key=value pair and Landscape tags are more like just values, so I decided on a method to generate keys by prepending landcape_tagN, where N is an iterated number. With that change in, I ran facter:

    ubuntu@mfisch:~$ facter | grep landscape
    landscape_tag0 => nova-compute
    landscape_tag1 => vm
    

    Values or Keys?

    The puppet master will not know what to do with “landscape_tag0″ most likely, so we’ll need a convention to make this more useful. One idea that my colleague had was to actually set a tag with an = sign, like this, “type=compute”. Alas Landscape won’t let us do this, so instead we’ll probably just set a convention that the last _ is the key/value boundary. That would map like this:

  • datacenter_USEast -> datacenter=USEast
  • node_type_compute -> node_type=compute
  • foo_bar -> foo=bar
  • Note the current version of my script that’s in github doesn’t have this convention yet, you’ll probably want to choose your own anyway.

    Issues

    My script is pretty simple, it retrieves a list of computers, filtering on the hostname, as long as it only finds one match, it proceeds. This matching is the weakpoint of the current iteration of this script. It’s possible to have more than one system registered with the same hostname, so I’m thinking of adding a better filter here when I get time. If you’ll note the original script that I based mine on solved this by making you pass in the hostname or id as an argument. My current plan is to look at the system title in /etc/landscape/client.conf and do a 2nd level filter on that, but even that I don’t think will be guaranteed to be unique.

    What is This Good For?

    You probably use an automated tool to install and boot a node and it registers with Landscape (via puppet of course). Now what. We have a box running Ubuntu but not doing much else. Let’s say I want a new compute node in an openstack cluster, so all I’d have to do is tag that box with a pre-determined tag, say “compute-node”, let puppet agent run and wait. The puppet master will see the facter facts that let us know it should be a compute node and act accordingly.

    Code

    Here’s the code and patches always welcome.

    Keystone: User Enabled Emulation (follow-up)

    Last week I wrote about Keystone using LDAP for Identity. Thanks to a helpful comment from Yuriy Taraday and a quick email exchange I have solved the issue of the empty “Enabled” column. Here’s how it works.

    Emulated user enable checking is useful if your LDAP system doesn’t have a simple “enabled” attribute. It works by using a separate group of users or tenants that you must be a member of in order to be enabled. Yuriy has a simple example for this which shows how we can mark user2 as enabled and user1 as disabled.

    *ou=Users
    +-*cn=enabled_users
    -member=cn=user2,ou=Users
    +-*cn=user1
    +-*cn=user2

    To make this setup work, add these to your keystone.conf file:

    user_enabled_emulation = True
    user_enabled_emulation_dn = cn=enabled_users,cn=groups,cn=accounts,dc=example,dc=com

    The default value for foo_enabled_emulation_dn is cn=enabled_foos,$tree_dn, in other words, user_enabled_emulation_dn has a default of enabled_users (note the s).

    Keep in mind that even when you remove a user from the enabled_users group they are still a valid user to LDAP/AD. This means that your service account, which you’re using for ldaps authentication, does not need to be a member of this group.

    The user_enabled_emulation_* fields appear to be undocumented to me, so I’ll work on that this week so that the official docs are more helpful.

    Tagged ,

    Keystone: LDAP for Identity, SQL for Assignment

    This week I volunteered to work on playing around with integrating keystone with LDAP (via a FreeIPA box). Since I basically knew nothing about LDAP nor keystone before starting, I learned a few lessons along the way. And here they are:

    The Setup

    I started with a FreeIPA server that our team uses, I have full admin rights to it. I was also running an “All In One” openstack Havana instance on a VM that I setup using puppet_openstack_builder.

    The Goal

    My basic goal here was to learn more about both LDAP and keystone and in the process get the AIO instance to authenticate against LDAP as a preliminary test to authenticating against AD. In our environment, I wanted LDAP to manage Idenity (users, groups, and group memberships) and Keystone’s SQL backend to manage Assignment (roles, tenants, domains). This will work well when we integrate with AD since we cannot just create accounts on the corporate AD server willy-nilly. This is called Read Only LDAP and is covered in more detail here.

    Diving In

    There are several awesome blog entries about using FreeIPA and Keystone from Adam Young and these helped me get started. I’d configure keystone, restart it, then tail the logs while running keystone user-list.

    Configuration

    The very basic config is done like this:

    First, enable the LDAP identity driver:
    [identity]
    driver = keystone.identity.backends.ldap.Identity
    #driver = keystone.identity.backends.sql.Identity

    Then you need to tell Keystone to use SQL for assignment:
    [assignment]
    driver = keystone.assignment.backends.sql.Assignment

    Next we setup the user to which AD will authenticate since we’re using ldaps:

    [ldap]
    url = ldaps://example.com:636
    user = uid=service_acct,cn=users,cn=accounts,dc=example,dc=com
    password = UbuntuRulez

    Then we tell it about the user and group schema:

    user_tree_dn = cn=users,cn=accounts,dc=example,dc=com
    user_filter = (memberOf=cn=openstack,cn=groups,cn=accounts,dc=example,dc=com)
    user_objectclass = inetUser
    user_id_attribute = uid
    # this is what is searched on
    user_name_attribute = uid
    user_mail_attribute = mail
    user_pass_attribute =
    # XXX FIXME -mfisch: wont work on freeIPA like this
    #user_enabled_attribute = (nsAccountLock=False)
    user_allow_create = False
    user_allow_update = False
    user_allow_delete = False
    ...
    group_tree_dn = cn=groups,cn=accounts,dc=example,dc=com
    group_filter =
    group_objectclass = groupOfNames
    group_id_attribute = cn
    group_name_attribute = cn
    group_member_attribute = member
    group_desc_attribute = description
    # group_attribute_ignore =
    group_allow_create = False
    group_allow_update = False
    group_allow_delete = False

    Lastly we point at the cert info:
    use_tls = False
    tls_cacertfile = /etc/ssl/certs/ca-certificates.crt

    The First Problem

    The first stumbling block I hit is that I needed to use ldaps. This is a pretty basic fix, just grab the cert and put it on your box where keystone is running. Unfortunately I found out that unless I had the cert path also defined in /etc/ldap/ldap.conf, the query didn’t work. The fix is to make a pretty much empty /etc/ldap/ldap.conf and add this one line that points to the cert you pulled down:

    TLS_CACERT /etc/ssl/certs/ca-certificates.crt

    user_name_attribute

    My next issue is that the default user_name_attribute is not right for FreeIPA. I checked this using Apache Directory Studio which made it easy for me to browse the tree and test queries. I needed to use uid and not cn. The issue here is that if you have this wrong, then the initial authentication of your LDAP user for the ldaps query fails and the resulting output provides no clue as to the issue, even with debug enabled. Once I solved this, I had a permission issue.

    Bootstrapping

    The problem once you switch the authentication mechanism to LDAP, you’ve “lost” all the old roles and users that puppet setup, like the admin user who actually has permissions to do stuff like list users. The fix as any keystone expert will know is to bypass the main API and use the service token directly. In the keystone.conf a service token is defined as admin_token. To use it, unset everything like OS_USERNAME and set:

    export SERVICE_TOKEN=
    export SERVICE_ENDPOINT=http://localhost:35357/v2.0

    Then you need to give your user, in my case, “id=mfischer” permission to do stuff. I ended up giving mfischer the admin and _member_ role in both of my tenants since he/me is the new admin. Finally, I switched my settings back to what’s in my rc file, mfischer as the user, my LDAP password, and the normal keystone endpoint, and… finally my user-list query worked and I got results from LDAP.

    root@landscape-03:/var/log/keystone# keystone user-list
    +------------+------------+---------+--------------------------+
    | id | name | enabled | email |
    +------------+------------+---------+--------------------------+
    | mfischer | mfischer | | matt.fischer@example.com |
    ....

    User Enabled?

    keystone wants to know if a user is enabled and shows this as a column in the output of user-list. All my results were blank, and it’s not clear that FreeIPA has a straightforward “enabled” column in our setup. At a glance this stuff seemed optimized for AD’s user-enabled mask stuff. (If you know a fix, let me know please).

    Trying Other Services

    I decided to try nova list next and guess what, it failed. It’s obvious to me now, but the puppet AIO has users created for almost all the services, like nova, cinder, glance, etc. I needed users for these. This is easy to do in FreeIPA, so I made a batch of users and restarted the node. I bet you can guess, but stuff still failed because these users didn’t have the roles that they needed. At this point, I switched the identity backend back to SQL, restarted keystone and made a map of roles to users and tenants. In my case it was pretty basic, everything needed admin in the services tenant and some needed more, here’s how to do it simply:

    for I in 'glance' 'nova' 'cinder' 'neutron' 'heat' 'heat-cfn' 'swift';
    do; keystone user-role-add --user-id=$I --tenant-id=de4442c6e54a43459eaab97e26dc21f8 --role id=a5a8ea228b1942e28289ba63fba9b3c0; done

    I did this and then bounced the node again (which is frankly simpler than restarting everything but unlikely to work in the real world). And again nothing worked! I’d forgotten one more step, each service has a password defined in it’s config file as “admin_password”. I signed back into FreeIPA and set the passwords and then bounced the node a final time.

    This time, I could sign into Horizon and everything worked great! Finally, I was using LDAP only for identity and didn’t need to switch back.

    Obviously this is much better to be done before you start your puppet install but in my case it was a great learning experience about roles/tenants/users and the tools that keystone provides.

    Debugging Tools/Hints

    Here are some more debugging tools and hints that I came across and maybe will help you if you’re dealing with LDAP and keystone:

    1. Enable debug and verbose in keystone, of course.
    2. I’ve mentioned Apache Directory Studio so you can test LDAP queries. This command also helps show what fields are available and is simpler than using ADS: “ipa user-show –all –raw”.
    3. The LDAP code in keystone has double secret logging which you can enable in /usr/lib/python2.7/dist-packages/keystone/common/ldap/core.py. You can look for “ldap.set_option(ldap.OPT_DEBUG_LEVEL, 4095)” and uncomment it. These logs only show on stdout, so you’ll need to stop the service and run it by hand to see this output.
    4. I also traced some code in pdb, in addition to the file listed above you should also look at /usr/lib/python2.7/dist-packages/keystone/identity/backends/ldap.py

    Good luck everyone!

    Tagged , ,

    Updating Keystone Endpoints

    This morning we ran into an issue when trying to get our jenkins server to talk to an Openstack cloud we have setup. The issue is that the systems are in different data centers with different firewalls and other assorted networking challenges. The network guys solved this for my by giving the control node in the openstack cloud a NAT address that I could ping. However, after doing all this, I still had issues with keystone and assorted commands (like nova) hanging.

    The first step to debugging this was to figure out why it was hanging. To solve this, I used the debug option on nova. (Technically to be honest, the first thing I did was check to see if iptables was blocking ports I needed, you may also need to do this) However, with iptables looking good, I used –debug in keystone endpoint-list.

    The first thing you can see in the output is that its successfully retrieving the endpoint list from keystone. The second thing, which was causing the hang, is that it was trying to ping an internal IP and not the natted one. Here’s some of that output with a lot of cruft removed and fake IPs:

    [mfischer@puppet-ci-01 ~]$ keystone --debug --os-auth-url http://1.2.3.4:5000/v2.0/ --os-username mfischer --os-password ubuntu4life --os-tenant-name test endpoint-list
    REQ: curl -i http://1.2.3.4:5000/v2.0/tokens -X POST -H "Content-Type: application/json" -H "User-Agent: python-keystoneclient"

    REQ BODY: {"auth": {"tenantName": "test", "passwordCredentials": {"username": "mfischer", "password": "ubuntu4life"}}}

    connect: (1.2.3.4, 5000) ************

    bunch of cruft here that basically means it connected

    connect: (10.0.0.10, 35357) ************

    hang here

    The last line was troubling because that’s an internal IP that I cannot ping from this box. The info on that line comes from keystone, in the endpoint list. This is what the list of endpoints looks like, truncated and formatted for a normal screen.

    | id | region | publicurl | internalurl | adminurl |service_id|
    | 28c...| RegionOne | http://10.0.0.10:8080| http://10.0.0.10:8080| http://10.0.0.10:8080 | 9f13... |

    So the fix here is that I needed to change these internal URLs to a more friendly and portable hostname. This is doable in two ways that I know of:

        Delete and re-create the endpoints
        Hack the mysql db

    Since Option 2 sounded more exciting, I got to work. A great overview on how to do just that is here. After reading this, I realized that I’d have to hand-update every line for the public and admin URLs. The issue I have with this process if that changing all the URLs is error prone and tedious, so instead I wrote a tool.

    The tool is called update-endpoints, and it’s hosted here. If you try it, please be careful, back up or dump your DB. I’ve only done limited testing on it and it could break your system. The basic usage is that it connects to your DB and updates the hostname/IP portion of the URL for a class of endpoints (admin, public, or internal). For example, to point all public endpoints to foo.com,

    ./update-endpoints.py --username root --password debb4rpm --host localhost --endpoint foo.com --type public

    This change should not change the port or other parts of the URL, so if you want to change those this tool won’t work for you. It seems by default on my installs that mysql can only be talked to from localhost, so I’ve been running this on my control nodes.

    After I ran the tool against the public and admin URLs, I rechecked my endpoints, and now they pointed to foo.com, which was conveniently NATd for me.


    | id | region | publicurl | internalurl | adminurl |service_id|
    | 28c...| RegionOne | http://foo.com:8080| http://foo.com:8080| http://foo.com:8080 | 9f13... |

    Better yet, all the nova and keystone commands I wanted to run worked and Jenkins was able to talk to the controller!

    If you’d like to improve the tool, please do a pull request.

    Tagged