Announcing forth-keystoneclient

Over the past few weeks I’ve been working on a new client implementation for keystone. Python is a bit bloated for nimble CLIs like keystone, and so I went back to a language I used at my first job back in 1999, Forth. Forth is interpreted like python but it’s also very simple and memory efficient. And unlike python, forth can fit on a floppy disk. This allows this client to be installed onto satellites and other memory constrained environments. For embedded systems, forth also makes a great alternative to C. Now, onto the client.

Mastering Forth

Mastering Forth

As you probably know, forth is stack-based, and no real programmer likes someone else managing the stack for them. I’ve taken this paradigm into the client design. So let’s dive in. Below is an example usage so you can see how simple it is:

First we remember the definition of what user-role-list does. Python offers the “help” and “dir” for introspection, forth has ‘ and DUMP which work just as well. First let’s make sure user-role-list is defined:

' user-role-list  ok

Then let’s see what it’s doing:

' user-role-list 64 cells dump
7F0A31F7F300: D5 44 40 00  00 00 00 00 - 00 00 00 00  00 00 00 00  .D@.............
7F0A31F7F310: 25 46 40 00  00 00 00 00 - 20 DE F2 31  0A 7F 00 00  %F@..... ..1....
7F0A31F7F320: 88 46 40 00  00 00 00 00 - 99 99 99 99  99 99 99 99  .F@.............
...
 ok

As you can simply tell from the above, we need to pass in the tenant-id and user-id, so let’s make the call.

S" de4442c6e54a43459eaab97e26dc21bc2"  ok
S" 2ebadab9148d421287eee6d264f29736d"  ok
user-role-list 33  
+----------------------------------+--------------------+----------------------------------+----------------------------------+
| id | name | user_id | tenant_id |
+----------------------------------+--------------------+----------------------------------+----------------------------------+
| 006eaf0730e44756bc679038477d3bbd | Member | 2ebadab9148d421287eee6d264f29736d | de4442c6e54a43459eaab97e26dc21bc2 |
| 03e83b65036a4e0cbd7cff5bff858c76 | admin | 2ebadab9148d421287eee6d264f29736d | de4442c6e54a43459eaab97e26dc21bc2 |
+----------------------------------+--------------------+----------------------------------+----------------------------------+
ok

I’ll be handing this out on floppy disks at the OpenStack conference in Atlanta if you’re interested, come find me and I’ll make you a copy.

Prepping for distribution at the conference

Prepping for distribution at the conference

A note on debugging, if you plan on installing this on satellites remember how the speed of light can interact with your token expiration timeout.

Tagged ,

Keystone: User Enabled Emulation Can Lead to Bad Performance

An update on my previous post about User Enabled Emulation, tl;dr, don’t use it. It’s slow. Here’s what I found:

I just spent parts of today debugging why keystone user-list was so slow. It was taking between 8 and 10 seconds to list 20 users. I spent a few hours tweaking the caching settings, but the database was so small that the cache never filled up, and so I realized that this was not the main issue. A colleague asked me if basic ldapsearch was slow, and no it was fine. Then I dug into what Keystone is doing with the enabled emulation code. Unlike a user_filter, Keystone appears to be querying the user list and then re-querying LDAP for each user to check if they’re in the enabled_emulation group. This leads to a lot of extra queries which slows things down. When I disabled this setting, the query performance improved dramatically to between 2-2.5 seconds, about a 4x speed-up. If you are in a real environment with more than 20 users, the gain will be pretty good in terms of real seconds.

Disabling the enabled_emulation leaves us with no user enabled information. In order to get the Enabled field back, I’m going to add a field to the schema to emulate what AD does with the enabled users. Since this portion of Keystone was designed for AD, this blog post may help clear up what exactly it expects here from an AD point of view. Read that page to the end and you get a special treat, how to use Logical OR in LDAP, see if it makes less sense than the bf language does.

Also to reduce my user count, I did enable a user_filter, which since it’s just part of the initial query does NOT appear to slow things down. You could skip the new field and just use the filter if you want, however it’s not clear what impact a “blank” for Enabled has, other than perhaps some confusion. If it has a real impact, PLEASE comment here!

Tagged , , ,

Landscape Tags to Puppet Facter Facts

I’ve been playing around this week with using Landscape to interact with puppet. I really like Landscape as an administration tool, but I also really like using puppet to manage packages and configuration. So the question is how to get puppet to do something based on Landscape. Landscape tags seemed like an obvious choice for this, tag a system as “compute-node” and it becomes a nova compute node, but how do you make this happen?

Standing on Shoulders

After running through a couple of ideas, I was inspired by a great article from Mike Milner. Mike’s blog post uses Landscape tags to do node classification and I wanted to use tags to set facter facts so I needed a few changes.

Make Some Tags

First I went to Landscape and added some basic tags to my box:

Just a couple tags

Just a couple tags

Get Some Tags

Next, I sat down to write a script that would run on each system and get the tags for the system it was running on. I did this first before I looked into the glue with Facter. This turned out to be pretty since since the Landscape API, is easy to use, well documented, and comes with a CLI implementation that makes testing easy.

ubuntu@mfisch:~$ get_tags.py
nova-compute
vm

Facter Time

Once that worked, it was time to look at Facter. A colleague told me about executable facter facts. The tl;dr for this is, drop a script into /etc/facter/facts.d, make it executable and facter will take the output from it and turn it into facter facts. This was super cool. I had planned on some complex hooks being required, but all I had to do was print the tags to stdout. However, Facter wants a key=value pair and Landscape tags are more like just values, so I decided on a method to generate keys by prepending landcape_tagN, where N is an iterated number. With that change in, I ran facter:

ubuntu@mfisch:~$ facter | grep landscape
landscape_tag0 => nova-compute
landscape_tag1 => vm

Values or Keys?

The puppet master will not know what to do with “landscape_tag0″ most likely, so we’ll need a convention to make this more useful. One idea that my colleague had was to actually set a tag with an = sign, like this, “type=compute”. Alas Landscape won’t let us do this, so instead we’ll probably just set a convention that the last _ is the key/value boundary. That would map like this:

  • datacenter_USEast -> datacenter=USEast
  • node_type_compute -> node_type=compute
  • foo_bar -> foo=bar
  • Note the current version of my script that’s in github doesn’t have this convention yet, you’ll probably want to choose your own anyway.

    Issues

    My script is pretty simple, it retrieves a list of computers, filtering on the hostname, as long as it only finds one match, it proceeds. This matching is the weakpoint of the current iteration of this script. It’s possible to have more than one system registered with the same hostname, so I’m thinking of adding a better filter here when I get time. If you’ll note the original script that I based mine on solved this by making you pass in the hostname or id as an argument. My current plan is to look at the system title in /etc/landscape/client.conf and do a 2nd level filter on that, but even that I don’t think will be guaranteed to be unique.

    What is This Good For?

    You probably use an automated tool to install and boot a node and it registers with Landscape (via puppet of course). Now what. We have a box running Ubuntu but not doing much else. Let’s say I want a new compute node in an openstack cluster, so all I’d have to do is tag that box with a pre-determined tag, say “compute-node”, let puppet agent run and wait. The puppet master will see the facter facts that let us know it should be a compute node and act accordingly.

    Code

    Here’s the code and patches always welcome.

    Keystone: User Enabled Emulation (follow-up)

    Last week I wrote about Keystone using LDAP for Identity. Thanks to a helpful comment from Yuriy Taraday and a quick email exchange I have solved the issue of the empty “Enabled” column. Here’s how it works.

    Emulated user enable checking is useful if your LDAP system doesn’t have a simple “enabled” attribute. It works by using a separate group of users or tenants that you must be a member of in order to be enabled. Yuriy has a simple example for this which shows how we can mark user2 as enabled and user1 as disabled.

    *ou=Users
    +-*cn=enabled_users
    -member=cn=user2,ou=Users
    +-*cn=user1
    +-*cn=user2

    To make this setup work, add these to your keystone.conf file:

    user_enabled_emulation = True
    user_enabled_emulation_dn = cn=enabled_users,cn=groups,cn=accounts,dc=example,dc=com

    The default value for foo_enabled_emulation_dn is cn=enabled_foos,$tree_dn, in other words, user_enabled_emulation_dn has a default of enabled_users (note the s).

    Keep in mind that even when you remove a user from the enabled_users group they are still a valid user to LDAP/AD. This means that your service account, which you’re using for ldaps authentication, does not need to be a member of this group.

    The user_enabled_emulation_* fields appear to be undocumented to me, so I’ll work on that this week so that the official docs are more helpful.

    Tagged ,

    Keystone: LDAP for Identity, SQL for Assignment

    This week I volunteered to work on playing around with integrating keystone with LDAP (via a FreeIPA box). Since I basically knew nothing about LDAP nor keystone before starting, I learned a few lessons along the way. And here they are:

    The Setup

    I started with a FreeIPA server that our team uses, I have full admin rights to it. I was also running an “All In One” openstack Havana instance on a VM that I setup using puppet_openstack_builder.

    The Goal

    My basic goal here was to learn more about both LDAP and keystone and in the process get the AIO instance to authenticate against LDAP as a preliminary test to authenticating against AD. In our environment, I wanted LDAP to manage Idenity (users, groups, and group memberships) and Keystone’s SQL backend to manage Assignment (roles, tenants, domains). This will work well when we integrate with AD since we cannot just create accounts on the corporate AD server willy-nilly. This is called Read Only LDAP and is covered in more detail here.

    Diving In

    There are several awesome blog entries about using FreeIPA and Keystone from Adam Young and these helped me get started. I’d configure keystone, restart it, then tail the logs while running keystone user-list.

    Configuration

    The very basic config is done like this:

    First, enable the LDAP identity driver:
    [identity]
    driver = keystone.identity.backends.ldap.Identity
    #driver = keystone.identity.backends.sql.Identity

    Then you need to tell Keystone to use SQL for assignment:
    [assignment]
    driver = keystone.assignment.backends.sql.Assignment

    Next we setup the user to which AD will authenticate since we’re using ldaps:

    [ldap]
    url = ldaps://example.com:636
    user = uid=service_acct,cn=users,cn=accounts,dc=example,dc=com
    password = UbuntuRulez

    Then we tell it about the user and group schema:

    user_tree_dn = cn=users,cn=accounts,dc=example,dc=com
    user_filter = (memberOf=cn=openstack,cn=groups,cn=accounts,dc=example,dc=com)
    user_objectclass = inetUser
    user_id_attribute = uid
    # this is what is searched on
    user_name_attribute = uid
    user_mail_attribute = mail
    user_pass_attribute =
    # XXX FIXME -mfisch: wont work on freeIPA like this
    #user_enabled_attribute = (nsAccountLock=False)
    user_allow_create = False
    user_allow_update = False
    user_allow_delete = False
    ...
    group_tree_dn = cn=groups,cn=accounts,dc=example,dc=com
    group_filter =
    group_objectclass = groupOfNames
    group_id_attribute = cn
    group_name_attribute = cn
    group_member_attribute = member
    group_desc_attribute = description
    # group_attribute_ignore =
    group_allow_create = False
    group_allow_update = False
    group_allow_delete = False

    Lastly we point at the cert info:
    use_tls = False
    tls_cacertfile = /etc/ssl/certs/ca-certificates.crt

    The First Problem

    The first stumbling block I hit is that I needed to use ldaps. This is a pretty basic fix, just grab the cert and put it on your box where keystone is running. Unfortunately I found out that unless I had the cert path also defined in /etc/ldap/ldap.conf, the query didn’t work. The fix is to make a pretty much empty /etc/ldap/ldap.conf and add this one line that points to the cert you pulled down:

    TLS_CACERT /etc/ssl/certs/ca-certificates.crt

    user_name_attribute

    My next issue is that the default user_name_attribute is not right for FreeIPA. I checked this using Apache Directory Studio which made it easy for me to browse the tree and test queries. I needed to use uid and not cn. The issue here is that if you have this wrong, then the initial authentication of your LDAP user for the ldaps query fails and the resulting output provides no clue as to the issue, even with debug enabled. Once I solved this, I had a permission issue.

    Bootstrapping

    The problem once you switch the authentication mechanism to LDAP, you’ve “lost” all the old roles and users that puppet setup, like the admin user who actually has permissions to do stuff like list users. The fix as any keystone expert will know is to bypass the main API and use the service token directly. In the keystone.conf a service token is defined as admin_token. To use it, unset everything like OS_USERNAME and set:

    export SERVICE_TOKEN=
    export SERVICE_ENDPOINT=http://localhost:35357/v2.0

    Then you need to give your user, in my case, “id=mfischer” permission to do stuff. I ended up giving mfischer the admin and _member_ role in both of my tenants since he/me is the new admin. Finally, I switched my settings back to what’s in my rc file, mfischer as the user, my LDAP password, and the normal keystone endpoint, and… finally my user-list query worked and I got results from LDAP.

    root@landscape-03:/var/log/keystone# keystone user-list
    +------------+------------+---------+--------------------------+
    | id | name | enabled | email |
    +------------+------------+---------+--------------------------+
    | mfischer | mfischer | | matt.fischer@example.com |
    ....

    User Enabled?

    keystone wants to know if a user is enabled and shows this as a column in the output of user-list. All my results were blank, and it’s not clear that FreeIPA has a straightforward “enabled” column in our setup. At a glance this stuff seemed optimized for AD’s user-enabled mask stuff. (If you know a fix, let me know please).

    Trying Other Services

    I decided to try nova list next and guess what, it failed. It’s obvious to me now, but the puppet AIO has users created for almost all the services, like nova, cinder, glance, etc. I needed users for these. This is easy to do in FreeIPA, so I made a batch of users and restarted the node. I bet you can guess, but stuff still failed because these users didn’t have the roles that they needed. At this point, I switched the identity backend back to SQL, restarted keystone and made a map of roles to users and tenants. In my case it was pretty basic, everything needed admin in the services tenant and some needed more, here’s how to do it simply:

    for I in 'glance' 'nova' 'cinder' 'neutron' 'heat' 'heat-cfn' 'swift';
    do; keystone user-role-add --user-id=$I --tenant-id=de4442c6e54a43459eaab97e26dc21f8 --role id=a5a8ea228b1942e28289ba63fba9b3c0; done

    I did this and then bounced the node again (which is frankly simpler than restarting everything but unlikely to work in the real world). And again nothing worked! I’d forgotten one more step, each service has a password defined in it’s config file as “admin_password”. I signed back into FreeIPA and set the passwords and then bounced the node a final time.

    This time, I could sign into Horizon and everything worked great! Finally, I was using LDAP only for identity and didn’t need to switch back.

    Obviously this is much better to be done before you start your puppet install but in my case it was a great learning experience about roles/tenants/users and the tools that keystone provides.

    Debugging Tools/Hints

    Here are some more debugging tools and hints that I came across and maybe will help you if you’re dealing with LDAP and keystone:

    1. Enable debug and verbose in keystone, of course.
    2. I’ve mentioned Apache Directory Studio so you can test LDAP queries. This command also helps show what fields are available and is simpler than using ADS: “ipa user-show –all –raw”.
    3. The LDAP code in keystone has double secret logging which you can enable in /usr/lib/python2.7/dist-packages/keystone/common/ldap/core.py. You can look for “ldap.set_option(ldap.OPT_DEBUG_LEVEL, 4095)” and uncomment it. These logs only show on stdout, so you’ll need to stop the service and run it by hand to see this output.
    4. I also traced some code in pdb, in addition to the file listed above you should also look at /usr/lib/python2.7/dist-packages/keystone/identity/backends/ldap.py

    Good luck everyone!

    Tagged , ,

    Updating Keystone Endpoints

    This morning we ran into an issue when trying to get our jenkins server to talk to an Openstack cloud we have setup. The issue is that the systems are in different data centers with different firewalls and other assorted networking challenges. The network guys solved this for my by giving the control node in the openstack cloud a NAT address that I could ping. However, after doing all this, I still had issues with keystone and assorted commands (like nova) hanging.

    The first step to debugging this was to figure out why it was hanging. To solve this, I used the debug option on nova. (Technically to be honest, the first thing I did was check to see if iptables was blocking ports I needed, you may also need to do this) However, with iptables looking good, I used –debug in keystone endpoint-list.

    The first thing you can see in the output is that its successfully retrieving the endpoint list from keystone. The second thing, which was causing the hang, is that it was trying to ping an internal IP and not the natted one. Here’s some of that output with a lot of cruft removed and fake IPs:

    [mfischer@puppet-ci-01 ~]$ keystone --debug --os-auth-url http://1.2.3.4:5000/v2.0/ --os-username mfischer --os-password ubuntu4life --os-tenant-name test endpoint-list
    REQ: curl -i http://1.2.3.4:5000/v2.0/tokens -X POST -H "Content-Type: application/json" -H "User-Agent: python-keystoneclient"

    REQ BODY: {"auth": {"tenantName": "test", "passwordCredentials": {"username": "mfischer", "password": "ubuntu4life"}}}

    connect: (1.2.3.4, 5000) ************

    bunch of cruft here that basically means it connected

    connect: (10.0.0.10, 35357) ************

    hang here

    The last line was troubling because that’s an internal IP that I cannot ping from this box. The info on that line comes from keystone, in the endpoint list. This is what the list of endpoints looks like, truncated and formatted for a normal screen.

    | id | region | publicurl | internalurl | adminurl |service_id|
    | 28c...| RegionOne | http://10.0.0.10:8080| http://10.0.0.10:8080| http://10.0.0.10:8080 | 9f13... |

    So the fix here is that I needed to change these internal URLs to a more friendly and portable hostname. This is doable in two ways that I know of:

        Delete and re-create the endpoints
        Hack the mysql db

    Since Option 2 sounded more exciting, I got to work. A great overview on how to do just that is here. After reading this, I realized that I’d have to hand-update every line for the public and admin URLs. The issue I have with this process if that changing all the URLs is error prone and tedious, so instead I wrote a tool.

    The tool is called update-endpoints, and it’s hosted here. If you try it, please be careful, back up or dump your DB. I’ve only done limited testing on it and it could break your system. The basic usage is that it connects to your DB and updates the hostname/IP portion of the URL for a class of endpoints (admin, public, or internal). For example, to point all public endpoints to foo.com,

    ./update-endpoints.py --username root --password debb4rpm --host localhost --endpoint foo.com --type public

    This change should not change the port or other parts of the URL, so if you want to change those this tool won’t work for you. It seems by default on my installs that mysql can only be talked to from localhost, so I’ve been running this on my control nodes.

    After I ran the tool against the public and admin URLs, I rechecked my endpoints, and now they pointed to foo.com, which was conveniently NATd for me.


    | id | region | publicurl | internalurl | adminurl |service_id|
    | 28c...| RegionOne | http://foo.com:8080| http://foo.com:8080| http://foo.com:8080 | 9f13... |

    Better yet, all the nova and keystone commands I wanted to run worked and Jenkins was able to talk to the controller!

    If you’d like to improve the tool, please do a pull request.

    Tagged

    OpenStack: How to Grant or Deny Permissions to Features

    I’m now working on OpenStack (still on Ubuntu too in my free time) and part of that switch is lots of learning. Today I tried to answer a basic question, “How do I prevent a user from being able to create a router?” After some mostly fruitless searching and asking I stumbled upon a policy.json file in the neutron config, this looked promising. So from that start to a functional solution, follow along below.

    First, Ask the Right Question

    As I found out later, the right way to ask this is “how do I deny everyone the right to create a router and allow some people”. This is using the role based security model that OpenStack uses. Previously I’d only used the standard admin and _member_ roles. Now I had a use for a new role.

    Create and Assign the Role

    Now I needed to create a role. I did this in Horizon, by clicking on Admin->Roles. I called my new role “can_create_router”, which is probably too specific for the real world but works fine here. After creating the role, I needed to grant the role to a user. For my example, I have two users, mfisch, who cannot create routers and router_man, who can. Since I could not find how to grant a role in Horizon (Havana), I used the CLI.

    Find tenant ID

    [root@co-control-01 ~(keystone_admin)]# keystone tenant-list
    +----------------------------------+------------+---------+
    | id | name | enabled |
    +----------------------------------+------------+---------+
    | 0a7685cce7c341fd94a83b5dc5f4b18f | admin | True |
    | 0b85137ded2b45418ebfc3278675679e | proj | True |
    | 03fae7b406814ea48a4c10255dd855cf | services | True |
    +----------------------------------+------------+---------+

    Find the user-id for router_man

    [root@co-control-01 ~(keystone_admin)]# keystone user-list --tenant-id 0b85137ded2b45418ebfc3278675679e
    +----------------------------------+------------+---------+----------------------------+
    | id | name | enabled | email |
    +----------------------------------+------------+---------+----------------------------+
    | 01faa8477702463fa12ea5d6b6950416 | mfisch | True | |
    | 0586608123e343f2a0be8237029bdc2d | router_man | True | |
    +----------------------------------+------------+---------+----------------------------+

    Find the ID for the “can_create_routers” role

    [root@co-control-01 ~(keystone_admin)]# keystone role-list
    +----------------------------------+--------------------+
    | id | name |
    +----------------------------------+--------------------+
    | 0fe2ff9ee4384b1894a90878d3e92bab | _member_ |
    | 0c580f80022a4705b49b920772936178 | admin |
    | 03e83b65036a4e0cbd7cff5bff858c76 | can_create_routers |
    +----------------------------------+--------------------+

    Finally, grant the “can_create_routers” role to router_man

    keystone user-role-add --user-id 0586608123e343f2a0be8237029bdc2d --tenant-id 0b85137ded2b45418ebfc3278675679e --role-id 03e83b65036a4e0cbd7cff5bff858c76

    And validate the new role

    [root@co-control-01 ~(keystone_admin)]# keystone user-role-list --user-id 8586608123e343f2a0be8237029bdc2d --tenant-id 5b85137ded2b45418ebfc3278675679e
    +----------------------------------+--------------------+----------------------------------+----------------------------------+
    | id | name | user_id | tenant_id |
    +----------------------------------+--------------------+----------------------------------+----------------------------------+
    | 006eaf0730e44756bc679038477d3bbd | Member | 0586608123e343f2a0be8237029bdc2d | 0b85137ded2b45418ebfc3278675679e |
    | 03e83b65036a4e0cbd7cff5bff858c76 | can_create_routers | 0586608123e343f2a0be8237029bdc2d | 0b85137ded2b45418ebfc3278675679e |
    +----------------------------------+--------------------+----------------------------------+----------------------------------+

    Configure Neutron’s Policy File

    Now we need to configure neutron to allow this new role and to block everyone without it. This is not so easy since there’s no CLI yet for this. Like most really cool stuff in OpenStack, it’s probably 6 months away. For now, do it manually.

    We start by making a backup of /etc/neutron/policy.json on the control node, because we don’t want to break stuff. After that, open the file and look for the line that has “create_router”: on it. This is the feature we’d like router_man to have. How this file works is explained here in more detail, but what we need to know for now is that we only want admins and anyone with the “can_create_routers” role to be able to do it. I ended up doing it like this:

    "create_router": "rule:context_is_admin or role:can_create_routers",

    “rule:context_is_admin” basically boils down to “role:admin” so that will also work. Save the file and exit.
    Here’s my diff of the file if you’d rather see it that way:
    105c105
    < "create_router": "rule:regular_user",
    ---
    > "create_router": "rule:context_is_admin or role:can_create_routers",

    Restart neutron

    I can never remember all the neutron services, so I usually run service –list-all | grep neutron and restart everything that’s running. This is my set from today:

    service neutron-server restart
    service neutron-dhcp-agent restart
    service neutron-metadata-agent restart
    service neutron-openvswitch-agent restart

    If you know a way to restart all of neutron/nova/etc with a pseudo-service, please let me know.

    Try It!

    Log into Horizon as mfisch and then try to create a router.

    No router for you!

    No router for you!

    Now let’s sign in as router_man and see if we have the power.

    Great Success

    Great Success

    Conclusion

    I’m just scratching the surface of what you can do with these roles. The policy.json files are full of things that you can allow or deny. In the future, Horizon is supposed to take these into account when laying out the UI, because ideally mfisch in this scenario shouldn’t even see the “Create Router” button. Until then, the error message will suffice. Also thanks to Dave Lyle at HP for pointing me in the right direction this morning when I was fumbling around.

    Tagged

    pbuilder via pbuilder-scripts: A Short Howto

    There are a myriad of ways to do cross-compiles and a smaller myriad that can do chrooted debian package builds. One of my favorite tools for this is pbuilder and I’d like to explain how (and why) I use it.

    A pbuilder environment is a chrooted environment which can have a different distroseries or architecture than your host system. This is very useful, for example, when your laptop is running raring x64 and you need to build binaries for saucy armhf to run on Ubuntu Touch. Typically pbuilders are used to build debian packages, but they can also provide you a shell in which you can do non-package compilations. When you exit a pbuilder (typically) any packages you’ve installed or changes you’ve made are dropped. This makes it the perfect testing ground when building packages to ensure that you’ve defined all your dependencies correctly. pbuilder is also smart enough to install deps for you for package builds, which makes your life easier and also avoids polluting your development system with lots of random -dev packages. So if you’re curious, I recommend that you follow along below and try a pbuilder out, it’s pretty simple to get started.

    Getting Setup

    First install pbuilder and pbuilder-scripts. The scripts add-on really simplifies setup and usage and I highly recommend it. This guide makes heavy use of these scripts, although you can use pbuilder without them.

    sudo apt-get install pbuilder pbuilder-scripts

    Second, you need to setup your ~/.pbuilderrc file. This file defines a few things, mainly a set of extra default packages that your pbuilder will install and what directories are bind-mounted into your pbuilder. By default pbuilder scripts looks in ~/Projects, so make that directory at this point as well and set it in the .pbuilderrc file.

    Add the following to .pbuilderrc, substitute your username for user:

    BINDMOUNTS="${BINDMOUNTS} /home/user/Projects"
    EXTRAPACKAGES="${EXTRAPACKAGES} pbuilder devscripts gnupg patchutils vim-tiny openssh-client"

    I like having the openssh-client in my pbuilder so I can copy stuff out easier to target boxes, but it’s not strictly necessary. A full manpage for ~/.pbbuilderrc is also available to read about setting more advanced stuff.

    Don’t forget to make the folder:
    mkdir ~/Projects

    Making your First Pbuilder

    Now that you’re setup, it’s time to make your first pbuilder. You need to select a distroseries (saucy, raring, etc) and an architecture. I’m going to make one for the raring i386. To do this we use pcreate. I use a naming scheme here so that when I see the 10 builders I have, I can keep some sanity, I recommend you do the same, but if you want to call your pbuilder “bob” that’s fine too.

    cd ~/Projects
    pcreate -a i386 -d raring raring-i386

    Running this will drop you into an editor. Here you can add extra sources, for example, if you need packages from a PPA. Any sources list you add here will be permanent anytime you use this pbuilder. If you have no idea what I mean by PPA, then just exit your editor here.

    At this point pcreate will be downloading packages and setting up the chroot. This may take 10-30 minutes depending on your connection speed.

    This is a good time to make coffee or play video games

    This is a good time to make coffee or play video games

    Using your pbuilder

    pbuilders have two main use cases that I will cover here:

    Package Builds

    pbuilder for package builds is dead simple. If you place the package code inside ~/Projects/raring-x86, pbuilder will automagically guess the right pbuilder to use. Elsewhere and you’ll need to specify.

    Aside: To avoid polluting the root folder, I generally lay the folders out like this:

    ~/Projects/raring-i386/project/project-0.52

    Then I just do this


    cd ~/Projects/raring-i386/project/project-0.52
    pbuild

    This will unpack the pbuilder, install all the deps for “project” and then attempt to build it. It will exit the pbuilder (and repack it) whether it succeeds or fails. Any debs built will be up one level.

    Other – via a Shell

    The above method works great for building a package, but if you are building over and over to iterate on changes, it’s inefficient. This is because every time it needs to unpack and install dependencies (it is at least smart enough to cache the deps). In this case, it’s faster to drop into a shell and stay there after the build.

    cd ~/Projects/raring-i386
    ptest

    This drops you into a shell inside the chroot, so you’ll need to manually install build-deps.

    apt-get build-dep project
    dpkg-buildpackage

    ptest also works great when you need to do non-package builds, for example, I build all my armhf test code in a pbuilder shell that I’ll leave open for weeks at a time.

    Updating your pbuilder

    Over time the packages in your pbuilder may get out of date. You can update it simply by running:

    pupdate -p raring-i386

    This is the equivalent of running apt-get upgrade on your system.

    Caveats

    A few caveats for starting with pbuilder.

    • Ownership – files built by pbuilder will end up owned as root, if you want to manipulate them later, you’ll need to chown them back or deal with using sudo
    • Signing – unless you bind mount your key into your pbuilder you cannot sign packages in the pbuilder. I think the wiki page may cover other solutions.
    • Segfaults – I use pbuilders on top of qemu a lot so that I can build for ARM devices, however, it seems that the more complex the compile (perhaps the more memory intensive?) the more likely it is to segfault qemu, thereby killing the pbuilder. This happened to a colleague this week when trying to pbuild Unity8 for armhf. It’s happened to me in the past. The only solution I know for this issue is to build on real hardware.
    • Speed – For emulated builds, like armhf on top of x86_64 hardware (which I do all the time), pbuilds can be slow. Even for non-emulated builds, the pbuilder needs to uncompress itself and install deps every time. For this reason if you plan on doing multiple builds, I’d start with ptest.
    • Cleanup – When you tire of your pbuilder, you need to remove it from /var/cache/pbuilder. It also caches debs in here and some other goodies. You may need to clean those up manually depending on disk space constraints.

    Summary

    I’ve really only scratched the surface here on what you can do with pbuilder. Hopefully you can use it for package builds or non-native builds. The Ubuntu wiki page for pbuilder has lots more details, tips, and info. If you have any favorite tips, please leave them as a comment.

    Tagged ,

    Hacking the initrd in Ubuntu Touch

    This week I’ve been hacking some of the initrd scripts in Ubuntu Touch and I thought that I’d share some of the things I learned. All of this work is based on using Image Update images, which are flashable by doing phablet-flash ubuntu-system. First, why would you want to do this? Well, the initrd includes a script called “touch” which sets up all of the partitions and does some first boot migration. I wanted to modify how this process works for some experiments on customizing the images.

    Before getting started, you need the following packages installed on your dev box: abootimg, android-tools-adb, android-tools-fastboot

    Note: I was told after posting this that it won’t work on some devices, including Samsung devices, because they use a non-standard boot.img format.

    Getting the initrd

    The initrd is inside the boot.img file. I pulled mine from here, but you can also get it by dding it off of the phone. You can find the boot partition on your device with the following scriptlet, taken from flash-touch-initrd:

    for i in $BOOT; do                                                              
        path=$(find /dev -name "*$i*"|grep disk| head -1)                           
        [ -n "$path" ] && break                                                     
    done
    echo $path
    

    Once you have the boot.img file by whatever means you used, you need to unpack it. abootimg is the tool to use here, so simply run abootimg -x [boot.img]. This will unpack the initrd, kernel and boot config file.

    Unpacking and Hacking the initrd

    Now that you have the initrd, you need to unpack it so you can make changes. You can do this with some cpio magic, but unless you have a UNIX-sized beard, just run abootimg-unpack-initrd . This will dump everything into a folder named ramdisk. (UNIX beard guys: mkdir ramdisk; cp initrd ramdisk; cd ramdisk; cat initrd | gzip -d | cpio -i)

    To make changes, simply cd into ramdisk and hack away. For this example, I’m going to add a simple line to ramdisk/scriprts/touch. My line is

    echo "mfisch: it worked!" > /dev/kmsg || true
    

    This will log a message to /var/log/kern.log which can assist us to make sure it worked. Your change will probably be less trivial.

    Repacking

    Repacking the initrd is simple. To repack, just run abootimg-pack-initrd [initrd.img.NEW] Once you do this you’ll notice that the initrd size is quite different, even if you didn’t make any changes. After discussing this with some people, the best I can figure is that the newly packed cpio file has owners and non-zero datestamps, which make it slightly larger. One clue, when compared to mkinitramfs, abootimg-pack does not use the -R 0:0 argument and there are other differences. If you want to do this the hard way, you can also repack by doing: cd ramdisk; find . | cpio -o -H newc | gzip -9 > ../initrd.img.NEW

    Rebuilding the boot image

    The size change we discussed above can be an issue that you need to fix. In the file bootimg.cfg, which you extracted with abootimg -x, there is a line called bootsize. This line needs to be >= the size of the boot.img (not initrd). If the initrd file jumped by 4k or so, like mine did, be sure to bump this as well. I bumped mine from 0×837000 to 0×839000 and it worked. If you don’t do this step, you will wind up with a non-booting image. Once you correct this, rebuild the image with abootimg:

    abootimg --create saucy-new.img -f bootimg.cfg -k zImage -r initrd.img.NEW

    I’ve found that if your size is off, it will sometimes complain during this step, but not always. It’s best to check the size of saucy-new.img with the line you changed in bootimg.cfg at this point.

    Flashing and testing

    To flash the new boot image, reboot the device and use fastboot.

    adb reboot bootloader
    fastboot flash boot saucy-new.img
    

    Use the power button to boot the device now.

    Once booted you can go check out the kern.log and see if your change worked.

    Aug 13 16:11:04 ubuntu-phablet kernel: [    3.798412] mfisch: it worked!
    

    Looks good to me!

    Thanks to Stephane Graber and Oliver Grawart for helping me discover this process.

    Tagged , , ,

    Getting the PID and Process Name From a dbus Caller in C

    Over the past few months, I’ve been working on a dbus service (powerd) for Ubuntu Touch. Something that came up recently was the need to get the PID of the processes that call us. We were using this for statistics purposes of tracking who was holding requests, until today, when we decided to go a different direction. So this code is not landed in powerd, but perhaps it is still useful to someone. So I present, how to get the PID and process name from someone that calls you on dbus, in C.

    This code assumes a few things. You need to have a working server that handles a call of some sort. We will plug into that call to get the PID of the caller. With that in mind, let’s get started. If you want the version of powerd that does this full async, it’s here: lp:~mfisch/+junk/powerd-pids. Note that this code also incorporates some statistics creation for powerd that is not going to be put into trunk in the form that it is in this branch. Anyway, onto the code:

    Create a dbus proxy to make the PID look-up request to

    We need a dbus proxy object to talk to. This is the service where we can lookup the PID given then dbus name of the connection. I will connect to this proxy asynchronously. In my “main”, I start the connection:

        /* proxy for getting PID info */
        g_dbus_proxy_new_for_bus(G_BUS_TYPE_SYSTEM,
            G_DBUS_PROXY_FLAGS_DO_NOT_LOAD_PROPERTIES,
            NULL,
            "org.freedesktop.DBus",
            "/org/freedesktop/DBus",
            "org.freedesktop.DBus",
            NULL,
            (GAsyncReadyCallback)dbus_proxy_connect_cb,
            NULL);
    

    And then finish it later, the main result here is that dbus_proxy is set so I can use it.:

    void
    dbus_proxy_connect_cb(GObject *source_object,
                   GAsyncResult *res,
                   gpointer user_data)
    {
        GError *error = NULL;
    
        dbus_proxy = g_dbus_proxy_new_finish (res, &error);
        if (error) {
            g_warning("dbus_proxy_connect_cb failed: %s", error->message);
            g_error_free(error);
            dbus_proxy = NULL;
        }
        else {
            g_debug("dbus_proxy_connect_cb succeeded");
        }
    }
    

    In the call that your service handles, do the lookup synchronously

    I have a synchronous lookup listed first, then an async one. You should use the async one because you’re a good coder… unless you need to block until you find out who is calling you for some reason. I’ve left some powerd-isms for the function call, the source is from the requestSysState method that powerd supports. We will use the dbus_proxy object we created above to request the PID.

    gboolean                                                                       
    handle_request_sys_state (PowerdSource *obj, GDBusMethodInvocation *invocation, int state)
    {
        // get the name of the dbus object that called us
        owner = g_dbus_method_invocation_get_sender(invocation);
        if (dbus_proxy) {
            result = g_dbus_proxy_call_sync(dbus_proxy,
                    "GetConnectionUnixProcessID",
                    g_variant_new("(s)", owner),
                    G_DBUS_CALL_FLAGS_NONE,
                    -1,
                    NULL,
                    &error);
            if (error) {
                g_error("Unable to get PID for %s: %s", owner, error->message);
                g_error_free(error);
                error = NULL;
            }
            else {
                g_variant_get(result, "(u)", &owner_pid);
                g_info("request is from pid %d\n", owner_pid);
            }
        }
        ...
    }
    

    Once we have the PID, we can lookup the command line by reading /proc/PID/cmdline, my powerd code does this in the async example below.

    async dbus for fun and profit

    As I stated, synchronous is bad because it makes everyone wait, so here’s the async version.

    gboolean                                                                       
    handle_request_sys_state (PowerdSource *obj, GDBusMethodInvocation *invocation, int state)
    {
        // get the name of the dbus object that called us
        owner = g_dbus_method_invocation_get_sender(invocation);
        g_dbus_proxy_call(dbus_proxy,
            "GetConnectionUnixProcessID",
            g_variant_new("(s)", dbus_name),
            G_DBUS_CALL_FLAGS_NONE,
            -1,
            NULL,
            (GAsyncReadyCallback)get_pid_from_dbus_name_cb,
            NULL);
        ...
    }
    

    Here’s our callback where we handle the results, I left the code in that reads the process name from /proc. We have a utility function called sysfs_read that I used.

    void
    get_pid_from_dbus_name_cb(GObject *source_object,
                   GAsyncResult *res,
                   gpointer user_data)
    {
        GError *error = NULL;
        GVariant *result = NULL;
        guint pid;
        gchar process_name[PROCESS_NAME_LENGTH] = "";
        gchar proc_path[64] = "";
        int ret;
    
        result = g_dbus_proxy_call_finish (dbus_proxy, res, &error);
        if (error) {
            powerd_warn("get_pid_from_dbus_name_cb failed: %s", error->message);
            g_error_free(error);
        }
        else if (result) {
            g_variant_get(result, "(u)", &pid);
            g_variant_unref(result);
            /* safety check */
            if (pid != 0) {
                sprintf(proc_path, "/proc/%u/cmdline", pid);
                ret = sysfs_read(proc_path, process_name, PROCESS_NAME_LENGTH);
                if (ret < 0)
                {
                    powerd_debug("error reading process name from %s: %d",
                        proc_path, ret);
                    strcpy(process_name, "UNKNOWN");
                }
                g_debug("PID: %u, Process Name: %s", pid, process_name);
            }
            else {
                /* not sure this can happen */
                powerd_debug("unable to get pid info");
            }
        }
    }
    

    With that magic, I can get output like this:

    PID: 4434, Process Name: ./powerd-cli
    PID: 4436, Process Name: ./powerd-cli
    ...
    

    But what about Python?

    C is too hard you say. If you got carpal tunnel just from reading that code, I have a simple python call to do this for you, synchronously.

    #!/usr/bin/python
    
    import dbus
    import sys
    
    bus=dbus.SystemBus().get_object('org.freedesktop.DBus', '/org/freedesktop/DBus');
    print dbus.Interface(bus, 'org.freedesktop.DBus').GetConnectionUnixProcessID(sys.argv[1]);
    

    I use dbus-monitor to find someone interesting on dbus with this call. :1.17 looks like upower, so lets see if it worked:

    mfisch@caprica:~$ ./foo.py :1.17
    1988
    mfisch@caprica:~$ cat /proc/1988/cmdline
    /usr/lib/upower/upowerd

    Looks right to me!

    With the python, you can plug in the caller’s dbus name for “sys.argv[1]” and be on your way, or use the C code if you don’t want python’s overhead and think that managing pointers is entertaining.

    Special thanks to Ted Gould who pointed me to this method.

    Tagged ,