Thursday, September 27, 2012

OpenStack Nova Internals – pt2 – Services

This is a long one, so go grab a coffee.

Recently, I was working on a little OpenStack instrumentation problem. Let me tell you about it.

"You can't manage what you can't measure." is how the old saying goes. Certainly no greater truth than trying to increase software performance. This is what I was tasked with ... measuring OpenStack performance.

As I mentioned in Part 1, OpenStack uses Rabbit MQ to send messages between services (a service being an OpenStack Nova component like scheduler, compute, network, api, etc.)

I could already measure times from the client perspective using novaclient's --timings option. I could measure times inside a service using tach, which could be sent up to statsd or graphite. But there's another black hole that calls could go in ... Rabbit itself. As I mentioned, we stuff an RPC call on the queue and the related service picks it up and handles it. As every admin knows, if the processing service (the "worker") is too slow, the queues can grow quite large. We need to track that.

For our RPC calls, the "inflight" time is pretty simple on the surface:

Total Call Time = Time in Queue + Time in Service (+ Time in Queue for the response on a two-way call)

We could certainly monitor the message rate in rabbit and look for the change in processing speed, but we can get more information by injecting a real message in the system. For a normal production system you'd certainly want to watch both. Ideally, we can find out what the cause of the slow down is, or at least get some hints.

So, I thought I'd write a little utility program to inject a ping()-like message into the queue and send it to each service (you'll see why in a bit). We'll record the most basic information:
1. Time to the service
2. Time from the service

But non-HTTP OpenStack services are built on eventlet which, as I mentioned before, is a sync library. While these libraries make programming easier by eliminating most classic thread/locking problems, the downside is that the number of greenthreads can grow. And a call can get held up when the number of greenthreads grows. Most nova services are pretty light weight, but some have to make HTTP or out-of-process binary calls to get the work done.

For example,
  • the network service may have to talk to the switch, 
  • the compute node may have to talk to a busy hypervisor or 
  • the image service might be slow. 
The services have to be able to handle this rush of calls.

Can we track that?

The most basic way would be to run a time.sleep(1) call, during which eventlet will just pass control to another greenthread and revisit it later. But what if eventlet is busy due to a lot of greenthreads? Our 1 second call will take longer. That's our overhead. (Later, we can talk to eventlet directly and ask how many greenthreads are active. But that's another post.)

That was the plan ... so how to implement this? Here were my requirements:
  • I want something that can get a list of all active / enabled services and ping them. But that means I need to put a ping() method in every service in OpenStack Nova. Where should that live? 
  • Also, this program is going to run a long time, so I'd like to leverage existing service deployment and process control scripts (puppet/chef/etc). 
  • I'd like to reuse the Nova RPC library rather than duplicate that effort. 
  • But I also need to talk to the database ... can I reuse the Nova DB library? 
  • This is all going to require a configuration file to set up ... could I reuse the Nova configuration mechanism? 
  • I'll need unit tests. Nova has a very nice testing framework that integrates nicely with the continuous integration / code review system. The mechanism will have to change as Nova changes. I don't want to have to wait from the users that the protocols are out of sync (in Nova, the API is strictly versioned, but the RPC protocol is only loosely versioned).
Perhaps just writing a utility program isn't the correct approach. Perhaps I should make this another Nova service.

Perhaps I can bite off more than I can chew and try to get this accepted as a core service? If not, I'll refactor and pull it out as an optional external service.

Also, since the service is so simple I thought it would make for a good "how to" post ... so here we are!

Alright, enough chat ... how do we do that?

Let's start with the launcher. The ./bin file that going to fire this puppy up. We're not an HTTP service, we don't need paste or auth or any of that stuff. We just need to get our configuration and spin up eventlet. Just look at the launcher for the compute node https://github.com/openstack/nova/blob/master/bin/nova-compute

Pretty simple: load stuff, load stuff, load stuff ... we're "main" so parse the args, configure logging, place our monkey patches (another time :), create the service, make it available, pump the events until we die.

Of course, the magic line is service.Service.create(binary='nova-compute'). Something funky is going on there. Somehow that thing is finding the implementation for the compute service. Let's look at the ./nova/service.py create() method. Yes, that's magic in there.



"nova-compute" turns into "compute", which evaluates to "compute_manager", the flag "--compute_manager" is looked up for the code to load. The default is defined in nova/flags.py

And, nova.compute.manager.ComputeManager is loaded (or whatever you set it to in nova.conf)

Oh, notice the --periodic_interval flag in there? That's pretty cool, that's how often our internal timer should trigger. It's like our built-in cron service. We're going to use that to issue our ping()'s.

So what does a nova.???.manager.???Manager look like? Let's make one. In our case it will be nova.inflight.manager.InflightManager (we'll have to make a new ./nova/inflight directory for it with an empty __init__.py file)

This is the core of the inflight service I wanted (there's some other sugar in there to actually do the work, but that's not really important for here). What is important are lines 80-89. This is the handler for the periodic task event that will occur every N seconds. In our case, we're going to send a ping to the first item in the list of services and then move that service to the end of the list. A circular queue of pings.

Of course, to make this work we will need to include an --inflight_manager=nova.inflight.manager.InflightManager flag to our nova.conf

And, for illustration, you can see how we can add new flags to nova. The nice part is we only have to define them in the place they are used and the framework will include them in the grand configuration.

When we run ./bin/inflight-manager executable it will launch the framework and load our new InflightManager class. And then, every few seconds, the check_inflight() method will get called.

Next, look at lines 56-60. This little dictionary is a map of the service type to the api for that service. If we look at the topic column in the Service table (or service['topic'] in the result set) we'll have the type of the service we are hoping to talk to. This lets us talk to it in the correct way.

Now it gets a little tricky. Since I wrote part 1 of this series there have been some changes to the RPC abstraction in OpenStack. Previously I mentioned there was an API onto the service which was responsible for marshalling parameters and results to/from AMQP. This is still the case, but now there's a new layer just underneath it. (Most) all services now also have a related rpcapi.py file. The difference between my_service/api.py and my_service/rpcapi.py is as follows:

  • api.py is the thing that other services should use to talk to the service, just like before.
  • rpcapi.py handles the light-weight versioning of the RPC protocol that I hinted at earlier.
The idea is, eventually, we'll be able to have a means to mix old and new services in large deployments by impedance matching the RPC protocols. Currently, it'll just puke.

Since our inflight service should be able to test itself, and it will show up in the list of Services in the database, we're going to need a ./inflight/rpcapi.py file. The meat of it is pretty simple:

Note the version number in there. We'll need to bump that whenever we change the api. The topic is just the name of the queue within Rabbit that the message will be written to. In the same way, the service framework will look in the "inflight" queue topic for methods to call. We'll have to add that --inflight_topic flag to our nova.conf.

Where are we?

We have a service than can launch and generate calls to other services periodically. We have an RPC API on our service so other services can call us (in reality we'll only be calling ourself).

What we're missing is an inflight() call in each of the other service API's. OOP 101 says we can put this in a common base class and make it available to all services. And it's almost just that simple :) Then, we need to put a handler in each of the service ???Manager classes to actually do the work.

All of the RPC stuff within nova has been moved into openstack.common, since it's something that glance, quantum and other components can use if desired.

In order to add this common inflight() method to our base RPC API class, we're going to have to make a change to openstack.common. This is a different project. What happens behind the scenes is this: When something is submitted to openstack nova, it gets merged in a different branch. Our CI tools layer the openstack.common project on top of it. So, when we git clone nova we get a copy of openstack.common ... but we shouldn't make changes to it in nova itself. Instead we need to make our changes to openstack.common separately and remerge with nova trunk to see those changes in our working branch.

Tricky and somewhat confusing, I know. But it makes sense in the big picture.

Just remember, don't mess with stuff in ./nova/nova/openstack/* unless you're in the openstack.common repository.

Again, the flow is:
Service API -> Service RPC Proxy -> AMQP -> Service Manager

Let's add inflight() to the rpc proxy object in openstack/common/rpc/proxy.py of the openstack.common project (the baseclass for all rpcapi implementations)

There. Now we can call inflight() from our InflightManager class and have the message sit on the wire.

Next we need to add a common inflight() implementation to each of the service managers (including our InflightManager). Once again, we'll add it to the base class of all the Managers. Remember, the Managers are the part of the service that contains the implementation of the service methods.

Fortunately, this part is pretty easy. All Managers derive from nova/manager.py and don't really have to sweat much about the versioning since that's the senders job. We just have to add our implementation which, in this case, will just spawn a greenthread that sleeps for a second and returns the actual time.

Phew. We did it.

In reality, there are some other minor tweaks and actual code to make this service really useful, but the purpose here was to illustrate making your own service. If you like, you can look at the whole thing in the review branch.

If this was at all useful I look forward to your feedback in the comments or via Twitter.

Next time, we'll look at the HTTP interface and how REST calls are dealt with.


6 comments:

John HTran said...

The review patch got abandoned?

@TheSandyWalsh said...

@John, yeah, it was too close to Folsom's release, so I decided to wait until Grizzly opened to fight for it :) I'll sync it again soon.

Anton E. Self said...

Great post, Sandy.

Brent told me you were working on Openstack stuff. Cool.

Cheers, Anton

ChocWarrior said...

Thank you for a clean looking post. Seems like esoteric stuff for a 'cheap' ping, but I have never looked under the hood in openstack.

I imagine it would be simpler in C on top of tcp-IP if I have to use some other rpc to do the same job, should it seem so hard!

Bill Moy said...

Has anything further been done with this 'inflight' code within OpenStack?

I also noticed your wiki entry https://wiki.openstack.org/wiki/UnifiedInstrumentationMetering
Has anything been further developed as far as unifying instrumentation?

For instrumentation, would you recommend StackTach with Ceilometer?

Thanks.

Sandy Walsh said...

@Bill ... We found we could get all the same information from the *.start / *.end notifications. So, we added some reports to StackTach to handle this.

For classic instrumentation (aka "how are things running") I would strongly recommend using statsd/graphite. They're the most mature products for this. Also look at the Tach project which can monkey-patch openstack to generate statsd data. (link in the Debugging Openstack post)

There is Ceilometer, but it's way overkill and the data collected is questionable.

Hope it helps!