imoverclocked

Friday, July 18, 2025

How Reordering Files Reduced My Archive Size by Half

Background

I have a solar inverter made by Fronius that uploads json to an ftp server every ten seconds. There are two different json files (*.powerflow and *.interverter) with slightly different information. Every day, this means there are about 10k files and the file sizes vary but are usually between 500-1000 bytes each. On the local filesystem that equates to about 40MB per day. Over a year, we are talking about 3-4 million files (14-15GB) which can exhaust inodes on a reasonably sized ext4 filesystem far before space is a concern.

To manage this, I roll stats up daily. The stats are published into a folder named via YYYYMMDD by the inverter and I slurp those up into InfluxDB and then create a compressed tar archive called YYYMMDD.tar.xz for an ultimate backup.

This is a good system and each year compresses into around 50-100 MB. Already, we see that we are 1000x better in terms of space and inode usage. Maybe we can do better?

Example File Listing

/anon-ftp/fronius/photosynth/processed/20250717# ls -la | head
total 40540
drwxr-xr-x 2 root root 512000 Jul 18 08:05 .
drwxr-xr-x 10 root root 40960 Jul 19 01:35 ..
-rw------- 1 ftp ftp 738 Jul 17 17:14 100450.solarapi.v1.inverter
-rw------- 1 ftp ftp 802 Jul 17 17:14 100450.solarapi.v1.powerflow
-rw------- 1 ftp ftp 738 Jul 17 17:14 100500.solarapi.v1.inverter
-rw------- 1 ftp ftp 802 Jul 17 17:14 100500.solarapi.v1.powerflow
-rw------- 1 ftp ftp 738 Jul 17 17:15 100510.solarapi.v1.inverter
-rw------- 1 ftp ftp 802 Jul 17 17:15 100510.solarapi.v1.powerflow
-rw------- 1 ftp ftp 738 Jul 17 17:15 100520.solarapi.v1.inverter
...

When we do the obvious tar invocation, the files are added in a somewhat random order.

# tar -cJvf 20250717-obvious.tar.xz 20250717/ | head
20250717/
20250717/152720.solarapi.v1.inverter
20250717/193700.solarapi.v1.inverter
20250717/154110.solarapi.v1.powerflow
20250717/163020.solarapi.v1.inverter
20250717/230850.solarapi.v1.inverter
20250717/130120.solarapi.v1.inverter
20250717/102720.solarapi.v1.inverter
20250717/213220.solarapi.v1.powerflow
20250717/110950.solarapi.v1.inverter
...

and we are left with the resulting tar.

Does ordering the files make a difference?

Based on how compression technology works, it makes sense (in my head) that similar content right next to each other would be compressed better than mixed content. If I do a simple experiment, maybe I can validate this and see how much of an effect it has.

# tar -cJf 20250717-manual-sort.tar.xz 20250717/*.powerflow 20250717/*.inverter
# du -c *.tar.xz
116 20250717-manual-sort.tar.xz
212 20250717.tar.xz

Holy smokes! With almost no work, we are using about 54% of the space of the original tar invocation! (Yes, I did validate that both produce the same directory structure/content with diff -r)

Can I apply this anywhere?

I decided to download a dataset with lots of files. The easiest to think of was a linux kernel tarball for 6.16-rc6 which has 89,677 files and unpacks to 1.7G on my local filesystem. To order files here, I needed to write a utility since I couldn't do as simple of a tar invocation as I did for my stats collection.

The script: https://github.com/imoverclocked/orderfs

I played around with different orderings and binning based on sizes. As it turns out, the linux kernel source is structured in a way that is pretty optimal already:

# du -h *.tar.gz
241M linux-6.16-rc6.recompress.tar.gz
249M linux-6.16-rc6.sorted-bins.tar.gz
264M linux-6.16-rc6.sorted.tar.gz
241M linux-6.16-rc6.tar.gz

The variants above:

recompress - uncompressed the original tar and then recompress locally with vanilla gzip
sorted - apply a rough sort with filetype based on extension and then by size
sorted-bins - apply an extension-sort and prefer to keep smaller files sorted by path

The vanilla recompression was within a couple hundred bytes so my gzip is slightly different from the one kernel org is using. Maybe they use --best or some other flag that I didn't bother with.

Sorting by extension and then putting files in ascending size actually made things worse. I suspect (SWAG) that the pathnames in random order made it harder for gzip to find larger common chunks in the tar headers.

Sorting by extension and then by path made things better but still not as good as the default directory structure. (Well done kernel folks!)

Summary

While not extensively tested, the script does provide a definite win for my local stats in terms of percentages. The technique already seems to be used by the kernel.org structure, which is cool. Maybe others can use the script to see if restructuring their archives makes a significant difference for them.

Friday, May 6, 2022

IPv6, WiFi access point, nftables, and proxy_ndp

For reasons I don't care to defend, my internet connection looks something like this:

Since my firewall is connected to the hotspot via WiFi, bridging to my internal network is out of the question. While there are many tutorials and approaches to getting IPv6 into a topology that looks like this, none of them work for me for various reasons. eg: the hotspot will not delegate a prefix via dhcpv6

I really want IPv6 so ... let's see what we can do!

proxy_ndp

First of all, we need to fool the hotspot into thinking that the Linux WiFi link has a bunch of addresses that actually belong to other computers behind it. IPv6 uses neighbor discover protocol (ndp) and the linux kernel provides a feature called proxy_ndp. When enabled and configured, it does exactly what we want!

I couldn't find any real automation surrounding this technology. However, it does allow my wifi interface to expose a number of IPv6 addresses that are behind the Linux firewall. Manual configuration looks something like this:

# enable proxy_ndp on the wireless link
sysctl -w net.ipv6.conf.wlan0.proxy_ndp=1

# tell the kernel to be discoverable for an IP it
# doesn't actually have
ip -6 neigh add proxy 2001:db8::100

There is a user space daemon which looks somewhat unmaintained called ndppd that I didn't have a lot of luck with. In theory, it automatically provides neighbor discovery between two network interfaces. We'll bridge that gap in a bit.

radvd

For our second step, we need to configure the nodes on the internal network to use the same prefix as we are getting from the hotspot. This is going to make routing a little fun but we'll get back to that in a moment.

There are a lot of things which will advertise routes for you. For example, dnsmasq can do it. I'm going to use radvd since that's all it does. Feel free to get fancy with something else here.

# Find the current prefix for wlan0
ip -6 route show dev wlan0
(for illustration, maybe it shows 2001:db8::/64)

# install/configure radvd
apt install radvd
vi /etc/radvd.conf
systemctl start radvd

sample radvd.conf:

# eth0 is the internal network interface

interface eth0
{
AdvSendAdvert on;
AdvDefaultPreference low;
AdvHomeAgentFlag off;

# Auto-configured prefix from wlan0
prefix 2001:db8::/64
{
AdvOnLink on;
AdvAutonomous on;
AdvAutonomous on;
};
};

more-specific routes

For our next step, we need to get the linux firewall to pass packets in the right directions. Since we only have a single IPv6 /64 prefix to work with, our default route and prefix is configured on the WiFi link while the internal network nodes are also provisioned on our internal link. This means we need a bunch of specific routes. Manual configuration looks something like this:

# Add a specific route for 2001:db8::100 to the internal network

ip -6 route add 2001:db8::100 dev eth0

nftables

Next, we need to get the kernel to actually forward packets. Let's use some firewall rules (these are quite loose, you'll probably want to tighten them up for your needs) and get our internal network online.

example /etc/nftables.conf:

define DEV_INTERNAL = eth0
define DEV_INTERNET = wlan0
define V6_PREFIX = { 2001:db8::/64 }

table inet filter {
chain forward {
  type filter hook forward priority 0;
  policy drop;

      # stateless acceptance of all traffic to/from the

      # internal network on the configured prefix
  ip6 saddr $V6_PREFIX iifname $DEV_INTERNAL accept;
        ip6 daddr $V6_PREFIX oifname $DEV_INTERNAL accept;

      # these can help with debugging
       ip6 saddr $V6_PREFIX log prefix "filter-forward (s) " drop;
       ip6 daddr $V6_PREFIX log prefix "filter-forward (d) " drop;
       log prefix "inet-forward (dropped) ";
}
}

You will also need to enable forwarding and the nftables service for this.

# Enable ipv6 forwarding for all interfaces
sysctl -w net.ipv6.conf.all.forwarding=1
# reload firewall rules
systemctl restart nftables

Adding more nodes

At this point, only one node (2001:db8::/64) is online. It would be annoying to add each node manually, so we'll grab connection attempts and turn those into the requisite commands to run.

example nftables.conf:

define DEV_VETH veth0

table netdev v6traffic {
   chain ingress-internal {
       type filter hook ingress device $DEV_INTERNAL priority 0;
       ip6 saddr $V6_PREFIX tcp flags syn dup to $DEV_VETH;
   }
}

example /etc/network/interfaces.d/veth0:

auto veth0
iface veth0 inet manual
   pre-up ip link add veth0 type veth peer veth1
   pre-up ip link set veth1 up
   pre-down ip link del veth0 type veth peer veth1

example hackish script (ymmv):

#!/bin/sh

tcpdump --immediate-mode -i veth1 'ip6' -l 2> /dev/null | \
awk -W interactive -F. '
/IP6/ {
sub(/.*IP6 /, "");
print "ip -6 neigh add proxy " $1 " dev wlan0";
print "ip -6 route add " $1 " dev eth0";
}' | sh -v

Save the above script somewhere (heck, rewrite it and make it better!) and let it run in the background. As nodes attempt to connect through the linux firewall, the nftables-dup-to-veth0 will trigger the proxy_ndp and specific route lines. There will be lots of duplication but fixing that is an exercise left to the comments :)

Wednesday, October 14, 2020

Using Telegraph to collect local PurpleAir sensor data

If you are like me, you enjoy breathing. Perhaps you are not as keen on it but you certainly notice when it starts getting uncomfortable to breathe from heavy smoke or smog. After the latest round of California wildfires, I decided to purchase a PurpleAir sensor so I could know how the air inside my house and outside of my house were likely to affect me. I also have no central AC so understanding what happens when I open my windows turned out to be fairly transformative. I chose PurpleAir because a running friend linked me to it before coordinating a run and it has a fair network of sensors whose data is aggregated and displayed on a map for all to see. The data is also collected for WUnderground and its supporting entities. There was no node within 10 miles of my house and there are now several within 3 miles.

Enough backstory! Let's dive in...

Goals:

1) Understand current and historical data from my shiny new sensor

2) Find sources of air pollution in my home

3) Measure effectiveness of different filters at reducing AQI

I like the aggregated data on the PurpleAir website but they make it difficult to really get the most from the data. Also, they don't expose the full range of measurements on the website. While I appreciate this from a slight privacy standpoint, I need access to it for my own purposes.

My solution:

Setup Grafana, InfluxDB and Telegraf on a small local server (think: RaspberryPi or RockPro.)

There are many tutorials on getting these technologies set up with one-another so I'll just focus on the Telegraf configuration as well as displaying an example Grafana dashboard. It's up to you to fill in the rest!

Telegraf config:

[[inputs.http]]
interval = "10s"
urls = [
"http://purple-outside.example.com/json?live=true"
]
method = "GET"
timeout = "2s"
data_format = "json"
name_override = "purpleair"
tag_keys = [
"SensorId",
"Geo",
"Mem",
"place",
"version",
"hardwareversion",
"hardwarediscovered",
"wlstate",
"status_0",
"status_1",
"status_2",
"status_3",
"status_4",
"status_5",
"status_6",
"status_7",
"status_8",
"status_9",
"ssid",
]
## Unclear that this does the right thing with timezones/utc as the reference
## time in go is specific to MST.
# json_time_key = "DateTime"
# json_time_format = "2006-01-02T15:04:05z"

"purple-outside.example.com" can be replaced with your sensor's name or IP address depending on your network setup.

This configuration takes the JSON data from the sensor every 10 seconds and treats SensorID/Geo/Mem/place/etc... as tag data in InfluxDB.

Grafana dashboard AQI config:

For those that like to edit queries, the textual version of this config is:

SELECT mean("pm2.5_aqi") FROM "purpleair" WHERE ("url" = 'http://purple-outside.example.com/json?live=true') AND $timeFilter GROUP BY time($__interval) fill(linear)

If you have a fancy two-sensor version, you can add a second query with "pm2.5_aqi_b" instead of "pm2.5_aqi"

AQI graph

I purchased two sensors; One for inside and one for outside. As such, I added three queries and colored the two outside sensors as blue and the inside one as yellow. The resulting graph looks like this:

Can you tell when the stove is being used? (Hint: it's those tall yellow peaks!)

Temp/Humidity

The sensor also exposes fine-grained counts of different particle sizes, temperature, humidity, dewpoint and and more. For example, instead of "pm2.5_aqi" you can select "current_dewpoint_f", "current_temp_f", and "current_humidity" and create a dual-axis graph:

Pressure

Particle Size Counts

Results

I found that one of my filters does next to nothing for my air quality while my Honeywell HEPA filter brings the AQI of my house into the single digits from triple digits after some hours of running.

Also, cooking without adequate ventilation releases a surprising amount of fine particulate matter (but it sure smells good!)

Final Remarks

I hope this helps someone else breathe easier. Just knowing what your air quality is can be very helpful for understanding sources of congestion/sleep/wheezing etc. Thanks for reading!

Friday, December 11, 2015

For the love of bits, stop using gzip!

Everytime you download a tar.gz, Zod kills a kitten. Everytime you generate a tar.gz, well, let's keep this family-safe.

In 2015, there should be very few reasons to generate .tar.gz files. We might as well just use .zip for all the progress we have made since 1992 when gzip was initially released. xz has been a thing since 2009, yet I still see very little adoption of the format. Today, I downloaded Meteor and noticed that it downloads a .tar.gz. I manually downloaded the file and then recompressed it using xz:

$ du -h meteor-bootstrap-os.osx.x86_64.tar.*
139M meteor-bootstrap-os.osx.x86_64.tar.gz
67M meteor-bootstrap-os.osx.x86_64.tar.xz

Seriously, less than half the size! Maybe it's the amount of time is takes to compress? Let's see:

$ cat meteor-bootstrap-os.osx.x86_64.tar | time xz -9 -c > /dev/null xz -9 -c > /dev/null 165.21s user 1.14s system 99% cpu 2:47.70 total

$ cat meteor-bootstrap-os.osx.x86_64.tar | time gzip -9 -c > /dev/null
gzip -9 -c > /dev/null 35.03s user 0.23s system 99% cpu 35.583 total

Ok, so compressing takes longer. You have to do it once. It's still on the order of reasonable for something that compiles a 600MB tarball in the first place. What about decompressing?

$ time xz -d -c meteor-bootstrap-os.osx.x86_64.tar.xz > /dev/null
4.25s user 0.08s system 99% cpu 4.327 total

$ time gzip -d -c meteor-bootstrap-os.osx.x86_64.tar.gz > /dev/null
1.35s user 0.04s system 99% cpu 1.389 total

... and decompressing takes a little longer. But, wait a second, how long does it take to download the file in the first place? I'm on a decent connection and the file is being hosted on something that delivers the content at an average of (say) 1.5 MB/s. That's 88 seconds for the .tar.gz and 42 seconds for the tar.xz. Since the content is streamed directly to tar (a la: curl ... | tar -xf - ), we actually don't see a time slowdown because xz is slower, we see an overall speedup because the slowest operation is getting the bits in the first place!

What about tooling?

OSX: tar -xf some.tar.xz (WORKS!)

Linux: tar -xf some.tar.xz (WORKS!)

Windows: ? (No idea, I haven't touched the platform in a while... should WORK!)

Why am I picking on Meteor? Well, they place the tagline of "Build apps that are a delight to use, faster than you ever thought possible" right on their homepage. I just ran their install incantation and timed it:

./install.sh 2.32s user 7.67s system 14% cpu 1:10.53 total

70 seconds! Nice job! I must have downloaded it slightly faster than in my initial testing. It also means that the install is extremely limited by download speeds. So ... I can easily imagine this being twice as fast. All that needs to be done is change the compression format and I should be able to install this in 33 seconds!

So, who *does* use xz? kernel.org. Also, the linux kernel itself optionally supports xz compression of initrd images. Vendors just need to pay attention and turn the flags on. Anyone else want to be part of the elite field of people who use xz? Please?

Sunday, July 27, 2014

Runners Manifesto -- draft

It seems like the running community needs something along the lines of what the Hacker Manifesto was meant to be, except for running. There are many articles describing running to runners but perhaps this can be a document to introduce non-runners to runners... or even a document for runners to cherish.

Hi, I am a runner. I may not run faster than you, but I might run a lot further. Or maybe I can run more consistently every morning or evening. Or maybe I just enjoy lacing up a pair of shoes, or running barefoot.

First, let's be clear. I don’t run away from things, I run towards them. I make goals, have dreams and connect with other runners because we have a fundamental understanding of each other. We like to run.

Why do I run? Well, why do you breath? Running is kinda like breathing. Yeah, I could stop for a while but I would start getting light headed and dizzy… kinda like before a marathon. You really don’t want to bother me then.

I, like so many others, used to not understand runners. In high school, I played soccer and I could run. Im college, I biked around quite a bit. It wasn’t until life forced me to discover new things that I finally found my inner runner. Once I did, I realized that running fast was fun. Running far was fun. Trying to do both was dangerous… and also fun. Ultimately, neither was fundamental to what I did on a daily basis but somehow running becomes a way of life.

I was lucky when I started; I met a group of runners who had run a long time. There were wise old owls to give sage advice and young yet dedicated runners to keep the energy up. Just about everyone could whup my butt except I found that it didn’t matter. We were all friends and supported each other in our goals. Were we running the same race? Cool! Let’s get together and keep each other motivated to do better. Running similar races? Awesome … let’s get together and help keep each other motivated. Completely different races? Let’s get together and keep each other motivated. Essentially, I walked into what was a hidden community that hides in plain sight and supports itself.

What defines a runner? For me, its someone who gets out there and pushes themselves to achieve their goals. It may start with running and seeing a 5k to accomplish, and then trying to beat their 5k time. Maybe it takes you twice as long to do a 5k as me, that means you are working for twice as long to do that and I respect that. Maybe you respect that I can do it twice as fast. Either way, you rock for doing it.

If you stop running, are you still a runner? Yeah, I think so. Running is a way of life, not an activity you do. I’ve met people who used to run and I can tell they are runners. They say they used to be runners but I know better. They still run, just not on their feet.

If you are a non-runner, that’s cool too. Please don’t ask why I run and I won’t ask why you don’t. Also, if you want to challenge me to a race, that’s awesome. Come and train with me and we’ll run whatever I’ve signed up for next.

See you out there!

Wednesday, March 19, 2014

Host configuration -- use the hostname to configure the network

As I imagine many people to be, when bringing up hosts, I'm still stuck in the days of:

Step 1: (node) configure networking
Step 2: (node) configure the hostname
Step 3: (service1) configure DNS to match
Step 4: (service2) plug this into some form of configuration management
Step 5: profit

The problem is that you often have a complicated back and forth between configuration of the node and configuration of the node. If you decide to semi-automate this you might try and add dhcp into the mix:

Step 1: (service1) Allocate IP address for a node
Step 2: (node) Get MAC address from node
Step 3: (service2) Plug MAC+IP into DHCP configuration
Step 4: (service1+service2) Push out changes to dhcp/dns
Step 5: (node) initialize networking using DHCP
Step 6: profit

Adding in IPv6 and SLAAC, things get worse since you have to grab the SLAAC address after networking is brought up and that means two different changes to DNS. Pretty soon, you are spending several minutes just moving basic data between services on your network.

ENTER IPv6 and a slight amount of thought:

I recently had some other push factors (related to our private cloud) to try and minimize the effort spent here. I now use SLAAC and an IPv6 DNS address as a temporary configuration point. Then I query DNS for the information I need to configure the node. This basically just comes down to IPv4 and IPv6 address at the moment:

Step 1: (service1) Allocate IPv4 and IPv6 address
Step 2: (node) bring up node with temporary (SLAAC) config
Step 3: (node) query DNS for addresses and plug the values into a static configuration
Step 4: profit

Now all I need is a script on the new node to make this much faster and viola! My configuration is down to two steps:

Step 1: (service1) Allocate addresses
Step 2: (node) run: init_host new-hostname
Step 3: profit

Of course, init_host runs dig and plugs values into the key places. Finally, it runs some form of configuration management (puppet in our case) to get the rest of the host configured. Since many hosts are virtual instances under OpenStack, we can simply leave an init_host script in the base image for convenience. Time taken to bring up a new node has gone from minutes of error-prone copy/paste to seconds of error-prone typing. I'm much more likely to be careful over a period of seconds with a few steps than a period of minutes with many steps. Hopefully our infrastructure will benefit ... and hopefully yours will too!

Monday, March 3, 2014

Napa Valley Marathon - 2014

I live in a beautiful city that is practically ideal for runners in the winter. There are long flat areas to run as well as hilly and mountainous regions to train in depending on your goals. There is also an abundance of sunshine and ever so slight lack of oxygen at 2500' above sea level. To clarify, it's an ideal place for winter training for an early spring marathon since the summer gets pretty hot.

This training season has been non-ideal in many ways (stress fracture which meant one month with no running) and very little speed/strength training. I made my focus entirely on my long runs which I was just barely able to schedule before the marathon with two 20 mile runs on two successive weekends. The last one was on flat-as-a-pancake pavement with two weeks (one weekend) worth of tapering before the marathon. Given my somewhat ad-hoc training plan, I didn't have a solid goal pace in mind and was planning on winging it somewhat.

On race weekend, my diet varied from very decent hotel breakfasts to dark chocolate peanut butter cups. Suffice to say, I should have been a little more regimented with my food. Pre-race dinner wasn't bad but I probably could have done better by adding a potato somehow for potassium. I had chicken, rice and veggies. I did manage to avoid wine and drank plenty of water throughout the day so that I would be able to spend less time worrying about hydration during the run. In hindsight, I may have had a little too much water and not enough salt/potassium to stay balanced.

Ok, ready, set, go! I woke up on race morning with very little sleep. My neighbors in the hotel decided to come back at 2 am in the morning the previous night and have an hour long conversation. This night wasn't as bad since they started at midnight instead of 2 am. Of course, I had to wake up at 4:00 so I could leave by 4:30 to walk to the busses that left the finish area at 5:15 for the race that started at 7. To be fair to the race directors, I think this was just enough time for all logistics to be worked out for the crowd but I did spend 30 minutes on the bus after we arrived waiting to be kicked off. I think I will try to stay in Calistoga (close to the start) instead of Napa for those precious extra few hours of sleep if I run this again.

Chatter on the bus while driving up was the usual range of topics between runners. "This is my first marathon" ... "doesn't matter if it's your first, your 50th or you are 20 or 60, the feeling at the beginning is always the same" ... "does anyone know what the weather is supposed to be?" Ok, the last one may not be as typical on race day but the weather report had varied wildly in the past week. The forecast went from "no rain" on race day to "it's gonna rain" and everywhere in between. The night before the race I checked and there was a 0% chance. I took that and planned for only slightly worse. Thankfully I did plan for slightly worse.

At the start, I saw some friends in passing and made a few new ones. I saw a guy standing on his own and decided to make simple conversation. Jim would be happy for a sub-4 and hadn't run a marathon in a while. He asked what I was planning on and I let him know that I was hoping for 3:20 - 3:40 but was going to just see what happened. He seemed impressed and we continued chatting for a little bit. I saw another friend a little bit ahead, went to wish her good luck and then made my way back through the pack so I didn't get caught up in the rush of the start. I tend to go out to fast if I am too far in the front and so I compensate by starting a little further back and taking it easy for a mile.

After some poetry, the national anthem and the usual positive heckling of the announcer, we were off! I started further back than usual because people were walking all the way up to the starting line (which was chip timed.) I wasn't trying to PR so I did my best to just go with it and then gently pass people until I reached a slightly faster group. I did my usual identify people by race shirts they had on and say, "go Ragnar!" for people with Ragnar shirts. This worked well for a while until about mile 2 where I was able to hit my target range.

I caught up with some friends who were running the course for the heck of it. They originally signed up for the race but had injuries during training and thus planned on not doing it. The night before, there was wine and "good" decision making so the whole group decided to start the race and see what happened. I caught up to a few people in the group and passed them without realizing it was them. Once I had gone a little ways ahead they called me on it and caught up. We ran together for a short period and they definitely had their own conversation going. I decided to let the ladies talk while I keep focusing on my run which was starting to feel increasingly difficult.

I was breathing much harder than I would have at the same pace which seemed odd given that I was 2000' of elevation lower than usual. Eventually they pulled past and I started throttling my pace back. My body wasn't hurting or feeling tired but I was starting to feel a lot of mental resistance; Far more than should happen around mile 3 or 4. I couldn't really lock on what was problematic and so I equated the stress to negative thoughts and tried pulling myself out of them.

I passed Jim and he surprised me, "Hey Tim, way to go!" I had to look back a little to recognize him and cheered him on too. There was still a lot positive energy in the air and some surrounding runners chuckled at the exchange. I think he finished with, "I'll see you at the end!"

The spectators were great along the sides of the road. Nice posters and positive (but not over the top) comments for the passing runners. One group was even offering an entire line of high-fives. I think that was the fastest part of the course for me. I am a total sucker for that sort of thing and love to play into it. My gps log shows a single peak at 2 minutes faster than my target pace. I'm glad only one crowd was that organized with the high-fives.

I kept the pace, or at least I thought I was for a while. The resistance was getting unusually difficult to wade through. I tried everything to get positive thoughts flowing. I looked around at the beautiful grape fields and hills, enjoyed the cool temperatures, appreciated the runners around me, and came back to thinking of wine at the end. Just pushing through wasn't working either. Nothing helped. It's a downhill race with an uphill battle. Finally, I dropped my pace by two minutes for a half mile to try and reset a little bit. There was no way I could sustain this mentally even though my body was feeling mostly ok. Finally, I decided to walk. That's a tough decision for me, especially before mile 13.

The resistance I felt just would not give up. I started employing tactics such as 1 mile on and then a short walk/jog rest to try and break out of the head space. Once I figured out that I could walk, I started walking more and my overall pace was starting to really suffer. Jim caught up to me and asked what was wrong. I just kinda shook my head and said, "it's all mental." He knew exactly what I was saying and gave me an encouraging pat on the back. After talking for a little bit he continued on, "I'll wait for you at the finish!" At that point, a course motor cycle monitor looked back and asked for a thumbs up/thumbs down. I gave him a thumbs up but I noticed he took a moment to believe me and he didn't ask someone ahead of me that was also walking.

Finally, I suspected that this may be food related. I didn't plan on taking anything but salt pills and aid-station water which had worked fine for long runs in the past. There was an emergency GU in my pocket but I did not want to resort to that if I didn't have too. I started accepting gatorade and eventually bananas and even went back for seconds a few times. By mile 19, I was starting to feel a little less mental fatigue but all the walk breaks and cold had told my body, "ok, we are basically done." I then walked for half a mile trying to just focus on gradually increasing steam. I was hoping I could build up a little momentum and start into a jog that would eventually become a run again. I started an easy pace and decided that I had to hold it for a mile to see what happened.

I walked again for a brief period, started up and then it happened; My right calf started knotting up when I would land/push off. No longer was it just a question in my mind about running vs not, my body was enforcing that I could not run. By this point I had gotten past the, "maybe I should drop" thoughts and graduated to, "if I have to walk 6 miles then so be it." My mental state was getting much better but my body worse which reinforced the fuel-related diagnosis.

What was nice about passing mile 20 is that others were also hitting a wall. I wasn't alone anymore! There were people passing and walking ahead. Some people walked around me and someone even passed and said, "go Ragnar!" since I had my oldest Ragnar shirt on. That managed to bring a smile to my face.

I walked for two miles and tried again. Nope! Not happening. Potassium doesn't come from mid-air but rather from aid-station volunteers. Finally, an aid station! Lots of gatorade, water to help dilute it, a salt pill and as many bananas as I could hold with a cup of water. I gave myself 10 minutes to let the salt be absorbed into my system and let the food ease its way into a safe place. "Ok, let's try running again," I thought to myself. Nope. Nope nope nope! One more time? Nope. Fine, little more walking, but damnit, I'm making it!

In the the final stretch, once I got over all the people passing me, I found new motivation. If I couldn't run, I was going to walk as fast as I could. Constant forward motion was my goal. I tried running a few times and after analyzing the knots trying to form in my calf I finally figured out that if I used my heel a little more and my forefoot a little less, I could jog. Normally, this is not how I optimize my run. I wear minimal footwear and heel strikes are considered dangerous. The shoes I was wearing had several millimeters of padding, however, so I figured I could get away with it.

I thought to myself, "I'm going to make it, no matter what I need to do." The light at the end of the tunnel was starting to twinkle through slowly and the only thing keeping me from shedding a few tears of happiness were well placed crowds of people cheering on runners. They probably wouldn't have noticed given the light but constant rain that was falling by now. Runners themselves were saying motivating things as they pass. Sometimes it was for themselves via a mantra to keep going, other times it was definitely aimed at me. Either way, it was inspiring.

Ok, new goal with new information: I'm going to run through the finish. Oh, someone has homemade sorbet! I stopped and asked briefly about flavors. I finally chose the lightly colored one and it was SO GOOD. If the person who made that and stood out in the cold to serve sorbet is reading this, "Thank you! Something about it made me feel instantly better." I jogged out of sight and the cramps came back. So close! Walk for a moment... then start up again. Nope. Ok, let's walk and conserve a little energy. I'm running through this finish no matter what.

One spectator had measured 0.2 miles beyond the 25 mile post. She was trying to inspire people to keep going since there was exactly one mile. Again, being a sucker for spectators, I played into it, got a fist bump and started running. Oops, slowly, right ... cramps. Ok, I can do a mile! I can always do a mile!

With the finish area in sight, both of my calves were starting to cramp up even with my modified run. I finally decided I didn't care and just did whatever I had to in order to keep my body moving forward through the finish. I vaguely saw the finish clock in so much as to recognize a "4" as the first number... which I was well aware of anyway. I crossed all three finish matts and was greeted by someone immediately after I stopped running. She was looking me very intently in the eyes and I was trying really hard to recognize this person. Did I know her? Was she part of the group I came with? She had a lot of stuff on to stay warm/dry but I finally figured out she was trying to assess my state. I mentioned some cramping and she mentioned some medical tents. Someone placed a medal around my neck while another placed a heat blanket around my shoulders. The medal is cool but, hands down, it was the heat blanket that won in my mind.

This was definitely not the marathon I planned to run but there is no way you could get me to change a thing about it for the world. My time was an hour slower than anticipated, my body still hurts and I'm pretty sure I'll have a few nagging injuries for a while but the memories will last a lifetime.

Tuesday, February 25, 2014

OpenStack and Glance Image Types vs Size

I have looked around for good information on images and support for sparse storage and compression. After experimenting a bit I decided to compile a few key results. There are two attributes I care about here: storage and network transfer.

In order for efficient storage of an image, the image must be as small as possible. If there is a run of no data on an image (free space that has been filled with zeros) then that should not reserve bits on the filesystem. The key term here is "sparse image." Once you transfer a sparse image to another host, the "sparse bits" actually get transmitted as long strings of zeros... thus you transfer 10 GB of data for a 10 GB sparsely allocated image with (say) 6 GB of actual filesystem data.

In order to be efficient for a network transfer, the image has to actually be small which ends up meaning compression. (eg: gzip/bzip2/...)

I used an LVM backed instance for my source data and in order to fill zeros into the free space of the block storage I ran this inside the instance:

dd if=/dev/zero of=/tmp/zeros bs=4M; rm /tmp/zeros; halt

This basically just wrote zeros into a file under /tmp, removed it and then halted the instance. You may need to use a different location on your host because /tmp is sometimes mounted as a different filesystem (eg: tmpfs.)

Now, my lv is ready to convert to different image types. I used qemu-img in order to read from the block device to create images of different varieties:

qemu-img convert -c -O qcow2 /dev/vg/kvm-woof.snap host-woof.compressed.qcow2
qemu-img convert -O qcow2 /dev/vg/kvm-woof.snap host-woof.qcow2
qemu-img convert -O vdi /dev/vg/kvm-woof.snap host-woof.vdi
qemu-img convert -O vmdk /dev/vg/kvm-woof.snap host-woof.vmdk

The differences between sizes somewhat surprised me:

root@os-ph1:/staging# ls -lh host-woof.*
-rw-r--r-- 1 root root 2.1G Feb 25 17:01 host-woof.compressed.qcow2
-rw-r--r-- 1 root root 6.3G Feb 25 16:52 host-woof.qcow2
-rw-r--r-- 1 root root 10G Feb 25 16:42 host-woof.raw
-rw-r--r-- 1 root root 6.9G Feb 25 16:49 host-woof.vdi
-rw-r--r-- 1 root root 6.3G Feb 25 16:49 host-woof.vmdk

A raw image is pretty much the worst thing you can do. It turns out that only the qcow2 format supports compression within the image itself. There are some downsides to qcow2 with performance but certainly if you are creating/destroying VMs a lot then you can save yourself some network bandwidth by transferring smaller images. Creating the compressed qcow2 image took roughly twice as long (no hard numbers here) as the uncompressed version. YMMV as not all data is created equal (or equally compressible).

It's also interesting to note that a compressed qcow2 image may slowly expand through use. New writes to the image may not be compressed.

-- EDIT --

I use LVM to back instances on hosts in OpenStack. It turns out that regardless of the format chosen, the whole (raw) image still needs to be written out to the logical volume. Unless you have really fast disks, this is by far the slowest part of the process. At least in my environment (with spinning rust) all I gain is some space on my image store and less overall used network bandwidth. Also, images are cached in a raw format on each node. The image also acts something like a sparse image:

root@os-ph12:/var/lib/nova/instances/_base# ls -lh abcca*
-rw-r--r-- 1 nova nova 10G Feb 25 21:05 abcca5bbe40c5b147f8a110bf81dab8bbb65db25
root@os-ph12:/var/lib/nova/instances/_base# du -h abcca*
6.2G abcca5bbe40c5b147f8a110bf81dab8bbb65db25

Tuesday, January 7, 2014

"Unavailable console type spice" -- Another OpenStack Error

Tonight, I found myself trying to implement the spice proxy out of curiosity in havana. I found a nice post that talked about how to do this after banging my head on the wall for a while: http://joshrestivo.com/?p=32

Anyway, this lead to getting "console is currently unavailable. Please try again later. Reload" when I tried to load a console of an already running host. I dove into the stack trace I received in the nova-compute logs:

2014-01-07 22:41:51.402 5706 ERROR nova.openstack.common.rpc.amqp [req-4ad10aaf-60c0-4f88-8964-cb3f6dd06814 6c978326923a4fa997a6a83b3fdbd11e 47eedd8414b84466a731289a5d6dee35] Exception during message handling
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 461, in _process_data
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp **args)
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/common.py", line 439, in inner
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp return catch_client_exception(exceptions, func, *args, **kwargs)
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/common.py", line 420, in catch_client_exception
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp return func(*args, **kwargs)
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 90, in wrapped
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp payload)
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 73, in wrapped
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 271, in decorated_function
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp e, sys.exc_info())
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 258, in decorated_function
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3579, in get_spice_console
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp connect_info = self.driver.get_spice_console(instance)
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2199, in get_spice_console
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp ports = get_spice_ports_for_instance(instance['name'])
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2197, in get_spice_ports_for_instance
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp raise exception.ConsoleTypeUnavailable(console_type='spice')
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp ConsoleTypeUnavailable: Unavailable console type spice.
2014-01-07 22:41:51.402 5706 TRACE nova.openstack.common.rpc.amqp
2014-01-07 22:41:51.404 5706 ERROR nova.openstack.common.rpc.common [req-4ad10aaf-60c0-4f88-8964-cb3f6dd06814 6c978326923a4fa997a6a83b3fdbd11e 47eedd8414b84466a731289a5d6dee35] Returning exception Unavailable console type spice. to caller

and the final hint was a comment in the source:

# head -2197 /usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py | tail -3

# NOTE(rmk): We had Spice consoles enabled but the instance in

# question is not actually listening for connections.

raise exception.ConsoleTypeUnavailable(console_type='spice')

Turns out that running VMs have VNC or SPICE nailed into their configuration:

root@os-ph12:/# EDITOR=cat virsh edit instance-000000a1 | egrep 'graphics.*(spice|vnc)'

root@os-ph12:/# EDITOR=cat virsh edit instance-000000a0 | egrep 'graphics.*(spice|vnc)'

thus the VMs needed to be rebuilt on the host with spice.

Friday, September 13, 2013

OpenStack Error Reporting

OpenStack is a conceptually neat piece of software. It's very distributed and fault-tolerant and loosely coupled. That also means that getting back meaningful errors can be a tough process when something goes wrong.

Today, I upgraded my development cluster from Grizzly to Havanah. I spent several hours understanding new and changed config options throughout the landscape and updated my puppet definitions to spread these new changes across the nodes. I finally decided to go home and get some food before attempting to finish debugging some of the errors coming up from various nodes. I've complained to colleagues in the past about error reporting and how non-trivial it is to trace things through OpenStack. Well, here is a prime example.

In Horizon (the Web-GUI for mortals to use OpenStack), I tried creating an instance. It simply said it couldn't do it and left a cryptic error with a uuid listed. I'm used to this by now so I jump to the error logs in the apache log for Horizon. Not much there either. Hmm ... well, I just upgraded *everything* so where do I start? How about a compute node... after all, that's where the instance is supposed to start. Let's see if anything is going on there. Probe around a little and notice that nova-compute isn't actually running. Simple! Let's just start that up. Oh wait, the daemon is still not running. Now I have something in my error logs:

2013-09-13 19:07:08.959 4504 ERROR nova.openstack.common.threadgroup [-] Remote error: UnsupportedRpcVersion Specified RPC version, 1.50, not supported by this endpoint.
[u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 430, in _process_data\n rval = self.proxy.dispatch(ctxt, version, method, **args)\n', u' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 138, in dispatch\n raise rpc_common.UnsupportedRpcVersion(version=version)\n', u'UnsupportedRpcVersion: Specified RPC version, 1.50, not supported by this endpoint.\n'].
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup Traceback (most recent call last):
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/threadgroup.py", line 117, in wait
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup x.wait()
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/threadgroup.py", line 49, in wait
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup return self.thread.wait()
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 168, in wait
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup return self._exit_event.wait()
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup return hubs.get_hub().switch()
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup return self.greenlet.switch()
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 194, in main
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup result = function(*args, **kwargs)
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/service.py", line 65, in run_service
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup service.start()
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/service.py", line 156, in start
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup self.manager.init_host()
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 752, in init_host
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup context, self.host, expected_attrs=['info_cache'])
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/objects/base.py", line 90, in wrapper
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup args, kwargs)
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/conductor/rpcapi.py", line 507, in object_class_action
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup return self.call(context, msg, version='1.50')
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/proxy.py", line 126, in call
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup result = rpc.call(context, real_topic, msg, timeout)
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/__init__.py", line 140, in call
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup return _get_impl().call(CONF, context, topic, msg, timeout)
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py", line 824, in call
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup rpc_amqp.get_connection_pool(conf, Connection))
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 539, in call
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup rv = list(rv)
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 504, in __iter__
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup raise result
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup RemoteError: Remote error: UnsupportedRpcVersion Specified RPC version, 1.50, not supported by this endpoint.
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 430, in _process_data\n rval = self.proxy.dispatch(ctxt, version, method, **args)\n', u' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 138, in dispatch\n raise rpc_common.UnsupportedRpcVersion(version=version)\n', u'UnsupportedRpcVersion: Specified RPC version, 1.50, not supported by this endpoint.\n'].
2013-09-13 19:07:08.959 4504 TRACE nova.openstack.common.threadgroup
2013-09-13 19:07:08.962 4504 DEBUG amqp [-] Closed channel #1 _do_close /usr/lib/python2.7/dist-packages/amqp/channel.py:88

Yup, clear as mud. I try the usual, "plug this into a popular search engine and see who else is getting the error." Nobody! Wow, a brand new one! Woohoo... Happy Friday. Fine, this is open source and written in python, let's just dive into the code a little and see where the culprit is. Aha! This is an error we get when we don't know what is going on. The last case(in fact the last line) of the file.

So, let's try a little harder. We are dealing with RPC and we are getting a version issue. Something... Somewhere... doesn't like talking to me. Maybe it's the message server itself? *poke around with that* Other things seem fine, no errors to report there. Let's run nova-compute in debug mode in the foreground and see if we can glean anything about where it is failing. Hmm, one of the last things it reports on is talking to nova-conductor. Let's have a look at nova-conductor logs ... WOAH! That same error message! That's right, nova-conductor is *returning* this error for some reason:

2013-09-14 02:27:19.394 ERROR nova.openstack.common.rpc.amqp [req-cd3c0d15-fa20-486a-8935-2ed5fec76f7d None None] Exception during message handling
2013-09-14 02:27:19.394 17496 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-09-14 02:27:19.394 17496 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 430, in _process_data
2013-09-14 02:27:19.394 17496 TRACE nova.openstack.common.rpc.amqp rval = self.proxy.dispatch(ctxt, version, method, **args)
2013-09-14 02:27:19.394 17496 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 138, in dispatch
2013-09-14 02:27:19.394 17496 TRACE nova.openstack.common.rpc.amqp raise rpc_common.UnsupportedRpcVersion(version=version)
2013-09-14 02:27:19.394 17496 TRACE nova.openstack.common.rpc.amqp UnsupportedRpcVersion: Specified RPC version, 1.50, not supported by this endpoint.
2013-09-14 02:27:19.394 17496 TRACE nova.openstack.common.rpc.amqp
2013-09-14 02:27:19.397 ERROR nova.openstack.common.rpc.common [req-cd3c0d15-fa20-486a-8935-2ed5fec76f7d None None] Returning exception Specified RPC version, 1.50, not supported by this endpoint. to caller
2013-09-14 02:27:19.399 ERROR nova.openstack.common.rpc.common [req-cd3c0d15-fa20-486a-8935-2ed5fec76f7d None None] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 430, in _process_data\n rval = self.proxy.dispatch(ctxt, version, method, **args)\n', ' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 138, in dispatch\n raise rpc_common.UnsupportedRpcVersion(version=version)\n', 'UnsupportedRpcVersion: Specified RPC version, 1.50, not supported by this endpoint.\n']

I guess this should have been clear given all of the blatant references to nova-conductor. In fact, I was able to find the term "conductor" once in the backtrace which is pretty good for this kind of problem. So, this *must* be the source of my issue since this is where the error is being generated. Well, tonight, I got lucky. A package wasn't updated and doing a dist-upgrade to pull in matching packages fixed the issue. However, this is a scary place to be:

Upgrading one component (or not upgrading another) can cause components to completely fail on remote nodes. If I had 100 nova-compute nodes and a single node with an outdated nova-conductor running, presumably the 100 nodes would have crashed nova-compute daemons.

The take-away here is: make sure all components across the board are the same version for your cloud. The CLI commands seem to be able to accept a little bit of version lag but the internal components have issues with different versions running.

The less scary part in all of this is that I'm managed to keep my previously-started instances running because kvm processes were not directly affected. If I had customers that needed to start/stop processes, this would have been not fun. Of course, that's why we all keep a complete development/testing openstack cluster around, right?

Monday, August 26, 2013

OpenStack is ready for prime (test) time -- quantum/neutron rant

I've been spending the last year poking and prodding OpenStack in hopes of having a management layer for the various hosts with instances throughout my data center. OpenStack has so many neat little bells and whistles and ... green developers.

My DC is pretty well established and so I need to be careful about introducing new technology. I'm amused at the toy that is devStack. It seems to be the only streamlined way to install a cloud. The other way is to use pre-established puppet templates that make assumptions about how your hardware is setup. My use-case is covered by neither one so I get the third option: install everything and write my own puppet templates. This isn't even my real problem with OpenStack. My real problem is that there is no way to accomplish basic tasks that you would expect in migrating your infrastructure and while the documentation is there and pretty, it's not always clear.

Let's dive into my infrastructure for a moment. I run hosts with link aggregation and vlans. There are no unused ethernet ports on the back of my hosts thus there is no "management port." The management network is simply another vlan on that same trunk. Between trying to figure out whether or not to use quantum and some plugin under quantum or nova-network with one of its plugins (and which version of OpenStack to go along with that decision) I was flailing miserably trying to figure out how OpenStack would allow me to manage infrastructure in the same way: link-aggregation and vlans. There is a VLAN Manager plugin and there is an Open vSwitch plugin ... both of which seemed to be promising. After hosing my entire network with the Open vSwitch plugin (it bridged all of my vlans together by connecting them directly to br-int, something I still don't understand) I knew I had to hold this entire project with kid-gloves.

Finally, I found out that Quantum and the LinuxBridge plugin would do what I needed. That was quite a relief once I finally figured out the terminology used in the docs are not what a system-administrator with a networking bent would expect to see. Ok, time to get dirty! I can bring up VMs, I've got my image service (glance) running on top of my distributed object store (swift) and it's all authenticated via a central/replicated authentication service (keystone with mysql db.) Wow, I can create a tiny little subnet with 3 IPs for testing. Ok, let's bring up a VM ... and then another. Oops, I need more IPs! No problem, there seems to be a "quantum subnet-update" command! Oh, hmm ... it won't let me update the allocation pool. Alright, let's remove the allocation and replace it with a larger one. No dice, IPs are allocated. How about adding another IP allocation pool to the same subnet definition? Nope, can't have the same subnet CIDR declared for two allocation pools.

This is a *huge* problem if you are migrating hosts from a more than half-filled subnet into OpenStack. I guess it's not really a problem if you just have toy VMs on a toy network but I'd really like to think that something with this much effort put into it could be put into at least a semi-production environment.

A few other *big* problems with OpenStack:

* logging - errors show up anywhere/everywhere except where you are looking
* meaningful errors - many errors show up like, "couldn't connect to host" ... which host?!
* stable interfaces - configuration files, configuration terms, backend daemons, ... all change between releases of OpenStack.
* decent FAQ with answers - however, there are many launchpad bugs/discussions and IRC

If, after a year, I can't figure out how to get this thing safely into my infrastructure, I seriously doubt my abilities. Well, that is, until I realized that I wasn't the only one with serious fundamental problems with the architecture fitting into my current network.

What it seems to come down to is, either I need to significantly change the way I do things or I need to not do OpenStack.

Thursday, December 15, 2011

execve under OSX Snow Leopard

Background:

A colleague and I are developing a piece of code (called pbsmake) in python that ends up interpreting something that looks like a Makefile. We will use pbsmake to help distribute jobs to a local scheduler (each makefile target gets its own job) but we found that we may want to use the pbsmake interpreter as a shell interpreter itself so we can simply execute certain commands (which are really like makefiles) with a target.

Most of our development has been under GNU/Linux and this works just fine. However, as soon as we do this under OSX Snow Leopard, the top-level makefile starts being executed as if it were a bash script.

How to replicate:

create a simple interpreter that is a shell script itself:

/Users/imoverclocked/simple-interp.sh:

#!/bin/cat
This is my simple interpreter, it simply spits the file onto STDOUT

create a script that uses this interpreter:

/Users/imoverclocked/simple-script.sh:

#!/Users/imoverclocked/simple/interp.sh
This is a simple script that is interpreted (really, just spit out by cat)

try and execute the simple-script.sh

./simple-script.sh

Badly placed ()'s.

change the interpreter to first invoke /usr/bin/env as a work-around

/Users/imoverclocked/simple-script.sh:

#!/usr/bin/env /Users/imoverclocked/simple/interp.sh
This is a simple script that is interpreted (really, just spit out by cat)

executing the script now gives the same output as on other unix-like architectures.

$ ./simple-script.sh

#!/bin/cat

This is my simple interpreter, it simply spits the file onto STDOUT

#!/home/pirl/tims/sh/simple-interp.sh

This is a simple script that is interpreted (really, just spit out by cat)

My guess to what is happening is that execv* can't handle the case where a script calls a script to interpret a script. One work-around is to use /usr/bin/env in the "simple-script.sh" header which works since /usr/bin/env is a binary executable which then invokes the script in it's own execvp call.

Apparently this works under Lion but we can't quite make the plunge across our infrastructure yet. Hope this helps someone else out there! Maybe Apple is listening?

Wednesday, September 14, 2011

Getting ethernet bonding to work under Debian

I have a small cluster of different hardware running Debian squeeze/wheezy and several machine have been really simple to setup with bonding (aka: trunking/link aggregation/port trunking/ ... the list goes on) using mode=4 or "IEEE 802.3ad Dynamic link aggregation". My final clue to solving the problem was some dmesg output:

[   14.777381] e1000e 0000:06:00.1: eth0: changing MTU from 1500 to 9000
[   15.072476] e1000e 0000:06:00.1: irq 80 for MSI/MSI-X
[   15.128059] e1000e 0000:06:00.1: irq 80 for MSI/MSI-X
[   15.129468] ADDRCONF(NETDEV_UP): eth0: link is not ready
[   15.129473] 8021q: adding VLAN 0 to HW filter on device eth0
[   16.290994] e1000e 0000:06:00.0: eth1: changing MTU from 1500 to 9000
[   16.586584] e1000e 0000:06:00.0: irq 79 for MSI/MSI-X
[   16.640053] e1000e 0000:06:00.0: irq 79 for MSI/MSI-X
[   16.641411] ADDRCONF(NETDEV_UP): eth1: link is not ready
...
[   20.530343] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
[   20.530350] bonding: Warning: either miimon or arp_interval and arp_ip_target module parameters must be specified, otherwise bonding will not detect link failures! see bonding.txt for details.
[   20.710398] bonding: bond0: Adding slave eth0.
[   20.710415] e1000e 0000:06:00.1: eth0: changing MTU from 9000 to 1500
[   21.006430] e1000e 0000:06:00.1: irq 80 for MSI/MSI-X
[   21.060058] e1000e 0000:06:00.1: irq 80 for MSI/MSI-X
[   21.061374] 8021q: adding VLAN 0 to HW filter on device eth0
[   21.061445] bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full.
[   21.061462] bonding: bond0: enslaving eth0 as an active interface with an up link.
[   21.242433] bonding: bond0: Adding slave eth1.

If I brought bond0 down/up again later I would get a working link. I tried adding "sleep" commands into the network init sequence to try and figure out if this was just some quiescent state that the NIC driver was in during initialization. This didn't help... so I finally read the warning about needing miimon/arp_interval/arp_ip_target for bonding to work. This was odd because my /etc/network/interfaces file looks like this:

iface bond0 inet manual
        bond-slaves eth0 eth1 eth2 eth3
        bond-mode 4
        bond-miimon 100
        bond-xmit-hash-policy layer3+4
        mtu 9000
        dns-search lpl.arizona.edu
        post-up ip link set $IFACE mtu 9000
        post-up sysctl -w net.ipv6.conf.all.autoconf=0
        post-up sysctl -w net.ipv6.conf.default.accept_ra=0

As it turns out, miimon is not set when the bonding driver is loaded. To solve this problem I created a new file /etc/modprobe.d/bonding with the following content:

alias bond0 bonding
options bonding mode=4 miimon=100

This fixes the issue of bonding not working on boot and should probably be the source of a Debian/Linux bug report.

Thursday, August 11, 2011

Blender 2.5 Compositing from Python

If you are like me then you hate doing the same thing over and over ... which means you like to automate these kinds of processes when it makes sense to do so. I wrote a Blender plugin that imports a HiRISE DTM directly into Blender 2.5x and allows you to easily make fly-throughs of the various Mars locations that a DTM exists for.

My work wants to make automated fly throughs with a trivial piece of compositing to place a foreground and background image into the blender scene. I came across very few working examples or documentation on how to use the compositing part of Blender from python so ... here we go! I'll simply place the final example code here with comments and describe each section a little more in depth below.

# 1) Use compositing for our render, setup some paths
bpy.context.scene.use_nodes = True

fgImageLoc = "/path/to/foreground.tiff"

bgImageLoc = "/path/to/background.tiff"

# 2) Get references to the scene
Scene = bpy.context.scene
Tree = Scene.node_tree
Tree.links.remove( Tree.links[0] )

# 3) The default env will have an input and an output (Src/Dst)
Src = Tree.nodes["Render Layers"]
Dst = Tree.nodes["Composite"]

# 4) Let's create two groups to encapsulate our work
FG_Node = bpy.data.node_groups.new(
"ForegroundImage", type='COMPOSITE')
BG_Node = bpy.data.node_groups.new(
"BackgroundImage", type='COMPOSITE')

# 5) The foreground group has one input and one output
FG_Node.inputs.new("Source", 'RGBA')
FG_Node.outputs.new("Result", 'RGBA')

# 6) The foreground node contains an Image and an AlphaOver node
FG_Image = FG_Node.nodes.new('IMAGE')
FG_Image.image = bpy.data.images.load( fgImageLoc )
FG_Alpha = FG_Node.nodes.new('ALPHAOVER')

# 7) The Image and the Group Input are routed to the AlphaOver
# and the AlphaOver output is routed to the group's output
FG_Node.links.new(FG_Image.outputs["Image"], FG_Alpha.inputs[2])
FG_Node.links.new(FG_Node.inputs["Source"], FG_Alpha.inputs[1])
FG_Node.links.new(FG_Node.outputs["Result"], FG_Alpha.outputs["Image"])

# 8) Add foreground image compositing to the environment
newFGGroup = Tree.nodes.new("GROUP", group = FG_Node)

# 9) Route the default render output to the input of the FG Group
Tree.links.new(newFGGroup.inputs[0], Src.outputs["Image"])

# 10) The background group has one input and one output
BG_Node.inputs.new("Source", 'RGBA')
BG_Node.outputs.new("Result", 'RGBA')

# 11) The background group contains an Image and AlphaOver node
BG_Image = BG_Node.nodes.new('IMAGE')
BG_Image.image = bpy.data.images.load( bgImageLoc )
BG_Alpha = BG_Node.nodes.new('ALPHAOVER')

# 12) Create links to internal nodes
BG_Node.links.new(BG_Image.outputs["Image"], BG_Alpha.inputs[1])
BG_Node.links.new(BG_Node.inputs["Source"], BG_Alpha.inputs[2])
BG_Node.links.new(BG_Node.outputs["Result"], BG_Alpha.outputs["Image"])

# Add background image compositing, similar to 8/9
newBGGroup = Tree.nodes.new("GROUP", group = BG_Node)
Tree.links.new(newBGGroup.inputs[0], newFGGroup.outputs[0])
Tree.links.new(newBGGroup.outputs[0], Dst.inputs["Image"])

When you run this you will end up with a pipeline that looks like this:

The rendered scene outputs to the foreground image group which then outputs to the background image group which then outputs to the specified file path in Blender. Each group is a composite of primitives in blender. When expanded (via selecting the group and pressing Tab) you will see this:

The group input and Image outputs are routed into the Alpha Over image. The Alpha Over output is routed into to the groups output. This will overlay the Image node onto the scene. A similar setup is produced for the background image.

Here is a slightly more detailed breakdown of the script:

Tell blender that we have a special compositing setup

also, store info about where our foreground/background images are kept

Blender's environment is not empty by default.

Get a reference to it
Clear the link between the Render Layer and the Composite output

Get a reference to the default nodes for later use

Src is the source for rendered content from the scene
Dst is the node that takes an output and generates a file (or preview)

To simplify our compositing graph, create two groups to encapsulate each function
The group we just created has one input and one output
Create two nodes

Image - acts as an output with a static image
AlphaOver - overlays two images using the alpha channel defined in the second image

Create links between nodes.

It's easier to see these in the image above.

Instantiate the new group in the compositing environment

This is where I was a little lost, the group needs to be instantiated and then a new object is returned. The input/output of the returned object are the external input/output ports of the group. If you use the previous object you will make bad connections to the internal structures of the group. Don't do it!

Connect the rendered image to the foreground group input.
through 12. are pretty much the same as 5. through 9.

Different connections make the image a background image instead of a foreground image

Thanks to Uncle_Entity and Senshi in #blendercoders@freenode for fixing my initially poor usage.

Here a few links that use the compositing above. Notice that the text hovers above the DTM while the background ... stays in the background: