Tag Archives: technology

Boot Environments on Linux

I have been busy working on a bunch of exciting tech last few years with very little time for anything else apart from family. The typical startup grind. However, better late than never, I found a bit of opportunity to write up on something I have been hacking on (along with couple of others) and have been put out in open-source.

We have seen the concept of Boot Environments initially with ZFS on Solaris. Because the filesystem supports snapshotting and cloning, it is possible to create another virtual copy of the entire root filesystem as a clone of the currently booted root. It is then possible to mount that clone, upgrade packages inside that prefix and finally reboot into the upgraded clone after adding a grub entry for it. There are several advantages of this approach.

  • The operation of cloning the filesystem is instantaneous due to the Copy-On-Write design. There is no need to perform a costly copy of the entire root filesystem image.
  • During the upgrade the box is up and running and in production use because the currently booted environment remains untouched. The only downtime is when we reboot.
  • Since the currently booted environment is untouched the upgrade is very safe. Any failures or breakage during upgrade would mean destroying the clone and starting again from the beginning, in the worst case.
  • If we discover issues after booting into the updated clone, we can easily reboot back to the previous working Boot Environment.
  • Because of the COW design the upgraded clone only occupies space for the packages that were actually upgraded. Unchanged package content is shared with the parent dataset of the clone. This un-duplication is space efficient.
  • It is possible to have multiple boot environments on the box without having to worry about partitioning.
  • Because of the pooled storage design all datasets or clones share the same pooled storage space avoiding hard partitioning and fixed space allocation headaches.
  • Multiple boot environments are *extremely* useful in a testing and development setup where multiple software releases can reside on the box. To select a certain release for testing or experimenting, just boot into the appropriate BE (Boot Environment). Messing around is also easy. Create another BE from the current one. Boot into it and mess to your heart’s content. The original BE remains safe as long as you do not screw the disk or the bootloader of course!

I can go on some more but lets draw a line here. Now, we wanted to get the same stuff on Linux. It is possible given we have the nice beast called Btrfs. Btrfs, till sometime back has been controversial and criticised quite a bit. However, I noticed that it has been maturing of late. Number of serious issues have gone to negligible. Many fixes and improvements have come. All of the rants I found via Google were at least couple of years back and mostly were older than that. This gave us the confidence to start testing it and eventually use it in the products my employer sells. We did have to go through a learning curve getting to grips with the nitty gritties and idiosyncracies of Btrfs. It created a bit of initial teething troubles and deployment issues but it was manageable.

I looked around to see if the BE capability already existed but found none. I came across things like apt-snapshot or Snapper which are similar but not quite the same. Our hard requirement was that upgrade must not touch the running environment. So, in the end, we came up with our own scripts to implement the BE feature.

Since our Linux environment is based off Ubuntu the root subvolume is ‘@’. We then create a writable snapshot of ‘@’ as our initial root subvolume and that becomes our first BE. Subsequent upgrades creates writable snapshots of the current booted subvolume. In addition, the ‘@’ subvolume is always mounted under /.rootbe in our environment and all the BE subvolumes and mounted under it including the currently booted one which is also mounted at ‘/’ obviously.

Btrfs has the concept of a default subvolume, however we do not change that. Rather, we just use the ‘rootflags=subvol=…’ parameter. This allows us to have the primary grub menu in a single place and access always via /.rootbe/boot/grub/grub.cfg.

The entire show is managed via two shell scripts. One to do the BE management (create, delete, list etc.) called ‘beadm’ and one to upgrade the current environment by creating a new BE called ‘pn-apt-get’. Both of them are available at this url: https://github.com/PluribusNetworks/pluribus_linux_userland/tree/master/components/bootenv-tools

The ‘pn-apt-get’ script creates a new BE and runs ‘apt-get dist-upgrade’ inside a chroot. It assumes that the ‘sources.list’ has been setup properly.

In addition to all this I wanted to optimize the space used by having dpkg Not replace files in a package being upgraded if the new file is identical to the one already installed. I needed to add a small Dpkg patch to achieve this: https://github.com/PluribusNetworks/pluribus_linux_userland/blob/master/components/dpkg-ubuntu/debian/patches/skip_unchanged.patch

All this allows us to do safe, fast, in-production upgrades and only incur downtime during the reboot.  One quirk I had to deal with was BE space usage reporting. I had to turn on the ‘quota’ feature of Btrfs to get accurate space accounting even though I do not use quotas in practice. This also meant that I hit a couple of obscure quota bugs (especially after subvolume delete) in the 4.4 Ubuntu Xenial kernel release that we have been using (logistics issues). To work around I found it sufficient to do a periodic “quota rescan” every few hours.


Speaking @ NIT Trichy

I will be making my second visit to NIT Trichy to conduct a hands-on

Math - KP

Math – KP

workshop on Statistical Techniques in Technology along with our Math whiz-kid Kapileshwar Singh(or KP for short). This is part of the Pragyan 2013 Tech Fest. I will be playing an assistant role, trolling on the usual random numbers and crypto stuff along with helping KP in demonstrating self-similarity or autocorrelation in network traffic, stochastic modeling concepts etc.

My past visit to NIT Trichy during Pragyan 2012 was to give a Tech Talk on Vector Computing, Supercomputers and GPGPUs. I had also delivered an expanded version of this talk at the Great Indian Developer Summit last year. NIT Trichy is one of the top tier engineering colleges and seems to be chock full of bright geeks. This year’s pragyan has some exciting stuff lined up.

NetApp advanced Data Compression (from BeleniX?)


I came across this article on a so called advanced compression feature in OnTap: https://communities.netapp.com/docs/DOC-14329. There is a bunch of marketing fluff in the beginning of the article till you come to the second section where the real technique is illustrated. This is a simple idea being drummed up a bit. There is nothing special compared to what I had implemented for BeleniX and the OpenSolaris livecd years back. It is the very same feature.

Don’t believe me? Head over to my livemedia slides: http://www.osdevcon.org/2007/slides/osol_livemedia.pdf (slide 13). This is the presentation I delivered at the First OpenSolaris Developer Conference in 2007 in Berlin. It was hosted by the good folks of the  German Unix User’s Group.


At the high level there are but a couple of differences with the NetApp implementation. The OnTap version of course provides write capability which I did not implement since I was dealing with CDROMs. OnTap uses a compressibility threshold of 25% while I used 12% in the livecd case.

When I wanted to leave SUN and was looking for a change I had interviewed briefly with NetApp but decided not to pursue it for the sake of BeleniX at that time (OpenSolaris+ZFS was NetApp competition). However I had put all details of my CDROM Filesystem I/O scheduler and transparent compression implementation with URLs in my resume.Who knows if someone in NetApp decided to pull the good ideas from that into their product – ha.

Beebdroid on the Nexus 7

For people who know what a BBC Micro is, Beebdroid is an awesome application. For the others who are not aware, the 6502 processor at the heart of the micro is the predecessor to all ARM processors inside today’s mobile devices. Acorn Computers had worked jointly with the British Broadcasting Corporation to produce various iterations of this microcomputer. Eventually Acorn went on to release the first 32-bit ARM1 processor in 1985.

So it is fitting that the BBC Micro emulator be available on today’s ARM devices. My sister recently gifted me a Nexus 7 (yay, she is the greatest sis) and I am happily playing with BBC BASIC and all the retro games on it thanks to Beebdroid. I can’t forget the little box that I grew up with.