Reducing OpenSolaris ramdisk greed

Many will be knowing that all the distros of OpenSolaris use a ramdisk when booting off a livecd. In fact a ramdisk is used also during normal boot off harddisk. This is called a boot archive (/platform/i86pc/boot_archive for eg.) for a normal harddisk boot and miniroot or microroot in case of livecd.

These are basically compressed filesystem images that get decompressed and loaded into RAM by Grub. The kernel loads basic essential modules from this image in RAM till it is ready to access the real root filesystem on disk and continue with normal boot. In case of a livecd it keeps on using this pseudo RAM-disk as it’s root filesystem, since root filesystem must be read-write.

The ramdisk size on OpenSolaris has been growing quite large due to the kernel itself and system libraries growing in size/number. The livecd ramdisk is quite a bit larger than the boot archive.  In addition the ramdisk on SPARC is particularly big as RISC binaries are larger. There have been techniques introduced to tackle this for the standard Solaris install miniroot and the boot archive. One of them is Dcfs that I blogged about earlier. The other one is individual files compressed with Gzip inside the boot archive. These are filesystem-level changes that have introduced a couple of layers in the kernel to deal with different situations. But the livecd ramdisk size is still big and using Dcfs for this purpose has issues (dcfs is read-only) as discussed in this thread: http://mail.opensolaris.org/pipermail/caiman-discuss/2008-December/008263.html

The livecd ramdisk size also affects BeleniX and it uses various techniques to reduce the nunber of files included, but it is still an issue. Seeing the above discussion thread it recently occurred to me that moving compression to the ramdisk module makes it transparent and avoids multiple different filesystem-level implementations. The same lofi compression approach used to compress livecd contents can be used but it must be tweaked to make it read-write. One easy way to allow writes is to do a per-segment copy on write since the image is broken up into fixed-size segments and each segment individually compressed. Since this is a ramdisk, new memory can be allocated the first time a segment is written to. The segment is uncompressed and stored there and all subsequent access to this segment goes to this memory block. A different index holds pointers to these alternate segments or 0 otherwise.

However changing the ramdisk module alone is not enough. The ramdisk module is not available very early in boot. The kernel runtime linker at this stage is tasked to assemble essential modules for the kernel to function normally. It thus uses a very simple ramdisk reading rig along with very basic filesystem support. See here: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/krtld/bootrd.c and here: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/common/fs/. Incidentally you will notice a decompress.c file that implements decompression of Gzipped files in the boot archive. These things (along with Zlib code) get complied into the kernel binary (/platform/i86pc/unix for 32-bit x86). If we are using a compressed ramdisk then these places need to be changed. Note also that Grub places the initial ramdisk in contiguous physical memory which makes the diskread function in bootrd.c very simple.

Taking advantage of a bunch of holidays I started hacking on this and I am pleased to note that I have a working implementation now. That is the kernel is able to boot off a lofi-compressed ramdisk and a writable compressed livecd ramdisk also works. Making changes to the ramdisk module was fairly easy and it took me a day to get a working read-write implementation. Changing bits of the early boot code in the guts of the kernel was much more tricky and took me about 6 days. Problems at this early stage are hard to debug since kmdb itself is a module and is not yet loaded leaving only printf debugging. But too much of printf output is also bad as they scroll off the screen! Memory corruptions result in an “Unexpected trap” message and backtraces do not appear most of the time. There was a performance issue as well mostly caused by zlib’s inflate being called too many times. I resolved this by adding a minimal LRU cache for holding uncompressed segments. Another thing I had to tackle was caused by the way pointers directly into the ramdisk area are doled out to support so-called “cached” reads. This does not work for a compressed ramdisk.

I have placed a tarball here: http://www.belenix.org/binfiles/cramdisk.tar.gz . This tarball contains all the modified source code and pre-built binaries. One can test the ramdisk module by following instructions that I emailed to caiman-discuss. Testing the modified kernel can be done by folks creating livecds. These changs are against B104. This stuff is initial code and I am still testing it. I have found that using a 64K segment size gives a good balance between compression effectiveness and performance overhead.

This approach has advantages. It results in a smaller boot archive than can be had by current approaches. Since this is filesystem-agnostic, 2 different filesystem-level implementations (Dcfs and Gzip) are not needed. Grub loading time from livecd is reduced by orders of magnitude since Grub does not have to decompress. Finally it mitigates the livecd ramdisk size issue where using Dcfs is problematic.

Advertisements

5 thoughts on “Reducing OpenSolaris ramdisk greed

  1. Pingback: KDE 4.3.1 Available on BeleniX « The Pseudo Random Bit Bucket

  2. 83597

    Simply wanted to let you know that I found your blog on and while
    I appreciated checking out your post, it appears your blog
    acts up in a couple browsers. If I use Firefox, it comes up okay, but
    if I use Chrome, it comes up appearing overlapped and off-kilter.

    Just so you know.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s