Category Archives: BeleniX

RIP OpenSolaris.ORG

logo Belenix

 

Just noticed the site update today at http://www.opensolaris.org/. The website will be decom-ed after March 4th. This is the final symbolic nail in the coffin of what myself and many others I know had worked very hard for and shown great passion. The nail was driven long back with the Oracle acquisition of SUN, soon the post-mortem report will be filed in some dusty corner of the great hall of Oracle. It was simply a fantastic OS platform but a great example of Free and OpenSource software done wrong, completely wrong by clueless management who would not listen to people who have lived and breathed OpenSource for many years.

 

I still remember the churning and frustration I felt inside when SUN launched OpenSolaris (http://desktoplinux.com/news/NS2543956200.html) without even a passing reference to BeleniX on which it was built. The first release of OpenSolaris was simply BeleniX outfitted with new clothes. I had toiled hard, very hard to bring all the distro and livecd technologies to the OpenSolaris platform. Given it was OpenSource and a labor of love, but when you take something in it’s entirety and completely ignore the creator/origin, it is equivalent to stealing. Forget about mentions in public, I did not even get any recognition within the company (apart from a few bones in the form of a few $$ being thrown) for all that work that made one of their flagship OS platform releases possible. The vitriol that had had burned me from inside is hard to describe. It was at that instant I had decided on leaving SUN where once I had planned to build a long-term career.

 

People doing hard work, good work and not getting recognized within organizations typically happens due to lack of visibility of their work outside of their silos. All of SUN knew about my work till the CEO and still nothing. I was completely and thoroughly sidelined while other teams did launches and launch parties. I never quite understood what was my fault. Or was it politics?

 

This is now long past and I was over that hump by the end of 2008. However it still remains as the most bitter memory of my life till date (apart from my father losing both his kidneys). Fortunately the work environment I went into after leaving SUN is quite different, in a positive way and is far from these issues.

 

Anyway, may you rest in peace OpenSolaris.

Deutsch: Logo von OpenSolaris als Vektorgrafik

Deutsch: Logo von OpenSolaris als Vektorgrafik (Photo credit: Wikipedia)

NetApp advanced Data Compression (from BeleniX?)

Belenix_logo

I came across this article on a so called advanced compression feature in OnTap: https://communities.netapp.com/docs/DOC-14329. There is a bunch of marketing fluff in the beginning of the article till you come to the second section where the real technique is illustrated. This is a simple idea being drummed up a bit. There is nothing special compared to what I had implemented for BeleniX and the OpenSolaris livecd years back. It is the very same feature.

Don’t believe me? Head over to my livemedia slides: http://www.osdevcon.org/2007/slides/osol_livemedia.pdf (slide 13). This is the presentation I delivered at the First OpenSolaris Developer Conference in 2007 in Berlin. It was hosted by the good folks of the  German Unix User’s Group.

comp_index

At the high level there are but a couple of differences with the NetApp implementation. The OnTap version of course provides write capability which I did not implement since I was dealing with CDROMs. OnTap uses a compressibility threshold of 25% while I used 12% in the livecd case.

When I wanted to leave SUN and was looking for a change I had interviewed briefly with NetApp but decided not to pursue it for the sake of BeleniX at that time (OpenSolaris+ZFS was NetApp competition). However I had put all details of my CDROM Filesystem I/O scheduler and transparent compression implementation with URLs in my resume.Who knows if someone in NetApp decided to pull the good ideas from that into their product – ha.

A CASCADE of Patches

I want to get Free-CAD working on BeleniX and have been going through the dependencies. One of them is OpenCASCADE that I started to build one week back. Since then it has been a tale of pain till finally once week later I do have a successful build.

Firstly the software is enormous having thousands of files. For eg. after a make install I find that it installs 15600+ header files! I started building it with GCC4.4.1 on BeleniX. Secondly their document mentions support for building with Sun Forte Compiler on Solaris 8 – primitive info. Obviously the combination of OpenSolaris platform + Mesa + Gcc 4.4.1 is untested. Once I started I came across some usual issues: The configure script assumes Sun Studio/Forte and has options not supported by Gcc, some headers assumed Sun Studio/Forte, needed proper declaration for bcopy, replace usage of ieee_handler with fex_set_handling  etc. After those I started coming across variable names like SS, CS that conflict with predefined macros for x86 register names on OpenSolaris. I have seen this on many occassions in KDE and other software in places.

However after manually patching about 15 files from 15 build failures, I started to wonder how many more. So to test I ran a simple command: find . -type f | xargs grep -w SS. Believe it or not there were hundreds of matches! From a hunch I started a round-robin search with all the possible register names and for the record the following are used: CS, SS, DS, GS, FS, ES, ESP, EIP in about 465 different files. The only option now was to whip up a simple shell script to do a global search and replace. The resulting generated patchset is huge and I am not keeping it as a patch! I have embedded the script in the pkgbuild spec file.

At the end of this all I found that the Makefile does not have 100% DESTDIR support in spite of it using the GNU autotools. So I had to patch Makefile.in and that resulted in a full build. After packaging I discovered a packaging issue and had to re-run make install. Even though the source tree is already built that resulted in another full build! Looks like broken Makefiles.

After an exasperating several days I do have a package. I had faced this usage of common variable names clashing in the namespace in several different software like for eg. in Celestia 1.6 that I built last week. It uses the obvious “sun” to represent Sol. This is however a predefined macro in Gcc on OpenSolaris. Granted that this can be worked around by using “-Usun”(Unsafe ?) in CFLAGS and OpenSolaris exposing register defines in headers by default looks like a bug, it is nevertheless a really, REALLY BAD IDEA to use obvious, common, short variable names in your software.

Ksysguard working on OpenSolaris

Ksysguard working on OpenSolaris

Ksysguard working on OpenSolaris

Anyone who might have tried the earlier KDE4.3 packages for BeleniX may have noticed that Ksysguard (CTRL+ESC or Kmenu -> Applications -> System -> System Monitor)  basically shows a blank slate. The process list is empty, CPU and Network stats are unavailable. The number of exposed sensors are too few.

I spent the last few days hacking on that component and got an initial working version that implements all the basic functionality for the OpenSolaris platform. There are still bugs to iron out and new sensors to add (using DTrace here can open up lots of possibilities). The current patch is here. The kdebase4-workspace package has been published into the BeleniX repository.

KDE 4.3.1 Available on BeleniX

The current 4.3.1 release of KDE is now available on BeleniX 0.8 Alpha. See this link for the details: http://www.belenix.org/content/KDE-431-now-available-BeleniX-08-Alpha. I have borrowed patches from the work of the KDE-Solaris team and Fedora Core 11 repository.

Currently 0.8 Alpha is only available via the network installer. We will be starting to work on building a LiveCD ISO soon. In addition 0.8 Alpha is based on OpenSolaris source drop for build 114. This will be updated to a more recent build. There are other things to look at like lofi bypass mode that should make it practical to use encrypted lofi on iSCSI targets – advantage being end-to-end encryption. Rework the ramdisk compression piece for the latest kernels and fix some oustanding corruption issue. Developer documentation for developing software on BeleniX, Hudson based bulk build setup for regular bulk builds on BeleniX repo, an installer written entirely in Python 2.6, A Gcc 4 build of Firefox with Profile Driven Optimization, Gcc build of OpenJDK on OpenSolaris, use RPM5 packaging with Smart Package Manager and lots of other stuff. One of the goals is use an Open Toolchain end-to-end. In that respect it is also important for us to look at a Gcc 4.2 build of the OpenSolaris kernel.

For me personally it is amazing to see how much BeleniX has progressed from the early days of a commandline-only ramdisk-only barebones kernel boot to single-user mode in an image assembled by hand. I manually went through and included individual files back in Sept 2005! Today some people may not realize it but BeleniX is a first-class OpenSolaris environment and a first-class KDE environment. People have been using it daily for months and it has been used in a multi-user build-server environment, like our build server in Moscow. Of course we face the problem of lack of developers, so developers are more than welcome!

In addition few people may know that the OpenSolaris distro from SUN owes it’s origin to BeleniX. Every technology that I developed for BeleniX during the 2.5 yrs prior to OpenSolaris-Distro coming out was used. In fact the first Beta release was based on BeleniX 0.4.1 with IPS and Caiman installer put in and KDE replaced by Gnome – I was part of the core team working on that!   See LiveCD Architecture Overview Diagram and LiveCD Features Timeline. Sadly there is not even a shred of information or documentation that alludes to this except for a sole reference in the OpenSolaris Bible.

Lost on the Dee – BUS

Lost on the Dee - BUS

Lost on the Dee - BUS

The earlier KDE 4.2.4 that I built for BeleniX had a weird persistent issue of various DBUS clients timing out after not receiving a response for their messages. It would happen erratically but when it did happen the desktop will be very slow to come up, applications will open after some time and the “.xsession-errors” file will get filled up with DBUS timeout messages. At that time there were many other issues to resolve and this problem got ignored not least due to it’s erratic nature.

The problem however persisted even when I built the latest KDE 4.3.1. It was much less effort to update to 4.3.1 since now the build recipes were already present and 4.3.1 had far less bugs than 4.2.4. When testing in VirtualBox I started seeing that this time the timeouts were consistent. So I started poking around with dbus-monitor and qdbusviewer and eventually found that clicking on thhe kded node on qdbusviewer caused a timeout with appropriate messages coming up with dbus-monitor. So kded4 was stuck. Next I used pstack and found the kded4 stack which showed that it was stuck on a write to the Gamin file descriptor. Promptly I got a stack of the gam_server process and found that one of it’s threads was blocked on a write. Using pfiles I saw them pointing to the same unix domain socket – AHA so they were deadlocked.

Now Gamin is a drop in replacement for FAM – File Alteration Monitor that can monitor files and directories and provide notification of their changes to consumers. Gamin uses Inotify on Linux. The OpenSolaris port done by the JDS team uses File Event Notification that is similar to Linux Inotify but uses a more generic Event Ports framework. Now KDE uses the KDirWatch class that in turn communicates with Gamin and I was using the OpenSolaris Gamin port. It appears that KDirWatch uses a single thread while Gamin can send back events at any time, even when the consumer has issued a subscribe call and it has not returned. Indeed the OpenSolaris port of Gamin sends back events in the new subscription processing flow. There are additional calls in KDirWatch around calls to FAMMonitorFile and FAMMonitorDirectory with comments about avoiding a deadlock. But that is not enough as I could clearly see. This looks like a design shortcoming to me. Ideally KDirWatch should use one thread to handle async notifications and invoke subscription requests in another thread.

Ok all good now what am I to do ? Getting so close to finishing the KDE 4.3.1 build, I was in no mood to sit down and start changing KDirWatch. One alternative was to disable FAM and use Polling Mode, but that would be horrible. Eventually I modified the Gamin patch for OpenSolaris to not send back some of the events during the subscription flow and that did the trick for now. DBUS timeouts are solved. I am not sure what will be the impact of this on Gnome 2.26, but at least KDE which is the primary desktop for BeleniX, is working. BTW KDE 4.3.1 on BeleniX 0.8Alpha is now available. I will put a separate post on that.

Cute-E Four Point Five

Cute-E Four Point Five

Qt 4.5

Having reached a working KDE 4.2.4 desktop milestone, I have been racing to get to 4.3.1. 4.2.x has enough problems and 4.3.x has enough fixes and improvements to warrant a quick move. One of the requirements for 4.3.1 is Qt 4.5 and having already a build recipe for 4.4 I thought it won’t take much time apart from the compilation time itself. But Alas, badly mistaken was I!

It turned out to be a lot of “fun” for 5 days before I could get a working Qt 4.5 built. The first time I built 4.5 all text was appearing as square boxes. Suspecting some locale issues in my older build env I setup a fresh new one using the install_belenix script, but no joy. Cursing my bad luck I sat down for the ardous task of digging through the Qt text rendering and font handling code. To cut a long story short it eventually took me 5 evenings of a wild goose chase  through multiple functions in multiple libraries to identify an iconv issue. I am using GNU libiconv and the way Qt 4.5 caches the iconv handle seems to cause a problem and subsequent googling with more specific search terms turned up this link: http://mail.kde.org/pipermail/kde-freebsd/2009-April/005059.html

The FreeBSD developers had faced the exact same issue back in April. Eventually I patched the code just enough to avoid the caching and finally got text that a human could read (not a monkey BTW :-P). After this things have been pretty smooth and I have made good progress except for another sticky issue with building the Soprano bindings in KDEbindings. I have disabled the Soprano bindings for now.