Category Archives: BeleniX

RIP OpenSolaris.ORG

logo Belenix

 

Just noticed the site update today at http://www.opensolaris.org/. The website will be decom-ed after March 4th. This is the final symbolic nail in the coffin of what myself and many others I know had worked very hard for and shown great passion. The nail was driven long back with the Oracle acquisition of SUN, soon the post-mortem report will be filed in some dusty corner of the great hall of Oracle. It was simply a fantastic OS platform but a great example of Free and OpenSource software done wrong, completely wrong by clueless management who would not listen to people who have lived and breathed OpenSource for many years.

 

I still remember the churning and frustration I felt inside when SUN launched OpenSolaris (http://desktoplinux.com/news/NS2543956200.html) without even a passing reference to BeleniX on which it was built. The first release of OpenSolaris was simply BeleniX outfitted with new clothes. I had toiled hard, very hard to bring all the distro and livecd technologies to the OpenSolaris platform. Given it was OpenSource and a labor of love, but when you take something in it’s entirety and completely ignore the creator/origin, it is equivalent to stealing. Forget about mentions in public, I did not even get any recognition within the company (apart from a few bones in the form of a few $$ being thrown) for all that work that made one of their flagship OS platform releases possible. The vitriol that had had burned me from inside is hard to describe. It was at that instant I had decided on leaving SUN where once I had planned to build a long-term career.

 

People doing hard work, good work and not getting recognized within organizations typically happens due to lack of visibility of their work outside of their silos. All of SUN knew about my work till the CEO and still nothing. I was completely and thoroughly sidelined while other teams did launches and launch parties. I never quite understood what was my fault. Or was it politics?

 

This is now long past and I was over that hump by the end of 2008. However it still remains as the most bitter memory of my life till date (apart from my father losing both his kidneys). Fortunately the work environment I went into after leaving SUN is quite different, in a positive way and is far from these issues.

 

Anyway, may you rest in peace OpenSolaris.

Deutsch: Logo von OpenSolaris als Vektorgrafik

Deutsch: Logo von OpenSolaris als Vektorgrafik (Photo credit: Wikipedia)

NetApp advanced Data Compression (from BeleniX?)

Belenix_logo

I came across this article on a so called advanced compression feature in OnTap: https://communities.netapp.com/docs/DOC-14329. There is a bunch of marketing fluff in the beginning of the article till you come to the second section where the real technique is illustrated. This is a simple idea being drummed up a bit. There is nothing special compared to what I had implemented for BeleniX and the OpenSolaris livecd years back. It is the very same feature.

Don’t believe me? Head over to my livemedia slides: http://www.osdevcon.org/2007/slides/osol_livemedia.pdf (slide 13). This is the presentation I delivered at the First OpenSolaris Developer Conference in 2007 in Berlin. It was hosted by the good folks of the  German Unix User’s Group.

comp_index

At the high level there are but a couple of differences with the NetApp implementation. The OnTap version of course provides write capability which I did not implement since I was dealing with CDROMs. OnTap uses a compressibility threshold of 25% while I used 12% in the livecd case.

When I wanted to leave SUN and was looking for a change I had interviewed briefly with NetApp but decided not to pursue it for the sake of BeleniX at that time (OpenSolaris+ZFS was NetApp competition). However I had put all details of my CDROM Filesystem I/O scheduler and transparent compression implementation with URLs in my resume.Who knows if someone in NetApp decided to pull the good ideas from that into their product – ha.

A CASCADE of Patches

I want to get Free-CAD working on BeleniX and have been going through the dependencies. One of them is OpenCASCADE that I started to build one week back. Since then it has been a tale of pain till finally once week later I do have a successful build.

Firstly the software is enormous having thousands of files. For eg. after a make install I find that it installs 15600+ header files! I started building it with GCC4.4.1 on BeleniX. Secondly their document mentions support for building with Sun Forte Compiler on Solaris 8 – primitive info. Obviously the combination of OpenSolaris platform + Mesa + Gcc 4.4.1 is untested. Once I started I came across some usual issues: The configure script assumes Sun Studio/Forte and has options not supported by Gcc, some headers assumed Sun Studio/Forte, needed proper declaration for bcopy, replace usage of ieee_handler with fex_set_handling  etc. After those I started coming across variable names like SS, CS that conflict with predefined macros for x86 register names on OpenSolaris. I have seen this on many occassions in KDE and other software in places.

However after manually patching about 15 files from 15 build failures, I started to wonder how many more. So to test I ran a simple command: find . -type f | xargs grep -w SS. Believe it or not there were hundreds of matches! From a hunch I started a round-robin search with all the possible register names and for the record the following are used: CS, SS, DS, GS, FS, ES, ESP, EIP in about 465 different files. The only option now was to whip up a simple shell script to do a global search and replace. The resulting generated patchset is huge and I am not keeping it as a patch! I have embedded the script in the pkgbuild spec file.

At the end of this all I found that the Makefile does not have 100% DESTDIR support in spite of it using the GNU autotools. So I had to patch Makefile.in and that resulted in a full build. After packaging I discovered a packaging issue and had to re-run make install. Even though the source tree is already built that resulted in another full build! Looks like broken Makefiles.

After an exasperating several days I do have a package. I had faced this usage of common variable names clashing in the namespace in several different software like for eg. in Celestia 1.6 that I built last week. It uses the obvious “sun” to represent Sol. This is however a predefined macro in Gcc on OpenSolaris. Granted that this can be worked around by using “-Usun”(Unsafe ?) in CFLAGS and OpenSolaris exposing register defines in headers by default looks like a bug, it is nevertheless a really, REALLY BAD IDEA to use obvious, common, short variable names in your software.

Ksysguard working on OpenSolaris

Ksysguard working on OpenSolaris

Ksysguard working on OpenSolaris

Anyone who might have tried the earlier KDE4.3 packages for BeleniX may have noticed that Ksysguard (CTRL+ESC or Kmenu -> Applications -> System -> System Monitor)  basically shows a blank slate. The process list is empty, CPU and Network stats are unavailable. The number of exposed sensors are too few.

I spent the last few days hacking on that component and got an initial working version that implements all the basic functionality for the OpenSolaris platform. There are still bugs to iron out and new sensors to add (using DTrace here can open up lots of possibilities). The current patch is here. The kdebase4-workspace package has been published into the BeleniX repository.

KDE 4.3.1 Available on BeleniX

The current 4.3.1 release of KDE is now available on BeleniX 0.8 Alpha. See this link for the details: http://www.belenix.org/content/KDE-431-now-available-BeleniX-08-Alpha. I have borrowed patches from the work of the KDE-Solaris team and Fedora Core 11 repository.

Currently 0.8 Alpha is only available via the network installer. We will be starting to work on building a LiveCD ISO soon. In addition 0.8 Alpha is based on OpenSolaris source drop for build 114. This will be updated to a more recent build. There are other things to look at like lofi bypass mode that should make it practical to use encrypted lofi on iSCSI targets – advantage being end-to-end encryption. Rework the ramdisk compression piece for the latest kernels and fix some oustanding corruption issue. Developer documentation for developing software on BeleniX, Hudson based bulk build setup for regular bulk builds on BeleniX repo, an installer written entirely in Python 2.6, A Gcc 4 build of Firefox with Profile Driven Optimization, Gcc build of OpenJDK on OpenSolaris, use RPM5 packaging with Smart Package Manager and lots of other stuff. One of the goals is use an Open Toolchain end-to-end. In that respect it is also important for us to look at a Gcc 4.2 build of the OpenSolaris kernel.

For me personally it is amazing to see how much BeleniX has progressed from the early days of a commandline-only ramdisk-only barebones kernel boot to single-user mode in an image assembled by hand. I manually went through and included individual files back in Sept 2005! Today some people may not realize it but BeleniX is a first-class OpenSolaris environment and a first-class KDE environment. People have been using it daily for months and it has been used in a multi-user build-server environment, like our build server in Moscow. Of course we face the problem of lack of developers, so developers are more than welcome!

In addition few people may know that the OpenSolaris distro from SUN owes it’s origin to BeleniX. Every technology that I developed for BeleniX during the 2.5 yrs prior to OpenSolaris-Distro coming out was used. In fact the first Beta release was based on BeleniX 0.4.1 with IPS and Caiman installer put in and KDE replaced by Gnome – I was part of the core team working on that!   See LiveCD Architecture Overview Diagram and LiveCD Features Timeline. Sadly there is not even a shred of information or documentation that alludes to this except for a sole reference in the OpenSolaris Bible.

Lost on the Dee – BUS

Lost on the Dee - BUS

Lost on the Dee - BUS

The earlier KDE 4.2.4 that I built for BeleniX had a weird persistent issue of various DBUS clients timing out after not receiving a response for their messages. It would happen erratically but when it did happen the desktop will be very slow to come up, applications will open after some time and the “.xsession-errors” file will get filled up with DBUS timeout messages. At that time there were many other issues to resolve and this problem got ignored not least due to it’s erratic nature.

The problem however persisted even when I built the latest KDE 4.3.1. It was much less effort to update to 4.3.1 since now the build recipes were already present and 4.3.1 had far less bugs than 4.2.4. When testing in VirtualBox I started seeing that this time the timeouts were consistent. So I started poking around with dbus-monitor and qdbusviewer and eventually found that clicking on thhe kded node on qdbusviewer caused a timeout with appropriate messages coming up with dbus-monitor. So kded4 was stuck. Next I used pstack and found the kded4 stack which showed that it was stuck on a write to the Gamin file descriptor. Promptly I got a stack of the gam_server process and found that one of it’s threads was blocked on a write. Using pfiles I saw them pointing to the same unix domain socket – AHA so they were deadlocked.

Now Gamin is a drop in replacement for FAM – File Alteration Monitor that can monitor files and directories and provide notification of their changes to consumers. Gamin uses Inotify on Linux. The OpenSolaris port done by the JDS team uses File Event Notification that is similar to Linux Inotify but uses a more generic Event Ports framework. Now KDE uses the KDirWatch class that in turn communicates with Gamin and I was using the OpenSolaris Gamin port. It appears that KDirWatch uses a single thread while Gamin can send back events at any time, even when the consumer has issued a subscribe call and it has not returned. Indeed the OpenSolaris port of Gamin sends back events in the new subscription processing flow. There are additional calls in KDirWatch around calls to FAMMonitorFile and FAMMonitorDirectory with comments about avoiding a deadlock. But that is not enough as I could clearly see. This looks like a design shortcoming to me. Ideally KDirWatch should use one thread to handle async notifications and invoke subscription requests in another thread.

Ok all good now what am I to do ? Getting so close to finishing the KDE 4.3.1 build, I was in no mood to sit down and start changing KDirWatch. One alternative was to disable FAM and use Polling Mode, but that would be horrible. Eventually I modified the Gamin patch for OpenSolaris to not send back some of the events during the subscription flow and that did the trick for now. DBUS timeouts are solved. I am not sure what will be the impact of this on Gnome 2.26, but at least KDE which is the primary desktop for BeleniX, is working. BTW KDE 4.3.1 on BeleniX 0.8Alpha is now available. I will put a separate post on that.

Cute-E Four Point Five

Cute-E Four Point Five

Qt 4.5

Having reached a working KDE 4.2.4 desktop milestone, I have been racing to get to 4.3.1. 4.2.x has enough problems and 4.3.x has enough fixes and improvements to warrant a quick move. One of the requirements for 4.3.1 is Qt 4.5 and having already a build recipe for 4.4 I thought it won’t take much time apart from the compilation time itself. But Alas, badly mistaken was I!

It turned out to be a lot of “fun” for 5 days before I could get a working Qt 4.5 built. The first time I built 4.5 all text was appearing as square boxes. Suspecting some locale issues in my older build env I setup a fresh new one using the install_belenix script, but no joy. Cursing my bad luck I sat down for the ardous task of digging through the Qt text rendering and font handling code. To cut a long story short it eventually took me 5 evenings of a wild goose chase  through multiple functions in multiple libraries to identify an iconv issue. I am using GNU libiconv and the way Qt 4.5 caches the iconv handle seems to cause a problem and subsequent googling with more specific search terms turned up this link: http://mail.kde.org/pipermail/kde-freebsd/2009-April/005059.html

The FreeBSD developers had faced the exact same issue back in April. Eventually I patched the code just enough to avoid the caching and finally got text that a human could read (not a monkey BTW :-P). After this things have been pretty smooth and I have made good progress except for another sticky issue with building the Soprano bindings in KDEbindings. I have disabled the Soprano bindings for now.

KDE 4.2.4 on BeleniX

It was a very long story getting to a functional KDE 4.2.4 on BeleniX. The amount of effort needed to integrate KDE 4 and iron out issues is immense indeed and needed months of effort not least because of the humongous dependency tree of KDE 4. Of course the work is still ongoing and there are still bugs to fix and update to the latest 4.3 release. Thanks to several guys like Sriram Narayanan, Kunal Ghosh, Kaya Saman and others who helped to test and find bugs/workarounds. Of course all this is part-time (weekend, evening) effort outside of my day job.

KDE 4.2.4 in BeleniX

The above screenshot shows KDE 4.2.4 desktop on BeleniX running Konqueror with Webkit as the rendering engine, Krdc, and Lotus Notes inside Wine with the Clock from Google Gadgets.

One of the simplifying but interesting things was the usage of Gcc 4.4 and the new Graphite optimizations in some places. In addition I borrowed patches and build recipes from the KDE-Solaris teams efforts and the Fedora repository. There were many challenges and there is still quite a bit of work to do. There are several patches that I’d have to submit to upstream KDE projects.

One of the most recent adventures was to get Amarok2 built properly. At first, I needed MySQL embedded, so I hacked the MySQL 5.0 Makefile in the SFW consolidation to build it. However Amarok would refuse to start first giving symbolic errors and then, after a few hacks, coring somewhere in libQtCore after trying to initialize Innodb. The attempt to initialize Innodb confused me till I read the Amarok2 build guide which states 5.1 is needed. So back to the SFW repository when it hit me again. SFW repo compiles using Studio and MySQL uses C++ stuff. Arrgh. I spent a whole day creating a new Spec file for building MySQL 5.1 with Gcc4. That was quite challenging to get right and also get embedded MySQL as a shared lib. Anyway the Innodb thing went away after that but coring persisted.

It was coring because the dynamic_cast operator was failing to cast a SqlCollection Object to one of it’s parents, SqlStorage. Weird! Eventually I played with the compiler options and changed -march=pentium3 to -march=pentiumpro, added -frtti and finally dynamic_cast started working again. Then the Amarok2 screen finally came up and then it cored again after 10 seconds. This time embedded MySQL was linked into Amarok2 as a static lib, so I rewhacked the Spec file to build a shared lib and got it right after several attempts. Finally made a silent prayer and had Amarok2 working without coring. This can be a Gcc 4.4.0 issue as well. We will be moving to Gcc 4.4.1 shortly with the patch to let it build Wine added.

BeleniX uses package from the SFW repo. One of the onging activities is to migrate package builds from SFW to spec files in BeleniX CVS repo and build with Gcc. The SFW gate packaging is weird in some respects. Not all features are enabled for some packages like no embedded server in MySQL. In addition the most horrible thing is that “11.11″ is used as the package version for every package! What sense does it make to do this ? It requires ugliness in spkg version comparison. There should be other ways to tie SFW package builds to ON build releases.

Another intention of ours is to get a working Firefox 3.5.x build on BeleniX using Gcc4.4. We are able to get a working debug enabled build but the release build crashes in a stub function. In addition since SUN Java is being used, the Java plugin won’t work in a Gcc Firefox build on OpenSolaris. This is because SUN Java for osol is built using SUN Studio C++ compiler. We will have to investigate getting OpenJDK built using Gcc4 on opensolaris. We do have XULRunner built however since that is needed by Google Gadgets.

Well that is enough for now, more stories later.

BeleniX 0.8 Alpha using Network Installer

There is an Alpha release of BeleniX 0.8 available that can be installed directly off the package repo via a network installer. There is no LiveCD release yet, that will come later. The network installer can be used if you are already using OpenSolaris 2009.xx or BeleniX 0.7.1. It will install 0.8 Alpha into a new boot environment leaving your current BE untouched. While booting you will get a GRUB menu option to boot into this environment.

The following simple steps are needed to use the network installer:

wget http://www.belenix.org/binfiles/install_belenix

chmod +x ../install_belenix

./install_belenix

You can run ‘install_belenix -help’ to get a detailed usage guide. The installer downloads approx 830+ packages. This release provides lots of updates to packages and new packages built mostly using Gcc 4.4.0. As a matter of direction BeleniX is moving to using Gcc 4.4 as the default compiler except for the OpenSolaris base OS itself. An open-source toolchain is preferable, it is easier to port FOSS stuff to OpenSolaris when using Gcc and Gcc 4.x series are introducing lots of good features.

This 0.8 Alpha consists of a complete KDE 4.2.4 base environment, with most packages from the extragear repo. Amarok2, Qt4 built using Gcc-Graphite, Webkit, Google Gadgets, PDO optimized Python 2.6, Gtk-Qt4 engine, Gcc 4.4 with the Graphite framework, X11 framework built using XCB support, BOOST (which is required for KDE4 anyway), Boost-Python, PyQt4, DJVU document support, MySQL 5.1, XULRunner and so on and also includes a complete GNOME 2.26 environment based off the JDS repository with modifications. More packages will keep appearing over the next several weeks.

ALPHA – ALERT: This is an Alpha release so things may not work or horribly crash in the new boot environment though we are seeing the KDE 4 desktop to be usable. So feedback and bug reports are welcome. Your existing boot environment is of course left untouched.

Gnome 2.26 on BeleniX

Apart from KDE4, Gnome 2.26 will also be available in BeleniX 0.8. I pulled the Desktop Consolidation trunk (JDS) and built it with a bunch of changes. The packages are already available in the package repository trunk but not yet recommended to upgrade to trunk as it is seeing a lot of churn at present. One of the things pending is to replace the default OpenSolaris branding with BeleniX branding. Below is a screenshot showing Gnome 2.26 + Compiz + Avant + Google Gadgets + Webkit on my box. The Gnome developer help documentation browser is built with Webkit support.

Gnome 2.26