One product, three people and a handful of companies

March 1, 2024
Hello, my name is Ernst Bablick and I am writing here about myself, my two founding colleagues Joachim Gabler and Daniel Gruber, a software product that most people probably know as “Grid Engine”, and a few companies that were responsible for “Grid Engine” in the past.

About a year after graduating in computer science, I started my career in HPC in early 1999 [1] when I was approached by a colleague of then professor Wolfgang Gentzsch to join his team at Genias. Since I knew the company Genias and was also familiar with Wolfgang Gentzsch’s vision, my decision was not long in coming. In March 99, I started as a software engineer in the development team for a product that was still called Codine/GRD at the time. One of my first tasks was to port the software to the Solaris operating system from Sun Microsystems.

I got to know Joachim Gabler when he joined Genias in October 1999 [2]. Like me, he had started out as a software engineer and we followed the development of Genias with great interest, which had been merged with the American sales company Chord Systems and renamed Gridware [3].

In June 2000, Sun announced that Codine/GRD was the preferred workload management solution for their Technical Compute Farms [4], and a month later we were all working for Sun Microsystems, which had acquired Gridware. During this time, our development group grew rapidly. Codine/GRD was renamed Grid Engine and Sun Microsystems released the software to the open source community in 2001 [5].

I was now working as a Senior Software Engineer and Joachim had just been promoted to lead the QA and Sustaining team when Daniel Gruber [6] joined our group in 2008. In addition to day-to-day development work, Daniel co-chaired the OGF working group that specified the DRMAA2 standard that governs job submission and monitoring of compute clusters for all workload management products.

Oracle finally took over Sun Microsystems in April 2009 [7]. Oracle’s perceived lack of interest in the HPC market and the resulting dissatisfaction in our development group eventually led to the development core responsible for Grid Engine moving from Oracle to Univa [8]. At Univa, I became technical product manager for the Grid Engine product and then development manager for the entire Grid Engine development team.

The continuous development of the product, especially in the areas of cloud computing and containerisation, ensured great customer acceptance. At the same time the lack of commitment in the open source area led to the spin-off of several open source projects (Open Grid Scheduler and Son of Grid Engine). [9]

Grid Engine reached its peak so far in June 2018, when Joachim and I accompanied an experiment that showed how to utilize over 1 million compute cores in a single Grid Engine cluster on AWS. [10]

Altair eventually acquired Univa in 2020 [11]. At that point, my enthusiasm was limited. Altair already had two workload management products. I felt vindicated by further developments and felt compelled to resign in 2023, as did other colleagues.

Here we are now, in March 2024, and the key developers from the early days are still together, facing the challenges of the future of Grid Engine under their own leadership at HPC-Gridware.

We will build on the latest version of the Univa code and continue open source development under the name “Open Cluster Scheduler”. Developers of other open source forks (Open Grid Scheduler and Son of Grid Engine) are welcome to join us. We will also offer support and consulting as HPC-Gridware under the product name “Gridware Cluster Scheduler” as well as further developments that will reach existing customers.