21 September, 2007

Datacenter Confidential #7

As if I needed more evidence that the Project Manager title was a secret cabal intent on interfering the the affairs of operations personnel I give you the following example:

This morning the same project manager described in Datacenter Confidential #6 sent an email to my boss complaining about, well, let's put it in his own words:
We’re running short on PROJECTNAME grid machines. I wonder if we have a status on PROJECTGRIDWORKER, which went down few weeks ago. Also, is it possible to run Linux on PROJECTMYSQLSERVER? It appears that in order to run our java programs on it, we have to modify/port the underlying scripts. We’re really have no time to do such a thing. It will be much easier if all grid machines are consistent and run on standard operating system. I’d strongly recommend we standardize on Linux for all of our production and development grid/non grid, as it is consistent and seems to be the industry standard.

We need a couple of machines to be configured for golden gate grid with sufficient memory and horse power ASAP. Can you please help us?
Regarding the shortage of grid machines, well, we have an RMA out to the vendor on the missing worker. Regarding resources, we can and will order extra RAM for all the machines. Regarding running Linux of PROJECTMYSQLSERVER, the answer is a resounding NO.

This is a development environment, and there were 6 machines purchased for it. The specifications for the hardware were clear, three of these machines would have RAID storage built in, and three would serve as JVM "workers", or dumb machines running java processes which would populate the various MySQL databases on the other three large storage servers.

These machines were purchased months ago, and per the production standards, the JVM "workers" run Debian GNU/Linux and the MySQL servers run FreeBSD 6.2.

The Project Manager is raising a stink because he wants to run a JVM on the FreeBSD machines. Luckily, I have solved this problem, and in fact run JVMs on FreeBSD machines in production. I have even gone through the trouble of changing the startup scripts for the wrapper utility we use to control JVM processes to play nice with either BSD or Linux. I have even checked this code into the source revision control system, Perforce, for the exact reason of avoiding such petty OS related debates.

Yet the Project Manager wants to "standardize" to Linux.

I for one am all for homogeny. I truly am. But I will be damned if I let some Project Manager tell me what I am and am not going to run on the machines I am responsible for maintaining. So I sent this to my boss:
my boss Ahh, if only it was so simple. I will be meeting with PROJECT MANAGER. Maybe you want to join.
Not really. If a someone above me wants to tell me what OS to run, they will be receiving my badge, keys and laptop. It is absolutely a waste of time for E-staff to even be discussing this, and if PROJECT MANAGER is not satisfied with what I tell him my operating environment is then that is cause for me to officially complain about cross-departmental interference or obstructionism. My word on this is the law, and if E-staff does not agree, they are free to remedy that however they can. I cannot be any more clear or unequivocal about it. PROJECT MANAGER needs to stop wasting my time, your time and THE CTO's time. Moreover, I will not tolerate E-staff second guessing trivial details such as this: that is the very definition of micro-management, and I will not be micro-managed. Feel free to convey my feelings on this matter however you deem appropriate.
Once I had a few hours to cool off, I drafted this response and sent it to my boss:
There are a number of OS choices available to technology companies running a Java Development Platform, ranging from MS Windows 2000, XP, Server 2003, Vista, Solaris, FreeBSD 5.4, FreeBSD 6.2, Linux 2.4 Kernel, Linux 2.6 Kernel as well as platform choices like Intel X86 32-bit, AMD X86 64-bit, SPARC or even IA64 (Tru-64).

More confounding, Linux offers literally scores of distribution choices, the most popular of which range from fully-commercial, for-pay distributions like Red Hat and Novell SuSE, or free varieties like Gentoo, Fedora-Core, Debian and Ubuntu.

Unfortunately, choosing one distribution over another, or even Linux over BSD in many case, is not as simple as making an arbitrary choice for an overall common operating environment unless you are not vendor agnostic; moreover licensing, and therefore budget concerns play a role in the choice. For the latter reason we have shied away from embracing Red Hat, SuSE or Solaris X86 due to the cost of licensing the number of CPUs we use.

What is also oftentimes the case is that new hardware, or varying system hardware configurations do not allow all distributions or operating environments to even install or run correctly: in many cases, we had to make distribution choices based on what would run on a system at the time the machine was needed rather than what we would have preferred.

One thing is true in the arguments for BSD versus Linux, and that is that in the majority of cases, CPUs, boards and peripheral support is more mature and support is more universal and standardized in the BSD kernel than in Linux, meaning that often newer hardware will not even run Linux. This has been the case in production nearly half a dozen times. For instance, the Sun AMD Opteron servers officially supported Red Hat and SuSE originally, we have three unlicensed SuSE servers and one FreeBSD server on the X4200 platform. By the time we acquired the X4100 and X2100 servers, Sun had added support for Ubuntu, which is based on the Debian distribution and packaging system, and is ostensibly free. The Dell 2950 servers, which are Intel 64-bit platform, would not, at the time we purchased them, run any current, non-beta or non-development Linux distribution freely available: not debian, not Fedora, not Gentoo, not Ubuntu, etc. As a result, those machines are all FreeBSD. Which is good, because the stated purpose of those machines are to run MySQL, and MySQL always runs on FreeBSD because that is the industry standard, and more subjectively but just as significantly, that is our standard. FREEBSDSERVER1, a content window, is a SuperMicro Intel-based 64-bit server and also would not run any Linux, so it too runs FreeBSD. I am sure there are more modern Linux distributions that would run on these machines if we installed them today, but that would not change the environment for the Dells and we are not going to take down a content window to satisfy a Linux-leaning operating environment preference.

A note about Debian Linux: newer machines and newer Linux distributions mesh better, which is why we run Debian on LINUXSERVER1 and LINUXSERVER2, and Ubuntu on LINUXSERVER3, LINUXSERVER4 and LINUXWORKER1, LINUXWORKER2, LINUXWORKER3 and LINUXWORKER4 (because these are Sun systems and they do not support Debian). Debian is a natural choice over Red Hat (commercial, bloated) and SuSE (ditto), as well as Fedora Core (bloated) or some lesser distribution such as Gentoo, etc, because Debian has better package management, and the package maintainers are more responsive. That we continue to run Red Hat and Fedora Core in production is only a function of legacy, and as always as machines running FC or RH are aged out, they will be replaced with either Debian or Ubuntu Linux, or FreeBSD.

In fact, there are few or any arguments not to run a uniformly FreeBSD environment except for legacy binary issues such as SVM, i18n, Chasen, Sleepycat and so on. One of the arguments in favor of Linux touches the UFS/FFS versus ext3 or ReiserFS debate. It is true, ext3 is somewhat faster than UFS/FFS, but ext3 is also less stable and therefore more dangerous than UFS/FFS, more so with ReiserFS. The choice, then, is clear when data-protection is made a priority, as it is in our architecture. ReiserFS is something we have used to side-step the need for an inode table for filesystems, allowing a filesystem to grow to hundreds of millions of objects in some cases. More recently, however, we have made a decision to move away from this model for several reasons: loss of objects is a very real threat with ReiserFS, filesystems with that many objects are hard to manage (move, transfer, archive) and filesystems over a certain number of gigabytes are undesirable for this same reason, not to mention Lucene has a limit of X number of objects and Y number of gigabytes it can reasonably be expected to manage per index. And, obviously, Lucene is effectively agnostic about UFS/FFS versus ext3 because the indeces are held in large files where platter RPM is as much a factor as the relative speed of the filesystem for I/O, and Lucene's "MAILDIR" type object store format allows fast-traversal of actual objects (eg, files). I'm fairly certain that if we tested a UFS Lucene index versus an ext3 index, the speeds would be marginally different, or at least within acceptable parameters.

So, in fact, the question of what the standard is can be answered succinctly enough for the sanity of upper management and project managers: if the platform allows, the OS will be one of Debian Linux or Ubuntu Linux, except for MySQL servers, which will run FreeBSD. All other machines will run FreeBSD, unless there is a package dependency that requires a Microsoft platform; however the onus is on project management to remove all Microsoft dependencies and that objective has been in place since 2005. All other distributions and platforms are supported as legacy platforms to be aged out and replaced with one of the above stated platforms.

Hopefully I have made myself clear on this issue and we can put this to rest. Thank you and have a good weekend.
Hopefully this will shut down the Project Manager interference into my fiefdom. Or, you will see me soon blogging about job hunting in the Bay Area.

2 comments:

Mister said...

Whoa, dude. That was in-depth. I hope you got your way!

dr von drinkensnorten said...

we'll see..