I just returned from installing a basic DC-X system at a customer site in Northern Germany. My position at Digital Collections is not project manager – I work as software developer and architect – but since we’re just starting to roll out DC-X and it’s still in beta we decided to let developers participate in the first few projects. Getting in direct touch with customers and users should help us find the weak spots, add essential features and prepare a stable release that deserves its name.
An important lesson I had to relearn was that, especially in an enterprise environment, your software will only get on user’s screens with the efforts of a lot of people. I’ll leave marketing and the whole sales process aside (not my expertise), but once you’re arriving at the client site to install the software, you realize who has to be involved: The guy who bought the hardware and put it into the datacenter. The networking, firewall, proxy guys. The one who sets up the operating system and the cluster software. A team working out failover scenarios and tests. The person in charge of SAN storage. The backup guru. The Nagios expert. A person who knows how to connect to the LDAP-capable directory service, and who creates the required user groups and test accounts. Someone who knows which browser versions are installed on client computers (and who can roll out newer ones). People who are going to train users. The customer’s support personnel. Someone helpful who knows how to get content into the system. Managers who help arrange all this. It’s no wonder that a lot of our projects get delayed in this phase, but this time it worked great, everyone was very helpful and delivered fast! (And none of this work actually matters to users, except that they want to be able to access the software and have it run reliably… Isn’t it kind of sad how much time has to be spent on infrastructure?)
By the way: Today’s servers (three servers, each with two quad-core Intel Nehalem processors with hyperthreading and 48 GB RAM) are fast! A few years ago, we did measure the time it takes to import an image into DC-X in „seconds per image“. Now we had to switch to „images per seconds“… And it’s astounding how different an impression your software can make depending on which client computer runs it. When I was presenting DC-X from my laptop over a beamer with just 1024×768 resolution, it didn’t look that great. (DC-X is designed for a minimum resolution of 1280×800). Looking at it on the user’s screens (two 22″ monitors, and brand new PCs), it felt like a totally different beast. (I’m in for another kind of surprise when the users with slow PCs and a single 15“ monitor will start using DC-X…)
Our friends at Janz set up the OS (SLES), the IBM GPFS cluster filesystem and Heartbeat clustering. GPFS seems to run reliably in the background, allowing all servers painless write access to the same SAN volumes with no need for NFS, not getting in the way. But I’m scared by Heartbeat’s complexity and (from my newbie perspective) lack of user friendliness and polish. Definitely something I wouldn’t want to set up myself.
Setting up a cluster environment and testing it means a lot of additional work. My work on the software was frequently interrupted by someone saying „sorry, I just moved Apache processes to another server“ or „we tried removing half of the cables and now the filesystem has gone away“. (Yes, they’ll soon have a separate test server…) MySQL failover and recovery worked very well. Solr not so much, it doesn’t recover damaged indexes automatically. When the clustering works, it seems like magic – you switch off one server and a few seconds later all processes are alive and happy on another server. But there were also times when Hearbeat refused to start processes and killed them when I manually started them (only an hour before the system was supposed to go live). I was certainly doing something wrong, but couldn’t help feeling slightly annoyed…
Another thing I learned: Almost every little thing that doesn’t work as expected, is confusing, ugly, inconsistent, or offers unneeded functionality will come back to haunt you. During development, we all too often dismiss the „little things“ as not being that important; the focus is on getting things working. But during each demo, and much more so when you release the software and you have to watch actual users try to use it, every little unfinished thing will jump into your face and embarrass you… As Jon Udell once wrote: „If you want to make software developers squirm, force them to watch people using their software.“