The article Challenge of Scale Part 7 – Digital Asset Management by Real Story Group’s Theresa Regli reminds me that our DAM software DC-X is quite good at “scalability”. What does that mean? The article lists many challenges, here’s some notes on three aspects:
The article talks of “hundreds of thousands or millions of digital assets”. Since most of our customers are publishers, DC systems are built to handle decades worth of newspaper articles, pages, images, and news agency texts. Typical installations range from several hundred thousand assets to a few million, with the two largest ones containing more than 20 million documents each.
Theresa Regli also mentions “large numbers of assets […] ingested and outgested in bulk”. Our customers report ingesting up to 80,000 documents per day during regular production use. When we migrate data from older systems during installation, this number can be even larger – an older blog post on real-world import performance has an example with 50,000 images or 400,000 articles ingested per hour.
Our software is mostly installed on premises (although that’s changing slowly), running on hardware ranging from a single virtual machine with 2 GB RAM (I’m running DC-X on my laptop as well…) to a cluster of 2-5 physical servers. The DC-X architecture allows to scale out easily by moving components like the database and Solr fulltext engine onto dedicated servers. Ingestion is performed by parallel worker processes, which can also run on dedicated servers.
Since most installations are intranet-only, they usually don’t have more than a few dozen to a few hundred parallel users. The most performance critical components are the database and Solr, which we spend some time optimizing, usually with great results. (Good that we moved away from Oracle Text, which was causing problems that sometimes were impossible to fix…)