Deduplication

Too much information can really bring your business down. The amount of data stored by businesses may surpass the world’s storage capacity by 2010, claims market research firm IDC.


Too much information can really bring your business down. The amount of data stored by businesses may surpass the world’s storage capacity by 2010, claims market research firm IDC.

Too much information can really bring your business down. The amount of data stored by businesses may surpass the world’s storage capacity by 2010, claims market research firm IDC.

Now while it may be easy to dismiss this as the next Millennium Bug story, the quantity of data being stored is growing by a whopping 60 per cent a year. Steve Mills, head of IBM’s US$13 billion software department, says 75% of that information is replicated data and can be discarded.

There’s a number of reasons for this duplication, from pictorial files clogging up email accounts to IT staff dutifully backing up an organisation’s data. Generally, they don’t stop to assess the quality of that data so they’ll copy huge amounts of data every time they do a back up.

Considering the scale of the problem, it’s no surprise that deduplication (or ‘dedupe’) is currently the storage industry’s favourite buzzword. It means running a program or plugging a box into your network to scan data at a sub-file level – not altogether different from a virus scanner. If it finds a copy it removes it and replaces it with a pointer to the original, allowing you to store one master copy.

Admittedly, single-instance-storage (SIS) offered by some vendors, and inherent in Microsoft’s Exchange server, has been doing this for a while. But where dedupe gets clever is by scanning inside the file to see what’s different. For example, SIS keeps separate copies of your logo where it appears on all corporate correspondence, office stationery and every page of presentations – but with dedupe, the logo will only be stored once.

Email servers aside, the ramifications for backups, virtualisation (running lots of desktops from a single server) and disaster recovery, all of which involve large quantities of replicated data, are enormous.

Data crushing

Beth White, vice-president of marketing for Data Domain, says: ‘We’ve seen 20 times compression with a database, and very aggressive deduplication rates with VMware [virtualisation]. We really chew that up – sometimes we get 40-60 times compression.’

Surfwear brand O’Neills, credited with inventing the modern wetsuit and surfer’s safety leash, introduced dedupe appliances into the backup environments of five of its European sites. The company reduced its stored data, including a VMware implementation, by a factor of 18 and in the process cut its backup window from 14 hours to two.

‘We used to back up 1.4 terabytes – just critical things,’ says the company’s global IT service and infrastructure manager, Peter Malijaars. ‘Now we back up all the archives as well – 5.3 terabytes. Since we’re a clothing company there’s a lot of old designs, but now we only have to back up what we’ve changed since last time.’

Marc Barber

Marc Barber

Marc was editor of GrowthBusiness from 2006 to 2010. He specialised in writing about entrepreneurs, private equity and venture capital, mid-market M&A, small caps and high-growth businesses.

Related Topics