A Guide To Data Deduplication For Businesses


If the idea of business speak or technical descriptions send you into a bored stupor, chances are that the term ‘data deduplication’ will have you running for the hills. Well, stop running. Data deduplication is an incredibly useful technique designed to help streamline any business that uses large amounts of data.


What is it?

Data deduplication is exactly as it sounds. Using an analytical process, deduplication picks up on the chunks of data that are unique and stores them. As the process continues, whenever another chunk matches a previously stored one, the redundant one is replaced with a small reference that points to the chunk already stored. Because some chunks might be repeated thousands of times as part of the analysis, the process can end up reducing the total size of the database, and by quite a substantial amount.

How can this benefit your business?

If you have a large amount of duplicate data stored in the same drive or disk (and the chances are you do, because most modern backup programs lead to duplication), then deduplication can save you a huge amount of space. Backing up once in a different drive and then running deduplication software on your main one is a far less resource-consuming method.

Secondly, as well as hard drive space, deduplication is a great way to minimise your bandwidth use. Most large companies have their own intranet through which different files are sent. If duplicated files are also being sent (and the chances are that they are) then your firm will end up paying through the nose for bandwidth you don’t need to be using.

Finally, virtual servers can benefit substantially from the technique, because it allows nominally separate system files for each server to be coalesced into one storage space. Again, this is a great way to decrease the amount of resource used for data storage.

Who offers this sort of technology?

For what is quite a specialist technique, there are actually a lot of different companies that offer their own variety of deduplication. Dell’s Ocarina ECOsystem is a popular product, as is the Fujitsu’s ETERNUS CS Data protection appliance. IBM’s ProtecTier is another widely used version. Other companies who offer the necessary technology include Permabit Technology, Quantum, SEPATON, Symantec and Barracuda Networks.

Are there any downsides?

With this technique, there are inevitable concerns about data being lost as part of the process, as essentially the design of the deduplication process will be the difference between chunks of data being saved and being lost. The quality of the algorithm is therefore paramount. Fortunately, the maturation of the technology over the past few years has meant that most of the major products that offer the service have proved their software’s integrity beyond any reasonable doubt.

The other main issue amongst critics is that with some of the systems security can be compromised if hostile parties are able to guess the hash value of the data they’re trying to reach. However, for the best advice, we’d recommend getting in touch with an IT support company. Consultants from a service provider will be up to date with the latest techniques and applications and will be able to suggest ways of incorporating data deduplication into your IT management strategy.