Why tiered enterprise archiving is the killer app for Hadoop

Research suggests that 70% of Hadoop deployments will fail this year. This foolproof strategy will help wrangle your data and beat the odds.

A recent Gartner report estimates that 70% of Hadoop deployments in 2018 will fail to meet key objectives, citing skills and integration challenges. Apparently, many organizations are taking on Hadoop projects but fail to take them into full production, because they don’t have a clear roadmap or the skills necessary to complete them — eventually becoming pet projects.

Hadoop deployments in 2018

That got me thinking about something: Organizations hold a lot of data, 80% of which is inactive (another statistic from Gartner). What can be a better killer application than archiving/retiring this data into a big data repository? Because big data repositories can be built by commodity storage, compute, and open-source software, they bring immediate ROI — making them a quick sell for upper management, and easy to deploy.

Enterprise archiving is an information lifecycle management best practice and data ingestion strategy to distribute your data into accessible tiers, based on the importance, age, or compliance requirements of the data — a capability made possible by Hadoop’s powerful HDFS file system. While every organization’s data and needs differ, a good starting point to an enterprise archiving system is to base it on following tiers, ideally spread across hybrid and multi-cloud computing infrastructures:

– Data Lake tier: For active data that needs to be frequently accessed.

– Archive tier: For data that needs to be completely decoupled from the production environment, useful for big data analytics and data science projects.

Information Lifecycle Management best practice

Designed for low cost, commodity hardware

Because Hadoop is built with commodity hardware in mind, deploying enterprise archiving to cloud services such as S3 makes it an extremely cost-effective way to store your data. Beyond archiving, you can expand the same to Enterprise Data Lake and open up the platform for analytics to predict and prevent issues, rather than responding to issues. This creates new opportunities for your data scientists to do things that were just not possible before.

But enterprise archiving shouldn’t be a means to an end, it’s only the foundation to begin monetizing and organizing your data. More importantly, it’s the roadmap of apps that you build-out, that will determine the success of your Hadoop project. And it’s equally important to implement a proper information governance process alongside your enterprise archiving.

The foundation for every Hadoop project

However, implementing tiered enterprise archiving as the foundation of your next Hadoop project early on during the data ingestion process is critical to ensuring the stability, security, and organization of your data. Once the data is in, it can the enterprise archiving can help build apps to solve problems such as GDPR, shared services platforms, and much more. And this is most apparent when you decide to bring your project into production where the future volume, variety, and velocity of incoming data is unknown — impacting performance, costs, and availability.

Learn more about Solix Enterprise Archiving here.

Learn more about the Solix Common Data Platform here.

Designed for low cost, commodity hardware

The foundation for every Hadoop project

Sai Gundavelli

Related Posts