Data lakes are no longer anomalies. Consolidating all of an organization’s data—unstructured, semi-structured, and structured—into a single repository for integration, access, and analytics purposes is rapidly emerging as the preferred way to manage big data initiatives.
There are numerous benefits to accounting for big data with this approach. Data access is expedited, the traditional silo format of partitioning data according to their types and purposes is avoided, and lengthy data modeling concerns are alleviated. Data are stored with minimal data preparation, further fostering the democratization of big data while reducing the complexity of its associated technologies.
The first pitfall organizations typically encounter in the wake of these benefits is adopting such an approach without considering the long-term sustainability of their data lake. Without substantial data preparation and data management hallmarks of governance, data quality, security and others, a single repository for big data can produce considerable problems—whether used across or within business units.
Second common pitfall for implementing data lakes arises when organizations require the need for scarcely found data scientists to generate value from these hubs. Since data lakes store data in their native format, it is not uncommon for data scientists to spend as much as 80 percent of their time on basic data preparation. Consequently, many of the enterprise’s most valued resources are dedicated to mundane, time-consuming processes that considerably protract time to action on potentially time-sensitive big data.
Lastly, Data lakes typify the recent technological advancements that have enabled unprecedented scalability, expedience of access, and comprehensive data storage (for all data types) at prices attractive to even small and mid-sized businesses. Consequently, traditional approaches to managing them and their data will not work. A common pitfall is for organizations to attempt to use conventional methods that proved successful in relational environments in one primed for both big data and structured data.
To learn more about these Pitfalls and how to avoid them, download our paper: The Three Pitfalls of Data Lakes and How to Avoid Them.