A data mart is a simple form of a data warehouse that is focused on a single subject or functional area. It draws data from a limited number of sources such as sales, finance or marketing, and is often created and controlled by a single department within an organization. Like data warehouses, data marts implement the characteristics of governed, non-volatile, and integrated data, although the static model known to be the “truth” at the time is of a smaller scope than the enterprise scope used in a data warehouse.
In addition to having the three characteristics of a data warehouse (governed, non-volatile, and integrated), data marts introduce a fourth – agile. Because they are smaller in scope (i.e. contain only data relevant to the specific use case), they can be rebuilt more quickly and at a lower cost if that model changes. In addition, since a single model about an organizational domain defined by a domain Subject Matter Expert (SME) typically reflects reality more accurately than an enterprise model expressing the interoperability of several domains defined by multiple domain SMEs, the model is also less apt to change.
Of course, they still have many of the same issues as data warehouses, such as extensive up-front modeling and the need for data cleansing to get all the data into the required format. However, these processes are much less expensive in the creation phase than that of the average data warehouse, since they only address one use case at a time.
Furthermore, they introduce some new management issues. Enterprise data is duplicated in multiple marts (e.g. customer data would be used by marts created to address sales trends as well as customer support resolution), and if that enterprise data changes, it must be updated across all marts, or different they will provide different answers from the same data, which only confuses decision makers. This management of the duplicity of data is a key reason that Total Cost of Ownership (TCO) for a data mart approach often exceeds that of a data warehouse over the life of the systems.
Read also: Data Warehouse vs Data Lake
In addition, each data mart has to address governance and provenance concerns, which in turn increases TCO, such as:
- How to define the processes for Extract-Transfer-Load (ETL) for each data source;
- How to synchronize data from Online Transaction Processing (OLTP) systems to the mart;
- How to resolve user-identified inaccuracies in the data;
- What systems fed the data;
- How was the data filtered, modified, and enhanced, and
- Why the answer provided by one mart is more accurate than answers from other OLTP systems and/or other marts.
However, these challenges can be alleviated by utilizing a data lake, which copies data from existing OLTP systems into a single data structure, allowing analytics to access all of the data in a single place. The data lake approach reduces the up-front burden, in terms of development time and costs, of refining and improving data based on some known model of perceived truths at the time. Then, when semantic - or "smart" - technology is applied to that data lake, a Smart Data Lake® is created that bridges the dichotomy of the rigid structure and governance found in data warehouses and marts with the chaotic power of the structure-agnostic data lake.
To learn more about data lakes, download the whitepaper "Data Lake Trends - the Rise of Data Lakes".