The Data Lake Survival Guide: The What, Why and How of the Data Lake
In times past, when thinking about digital data, it made sense to segregate data between transactional data, the data captured in business applications, stored in database tables and presented by BI tools, and all other data: emails, web pages, images, video and so on. Nowadays we tend to refer to such “other data” as unstructured data.
Nevertheless it was analyzable and software for deriving value from such data has
crossed the chasm. It was that analytical imperative more than anything else which gave rise to the
original concept of a data lake, a data store for both species of data and, additionally
for data harvested from multiple sources external to the business, some of which was
inevitably unstructured.
In this paper, we will examine how the new ecosystem created by the data lake will no longer consist entirely of the transactions (or events) of
the business. It will also include data from other sources, which the business uses to
perform analytics and inform its users of important information on which decisions can
be based. The system of record will be, as it always was, the golden copy of corporate
data and the audit trail of the IT activities of the business.