Describing Data Lakes
- A data lake is a system of data stored in its raw format
- Usually, it stores object blobs or files
- It is used for dumping all forms of data generated by the business
- Data lakes and data warehouses are conceptual forms of storage
- Whereas, databases and file systems are implementations of these conceptual models
- For example, we can implement a data lake with a file system
- Or, we can implement a data lake with a databse
Common Examples of Data in Data Lakes
- Structured data feeds
- Chat logs
- Emails
- Images (as blobs)
- Videos
Defining a Data Lake
- Data lakes are used for storing raw data that will be transformed and loaded into a data warehouse eventually
- Typically, this is where raw predictive data is stored
- This data can sit in a data lake for weeks to months without being used
- Storing data in a data lake is very expensive
- Very few business users will access the data lake ever
-
Data lakes typically:
- Grow quickly
- Are very large
- Contains various types of data
Defining Use Cases for Data Lake?
-
Cheap way to stores lots of different types of data
- This implies storing raw data that seems useful now
- Later, we'll want to transform and load it in some way
- This could include videos, transactions, returns, etc.
-
Storing data that seems important now
- There may not be a plan in place to transform the data
- However, we will receive some important data that we want to store now
-
Storing data before transformations and loads
- Specifically, we'll store data here before further transformations and loads
References
Previous
Next