Data Lake

Describing Data Lakes

  • A data lake is a system of data stored in its raw format
  • Usually, it stores object blobs or files
  • It is used for dumping all forms of data generated by the business
  • Data lakes and data warehouses are conceptual forms of storage
  • Whereas, databases and file systems are implementations of these conceptual models
  • For example, we can implement a data lake with a file system
  • Or, we can implement a data lake with a databse

Common Examples of Data in Data Lakes

  • Structured data feeds
  • Chat logs
  • Emails
  • Images (as blobs)
  • Videos

Defining a Data Lake

  • Data lakes are used for storing raw data that will be transformed and loaded into a data warehouse eventually
  • Typically, this is where raw predictive data is stored
  • This data can sit in a data lake for weeks to months without being used
  • Storing data in a data lake is very expensive
  • Very few business users will access the data lake ever
  • Data lakes typically:

    • Grow quickly
    • Are very large
    • Contains various types of data

Defining Use Cases for Data Lake?

  1. Cheap way to stores lots of different types of data

    • This implies storing raw data that seems useful now
    • Later, we'll want to transform and load it in some way
    • This could include videos, transactions, returns, etc.
  2. Storing data that seems important now

    • There may not be a plan in place to transform the data
    • However, we will receive some important data that we want to store now
  3. Storing data before transformations and loads

    • Specifically, we'll store data here before further transformations and loads

References

Previous
Next

Batch Processing

Data Warehouse