DataScience_Examples

All about DataSince, DataEngineering and ComputerScience

View the Project on GitHub datainsightat/DataScience_Examples

Introduktion to Data Lakes

A scalable and secure data platfrom that allows enterprises to

any type or volume of information.

Ecosystem

Architectute

Datalake vs Datawarehouse

Datalake Datawarehouse
Native format Loaded after usecase
All data types Processed
Easy Changes Faster insights
Applicatoin-specific Current and historical data
  Consistent schema

Data Extraction

EL (Extract and Load)

Data can be imported “as is”.

ELT (Extract, Load and Transform)

Data is transformed after loading it in the target.

ETL (Extract, Transform and Load)

Data needs to be Transformed before loading.

How does Cloud Storage Work?

Cloud Storage Storage Classes

Cloud Storage simulates file system. Object paths are stored as metadata for each object. They are not partitioned in distinct folders.

File Access

File Access

gs://declass/de/modules/02/script.sh

Web Access

https://storage.cloud.google.com/declass/de/modules/02/scsript.sh

Object Management Features

Secure Cloud Storage

Access Management
Encryption
Special Cases

Storage Types

Types

Transactional vs Analytic Workload

Workload

Transactional workload is write-heavy. Analytical systems are read-heavy.

$ gsutil -m cp ..

Cloud SQL as Relational Datalake

Managed Services for RDBMS (SQL Server, MySQL, PostgreSQL).

Cloud SQL

Backup, revocery, scaling and security is managed.

Replicas

Fully Managed vs Serverless

Fully managed Serverless
No setup No server managemend
Automatd backups Fully managed security
Replicated Pay for usage

Serverless Data Management Architecture

Serverless