Introduction
The Role of a Data Engineer
A dataengineer builds data pipelines.
- Get the data where it can be useful
- Get the data in a usable condition
- Add new value to the data
- Manage the data
- Productionize data processes
Data Lake
Brings together data from multiple sources. Cloud Storage Bucket.
- Does it handle all types to data?
- Can it Scale?
- Does it support high throughtput ingestion
- Control to objects
- Can other tools connect easily
ETL
Extract, Transform, Load. Dataproc, Dataflow
Real Time Analytics
Pub/Sub
Data Engineering Challenges
- Access to Data
- Quality
- Computational ressources
- Query Performance
BigQuery
Serverless Datawarehouse
Data Lakes and Data Warehouses
Datawarehouse
- Can it serve as a sink for batch and streaming data
- Can it scale
- How is the data organized
- Is it designed for performance
- What it the maintenance level
Databases vs Data Warehouses
SQL > CloudSQL
Cloud SQL |
BigQuery |
Transactional DB |
Data Warehouse |
Record Based |
Column Based |
Partners
- ML Engineer
- Data Analyst
- Data Engineer
BigQuery ML
BigQuery BI Engine
Data Access
- Data Catalog
- Data Loss Prevention API (Manage sensitive Data)
Cloud Composer (Airflow)
- How can we ensure pipeline health
- minimize maintenance
- respond to business needs
- are we using the latest tools?