All about DataSince, DataEngineering and ComputerScience
View the Project on GitHub datainsightat/DataScience_Examples
Pipelines process a certain amount of data an then exits.
Tranformation cannot be expressed in SQL. Use Dataflow as ETL Tool and land data in BigQuery.
Look Beyond Dataflow and BigQuery
Issue | Solution |
---|---|
Latency | Dataflow to Bigtable |
Spark | Dataproc |
Visual | Cloud Data Fusion |
Metadata as a service.
Bounded Data (Batch) | Unbounded Data (Stream) |
---|---|
Finite data set | Infinite data set |
Complete | Never complete |
Time of element is disregarded | Time of element is siginificant |
in rest | in motion |
Durable storage | Temporary storage |
Data Integration (10sec - 10min) | Data decisions (100ms - 10sec) |
---|---|
Data warehouse real-time | Real-time recommendations |
Fraud detection | |
Gaming events | |
Finance back office |