Data Lake Knowledge Center
Data Application: for data ingestion
- An application that generates dataframes and update dataset.
Data Application: for data transformation
- Consume one or multiple dataset and generates dataframes and update datasets
Data Pipeline
- Bunch of data application that has dependency
- Runs on a regular basis
It is a good idea to using Apache Airflow to orchestrate data pipeline