Data Lake Knowledge Center
Overview
| Diagram
|
Description
|
|
|
- 1 Airflow Scheduler trigger DAG (DAG is generated based on metadata)
- The ETL job is a task within an airflow DAG
- 2 ETL executor pulls code from a repo into loacl disk
- 3 ETL executor uses dbt library to submit job to Apache Spark via JDBC interface (e.g. via Thrift Server)
- 4 Thrift Server take the SQL and pass it to Apache Spark to execute
|