Dl/Overview: Difference between revisions

From stonehomewiki
Jump to navigationJump to search
Line 17: Line 17:
     ETLE --2 git pull -->LC
     ETLE --2 git pull -->LC
     ETLE --3 dbt--> JDBC
     ETLE --3 dbt--> JDBC
     JDBC --> Spark
     JDBC --4--> Spark
}}
}}
| style="vertical-align:top;" |
| style="vertical-align:top;" |

Revision as of 17:55, 25 November 2025

Data Lake Knowledge Center

Overview

Diagram Description
  • 1 Airflow Scheduler trigger DAG (DAG is generated based on metadata)
    • The ETL job is a task within an airflow DAG
  • 2 ETL executor pulls code from a repo into loacl disk
  • 3 ETL executor uses dbt library to submit job to Apache Spark via JDBC interface (e.g. via Thrift Server)
  • 4 Thrift Server take the SQL and pass it to Apache Spark to execute