Dl/Overview

From stonehomewiki
Revision as of 17:53, 25 November 2025 by Stonezhong (talk | contribs) (→‎Overview)
Jump to navigationJump to search

Data Lake Knowledge Center

Overview

Diagram Description
  • 1 Airflow Scheduler trigger DAG (DAG is generated based on metadata)
    • The ETL job is a task within an airflow DAG
  • 2 ETL executor pulls code from a repo into loacl disk
  • 3 ETL executor uses dbt library to submit job to Apache Spark via JDBC interface (e.g. via Thrift Server)
  • 4 Thrift Server take the SQL and pass it to Apache Spark to execute