Dl/Overview: Difference between revisions

From stonehomewiki
Jump to navigationJump to search
(Created page with "<p>Data Lake Knowledge Center</p> = Overview = {{#mermaid: graph TD Scheduler[Apache Scheduler] ETLE[ETL Executor] CR[Code Repo] Spark[Apache Spark] }}")
 
Line 2: Line 2:


= Overview =
= Overview =
{{#mermaid:
{| class="wikitable grid mono section"
|-
! Diagram
! Description
|-
|{{#mermaid:
graph TD
graph TD
     Scheduler[Apache Scheduler]
     Scheduler[Apache Airflow/Scheduler]
     ETLE[ETL Executor]
     ETLE[ETL Executor&lt;Airflow Task&gt;]
     CR[Code Repo]
     LC[Local Code]
    JDBC[JDBC&lt;Thrift Server&gt;]
     Spark[Apache Spark]
     Spark[Apache Spark]
      
     Scheduler --1 trigger--> ETLE
    ETLE --2 git pull -->LC
    ETLE --3 dbt--> JDBC
    JDBC --> Spark
}}
}}
| style="vertical-align:top;" |
* 1 Airflow Scheduler trigger DAG (DAG is generated based on metadata)
** The ETL job is a task within an airflow DAG
* 2 ETL executor pulls code from a repo into loacl disk
* 3 ETL executor uses dbt library to submit job to Apache Spark via JDBC interface (e.g. via Thrift Server)
* 4 Thrift Server take the SQL and pass it to Apache Spark to execute
|}

Revision as of 17:53, 25 November 2025

Data Lake Knowledge Center

Overview

Diagram Description
  • 1 Airflow Scheduler trigger DAG (DAG is generated based on metadata)
    • The ETL job is a task within an airflow DAG
  • 2 ETL executor pulls code from a repo into loacl disk
  • 3 ETL executor uses dbt library to submit job to Apache Spark via JDBC interface (e.g. via Thrift Server)
  • 4 Thrift Server take the SQL and pass it to Apache Spark to execute