Dl/Overview: Difference between revisions
From stonehomewiki
Jump to navigationJump to search
Stonezhong (talk | contribs) (Created page with "<p>Data Lake Knowledge Center</p> = Overview = {{#mermaid: graph TD Scheduler[Apache Scheduler] ETLE[ETL Executor] CR[Code Repo] Spark[Apache Spark] }}") |
Stonezhong (talk | contribs) |
||
| Line 2: | Line 2: | ||
= Overview = | = Overview = | ||
{{#mermaid: | {| class="wikitable grid mono section" | ||
|- | |||
! Diagram | |||
! Description | |||
|- | |||
|{{#mermaid: | |||
graph TD | graph TD | ||
Scheduler[Apache Scheduler] | Scheduler[Apache Airflow/Scheduler] | ||
ETLE[ETL Executor] | ETLE[ETL Executor<Airflow Task>] | ||
LC[Local Code] | |||
JDBC[JDBC<Thrift Server>] | |||
Spark[Apache Spark] | Spark[Apache Spark] | ||
Scheduler --1 trigger--> ETLE | |||
ETLE --2 git pull -->LC | |||
ETLE --3 dbt--> JDBC | |||
JDBC --> Spark | |||
}} | }} | ||
| style="vertical-align:top;" | | |||
* 1 Airflow Scheduler trigger DAG (DAG is generated based on metadata) | |||
** The ETL job is a task within an airflow DAG | |||
* 2 ETL executor pulls code from a repo into loacl disk | |||
* 3 ETL executor uses dbt library to submit job to Apache Spark via JDBC interface (e.g. via Thrift Server) | |||
* 4 Thrift Server take the SQL and pass it to Apache Spark to execute | |||
|} | |||
Revision as of 17:53, 25 November 2025
Overview
| Diagram | Description |
|---|---|
|