Dl/Overview: Difference between revisions
From stonehomewiki
Jump to navigationJump to search
Stonezhong (talk | contribs) (→ETL) |
Stonezhong (talk | contribs) (→ETL) |
||
| Line 2: | Line 2: | ||
= ETL = | = ETL = | ||
<div class="toccolours mw-collapsible mw-collapsed expandable"> | |||
<div class="mw-collapsible-preview">ETL Flow</div> | |||
<div class="mw-collapsible-content"> | |||
{{#mermaid: | |||
graph TD | |||
Scheduler[Apache Airflow/Scheduler] | |||
ETLE[ETL Executor<Airflow Task>] | |||
LC[Local ETL Code] | |||
ER[ETL Code Repo] | |||
JDBC[JDBC<Thrift Server>] | |||
User[User<Data Engineer>] | |||
Spark[Apache Spark] | |||
Scheduler --2: trigger--> ETLE | |||
ETLE --3: git pull -->LC | |||
ETLE --4: dbt--> JDBC | |||
JDBC --5:--> Spark | |||
ER --> LC | |||
User --1: git push-->ER | |||
}} | |||
<br /> | |||
* 1: User pushes ETL code into ETL Code Repo | |||
* 2: Airflow Scheduler trigger DAG (DAG is generated based on metadata) | |||
** The ETL job is a task within an airflow DAG | |||
* 3: ETL executor pulls code from ETL Code Repo into loacl disk | |||
* 4: ETL executor uses dbt library to submit job to Apache Spark via JDBC interface (e.g. via Thrift Server) | |||
* 5: Thrift Server take the SQL and pass it to Apache Spark to execute | |||
</div> | |||
</div> | |||
<p></p> | |||
= BI Connection = | |||
<div class="toccolours mw-collapsible mw-collapsed expandable"> | <div class="toccolours mw-collapsible mw-collapsed expandable"> | ||
<div class="mw-collapsible-preview">ETL Flow</div> | <div class="mw-collapsible-preview">ETL Flow</div> | ||