Dl/Overview: Difference between revisions
From stonehomewiki
Jump to navigationJump to search
Stonezhong (talk | contribs) (→ETL) |
Stonezhong (talk | contribs) (→ETL) |
||
| Line 14: | Line 14: | ||
User[User<Data Engineer>] | User[User<Data Engineer>] | ||
Spark[Apache Spark] | Spark[Apache Spark] | ||
Scheduler -- | Scheduler --2: trigger--> ETLE | ||
ETLE -- | ETLE --3: git pull -->LC | ||
ETLE -- | ETLE --4: dbt--> JDBC | ||
JDBC -- | JDBC --5:--> Spark | ||
ER --> LC | ER --> LC | ||
User -- | User --1: git push-->ER | ||
}} | }} | ||
<br /> | <br /> | ||
* 1 Airflow Scheduler trigger DAG (DAG is generated based on metadata) | * 1: User pushes ETL code into ETL Code Repo | ||
* 2: Airflow Scheduler trigger DAG (DAG is generated based on metadata) | |||
** The ETL job is a task within an airflow DAG | ** The ETL job is a task within an airflow DAG | ||
* | * 3: ETL executor pulls code from ETL Code Repo into loacl disk | ||
* | * 4: ETL executor uses dbt library to submit job to Apache Spark via JDBC interface (e.g. via Thrift Server) | ||
* | * 5: Thrift Server take the SQL and pass it to Apache Spark to execute | ||
</div> | </div> | ||
</div> | </div> | ||
<p></p> | <p></p> | ||