Dl/Overview: Difference between revisions

From stonehomewiki
Jump to navigationJump to search
Line 14: Line 14:
     User[User<Data Engineer>]
     User[User<Data Engineer>]
     Spark[Apache Spark]
     Spark[Apache Spark]
     Scheduler --1 trigger--> ETLE
     Scheduler --2: trigger--> ETLE
     ETLE --2 git pull -->LC
     ETLE --3: git pull -->LC
     ETLE --3 dbt--> JDBC
     ETLE --4: dbt--> JDBC
     JDBC --4--> Spark
     JDBC --5:--> Spark
     ER --> LC
     ER --> LC
     User --5 git push-->ER
     User --1: git push-->ER
}}
}}
<br />
<br />


* 1 Airflow Scheduler trigger DAG (DAG is generated based on metadata)
* 1: User pushes ETL code into ETL Code Repo
* 2: Airflow Scheduler trigger DAG (DAG is generated based on metadata)
** The ETL job is a task within an airflow DAG
** The ETL job is a task within an airflow DAG
* 2 ETL executor pulls code from a repo into loacl disk
* 3: ETL executor pulls code from ETL Code Repo into loacl disk
* 3 ETL executor uses dbt library to submit job to Apache Spark via JDBC interface (e.g. via Thrift Server)
* 4: ETL executor uses dbt library to submit job to Apache Spark via JDBC interface (e.g. via Thrift Server)
* 4 Thrift Server take the SQL and pass it to Apache Spark to execute
* 5: Thrift Server take the SQL and pass it to Apache Spark to execute
</div>
</div>
</div>
</div>
<p></p>
<p></p>

Revision as of 18:05, 25 November 2025

Data Lake Knowledge Center

ETL