Dl/Overview: Difference between revisions

From stonehomewiki
Jump to navigationJump to search
Line 2: Line 2:


= ETL =
= ETL =
<div class="toccolours mw-collapsible mw-collapsed expandable">
<div class="mw-collapsible-preview">ETL Flow</div>
<div class="mw-collapsible-content">
{{#mermaid:
graph TD
    Scheduler[Apache Airflow/Scheduler]
    ETLE[ETL Executor&lt;Airflow Task&gt;]
    LC[Local ETL Code]
    ER[ETL Code Repo]
    JDBC[JDBC&lt;Thrift Server&gt;]
    User[User&lt;Data Engineer&gt;]
    Spark[Apache Spark]
    Scheduler --2: trigger--> ETLE
    ETLE --3: git pull -->LC
    ETLE --4: dbt--> JDBC
    JDBC --5:--> Spark
    ER --> LC
    User --1: git push-->ER
}}
<br />
* 1: User pushes ETL code into ETL Code Repo
* 2: Airflow Scheduler trigger DAG (DAG is generated based on metadata)
** The ETL job is a task within an airflow DAG
* 3: ETL executor pulls code from ETL Code Repo into loacl disk
* 4: ETL executor uses dbt library to submit job to Apache Spark via JDBC interface (e.g. via Thrift Server)
* 5: Thrift Server take the SQL and pass it to Apache Spark to execute
</div>
</div>
<p></p>
= BI Connection =
<div class="toccolours mw-collapsible mw-collapsed expandable">
<div class="toccolours mw-collapsible mw-collapsed expandable">
<div class="mw-collapsible-preview">ETL Flow</div>
<div class="mw-collapsible-preview">ETL Flow</div>

Revision as of 18:43, 25 November 2025

Data Lake Knowledge Center

ETL

BI Connection