Dl/Best Practices: Difference between revisions

From stonehomewiki
Jump to navigationJump to search
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
<p> [[dl/home|Data Lake Knowledge Center]] </p>
<p> [[dl/home|Data Lake Knowledge Center]] </p>
= Platform =
<div class="toccolours mw-collapsible mw-collapsed expandable">
<div class="mw-collapsible-preview">Apache Spark</div>
<div class="mw-collapsible-content">
Apache Spark is a good platform for batch based data processing as well as streaming based data processing. Advantage:
* Scalable
* Well supported (DataBricks is backing up this product)
* Well adopted
* Supported by many cloud providers ([https://aws.amazon.com/emr/ AWS EMR], [https://azure.microsoft.com/en-us/products/hdinsight Azure HDInsight] , [https://cloud.google.com/dataproc GCP Dataproc], [https://www.oracle.com/big-data/data-flow/ OCI dataflow])
* Instead of building your own data lake, you can use [https://www.databricks.com/ LakeHouse] provided by databricks, they support AWS, Azure and GCP.
</div>
</div>
<p></p>


= Data Ingestion =
= Data Ingestion =

Latest revision as of 09:36, 9 September 2024

Data Lake Knowledge Center

Platform

Data Ingestion

Data Governance