Dl/Best Practices: Difference between revisions

From stonehomewiki
Jump to navigationJump to search
 
(2 intermediate revisions by the same user not shown)
Line 3: Line 3:
= Platform =
= Platform =
<div class="toccolours mw-collapsible mw-collapsed expandable">
<div class="toccolours mw-collapsible mw-collapsed expandable">
<div class="mw-collapsible-preview">Spark</div>
<div class="mw-collapsible-preview">Apache Spark</div>
<div class="mw-collapsible-content">
<div class="mw-collapsible-content">
Apache Spark is a good platform for batch based data processing as well as streaming based data processing. Advantage:
Apache Spark is a good platform for batch based data processing as well as streaming based data processing. Advantage:
Line 9: Line 9:
* Well supported (DataBricks is backing up this product)
* Well supported (DataBricks is backing up this product)
* Well adopted
* Well adopted
* Supported by many cloud providers ([https://aws.amazon.com/emr/ AWS EMR], [https://azure.microsoft.com/en-us/products/hdinsight Azure HDInsight] , [https://cloud.google.com/dataproc GCP Dataproc], oci dataflow)
* Supported by many cloud providers ([https://aws.amazon.com/emr/ AWS EMR], [https://azure.microsoft.com/en-us/products/hdinsight Azure HDInsight] , [https://cloud.google.com/dataproc GCP Dataproc], [https://www.oracle.com/big-data/data-flow/ OCI dataflow])
* In stead of building your own data lake, you can use [https://www.databricks.com/ LakeHouse] provided by databricks, they support AWS, Azure and GCP.
* Instead of building your own data lake, you can use [https://www.databricks.com/ LakeHouse] provided by databricks, they support AWS, Azure and GCP.


</div>
</div>

Latest revision as of 09:36, 9 September 2024

Data Lake Knowledge Center

Platform

Data Ingestion

Data Governance