Dl/Models: Difference between revisions

From stonehomewiki
Jump to navigationJump to search
Tag: Reverted
Tag: Reverted
Line 25: Line 25:
}}
}}
|  
|  
* Bronze Tier:
<big><b>Bronze Tier</b></big>:<br /><br />
** raw data
<b>The purpose for bronze tier is to store data downloaded from external world into data lake so we can use all sort of tools inside data lake to further process it</b>
** no uniformed format, could be csv, JSON, binary, anything
* raw data
** could even be unstructured
* no uniformed format, could be csv, JSON, AVRO, parquet, binary, anything
** no data quality assurance
* could even be unstructured
** Usually it is a place for data ingestion application to dump raw data downloaded from external world.
* no data quality assurance
* Silver Tier:
<hr /><br />
** X
 
<big><b>Silver Tier</b></big>:<br /><br />
<b>The purpose for bronze tier is to allow data ingestion application to sanitize data, verify the quality of the data</b>
 
* Data quality is assurred
* Data may not be normalized. One table may use UTC for a timestamp column while aother table may use timestamp without timezone. No stadnardlization for column name.
* Data format is uniformed, usually it is stored as a format that is best fits the further ETL process, for example parquet.
 
|}
|}



Revision as of 03:09, 25 November 2025

Introduction

Overview

Diagram Description

Bronze Tier:

The purpose for bronze tier is to store data downloaded from external world into data lake so we can use all sort of tools inside data lake to further process it

  • raw data
  • no uniformed format, could be csv, JSON, AVRO, parquet, binary, anything
  • could even be unstructured
  • no data quality assurance


Silver Tier:

The purpose for bronze tier is to allow data ingestion application to sanitize data, verify the quality of the data

  • Data quality is assurred
  • Data may not be normalized. One table may use UTC for a timestamp column while aother table may use timestamp without timezone. No stadnardlization for column name.
  • Data format is uniformed, usually it is stored as a format that is best fits the further ETL process, for example parquet.

Models

Dataset

Data Unit

Data Location

Data Type