Dl/glossary: Difference between revisions

From stonehomewiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 34: Line 34:
<div class="mw-collapsible-content">
<div class="mw-collapsible-content">
A URI that uniquely identifies an asset, for example:
A URI that uniquely identifies an asset, for example:
* <code>s3://bucket_name/foo.parquet</code>      -- represent a parquet file stored in AWS S3
* <code>asset://s3/bucket_name/foo.parquet</code>      -- represent a parquet file stored in AWS S3
* <code>mysql://myserver/mydb/foo</code>          -- represent a table in MySQL, server name is myserver, dbname is mydb, table name is foo
* <code>asset://mysql/myserver/mydb/foo</code>          -- represent a table in MySQL, server name is myserver, dbname is mydb, table name is foo
* <code>mysql://myserver/mydb/foo/?batch_id=1&</code> -- represent a table in MySQL, server name is myserver, dbname is mydb, table name is foo, with a filter, which batch_id column need to match 1
* <code>asset://mysql/myserver/mydb/foo/?batch_id=1&</code> -- represent a table in MySQL, server name is myserver, dbname is mydb, table name is foo, with a filter, which batch_id column need to match 1
</div>
</div>
</div>
</div>
Line 45: Line 45:
<div class="mw-collapsible-preview">Dataset</div>
<div class="mw-collapsible-preview">Dataset</div>
<div class="mw-collapsible-content">
<div class="mw-collapsible-content">
It is a serious of dataframes that has the common schema.
It is a set of dataframes that has the common schema.


* dataset name is not unique, but name + major_version + minor_version is unique
* dataset name is not unique, but name + major_version + minor_version is unique

Revision as of 23:19, 5 March 2023

Data Lake Knowledge Center