|
|
| (One intermediate revision by the same user not shown) |
| Line 32: |
Line 32: |
|
| |
|
| = Considerations = | | = Considerations = |
| | <div class="toccolours mw-collapsible mw-collapsed expandable"> |
| | <div class="mw-collapsible-preview">url should have enough information to locate the data</div> |
| | <div class="mw-collapsible-content"> |
| | url field should have enough information for user to locate the data. For example, <code>"s3://mubucket/stock_quotes/2023-08-20.jsonl"</code> is a good url if your datalake only lives one AWS region. If your datalake crosses multiple AWS regions, you should put region ID in the url so you know from which region the bucket belongs to. |
| | </div> |
| | </div> |
| | <p></p> |
Latest revision as of 04:32, 23 August 2023
Data Lake Knowledge Center | Models
Introduction
Definition
A DataLocation object captures the location of the data and the format of the data.
Fields
id: UUID
Primary key
url: str
Specifies the location of the data.
For example: "s3://mubucket/stock_quotes/2023-08-20.jsonl" is a valid url, it represent a data object located in AWS S3, bucket name is mubucket, object key is stock_quotes/2023-08-20.jsonl
format: str
A string represent the format of the data, for example, "JSONL", "CSV", "PARQUET", etc.
Examples:
{
id: "617b2e86-9698-4b99-8956-57ce99d8de39",
url: "s3://mubucket/stock_quotes/2023-08-20.jsonl"
format: "JSONL"
}
Considerations
url should have enough information to locate the data
url field should have enough information for user to locate the data. For example, "s3://mubucket/stock_quotes/2023-08-20.jsonl" is a good url if your datalake only lives one AWS region. If your datalake crosses multiple AWS regions, you should put region ID in the url so you know from which region the bucket belongs to.