Dl/DataLocation: Difference between revisions
From stonehomewiki
Jump to navigationJump to search
Stonezhong (talk | contribs) |
Stonezhong (talk | contribs) |
||
| (6 intermediate revisions by the same user not shown) | |||
| Line 5: | Line 5: | ||
<div class="mw-collapsible-preview">Definition</div> | <div class="mw-collapsible-preview">Definition</div> | ||
<div class="mw-collapsible-content"> | <div class="mw-collapsible-content"> | ||
< | <big><b>A DataLocation object captures the location of the data and the format of the data.</b></big> | ||
< | <b>Fields</b> | ||
<pre><nowiki> | <pre><nowiki> | ||
id: UUID | id: UUID | ||
| Line 17: | Line 17: | ||
A string represent the format of the data, for example, "JSONL", "CSV", "PARQUET", etc. | A string represent the format of the data, for example, "JSONL", "CSV", "PARQUET", etc. | ||
</nowiki></pre> | </nowiki></pre> | ||
<b>Examples</b>: | |||
<pre><nowiki> | |||
{ | |||
id: "617b2e86-9698-4b99-8956-57ce99d8de39", | |||
url: "s3://mubucket/stock_quotes/2023-08-20.jsonl" | |||
format: "JSONL" | |||
} | |||
</nowiki></pre> | |||
</div> | |||
</div> | |||
<p></p> | |||
= Considerations = | |||
<div class="toccolours mw-collapsible mw-collapsed expandable"> | |||
<div class="mw-collapsible-preview">url should have enough information to locate the data</div> | |||
<div class="mw-collapsible-content"> | |||
url field should have enough information for user to locate the data. For example, <code>"s3://mubucket/stock_quotes/2023-08-20.jsonl"</code> is a good url if your datalake only lives one AWS region. If your datalake crosses multiple AWS regions, you should put region ID in the url so you know from which region the bucket belongs to. | |||
</div> | </div> | ||
</div> | </div> | ||
<p></p> | <p></p> | ||