-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MVP of [1603.45.16]
/"Ontologia"."United Nations"."P"/@eng-Latn
#2
Comments
While the https://drive.google.com/file/d/1jRshR0Mywd_w8r6W2njUFWv7oDVLgKQi/view?usp=sharing is not a new version (was from around 8 months ago, so things are likely be more consistent) sheet names already are not as consistent (so is not possible to just pipepile zip output as some files would replace other. However, they is likely they still have patterns. Since each dataset metadata can (and often would) be upgraded over time, then ingestion of more centralized version would need to be able to normalize more than one format at same time, like "Admin" "Adm" "adm", and the combinations with country prefix. I will copy here the preview, since the https://github.com/EticaAI/ndata is likely to have history wiped several times to save space.
|
I think a pure POSIX-shell function to make quick-n-dirty conversion from these headings to HXL could could, without need to more complex features. meta-de-caput.uniq.txt Maybe will not need full table of languages to generate the terms. So worst case scenario they can be hardcoded |
An MVP of the HXLated result already exist. NotesI'm not 100% sure about the HXL hashtags for raw headers |
$ wc -l 1603/45/16/999/1603_45_16_1_15828996298662.hxl.csv
432262 1603/45/16/999/1603_45_16_1_15828996298662.hxl.csv
m$ ls -lha 999999/1603/45/16/hxl/ | wc -l
408
$ ls -lha 999999/1603/45/16/hxl/*_0* | wc -l
114
$ ls -lha 999999/1603/45/16/hxl/* | wc -l
404
$ ls -lha 999999/1603/45/16/hxl/*0* | wc -l
114
$ ls -lha 999999/1603/45/16/hxl/*1* | wc -l
114
$ ls -lha 999999/1603/45/16/hxl/*2* | wc -l
110
$ ls -lha 999999/1603/45/16/hxl/*3* | wc -l
53
$ ls -lha 999999/1603/45/16/hxl/*4* | wc -l
13
$ ls -lha 999999/1603/45/16/hxl/*5* | wc -l
0
Trivia: do exist at least 432.262 published Place codes worldwide. (from 0 to 4, not attested admin level 5 and 6). The minimum an non-compressed CSV with every code would be around 13MB. Also, how they are flattened would make difference on the space. But the good thing is we're far lower than GitHub ideal maximum of 50mb (hardlimit is 100MB) Conventions on how to use UN M49 private namespaces as reference for compiled resultsOn this topic
For aggregated datasets related to world places, I believe we should start using private namespaces for it and document the logic. This saves a lot of upfront drama with scripting. On logic about "population statistics"I think aggregate population statistics is a different issue, but the sole major reason for the classical 70's UN m49 (https://unstats.un.org/unsd/publication/SeriesM/Series_M49_(1970)_en-fr.pdf) was this type of statistics. Wikipedia says this is not more used, but makes total sense for us here. However I think population statistics is not a priority. But I know there is more than one datasets (and they are more automated) so at least at adm0 (country level) this would not be hard to automate. But we're already going for more detailed data, at least for countries such as Brazil which we may have additional sources. Other priorities would be start mapping the P-Codes with Wikidata. Then things are going to be relevant. |
…strative Level 0 (country/terrotiries)
…DATA / HXL_ATTRIBUTES_AD_WIKIDATA mappings draft
…lementation (based on dictionary) of COD-AB like data to RDF+HXL
…rk the original CSV/HXL/HXLTM exporter also save upper levels, so it make easier for make RDF relationship from the most detailed administrative region availible
…atio_identitas_numerodinatio() started
…ries, local only (time: 30m28,338s); before RDF relations
…coded list will make it work for common cases at sort term
…9999_54872.py --objectivum-formato=_temp_hxl_meta_in_json
Rationale behind
|
Humm... we will need some documented way to
the current drafts already work for HXL / HXLTM, but without this change, it would need extra hardcoded logic. Another issue is that the current use of Edit:
|
…local numeric identifiers (brute force creation of IDs based on P-Codes may fail since some places have letters in the middle of P-Codes)
… local numeric identifiers (brute force creation of IDs based on P-Codes may fail since some places have letters in the middle of P-Codes)
Quick links
[1603:??:1603]
/HXL/; focus on pre-compile replacement maps #5This issue is about minimal viable product of encode the entire public available P-Codes on numerordinatio. The scripts may need to get some cron job or manual upgrade over time, but this issue is mostly about at least have first version.
Replacing ISO 3661-1 alpha 2 with UN M49
P-codes are prefixed with 2 letter codes, which have advantage of deal with leading zeros. So, for P-Codes, this make sense leading letters, which also allow use pure P-Codes as programming variables. However the numerordinatio works, we can go fully numeric.
[1603.45.16]
vs[1603.45.49]
In theory,
[1603.45.16]
could be a more specific version of[1603.45.49]
(https://unstats.un.org/unsd/methodology/m49/) instead of have own base namespace. This may change later.Another point is that depending of how numerordinatio would be done, the codes could have aliases.
Changes
[1603.45.15]
renamed to[1603.45.16]
(US-ASCII alphabet with K makes P as 16, not as 15).The text was updated successfully, but these errors were encountered: