Workdocumentation 2021-03-30
Red Links and Data Fixations
Broken redirects
- The list of all broken redirects on openresearch can be found here: https://www.openresearch.org/mediawiki/index.php?title=Special:BrokenRedirects&limit=500&offset=0
Broken file links
- List of all broken links which contain a file path but the file does not exist can be found here: https://www.openresearch.org/wiki/Category:Pages_with_broken_file_links
Broken properties and errors
- The property Property:Has improper value for stores all invalid values.
Ordinals
- For the Fix for ordinals the following approaches can be used :
- This approach finds and edits all events with improper ordinals fixed:
wikiedit -t wikiId -q "[[Has improper value for::Ordinal]]" --search "(\|Ordinal=[0-9]+)(?:st|nd|rd|th)\b" --replace "\1"
- A code snippet can be used coupled with wikibackup and bash tools for specific editing of pages: Code Snippet
- Pipeline usage:
grep Ordinal /path/to/backup -l -r | python ordinal_to_cardinal.py -stdin -d '../dictionary.yaml' -ro -f | wikirestore -t ormk -stdinp -ui
Improper Null values for Has person
- Has person was using "some person" as a null value. There was incorrect usage where in the free text events would use some person while the Wikison Format info would contain the person name.
- First way of doing this is to remove free text altogether. A code snippet was used coupled with bash utility grep. Usage:
grep 'some person' -r '/path/to/backup' -l | python scripts/Data_Fixes.py -stdin -ro -rmf | wikirestore -t ormk -stdinp -ui
Output result: ICFHR 2020
- Second way to do this is to only remove the 'some person' entry from the wiki free text. Python snippet is used with bash utility grep. Usage:
grep 'some person' -r '/path/to/backup' -l | python scripts/Data_Fixes.py -stdin -ro -rdf | wikirestore -t ormk -stdinp -ui
output result: ACCV_2020
Dates
- End date or dates in general are placed with strings.
- Decision to remove the field all together or fix them?
- For fixing manual intervention is needed.
- For removing a field all together a small code snippet can do the trick
Acceptance Rate Issue
- Statistics for the missing values for Submitted papers:
- With bash utility these can be found
- Number of pages that have the field "Submitted papers" : 1716
- Number of pages that have the field "Accepted papers" : 1965
- With a small python code snippet the following can be found:
- Number of pages that have the field "Submitted papers" but no field of "Accepted papers" : Approximately 63
- Number of pages that have the field "Accepted papers" but no field of "Submitted papers" : Approximately 302