Workdocumentation 2021-03-30

From OPENRESEARCH mk copy Wiki
Jump to navigation Jump to search

Red Links and Data Fixations

Broken redirects

Broken file links

Broken properties and errors

  • The property Property:Has improper value for stores all invalid values.

Ordinals

  • For the Fix for ordinals the following approaches can be used :
    • This approach finds and edits all events with improper ordinals fixed:
wikiedit -t wikiId -q "[[Has improper value for::Ordinal]]" --search "(\|Ordinal=[0-9]+)(?:st|nd|rd|th)\b" --replace "\1"
  • A code snippet can be used coupled with wikibackup and bash tools for specific editing of pages: Code Snippet
  • Pipeline usage:
grep Ordinal /path/to/backup -l -r | python ordinal_to_cardinal.py -stdin -d '../dictionary.yaml' -ro -f | wikirestore -t ormk -stdinp -ui

Improper Null values for Has person

  • Has person was using "some person" as a null value. There was incorrect usage where in the free text events would use some person while the Wikison Format info would contain the person name.
  1. First way of doing this is to remove free text altogether. A code snippet was used coupled with bash utility grep. Usage:
 
grep 'some person' -r '/path/to/backup' -l | python scripts/Data_Fixes.py -stdin -ro -rmf | wikirestore -t ormk -stdinp -ui

Output result: ICFHR 2020

  1. Second way to do this is to only remove the 'some person' entry from the wiki free text. Python snippet is used with bash utility grep. Usage:
grep 'some person' -r '/path/to/backup' -l | python scripts/Data_Fixes.py -stdin -ro -rdf | wikirestore -t ormk -stdinp -ui

output result: ACCV_2020

  • Usage Statistics:
    • The null field 'some person' has been used 223 times

Dates

  • 153 improper start date entries.
  • 6 improper end date entries.
  • Reference page
  • End date or dates in general are placed with strings.
  • Decision to remove the field all together or fix them?
    • For fixing manual intervention is needed.
    • For removing a field all together a small code snippet can do the trick

Acceptance Rate Issue

  • Statistics for the missing values for Submitted papers:
  • With bash utility these can be found
    • Number of pages that have the field "Submitted papers" : 1716
    • Number of pages that have the field "Accepted papers" : 1965
  • With a small python code snippet the following can be found:
    • Number of pages that have the field "Submitted papers" but no field of "Accepted papers" : Approximately 63
    • Number of pages that have the field "Accepted papers" but no field of "Submitted papers" : Approximately 302