Home : Resources : Data Literacy : 1. Knowing Data

Data Literacy 

1. Knowing Data

Data Quantity 

You can be confident in the results of an analysis only if there is sufficient data; for example, could you determine whether a baseball player is a good hitter after watching only a single at-bat? What if you knew whether the player swung at each pitch, but had no information on the result?
  • Are there enough observations?
  • Were any data removed? If so, why?
  • Do the data contain the right fields?

Data Quality 

There is an old mantra in data analysis: garbage in, garbage out. Even the best analytic tools cannot produce good results from bad data. Having a significant quantity of data is not very meaningful if the data are not accurate, complete, and consistent.

Could you determine whether a baseball player is a good hitter if the data on half of their at-bats were corrupted? What if the capital letters “O” and “I” were transcribed as the numerals “0” and “1”, respectively? A computer views “OUT” and “0UT” as being distinct strings!
  • What are the possible sources of error?
  • What is the error rate?
  • Are the data authoritative, i.e. are they used as the basis for other analyses and decision-making?

Data Sanity 

  • Are the data appropriate for a particular analysis?
  • Are there authoritative rules to identify “good” and “bad” data?
  • Are the hardware and software systems that acquire, process, and store the data well-understood