As a follow-on to last month's post about courses at the British Library, I asked some people who are teaching digital humanists where their goalposts are, i.e., what they think the minimum someone in the humanities should know about working with computers and digital data. The brief responses were interesting:
- Do they understand how to navigate their filesystem from the command line?
- Can they count how often a given word appears across multiple documents?
- Can they count n-grams across multiple documents?
- Can they extract keywords-in-context?
- Can they visualize these counts with bar or line plots?
- Can they use regular expressions?
- Do they know how to find and navigate places like Stack Overflow, and do they know their social norms?
A longer response from Fiona Tweedie at the University Melbourne is worth quoting in full. She first suggested seeing if the person in question could import and export bibliographic data from a management tool like Zotero, then later added:
My thought when I suggested the Zotero task...was...a stab at the sort of question to ask someone at the beginning of the discussion to work out their level and where their needs are, like the tests that language schools give you to work out which class to put you in, or the sorts of questions you ask someone to gauge where to pitch your explanation of your research (You've heard of Romans, right? The Republic? Julius Caesar? Tiberius Gracchus? etc).
I've been talking with the library here at UniMelb about doing an introductory 'data literacy' module... Greg says he thinks that term isn't helpful as it can mean anything, but I mean really helping humanities researchers (and I'm thinking old-fashioned historians here, because that's what I am) to make the leap to 'thinking data'. At the end of this module, I'd hope researchers could all:
Understand that they are working with data (even if a lot of it is secondary literature), recognise clean vs messy data and think actively about how data structures work and how they might structure their data.
Understand how data management and workflows can support their research.
Use an appropriate reference manager.
Be sufficiently familiar with different types of tools available to make an appropriate selection (spreadsheet vs Omeka vs full relational database)
Know where to go for further help (Google, Stack Overflow, Your Librarian may all be valid answers)
I would then like to build out to more sophisticated material — I agree that overcoming fear of the command-line and being able to navigate a file structure and move files around is important. I'm also definitely with you on web-scraping... Key-words in context and topic modelling are great things to be able to do. And everyone loved making graphs when I trailed the NLTK materials, so definitely include some basic visualisation.
There's more detail here, but it's already clear that the content of a "Humanities Carpentry" workshop would be quite different from what we teach right now. There are lots of other things we need to do before we do this, but doing this is definitely on our list.
This post originally appeared in the Software Carpentry blog.