I’m working on a project for the AP syllabus and am trying to create vocab frequency lists. I knew going in that I needed a spreadsheet to do it, but I wasn’t sure how to go from text file to spreadsheet without a lot of manual labor on my part, i.e. copying and pasting every non-first-word into the same column. No thanks.
Excel has an import function. I figured that was the place to start. And I was correct. But not yet successful. I need a .txt file as opposed to my .docx file. But that’s doable. Save as and off we go. So here are the initial (relatively straightforward steps).
- Copy the desired text. I took mine from thelatinlibrary.com.
- Paste into a Word (or other word processor) document.
- Save as a .txt file (using the Save As… option).
This doesn’t do much, however, beyond giving you an import-able file. If you import the .txt file as is, the process looks like this (towards the end, I skipped a few screen shots in which nothing changes):
The text above is not particularly useful. There is a lot of that manual labor I mentioned above to get it into a usable form. The goal is to get each word into the left column. But how to accomplish that goal? Back to the .txt file and Word.
The Replace function (the subset of the Find function) can find and replace-with non character features of the text. So ‘Find’ a space and ‘Replace’ with a <return>.
The image above is what you end up with. And this should be workable with Excel. The steps remain the same (as above) but the result is much more helpful:
At this point you select all, sort the data alphabetically, and end up here:
There is still some clean up to do. All of those words with quotes before them or brackets or parentheses need to be edited, but that is a relatively low number. Once those edits are done, resort, and you have your alphabetical list.
At that point, I manually count. There are too many variables in forms to let Excel do it. In the column next to the word I put the number of occurrences and in the column next to that the vocab header word. And I put those in each row / occurrence of the word rather than just the top or the bottom occurrence, so that at the end I can sort by that middle column (the number) and have all occurrences of the word next to each other. If nothing else, this is important for ambiguous forms, e.g. adeo: with all of the forms of adeo together, I can determine whether it is the verb or the adverb. I also, when I find a form of a word elsewhere in the list, e.g. fero vs. tuli, adjust both occurrences, so that if I have 3 in the fs and 2 in the ts, I make them both 5, again so that the final list will sort correctly.
And with that you can generate, with some work, but considerably less than otherwise, workable vocab frequency lists for large blocks of text.