As a data scientist, I have done many NLP projects at work. I mostly work in a Jupyter Notebook python environment and leveraged common NLP libraries like NLKT and Spacy. My usual NLP project mostly consisted of cleaning and structuring data (lemmatization, stop words, n-grams, tokenization, etc.) and running clustering models on them (usually LDA – Latent Dirichlet Allocation).
I have not heard of these tools and am excited to try them out as this is very different than my regular workflow. I have explored a few tools but will share my experience with two in particular below.
Voyant
Back in 2019, I started a personal NLP (Natural Language Processing) project but never finished it. As a quick recap, I was just looking to explore a lyrics data set. I picked Lana del Rey because I had been listening to her frequently back then. I tested gathering lyrics by web scraping and API (with AZ Lyrics and MusicMatch respectively). I was able to query lyrics from MusicMatch’s API after many trials but only to find out the “Free API” version only offers 30% (or the first 30 lines?) of lyrics per song. For this praxis, I was hoping to use this old dataset that I have gathered to explore tools mentioned in the guidelines. Unfortunately, I didn’t save the text anywhere and the code I wrote is outdated so it will require plenty of effort to refactor the code.
In the end, I have decided to use Taylor Swift’s latest album, Midnight (3 am version), instead. I have been listening to this album recently and so am familiar with the lyrics. I ended up just copying and pasting the lyrics from a site manually as it is the most straightforward.
I pasted the lyrics into the web interface and explored the web took quite a bit. I don’t find the output particularly helpful. I believe it is due to both lack of processing (e.g., data cleaning) as well as the nature of this text corpora. Here’s a screenshot of what I am seeing. I was unable to draw any insights. However, I was impressed by how easy it was to just paste in text, and all these features are automatically generated.
Google Books Ngram Viewer
I have never heard of this tool before. From the name “n-gram”, I had the wrong assumption about what this tool does. It appears to be like a google trend product but related to google books content, which I thought is helpful. I have always been interested in gender disparity in many aspects of life, so I explored different keywords. Sharing two comparisons below:
I have to look into those tools. They sound like the exact tools that I need for my other class. Really interesting tools for analysis work.