PyNLPVIS – A low-code library for Natural language processing and data visualization | Introduction to Digital Humanities Fall 2022

Abstract

Python is a great programming language and one of the fastest-growing programming languages in the world due to its versatility. Python can be used for mobile app development, web development, and data analytics because of supporting a vast number of modules and packages. Although Python is known for its user-friendliness and readability, it inherits all the hardships that involves in learning any programming language. This is especially true for people from various backgrounds, such as Digital Humanities scholars. The goal of this project proposal is to change that. We propose PyNLPVIS which is a low-code library for Natural language processing and data visualization. PyNLPVIS will be a series of cohesive, reusable functions that will streamline text analysis and data visualization. The library will be hosted as a PyPi project so that, it can be easily accessible not only to humanists but also to the masses.

Methods

The project will be written in python as a module and will be shared as a PyPi project to enhance accessibility. Below is the simplest code for parts of speech tagging with NLTK barring that the textual data is already clean.

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from collections import Counter
 
text = "It is necessary for any Data Scientist to understand Natural Language Processing"
text = text.lower()
tokens = nltk.word_tokenize(text)
pos = nltk.pos_tag(tokens)
the_count = Counter(tag for _, tag in pos)
print(the_count)

PyNLPVis will create an abstraction by taking care of data processing from Line of Code(loc) text = text.lower() till the end in the background. To accomplish this the library will be equipped with a sample function like the following.

def pos_tagger(text):
 text = text.lower()
 tokens = nltk.word_tokenize(text)
 pos = nltk.pos_tag(tokens)
 the_count = Counter(tag for _, tag in pos)
 print(the_count)

Once we set up the library as a PyPi project, the user will simply integrate the library in their python script with the following loc.

import pos_tagger from PyNLPVis

Then the user needs to simply pass the text data using the following command to get a result from POS tagging.

pos_tagger(/*User defined text*/)

The low code nature of PyNLPVis has reduced the necessity of writing several locs to just a couple of locs for this instance.

Also visualizing the output of the pos_tagger for instance as a Word Cloud, the user has to simply do the following

import word_cloud from PyNLPVis
pos = pos_tagger(/*User defined text*/)
word_cloud(pos)

Work Plan

This entry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

Introduction to Digital Humanities Fall 2022

DHUM 70000 CUNY Graduate Center

PyNLPVIS – A low-code library for Natural language processing and data visualization

Abstract

Methods

Need help with the Commons?