Humanities and the Web Post-Workshop Informational!

I recently had the opportunity to attend the Internet Archive’s workshop on web archiving in Los Angeles. Firstly, before I get into a rundown of what we learned, I just wanna say it was AWESOME. I met some great people there, including fellow DHers! This was their first workshop, and there will be more forthcoming, so don’t be sad if you missed this one! Please feel free to ask me questions in the comments or in an email if you want more information or clarification on any of these points. I’d recommend reading this post chronologically, as it goes in order from basics to advanced topics. Now onto the good stuff…

What is the Internet Archive?

  • Contains primarily, though not exclusively, 20th-21st century records of human interaction across all possible mediums (newspapers to fiction to gov. info to art etc).
  • Constant change and capture.
  • Every country in the world included.
  • Fit for both macro and micro level research questions.
  • Fit to archive both hundreds or millions of documents.
  • Known for the Wayback Machine, which takes snapshots of websites at different points in time, shows you those snapshots, as well as information about when snapshots are taken.

What is a web archive?

  • Web archives are a collection of archived URLs that contains as much original web content as possible while documenting the change over time and attempting to preserve the same experience a user would’ve had of the site on the day it was archived.

Challenges of web archiving

  • Trying to hit the balance between access to billions of bits of information and actual usability of that information.
  • Content is relational and self-describing.
  • Difficult to subset relevant collections, storing and computing all of it.
  • So many methods and tools to choose from.

Glossary

  • Crawler – software that collects data from the web.
  • Seed – a unique item within the archive.
  • Seed URL – the starting point/access point for the crawler.
  • Document, here meaning any file with a unique URL.
  • Scope – how much stuff the crawler will collect.
  • WARC—file type for downloaded archived websites.

Examples of steps to archive from a project on COVID response in the Niagara Falls region

  • Close reading with Solrwayback – searchable, individual items examinable in the collection.
  • Distant reading with Google Colab – sentiment analysis, summary statistics, data visualization.
  • Data subsetting with ARCH – full-text dataset extraction from the Internet Archive’s collections.
  • As an outcome, helped the City of Niagara Falls formulate a better FAQ for common questions they weren’t answering.

Other methods

  • Web scraping – creating a program that takes data from websites directly.
  • Topic modeling – assess recurring concepts at scale (understanding word strings together to create a topic).
  • Network analysis – computationally assessing URL linking patterns to determine relationships between websites.
  • Image visualization – extracting thousands of images and grouping them by features.

Web archiving tools

  • Conifer (Rhizome)
  • Webrecorder
  • DocNow
  • Web Curator Tool
  • NetArchive suite
  • HTTrack
  • Wayback Machine – access tool for viewing pages, surf web as it was.
  • Archive-It
  • WARC – ISO standard for storing web archives.
  • Heritrix – web crawler to capture web pages and creates WARC files.
  • Brozzler – web crawler plus browser-based recording technology.
  • ElasticSearch & SOLR – full-text search indexing & metadata search engine software.            

Intro to Web Archiving

  • The average web page only lasts ~90-100 days before changing, moving, or disappearing.
  • Often used to document subject areas or events; capture and preserve web history as mandated; taking one-time snapshots; and supporting research use.

Particular challenges

  • Social media is always changing policies, UI, and content.
  • Dynamic content, stuff that changes a lot.
  • Databases and forms that requires user interaction, alternatives include sitemaps or direct links to content.
  • Password protected and paywalled content.
  • Archive-it can only crawl the public web, unless you have your own credentials.
  • Some sites, like Facebook, explicitly block crawlers. Instagram blocks them but has workarounds.

How to Use the Internet Archive (It’s SO EASY)

  • Browse to web.archive.org/save – enter URL of the site you want to archive, creates an instance. Boom!
  • You can also go to: archive-it.org – create a collection (of sites), add seeds (URLs).
    • Two types of seeds: with the end / (backslash) and without the end backslash. Without adds all subdomains- eg, if I did my Commons blog noveldrawl.commons.gc.cuny.edu, it’ll give me ALL the commons blogs- everything before the ‘.commons’. If I do noveldrawl.commons.gc.cuny.edu/, it’ll give me just all the stuff on my blog AFTER the slash, like noveldrawl.commons.gc.cuny.edu/coursework.

I archived the website data… now what do I do with it?!… Some Tools to use with your WARC files:

  • Palladio – create virtual galleries
  • Voyant – explore text links
  • RawGraphs – create graphs

ARCH (Archives Research Compute Hub)

  • ARCH is not publicly available until Q1 2023; workshop participants are being given beta access and can publish experiment results using it.
  • Currently can only use existing Archive-It collections, however after release user-uploaded collections will be supported.
  • It uses existing collections in Archive-It, which you do need a membership to use.
  • Non-profit owned, and the internet archive is decentralized and not limited to a government or corporate tool.
  • Supports computational analysis of collections, eliminating the need for the technical knowledge to analyze sites, and allows for analysis of complex collections on a large scale.
  • Integrates with the Internet Archive, and has the same interface as Archive-It.
  • Can extract domain frequency (relationship between websites), plain text, text file info, audio files, images, pdfs, ppts. It can also create graphs of these relationships in browser. There’s even more it can do than this, if you need it, it can probably do it. All data is downloadable, which can be previewed before download.

Observations

  • I noticed the majority of everyone present had faced some sort of cultural erasure, threatened or realized, modern or archaic, that has brought them to their interest in archiving.
  • From experience using these tools, I’d say Wayback is great if you need to just archive one site, perhaps for personal use, whereas Archive-It is great if you have many sites in a particular research area that you’re trying to archive and keep all in one place.

Links of interest

  • https://archive-it.org/collections/11913 – Schomburg Center for Research in Black Culture, Black Collecting, and Literary Initiatives (67 GB, 23+ websites since March 2019; contains blog posts, articles, bookstore lists, etc)
  • https://archive-it.org/collections/19848 – Ivy Plus Libraries Confederation, LGBTQ+ Communities of the Former Soviet Union & Eastern Europe (30 GB, 70+ websites, since Aug ’22; contains news, events, issues, etc)

Further Resources

Connected Pedagogy Praxis Assignment Blog

I left two annotations asking ask students questions related to the concept of “industry,” social media celebrities, and self-presentations online in the four-dimensional world. The two annotations are steps in a process for them to construct a theoretical framework of self and performance in the digital age.

What I annotated 1:

I asked the question:

How do you understand the “industry” here? What’s your thought on the promotion and presentation of self by social media celebrities? Do you agree that the authenticity of self must diminish if one enters this “industry” in four dimensions?

I aim to encourage students to find a connection between their own lives and the reading, so I choose social media celebrities as a perspective to examine the mysterious mix of cyberspace and the real world in which we are all living. Scott’s idea of the fourth dimension concerns our sense of ourselves that has been twisted in the blurring of boundaries between our social and private lives. I asked the question to encourage students to consider the idea of “industry” and “celebrity capital in the digital era.”

I further explore the idea of “self” in my following annotation:

I asked the question and provided them with an additional resource:

Check out this book, a very good one about how we present ourselves and our activities to others in social interactions. Goffman, Erving. The Representation of Self in Everyday Life. Edinburgh: University of Edinburgh Social Sciences Research Centre, 1956. You could also watch this video, a short introduction to this book. https://www.youtube.com/watch?v=6Z0XS-QLDWM

In the four-dimensional world, do we still have a back/front stage difference, as discussed by Goffman? Where is the stage, and how is communication managed in a digital age? For example, can an online narrative of a person disappear or die? What does “death” mean in the four-dimensional world? Consider examples like “Get Ready with Me” and “Room Tour” YouTube videos.

I introduced a book written by Erving Goffman and gave students a link to a YouTube video summarizing this book. Goffman’s book is a great one analyzing a theatrical model of everyday self-performance. I would like to help students construct a theoretical framework of the stage, self, performance, social interactions, and digital age by connecting Scott’s book and Goffman’s book, but I also realized that this book might be too long to digest as supplementary material. So I listed a YouTube link here to a short introduction video. To help students connect their own lives and the abstract content they are readings, I raised the examples of “Get Ready with Me” and “Room Tour” YouTube videos.

Tech support & more at the GC: the New Media Lab

Hi Class,

I just wanted to ping you all a brief message on the GC New Media Lab (NML). Yesterday afternoon I attended their General Meeting and wanted to share a couple of takeaways. First, it is a fantastic place, Marco Battistella and his team are extremely competent and have a lot to share, plus they are really keen to hear students’ ideas and projects in order to support them through their expertise combined with the vast technology available at the Lab. The latter has web hosting capacity and several packages available for students, including Drupal, Final Cut Pro X, WordPress, Omeka, Abdobe’s Suite, and more.

For this meeting, the focus was on Omeka, an open-source tool enabling creators to build online digital collections and exhibits; so, if you are thinking about a library or archive for your final project, you should definitely check Omeka out. Further points were also discussed such as how the Lab can help students develop their digital projects.

To get access to the Lab resources and know-how, Master’s students are required to have a recommendation letter from the faculty (typically, this would be provided by the Programme’s advisor), after which they can submit their application and schedule their first one-to-one with the Lab staff, who will support them through project planning and project development/implementation.

DEFCON Speaker Series: Dr. Anelise Shrout

Institutional Critiques with Student Collaborators 

On November 13 I attended an online lecture hosted by DefCon featuring Dr. Anelise Shrout. Dr. Shrout discussed her pedagogical approach to her DH class focused on investigating the history of Bates college. She provided an overall framework and design of the course as well as specific examples of class activities that supporter her goals.

Dr. Shrout channelled her expertise in the 19th century origins of philanthropy to design a course that investigates Bates’s namesake whose donations enabled the founding of the institution. Bate’s has traditionally offered an origin story that centers on the abolitionist roots of its founders, yet no one had investigated the origins of the wealth that Bates used to fund the school. Through a variety of hands-on activities related to research, data capture, coding, and collaboration, Dr. Shrout empowers her students to investigate their own institution while gaining valuable insight and skills. Her ultimate goal is that students walk away from the class with a deeper understanding of the strengths and weaknesses of data and refined collaboration skills which can be used in any work they pursue. Specific tactical skills she bakes into the course include: ;

Her course included the following activities and pedagogical approaches: 

  • Transcription of 19th Century Records
    (Skills Research and Critical Understanding of Data (how data is constructed, the implications of how they are structured and how this can hinder and help analyses)

    Lessons in reading and writing 19th century handwriting, including practicing with fountain pens. In this case, students capture data as recorded in 18th century donation rosters of Bates College as well as purchase orders of Bates textile mill which fabricated cloth using cotton purchased from Southern slave plantations. Students are able to engage with the original data sources before it is scrubbed and tidy, helping them understand the complexity and imperfection of the resulting data. When digitally capturing historical data many educated guesses are baked into what might eventually appear as “clean data.” The fountain pen exercise helped students to connected in a more visceral way to the period the data originated from, and made their own educated guesses…more educated.
  • Maneuvering the Data
    (Skills: coding, collaboration, data visualization and analysis with an awareness of the subjectivity of the data)

    This phase in the course includes learning basic coding through exercises developed in Saturn Notebook, watching videos, and discussing in groups. Pairs then apply the specific functions learned to the data that was collected earlier in the semester—enabling students to put it into practice what they had just learned. Professor Shrout seeks feedback in each pairing session to understand if there are any snags, and invite students to consciously consider what is working, not working, or they could do better to ensure successful collaborations. Professor Shrout emphasizes trying and failing over perfection to remove any barriers to familiarizing ones self to the coding environment.
  • Asking Questions of the Data
    (Skills: Critical Thinking (Data and History), understanding different approaches to the study of data)

    Students are asked to consider what questions can be asked related to the data and beyond and then propose their own personal projects related to their inquiries and findings. In this case students learned that cotton is sold in bales which is equal to 500#—they may then ask why that means in relation to the actual physical labor needed to produce that much cotton. They may ask who else shows up in the data besides Bates (turns out many donors had ties to cotton, some of whom have streets on campus named after them)
  • Final Projects
    (Skills: Collaboration, Critical thinking, aiming work at informed civic action in pursuit of social justice)

    Students then design projects that encapsulate their findings and bring them to life in ways that they deem appropriate. Some amazing examples were infographics related to the founding donors posted around the campus; a student survey investigating if knowing the ties to southern cotton if students would’ve still applied (they would’ve, and in fact claim they would be more eager if the institution was forthcoming with that information—these findings were shared with the schools leadership). Shrout also shared that a student approached her a year after they had taken the course to voice interest in using the skills she had developed in the class to investigate a sensation of feeling less are on campus. This included discussing with a critical lens what types of data could be used to this end and its shortcomings (eg. Campus police records).

Throughout the course Shrout also offers readings that intersect with the issues that arise related to DH and its intersection with feminism and white supremacy. Some of the readings she mentioned include Katie Rawson’s and Trevor Muñoz’s Against Cleaning, D’ignazio’s and Klein’s Data Feminism, Marisa Fuente’s Dispossessed Lives, Jessica Marie Johnson’s Markup Bodies, and Craig Steven Wilder’s Ebony and Ivy: Race, Slavery and the Trouble History of America’s Universities.

Shrout also generously offered her class outline which she adapts to her specific intentions as the class evolves. She emphasized that she at times had skeptics in the classes—students initially more interested in coding and computer science and those who were intimidated by it. She is careful to emphasize critical thinking skills as the hero of the course to appeal to both coders who think they won’t learn much and allay the fears of the coding-shy.

The lecture was inspiring and inspired, and I fully regret not broadcasting it to our cohort as it aligned perfectly with our current readings and class discussion. Professor Shrout really brings DH to life for her students by putting it to work on a subject that directly impacts them and in a way that empowers them to find their own critical and creative voice within the it.

Connected Pedagogy Praxis Assignment

After spending a large portion of this semester discussing, analyzing, and creating annotations in my Doing Things with Novels course, I approached this assignment with a developed understanding of the approaches to scholarly marginalia that I have most benefitted from as both a reader and as a producer. As I worked through Laurence Scott’s The Four-Dimensional Human: Ways of Being in the Digital World, I sought to provide examples of my typical approach to annotation and its focus on the creation of connections and the expansion of the text’s scope through the inclusion of relevant resources and references to theoretical approaches through which the reader might reinterpret their reading of the annotated text.

My first annotation responded to Scott’s statement:

“Increasingly, the moments of our lives audition for digitisation. A view from the window, a meeting with friends, a thought, an instance of leisure or exasperation – they are all candidates, contestants even, for a dimensional upgrade.”

My annotation:
I’m reminded of Byung-Chul Han’s recent work Non-things: Upheaval in the Lifeworld (2022), specifically his discussions regarding the notion of se produire, or to “play to the gallery” in relation to the production of digital identity through the production and staging of information online (14). He goes on to suggest that “digital images transform the world into available information,” intensifying the production of a world enframed purely through curated imagery and amplifying a sense of the Baudrillardian hyperreal. Not the most novel of thoughts but I love Han’s text because his approach can get a little woo-woo at times with statements like, “The decorative and the ornamental are characteristic of things. They are life’s way of telling us that life is about more than mere functioning. In the baroque age, the ornamental was theatrum dei, the theater of the gods. If we submit life fully to functionality and information, we drive the divine out of life. The smartphone is the symbol of our time… it is not embellished in any way” (23). I imagine his work would be a fun read if one was looking to develop a comparative analysis between a theoretical thinker broaching similar subject matter and Scott’s work in Four-Dimensional Human.

My intention with this annotation was to create connections for the reader and to expand their scope of inquiry as they scrutinize this pivotal point in Scott’s piece. Ideally, by pointing them to texts, concepts, and quotes that allowed me to create connections between Scott’s piece and the greater conversations dealing with technology, the production of identity, and time-and-space as impacted by both the digital and capital, it will encourage them to explore, respond, and even disagree with my connections. Any of these would be a fantastic result, as long as it triggers some exploration of the text through searching outside of the text.

My second annotation responded to Scott’s statement:

“Social media, for example, makes a moment four-dimensional by scaffolding it with simultaneity, such that it exists in multiple places at once.”

My annotation:

Having read a decent amount about David Harvey’s notion of time-space compression (or, the rupturing of our experience of time and space as the flow of capital accelerates) this semester, it would be a fun exercise (though maybe redundant) to dig into Scott’s idea of reality “scaffolded with simultaneity” via social media through the theoretical lens of Harvey’s work (+ maybe Virilio’s idea of speed-space or Han’s work in The Art of Lingering).

Though stylistically similar, I’d like to focus on a different portion of my approach that exists in each annotation. In both, I tried to provide the reader with a prompt of inquiry. In the first annotation, this prompt was advancing the idea that a comparative analysis between Han and Scott’s work might act as an interesting rabbit hole to wander down. In this annotation, I suggest that a worthwhile exercise might be exploring Scott’s suggestion of simultaneity in conjunction with analyses of time and space as affected by capitalism through the work of David Harvey, Paul Virilio, and Byung-Chul Han. My intention here is to provide not just possible connections but a prompt for them to explore if such connections actually exist as suggested and whether or not they are worth investigating further.

Essentially, my primary goal for the reader (and for myself) is to encourage a more comprehensive consideration of the text and to enjoy the search that comes with creating and developing intertextual connections.

Thought Experiment: Fashioning a DH Course as a Program-Product-Project

This is a short thought experiment I’m cross-posting from my Digital Pedagogy class for consideration. Context: assessment/evaluation of digital products in the humanities.

Within nonprofit and other sectors, an organization generally has a set of programs based on their mission and for which they’re funded. Within each of these programs are products to be delivered. Each of these products involve one or more projects. These program, products, and projects, are based on meeting the needs of the people they serve. Identifying products to build include data analysis and engaging a range of research methods such as a needs assessment or a human-centered design outreach effort, which are then evaluated later on using outcome and impact metrics. How about if we fashion Digital Humanities curriculum similarly? The course is the program, the product(s) to be delivered are identified in the syllabus, and each week represents a project to achieve the goal of delivering the product(s). This ensures that every class serves as a milestone in the development of a stated goal: building a product. So, instead of seemingly disparate readings and topics from class to class, there’s a roadmap with transparency into the process and learning critical project management skills along the way. The final project could be the evaluation of the program, product, and/or projects.

Pedagogy Readings & Class Discussion

This week’s reading and discussion introduced some interesting thought-starters for me. Highlighting a few to share here:

  1. Per “How Not to Teach Digital Humanities”, DH is not suitable for undergrad students due to the theoretical component. I thought this was interesting and was wondering if I were to take this class in undergrad, what would that be like? I am still navigating my perception of the DH field as this is my first DH class. It is indeed more theoretical, and less technical than I expected. I do enjoy the critical components of it but do crave more resolutions/ applications.
  2. During the last class, it was discussed that “humanities studies don’t solve problems, but rather post questions to those problems”. This jumped out to me and definitely got me thinking. I studied BBA (business) during my undergrad studies and have always been in a “problem-solving” role since I joined the workforce (management consulting, corporate finance, data science, etc). This was definitely a shocking and liberating revelation that I was not expecting. In the professional context, I operate from a problem-driven mindset (as opposed to a solution-driven mindset). While it is still a problem-space operating mindset, the goal is simply to ensure we are working on the right problem before we dive into solution-ing. “Posting/ asking questions without the intent of solving the problem” is definitely something I have never considered before. I am not sure how I feel about it yet, but this is definitely an interesting mental experiment.

Creating Community Through Collaborative Projects | Pedagogy

The readings related to pedagogy allowed me to reflect and think about the work I do with students at LaGuardia Community College.

Context

First-Year Seminar and Student Success Mentors

At LaGuardia Community College, I oversee the Student Success Mentor (SSM) Program. We hire 15 -20 students each semester. The SSMs in the program facilitate the studio hour lab session that is part of a discipline-specific First Year Seminar Course taught by faculty in the discipline and mentored by one of our SSMs. For example, if a first-year student is majoring in business, they will take the Business and Technology First Year Seminar Course. Currently, we have 16 discipline-specific courses. Our program serves approximately 2000 students who are enrolled in the First Year Seminar.

Learning Digital Communication Ability

Before students graduate and or transfer from our institution, they should learn Digital communication, Written Communication, and Oral communication as part of the Core Compemntices and Abilities. Students begin their development in Digital Communication Ability by creating their first Core ePortfolio on Digication. This ePortfolio goes on a journey with the student from the first semester at LaGuardia until their capstone course.

The Student Success Mentors play an instrumental role in helping students develop their first ePortfolio. Therefore, new SSMs undergo 30 hours of training (5 weeks) before facilitating their First Studio Hour. We touch on several topics ranging from the following:

  • Mentorship
  • DEI
  • Digital Tools
  • Class management
  • Empathy
  • Facilitation
  • Digital Communication
  • College Resouce and etc.

New and Returning SSMs also participate in Professional Development every semester to go over any updates related to technology or topics that need to be addressed.

Community Building Through Collaborative Digital Blog Projects

For the last eight years, the Student Success Mentor Community is very strong. They support each other. Therefore, it’s important for us to create opportunities for SSMs to work together on different projects beginning with the Blog they create with New SSMs and Returning SSMs.

At the culmination of their New SSM Training, the program takes the New SSMs and returning SSMs on a trip to a cultural site. The aim of this trip is to help the New SSMs and Returning SSMs to connect with each other, but also connect the training topics to the artifacts they observe and analyzed at a cultural site. We have visited the MET, Ellis Island, Museum of the City of New York. We provide SSMs with prompts to help them think about what they learned during training and how it’s connected to the artifact. As a team, they come together to answer these questions. They are instructed to take videos, and pictures and take notes. One example is, at the MET, we asked teams to select one artifact they saw at an assigned exhibition and explain how this artifact represents one of the SSM Core Values and how it relates back to the work they will be doing with students once the semester begins.

What to do with media? Computer Literacy Skills

As a former student of graphic design and new media arts I know firsthand how important it is to organize, name, categorize, and backup media. One of the techniques I emphasize before we begin our exploration at the exhibit is to make sure that we have the media in one platform such as google drive to upload and share the media with team members along with any notes they are making as they are documenting their experience.

Putting it together!

In the following session, after they have gone to the cultural site, they meet in the lab. During this session, SSMs begin their work by creating a blog on ePortfolio with the information, videos, and photos they have gathered. As they are creating this blog in the lab, they are running from one computer to another. They are talking, laughing, and overall, helping each other build one blog. Once they have created their blog, they present the section of their blog. During the presentation, you can see the work that went behind creating this blog and how they all supported the work. They practice the digital communication skills they learned during training and also the different topics while creating a lasting community of mentors.

Connected Pedogoy Assignment: The Reverse Peephole

I made several annotations on the reading, as listed below. However, I would like to focus on three quotes. For each quote, I have provided a reason for annotation this quote. I aim to help students relate concepts or themes to their lives. Generally, when students relate the content to their experiences, I think they can better grasp and understand the concept.

  • Quote: ” Queen Victoria transformed into King Edward, ‘the fourth dimension’ became an everyday concept.”
  • Annotation: I am not familiar with this concept of ‘the fourth dimension that occurred a century ago. I need to research since I don’t know what transformation happened during this time.
  •  Quote: “The modem’s faithful churn made it seem as if it were tunneling through to somewhere else, opening up a space for us to inhabit.”
  • Annotation: What does “tunneling through to somewhere else” look like in the digital world? Does that tunnel still exist?
  • Quote:  “With the prospect of this fourth digital dimension, a moment can feel strangely flat if it exists solely in itself.”
  • Annotation: If you post an accomplishment on social media, does it feel “more real” than if you didn’t post it? Why?

1) Quote: “but our physical homes have also been digitized. We can identify a common fitting on a 4D house by traveling back in time to the unlikely world of Seinfeld‘s last season”

Annotation: Video Reference Youtube: https://youtu.be/jzAvEkbn3lA

Annotation Reason:

I am not sure what is the age group for this class, however, I am assuming that some might not understand the reference to Seinfeld. Therefore, I provided a short video that I found on YouTube related to this specific scene.

2)Quote: “physical homes have also been digitized.”

Annotation: During the pandemic, when work and school took place online, we were forced to digitally invite our colleagues, boss, classmates, interviews, and people we usually don’t invite to our homes. How did this make you feel? What did you learn more about this experience?

Annotation Reason:

This book was written in 2016, however, four years later, the whole world was forced to changes due to the pandemic. Therefore, these students can understand the impact it had on their lives. This question aims to help them reflect on their own experience, give them a moment to think about their feelings and give them the power to create their narrative, as this was a traumatic experience for many.

3) Quote: “What happens to the nervous system when it is exposed.to the delights and pressures and weird sorrows of networked life? “

Annotation: What happens to the development of children who are born into using mobile devices?

Annotation Reason:

This is a rhetorical question. I don’t have the answers. However, I’d like students to consider the digital world’s implications on children.