Sefaria in Gephi: More Visualization than Judaism, & That’s Okay – A Very DH Project.

A/N: This is less a critique and more praise, frankly, and you know what? I’m okay with that, this project is awesome.

Sefaria is a massive and widely-used online repository for Jewish texts, from Torah to Talmud. It contains all of our texts in several languages, and also includes user-generated commentaries and collections. It is often called a “living library” of Judaism. Gephi is a software that allows for data visualization. So when I came across Sefaria in Gephi, I said “oh my gosh, I’d been wondering about this exact thing!”

Sefaria in Gephi is a project by the folks at Ludic Analytics, a small group of colleagues who all work on literature visualization as a both a research and pedagogical tool. It’s a little dated at this point, finishing the posts themselves in 2014 when Sefaria was only a year old, but there is still a massive amount of value in these graphs. The main author, Dr. Liz Shayne, says she started this project mostly out of curiosity for “how do the formal attributes of digital adaptations affect the positions we take towards texts? And how do they reorganize the way we perceive, think about and feel for/with/about texts?”. This is, in my opinion, a very DH question to ask- how does the way we visualize data change the way we perceive the results and how we feel about them? It is truly DH in being at the intersection of math and literature, as well; it hits a very necessary cross-section.

Quick sidebar about Dr. Shayne- since this series was published on WordPress she actually ended up working with Sefaria, and is now a director at Yeshivat Maharat, a women’s university for Jewish studies (in goyim terms), which is actually part of my little niche community of Open Orthodoxy! Very proud to have her as one of us.

So about Sefaria: Sefaria creates literal links between texts that all reference the same thing. If you highlight Genesis 1:1, it’ll show you all the other texts that mention Gen. 1:1. It makes it very easy to see the (not so literal) links between the texts.

Over 87k connections were made over 100k nodes, which Dr. Shayne notes it’s important to realize that these connections are less an indication of over 2000 years of texts, but rather an indication of the incredible crowdsourcing Sefaria has been able to accomplish.

Here is the first example she gives of what Gephi did with Sefaria, using the plugin OpenOrd graphing, which visualizes large datasets.

The figure above represents the following:

“Blue – Biblical texts and commentaries on them (with the exception of Rashi). Each node is a verse or the commentary by one author on that verse.

Green – Rashi’s commentaries. Each node is a single comment on a section

Pink – The Gemara. Each node is a single section of a page.

(Note – these first 3 make up 87% of the nodes in this graph. Rashi actually has the highest number of nodes, but none of them have very many connections)

Red – Codes of Law. Each node is a single sub-section.

Purple – The Mishnah. Each node is a single Mishnah.

Orange – Other (Mysticism, Mussar, etc.)”

Don’t worry if you’re not Jewish/don’t know what these things mean, just know that they’re all Jewish texts. “Size also corresponds to degree”, says Shayne. “the more connections a single node has, the larger it is”. The largest blue node is just the first verse of Genesis. From this graph we can also see that most connections are made by the Gemara referencing the Torah and the Gemara referencing itself. Shayne notes, however, that this graph is just very hard to read, and misses a lot of important information like proximity- there’s nothing linear or otherwise sequential in this graph.

Dr. Shayne experiments with several different methods of visualizing this data, and is quite good at self-critiquing the methods based on the way they change the way you think about a text. The second article she writes in the series actually talks entirely about the limits of the project and of the medium, and specifically how to make limits work in your favor, which is something I think a lot of DH projects could use and are trying to learn. She also experiments with what data she’s trying to visualize, going from small concepts like connections between individual statements to much broader ones like visualizing connections between entire books. Due to her own project, as Sefaria gets bigger, it adds more links, and what she’s able to do with this data changes—like tracking allusions to other texts rather than tracking the texts themselves. There’s a point where she actually graphs Sefaria getting larger, which accidentally gave her insight into how Sefaria was built up.

Overall, I think this project has a lot to offer in terms of what it allows us to see about Jewish commentary from different lenses. However, this project is ultimately much less about Judaism and more about reliable and creative graphing. We learn that what is helpful visually may not always be what is helpful statistically. We also owe this project some credit to Sefaria’s popularity, as it was written about in Wired which gave it some traction. Unrelatedly, she is also quite witty: in potentially the funniest line I’ve ever read in an academic article, she writes “statistically speaking, Genesis 1:1 is the Kevin Bacon of Sefaria. You are more likely to be within 6 degrees of it than anything else.”

Dr. Shayne is, according to her WordPress, still working on this project, however I can’t find much online as to in what capacity (perhaps that was her work with Sefaria directly). You can access the data from her project here on GitHub (like any good DH project), but be warned that it is extremely hefty. She ends this exploration with two questions:

“1. How does this kind of work – making visualizations and thinking about networked Jewish text – enhance the traditional experience of studying Jewish texts in a Jewish environment?
2. How can an academic researcher make use of these visualizations and to what degree does she need to become an expert in network theory to do so?”

And to these I say:

  1. Making these sorts of connections is innate to Jewish study- it is why we study the commentaries in addition to the Tanach. These images don’t take the legwork out of making those connections, but rather serve as a memory aid.
  2. This question I can answer less, but I can say that you definitely don’t need to be an expert to use these- this is the Digital Humanities at work in serving the public knowledge base; these illustrations are incredibly accessible in their formatting.

1 thought on “Sefaria in Gephi: More Visualization than Judaism, & That’s Okay – A Very DH Project.

  1. Pingback: Belated Thoughts on Visualization | Introduction to Digital Humanities Fall 2022

Comments are closed.