Graph walks are the first topic we will dive into. Graphs are an abstract way to represent many data sources:

- A knowledge graph is a graph where entities (think Wikipedia entries) are nodes, and relations such as “married-to” or “born-in” are edges (e.g. Freebase , DBpedia , Yago , WikiData )
- A web graph is a graph where web pages are nodes, and hyperlinks from one page to another are edges.
- A Wikipedia hypertext graph is a graph where each Wikipedia page is a node, and when one page links to another, they have an edge
- An entity link graph is a graph where both texts and Wikipedia entities are nodes, and a text with an entity link to a Wikipedia entity, is represented by an edge. (Entity linking tools are: TagMe! , Dexter , etc – more on the project resource page)
- A social friend network (e.g., Facebook) is a graph where users are nodes, and there is an edge, if they are friends
- A social follower network (e.g., Twitter) is a graph were users are nodes, and there is an edge when a user follows another user (i.e., subscribes to their content)
- A citation graph is a graph where each piece of scientific literature is a node, and a citation is an edge in the graph
- A word co-occurrence graph is a graph, where words are nodes, and words that occur in the same text have an edge between them
- A word-net graph is a graph, where words are nodes, and they have an edge if they are in a syntactic relationship (e.g., synonym, hyponym, homonym, …)
- A word2vec graph is a graph were words are nodes, and all nodes have an edge between each other. However, the strength of the edge is the similarity of both word’s word vectors.

In the programming homework you are going to be working with a CAR-Hypertext graph like this:

Every CAR page is a node, and every paragraph represents an edge between all CAR pages it links to.

Graph walk algorithms are based on an thought experiment: Say a random surfer would pick a node at random, and would hop from node to node along edges for all eternity. Which node would it visit most often?

Graph walk algorithms can be used to rank nodes, by placing the node that is visited most often first, the second-most visited second, etc. This is the rough idea behind the PageRank algorithm, which made the Google search engine famous in the late 90’ies.

The purpose of this reading assignment is for you to become familiar with different graph walk algorithms and their variation, their underlying theory and applications.

(everyone has to read this)

- Haveliwala, Taher H. “Topic-sensitive pagerank.” Proceedings of the 11th international conference on World Wide Web. ACM, 2002. http://ilpubs.stanford.edu:8090/573/1/2002-6.pdf

(mandatory if you choose this as your expertise topic)

- Lempel, Ronny, and Shlomo Moran. “SALSA: the stochastic approach for link-structure analysis.” ACM Transactions on Information Systems (TOIS) 19.2 (2001): 131-160. http://delab.csd.auth.gr/~manolopo/oikonomiko/salsa.pdf
- Erkan, Günes, and Dragomir R. Radev. “Lexrank: Graph-based lexical centrality as salience in text summarization.” Journal of Artificial Intelligence Research 22 (2004): 457-479. http://www.jair.org/media/1523/live-1523-2354-jair.pdf
- Richardson, Matthew, and Pedro Domingos. “The intelligent surfer: Probabilistic combination of link and content information in pagerank.” Advances in neural information processing systems. 2002. http://papers.nips.cc/paper/2047-the-intelligent-surfer-probabilistic-combination-of-link-and-content-information-in-pagerank.pdf
- Berkhin, Pavel. “A survey on pagerank computing.” Internet Mathematics 2.1 (2005): 73-120. https://projecteuclid.org/download/pdf_1/euclid.im/1128530802 – This is a very long paper. I don’t expect you to read all of it. But choose a few sections into which you are diving deeper.

(if you are completely lost, start here!)

- Page, Lawrence, et al. The PageRank citation ranking: Bringing order to the web. Stanford InfoLab, 1999. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf
- Kleinberg, Jon M. “Authoritative sources in a hyperlinked environment.” Journal of the ACM (JACM) 46.5 (1999): 604-632. https://www.cs.bgu.ac.il/~snean151/wiki.files/7-AuthoritativeSourcesinaHyperlinkedEnvironment.pdf
- Chapter 21 in Book Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval. Vol. 1. No. 1. Cambridge: Cambridge university press, 2008. https://nlp.stanford.edu/IR-book/pdf/21link.pdf

(if you want to know more, or read these if the other papers were to easy)

- Chakrabarti, Soumen, and Alekh Agarwal. “Learning parameters in entity relationship graphs from ranking preferences.” European Conference on Principles of Data Mining and Knowledge Discovery. Springer, Berlin, Heidelberg, 2006. https://pdfs.semanticscholar.org/289f/e60cb5ebd0c45308c7660a2fa3aca1c3f997.pdf
- Borgatti, Stephen P., and Martin G. Everett. “A graph-theoretic perspective on centrality.” Social networks 28.4 (2006): 466-484. https://pdfs.semanticscholar.org/553d/6d22a9850304d822915515cbe26eb86dea45.pdf
- Newman, Mark EJ. “Analysis of weighted networks.” Physical review E 70.5 (2004): 056131. https://arxiv.org/pdf/cond-mat/0407503.pdf
- Haveliwala, Taher, Sepandar Kamvar, and Glen Jeh. An analytical comparison of approaches to personalizing pagerank. Stanford, 2003. http://ilpubs.stanford.edu:8090/596/1/2003-35.pdf