Reading List for Graph Walk
Graph walks are the first topic we will dive into. Graphs are an abstract way to represent many data sources:
- A knowledge graph is a graph where entities (think Wikipedia entries) are nodes, and relations such as “married-to” or “born-in” are edges (e.g. Freebase , DBpedia , Yago , WikiData )
- A web graph is a graph where web pages are nodes, and hyperlinks from one page to another are edges.
- A Wikipedia hypertext graph is a graph where each Wikipedia page is a node, and when one page links to another, they have an edge
- An entity link graph is a graph where both texts and Wikipedia entities are nodes, and a text with an entity link to a Wikipedia entity, is represented by an edge. (Entity linking tools are: TagMe! , Dexter , etc – more on the project resource page)
- A social friend network (e.g., Facebook) is a graph where users are nodes, and there is an edge, if they are friends
- A social follower network (e.g., Twitter) is a graph were users are nodes, and there is an edge when a user follows another user (i.e., subscribes to their content)
- A citation graph is a graph where each piece of scientific literature is a node, and a citation is an edge in the graph
- A word co-occurrence graph is a graph, where words are nodes, and words that occur in the same text have an edge between them
- A word-net graph is a graph, where words are nodes, and they have an edge if they are in a syntactic relationship (e.g., synonym, hyponym, homonym, …)
- A word2vec graph is a graph were words are nodes, and all nodes have an edge between each other. However, the strength of the edge is the similarity of both word’s word vectors.
In the programming homework you are going to be working with a CAR-Hypertext graph like this:
Every CAR page is a node, and every paragraph represents an edge between all CAR pages it links to.
Graph walk algorithms are based on an thought experiment: Say a random surfer would pick a node at random, and would hop from node to node along edges for all eternity. Which node would it visit most often?
Graph walk algorithms can be used to rank nodes, by placing the node that is visited most often first, the second-most visited second, etc. This is the rough idea behind the PageRank algorithm, which made the Google search engine famous in the late 90’ies.
The purpose of this reading assignment is for you to become familiar with different graph walk algorithms and their variation, their underlying theory and applications.
(everyone has to read this)
(mandatory if you choose this as your expertise topic)
(if you are completely lost, start here!)
Further reading beyond this assignment
(if you want to know more, or read these if the other papers were to easy)
- Chakrabarti, Soumen, and Alekh Agarwal. “Learning parameters in entity relationship graphs from ranking preferences.” European Conference on Principles of Data Mining and Knowledge Discovery. Springer, Berlin, Heidelberg, 2006. https://pdfs.semanticscholar.org/289f/e60cb5ebd0c45308c7660a2fa3aca1c3f997.pdf
- Borgatti, Stephen P., and Martin G. Everett. “A graph-theoretic perspective on centrality.” Social networks 28.4 (2006): 466-484. https://pdfs.semanticscholar.org/553d/6d22a9850304d822915515cbe26eb86dea45.pdf
- Newman, Mark EJ. “Analysis of weighted networks.” Physical review E 70.5 (2004): 056131. https://arxiv.org/pdf/cond-mat/0407503.pdf
- Haveliwala, Taher, Sepandar Kamvar, and Glen Jeh. An analytical comparison of approaches to personalizing pagerank. Stanford, 2003. http://ilpubs.stanford.edu:8090/596/1/2003-35.pdf