RESEARCH PROJECTS


  • Authors: Pinar Yanardag, SVN Vishwanathan
    Abstract:This paper proposes a general smoothing framework for graph kernels, inspired by state-of-the-art smoothing techniques used in natural language processing (NLP). However, unlike NLP applications that primarily deal with strings, we show how one can apply smoothing to a richer class of inter-dependent sub-structures that naturally arise in graphs. Moreover, we discuss extensions of the Pitman-Yor process that can be adapted to smooth structured objects, thereby leading to novel graph kernels.
    Download: Pdf, Data, Code


  • Authors: Pinar Yanardag, SVN Vishwanathan
    Abstract: In this paper, we present Deep Graph Kernels, a unified framework that learns the latent representations of sub-structures for graphs. Our framework is inspired by the latest advancements in language modeling and deep learning, and leverages the dependency information between sub-structures. We demonstrate our framework to learn graph-based representations of different communities on Reddit, and detect which community (i.e. subreddit) a discussion thread belongs to by only considering the communication patterns between the users.
    Download: Pdf, Data, Code


  • Authors: Shihao Ji, Hyokun Yun, Pinar Yanardag, Shin Matsushima, SVN Vishwanathan
    Abstract: Embedding words in a vector space has gained a lot of attention in recent years. While state-of-the-art methods provide efficient computation of word similarities via a low-dimensional matrix embedding, their motivation is often left unclear. In this paper, we argue that word embedding can be naturally viewed as a ranking problem due to the ranking nature of the evaluation metrics. Then, based on this insight, we propose a novel framework WordRank that efficiently estimates word representations via robust ranking, in which the attention mechanism and robustness to noise are readily achieved via the DCG-like ranking losses.
    Download: Pdf


  • Authors: Pinar Yanardag, SVN Vishwanathan
    Abstract: In this paper, we present a submodular framework for graph comparison. Our framework extends sampling scheme of the graphlet kernel to encourage diversity while avoiding redundancy when selecting sub-graphs. Our experiments on several benchmark datasets show that our framework outperforms the graphlet kernel in terms of classification accuracy by using 50% less samples. As a future work, we consider to formulate this problem as a graph summarization task, and summarize large-scale networks such as Facebook and Twitter.


  • Authors: Pinar Yanardag*, Mariheida Cordova* (*equal contribution)
    Abstract: The use of micro-blogging in classrooms is a recently trending concept in computer-aided education. Micro-blogs offer an effective way of communication in large classrooms, and engage students in meaningful discussions. However, under micro-blogging setting, the relevant content might be overwhelmed by irrelevant posts to the lecture which could jeopardize effective learning. Moreover, students might generate redundant content by posting similar questions to each other and create substantial information overload. In this paper, we present a principled approach for picking a set of posts that promotes relevant and diverse content while effectively turning down the noise created by redundant posts.


  • Authors: Pinar Yanardag, Rean Griffith, Anne Holler, K. Shankari, Xiaoyun Zhu, Ravi Soundararajan, Adarsh Jagadeeshwaran, Pradeep Padala
    Abstract: Using a population of VMware Virtual Center Virtual Appliances (VCVA) and their respective workloads, we describe techniques for constructing a model of their resource consumption and performance, memory requirements, and average operation-latency by mining logs of application (VCVA) performance. We use our model to provide sizing recommendations for the virtual appliance and identify features that can be used to provide rough estimates of expected memory consumption. We describe modeling techniques from statistical machine learning that are amenable to representing complex, non-linear systems.
    Note: Project is done while interning at VMware.


  • Authors: Pinar Yanardag, SVN Vishwanathan
    Abstract: Microblogging is a form of blogging where posts typically consist of short content such as quick comments, phrases, URLs, or media, like images and videos. Because of the fast and compact nature of microblogs, users have adopted them for novel purposes, including sharing personal updates, spreading breaking news, promoting political views, marketing and tracking real time events. Thus, finding relevant information sources out of the rapidly growing content is an essential task. In this paper, we study the problem of understanding and analysing microblogs. We present a novel 2-stage framework to find potentially relevant content by extracting topics from the tweets and design a submodular framework that copes with information overload problem.


  • Authors: Pinar Yanardag, SVN Vishwanathan
    Abstract: Reddit is one of the largest community-driven content aggregation websites. The immense popularity of Reddit can be attributed to its ability to surface freshest trends and content on the web, and it is commonly referred as the front page of the Internet. With over 800K communities devoted to various topics, a natural information overload problem arises. Given that thousands of new threads and discussion topics generated hourly, providing the best coverage of content that is relevant to the interests of the users becomes a critical task. In this paper, we propose a framework that tailors a personalized frontpage of the Internet. We formulate our framework as a submodular optimization problem, for which we can efficiently provide a near-optimal solution. The following image illustrates a set of community targeted towards female users that is discovered by our algorithm:


  • Authors: Pinar Yanardag, SVN Vishwanathan
    Abstract: TED is a non-profit organization devoted to Ideas Worth Spreading and invites thinkers and doers from all over the world to give short and inspiring talks. As of today, more than 2,000 TED Talks are available on TED.com, meaning over 2,000 ideas worth spreading. However, it is not easy for users to watch all TED videos and extract useful ideas. In this paper, we propose a submodular framework to summarize TED talks to extract inspiring ideas worth spreading. Our framework utilizes the feedback from the audience by using the comments associated with the talks in order to promote important aspects from the talks.


  • Authors: Pinar Yanardag, SVN Vishwanathan
    Abstract: In this paper, we consider the problem of providing personalized restaurant recommendations on Yelp. We first analyze the review corpus of Yelp and build a topic model. Then, we build personalized profile for individual users based on topics and cuisines they are interested in. By considering this insight, we design a recommender system that balances variety and relevancy to generate a set of new restaurants to explore. We formulize our problem as a submodular objective function for which we can guarantee optimality within a factor of (1-1/e).


  • Authors: Pinar Yanardag (Independent project)
    Abstract: What makes authors different? Can we compare authors by investigating how they interpret different concepts? For instance, how the concept of love interpreted by Kafka vs. Sylvia Plath? For this purpose, I processed 40K Gutenberg books, and build word embedding models for invidivual authors. Word embedding models embed words into a latent space, and allow us to investigate surrounding words for a given concept. Preliminary results are promising. For instance, comparison of the concept power shows that while Balzac interprets this concept as something related to science, development and nature, Mark Twain interprets this concept as spiritual, divine and god-related. As a quantitative evaluation, I was able to detect the gender of the authors with a 75% accuracy on a balanced dataset by just considering how authors embed a common set of 50 concepts into latent space.