Seminars & Colloquia
Department of Computer Science and Engineering University of Texas at Arlington
"Tackling Usability Challenges in Querying Massive, Ultra-heterogeneous Graphs"
Tuesday February 09, 2016 10:00 AM
Location: 3211, EBII NCSU Centennial Campus
(Visitor parking instructions)
This talk is part of the Taming the Data Seminar series
There is a pressing need to tackle the usability challenges in querying massive, ultra-heterogeneous entity graphs which use thousands of node and edge types in recording millions to billions of entities (persons, products, organizations) and their relationships. Widely known instances of such graphs include Freebase, DBpedia and YAGO. Applications in a variety of domains are tapping into such graphs for richer semantics and better intelligence. Both data workers and application developers are often overwhelmed by the daunting task of understanding and querying these data, due to their sheer size and complexity. To retrieve data from graph databases, the norm is to use structured query languages such as SQL, SPARQL, and those alike. However, writing structured queries requires extensive experience in query language and data model. The database community has long recognized the importance of graphical query interface to the usability of data management systems. Yet, relatively little has been done. Existing visual query builders allow users to build queries by drawing query graphs, but do not offer suggestions to users regarding what nodes and edges to include. At every step of query formulation, a user would be inundated with possibly hundreds of or even more options.
Towards improving the usability of graph query systems, Orion is a visual query interface that iteratively assists users in query graph construction by making suggestions using machine learning methods. In its active mode, Orion suggests top-k edges to be added to a query graph, without being triggered by any user action. In its passive mode, the user adds a new edge manually, and Orion suggests a ranked list of labels for the edge. Orion’s edge ranking algorithm, Random Correlation Paths (RCP), makes use of a query log to rank candidate edges by how likely they will match users’ query intent. Extensive user studies using Freebase demonstrated that Orion users have a 70% success rate in constructing complex query graphs, a significant improvement over the 58% success rate by users of a baseline system that resembles existing visual query builders. Furthermore, using active mode only, the RCP algorithm was compared with several methods adapting other machine learning algorithms such as random forests and naive Bayes classifier, as well as recommendation systems based on singular value decomposition. On average, RCP required only 40 suggestions to correctly reach a target query graph while other methods required 2-4 times as many suggestions.
Dr. Chengkai Li is Associate Professor and Director of the Innovative Database and Information Systems Research Laboratory (IDIR) in the Department of Computer Science and Engineering at the University of Texas at Arlington. His research interests are in several areas related to big data and data science, including database, data mining, Web data management, and information retrieval. His current research focuses on building large-scale human-assisting and human-assisted data and information systems with high usability, low cost and applications for social good. His research projects are on computational journalism, crowdsourcing and human computation, data exploration by ranking (top-k), skyline and preference queries, database testing, entity query, and usability challenges in querying graph data. Dr. Li received his Ph.D. degree in computer science from the University of Illinois at Urbana-Champaign. He graduated from Nanjing University with an M.Eng. degree and a B.S. degree in Computer Science. He is a recipient of the HP Labs Innovation Research Awards in 2011 and 2012.
Host: Rada Chirkova, CSC