CSC News

February 20, 2018

Best Paper Award Presented at IEEE Big Data 2017

Congratulations to NC State Computer Science PhD student HyeongSik Kim (pictured at right), recent PhD graduate Padmashree Ravindra, and Dr. Kemafor Anyanwu Ogan, associate professor of computer science at NC State, for winning the Best Paper Award at the 2017 IEEE International Conference on Big Data (IEEE Big Data 2017) held in Boston, MA December 11-14, 2017.


The winning paper is “A Semantics-Aware Storage Framework for Scalable Processing of Knowledge Graphs on Hadoop” in Proceedings of IEEE Big Data 2017, pages 193-202.  The abstract follows:


Knowledge graphs are graph-based data models which employ named nodes and edges to capture differentiation among entities and relationships in richly diverse data collections such as in the biomedical domain. The flexibility of knowledge graphs allows for heterogeneous collections to be linked and integrated in precise ways. However, resulting data models often have irregular structures which are not easy to manage using platforms for structured, schema-first data models like the relational model. To facilitate exchange, inter-operability and reuse of data, standards such as Resource Description Framework (RDF) have been increasingly adopted for representation. Domains such as the biomedical now have large collections of publicly available RDF graphs as well as benchmark workloads. To achieve scalability in data processing, some efforts are being made to build on distributed processing platforms such as Hadoop and Spark. However, while some distributed graph platforms have emerged for certain classes of mining workloads for non-semantic graphs (without typed edges and nodes), knowledge graph processing, which often involves ontological inferencing, continues to be plagued by scalability and efficiency challenges. In this paper, we present the design of a Hadoop-based storage architecture for knowledge graphs that overcomes some of the challenges of big RDF data processing. The rationale of the design strategy is to go beyond the traditional approach of exploiting structural properties of graphs while storing to include exploitation of semantic properties of knowledge graphs. Our system SemStorm is a Hadoop-based indexed, polymorphic, signatured file organization that supports efficient storage of data collections with significant data heterogeneity. Naive storage models for such data place more demands for meta-data management than traditional systems can support. The polymorphic file organization is further coupled with a nested, column-oriented file format to enable discriminatory data access based on queries. A major hallmark of SemStorm is the enabling of semantic-awareness in storage framework. The idea is to exploit the knowledge represented in ontologies that accompany data for optimizing data storage models such as identifying and managing data (sometimes implicit) redundancies. Another major advantage of SemStorm is that it derives optimized storage models for data autonomically, i.e., without user input. Extensive experiments conducted on real-world and synthetic benchmark datasets show that SemStorm is up to 10X faster than existing approaches.


To read the winning paper, click here.


The 2017 IEEE International Conference on Big Data (IEEE Big Data 2017) provided a leading forum for disseminating the latest results in Big Data Research, Development and Applications.



Return To News Homepage