October 14, 2010
Proceeds to Benefit the NC State Department of Computer Science
Students in Dr. Nagiza Samatova’s CSC 422/522 course co-authored a textbook titled “Practical Graph Mining with R.” The book, to be published by Chapman & Hall/CRC Press under the Data Mining and Knowledge Discovery Series, has a final delivery date of January 31, 2011.
The book was written entirely by students in the class. PhD students John Jenkins and Stephen Ware and undergraduate Neil Shah wrote three “umbrella” chapters on “Introduction to R” (Shah), "Introduction to Graph Kernels" (Jenkins) and “Introduction to Graph Theory” (Ware). Computer Science PhD students Arpan Chakraborty, Brent Harrison, William Hendrix, Srinath Ravindran and Kevin Wilson will serve as co-editors of the book.
This book is designed as a primary textbook for advanced undergraduates, graduate students, and researchers focused on computer, information, and computational science. It also provides a handy resource for data analytics practitioners. The book is self-contained, requiring no prerequisite knowledge of data mining, and may serve as a standalone textbook for graph mining or as a supplement to a standard data mining textbook.
All proceeds from the sale of the textbook will go to the NC State Department of Computer Science.
Description of the book: Graph data is ubiquitous in real-world science, government, and industry domains. Examples include social network graphs, Web graphs, cybersecurity networks, power grid networks, and protein-protein interaction networks. Discovery of useful knowledge from graph data is a challenging data mining task. To address this challenge, effective and efficient graph-based algorithms have emerged from the data mining research community.
“Practical Graph Mining with R” provides a practical, “do-it-yourself” approach to extracting interesting patterns from graph data. This book is a synthesis of two important aspects of computer science: graph-based modeling and data mining techniques. Whereas graphs depict the structure and relationships between entities, data mining is the process of extracting meaningful information from large data sets. The merger of the two combines the power of both: a means to discover useful properties of structured data. It covers many basic and advanced techniques for graph mining, including techniques for:
- Identifying anomalous nodes, edges, or substructures in a graph;
- Identifying frequently recurring patterns of a graph;
- Clustering vertices or graphs into related groups;
- Assessing and categorizing the nature of links in a graph;
- Measuring the proximity between vertices in a graph or the degree of similarity among multiple graphs;
- Classifying graphs and assigning graph node labels using graph kernels;
- Reducing the intrinsic high-dimensionality of graph datasets;
- Dividing up graph mining tasks so that they can be tackled by multiple processors working in parallel—an important general strategy for making large graph mining problems more manageable.
Each chapter of the book presents three representative computational techniques to guide and motivate the reader’s study of the topic and culminates in demonstrating how such techniques could be utilized to solve a real-world application problem(s) using real data sets. Applications include network intrusion detection, tumor cell diagnostics, face recognition, predictive toxicology, mining metabolic and protein-protein interaction networks, community detection in social networks, and others. These representative techniques and applications were chosen based on availability of open-source software and real data, as the book provides several libraries for the R statistical computing environment to “walk-through” the real use cases. The presented techniques are covered in sufficient mathematical depth. At the same time, chapters include a lot of explanatory examples. This makes the abstract principles of graph mining accessible to people of varying levels of expertise, while still providing a rigorous theoretical foundation that is grounded in reality. Though not every available technique is covered in depth, each chapter includes a brief survey of bibliographic references for those interested in reading further reading. By presenting a level of depth and breadth in each chapter with an ultimate focus on “hands-on” practical application, the book has something to offer to the student, the researcher, and the practitioner of graph data mining.