Seminars & Colloquia
Institute of Network Computing and Information Systems, Peking University, China
"On the SSS of WWW"
Thursday October 02, 2008 11:00 AM
Location: 3211, EB II NCSU Centennial Campus
(Visitor parking instructions)
Abstract: Much like a black hole, the Web, since its birth, has been absorbing all sorts of data (information) around the globe, ever generated along the path of human civilization. On the other hand, the digitized and networked (webbed) nature of web data, which generally means "easy to access", gives rise to much imagination on re-discovering, re-engineering, and re-using of the oceanic information. There are unlimited directions to take for Web related research. In this talk, I'll address issues related to "SSS of WWW", namely size, shape, and search of the World Wide Web. Starting from how people estimate the size of the Web, we provide a law of growth based on some empirical study and statistical data for Chinese Web. In terms of shape, we discuss a result based on a crawl of 800 million web pages, which shows the evidence of a "Tea Pot", instead of "Bow Tie" as people generally assume. For search, I'll first introduce Web InfoMall (http://www.infomall.cn), the Chinese web archive we have been constructing since 2001. And one can easily realize a step beyond the web archive, namely searching and more accurately mining based on Web InfoMall, to make use of the data in the web archive. With a web archive and associated capability, "web mining" here has a more or less different meaning, which spans from the structure analysis of the web to named entity and relation extractions, from spatial (if we consider URL as a space) information discovery to temporal information exhibition. I'll show some unique examples along this line, including HisTrace (a search facility based on Web InfoMall) and Tianwang Digest for Olympics 2008 (an event analysis system with Beijing Olympics as an instance).
Short Bio: LI Xiaoming received his Ph.D. in Computer Science from Stevens Institute of Technology (USA) in 1986 and has since taught at Harbin Institute of Technology and Peking University. He has founded the Chinese web archive WebInfoMall (http://www.infomall.cn), the search engine Tianwang (http://e.pku.edu.cn), the peer-to-peer file sharing network Maze (http://maze.pku.edu.cn), and other popular web channels. He is a member of Eta Kappa Nu, a senior member of IEEE, currently a Vice President of China Computer Federation, International Editor of Concurrency (John Wiley), Associate Editor of Journal of Web Engineering (Rinton), and editor of Electronics Letters (Chinese Edition, IET, UK). He has published over 100 papers, authored Search Engine Principle, Technology, and Systems (Science Press, 2005), and received numerous achievement awards from the Ministry of Science and Technology, Ministry of Education, Beijing Municipal Government, and other agencies.
Host: Tao Xie, Computer Science