Scalable Clustering for Data Mining Applications

Dr. Joydeep Ghosh
Monday, May 22, 2006
3:00PM - CSB-232

Abstract


Clustering or segmentation has been wide studied across multiple disciplines for over 40 years, leading to a vast array of techniques. However several data mining applications involve complex data characteristics or domain constraints that severely challenge such classical techniques. I will describe two next-generation clustering approaches that are able to address some of these challenges. First, we shall see how a graph-patitioning based "discriminative" approach to clustering is able to do market-basket analysis of very large retail datasets. Second, we look at a versatile "generative" approach to clustering that shows how the venerable k-means algorithm, which is based on the squared loss, can be generalized to deal a very large class of loss functions without losing its simplicity or scalability. This allows the generalized approach to be applicable to a very large variety of data types/characteristics. Finally, I shall introduce the powerful idea of co-clustering or biclustering (simultaneous clustering of features and objects) and show results on gene expression (microarray) data to illustrate its power.

Short Bio


Joydeep Ghosh joined the UT-Austin faculty in 1988 after being educated at IIT Kanpur , (B. Tech '83) and The University of Southern California (MS, Ph.D). He is currently the Cullen Professor in Engineering, and a Fellow of the IEEE. He is the founder-director of IDEAL (Intelligent Data Exploration and Analysis Lab) . His research interests lie primarily in the theory of adaptive multi-learner systems, intelligent data analysis, data mining and web mining, and their applications to a wide variety of complex engineering and AI problems.

Dr. Ghosh has published more than 200 refereed papers and co-edited 16 books. He received the 2005 Best Research Paper Award from UT Co-op Society and the 1992 Darlington Award given by the IEEE Circuits and Systems Society for the Best Paper in the areas of CAS/CAD, besides nine other "best paper" awards over the years. He was the Conference Co-Chair of Artificial Neural Networks in Engineering (ANNIE) '93 to '96 and '99 to '03 and Program Co-Chair for The SIAM Int'l Conf. on Data Mining (SDM'06) . He also serves on the program committee of several top conferences on data mining, neural networks, pattern recognition, and web analytics every year. Dr. Ghosh has been a plenary/keynote speaker on several occasions such as ANNIE'97 and MCS 2002, and has widely lectured on intelligent analysis of large-scale data. He has co-organized workshops on high dimensional clustering (ICDM 2003; SDM 2005), Web Analytics (with SIAM Int'l Conf. on Data Mining, SDM2002), Web Mining (with SDM 2001), and on Parallel and Distributed Knowledge Discovery ( with KDD-2000).

Dr. Ghosh has served as a consultant or advisor to a variety of companies, from successful startups such as Neonyoyo and Knowledge Discovery One, to large corporations such as IBM, Motorola and Vinson & Elkins. His research group has been supported by the NSF, Google, ONR, ARO, AFOSR, Intel, IBM, Motorola, TRW, Schlumberger and Dell, among others. At UT, Dr. Ghosh teaches graduate courses on data mining, artificial neural networks, and web analytics. He was voted the Best Professor by the Software Engineering Executive Education Class.