PODS Invited Talks
Computational Thinking, Inferential Thinking and "Big Data"
Michael I. Jordan (University of California, Berkeley, USA)
The phenomenon of "Big Data" is creating a need for research perspectives that blend computational thinking (with its focus on, e.g., abstractions, algorithms and scalability) with inferential thinking (with its focus on, e.g., underlying populations, sampling patterns, error bars and predictions). Database researchers and statistical machine learning researchers are centrally involved in the creation of this blend, and research that incorporates perspectives from both databases and machine learning will be of particular value in the bigger picture. This is true both for methodology and for theory. I present highlights of several research initiatives that draw jointly on database and statistical foundations, including work on concurrency control and distributed inference, subsampling, time/data tradeoffs and inference/privacy tradeoffs.
Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley. He received his Masters in Mathematics from Arizona State University, and earned his PhD in Cognitive Science in 1985 from the University of California, San Diego. He was a professor at MIT from 1988 to 1998. His research interests bridge the computational, statistical, cognitive and biological sciences. Prof. Jordan is a member of the National Academy of Sciences, a member of the National Academy of Engineering and a member of the American Academy of Arts and Sciences. He is a Fellow of the American Association for the Advancement of Science. He has been named a Neyman Lecturer and a Medallion Lecturer by the Institute of Mathematical Statistics. He received the David E. Rumelhart Prize in 2015 and the ACM/AAAI Allen Newell Award in 2009. He is a Fellow of the AAAI, ACM, ASA, CSS, IEEE, IMS, ISBA and SIAM.
Compact summaries over large datasets
Graham Cormode (University of Warwick, UK)
It is often useful to build a compact summary data structure for a large data set, that will allow certain queries to be answered approximately. The database community has a rich history of building such summaries, in the form of histograms, wavelets and samples. However, it can sometimes be challenging to build such summaries when the data in question is large and distributed, and it is not convenient to sort or perform intensive processing of the data, even offline. A class of summaries for a variety of queries has emerged, drawing on algorithms that were originally created for streaming algorithms. These can be quite flexible: they can be computed for subsets of the data independently, then merged together to obtain a summary for the whole dataset. These allow approximate answering fundamental queries such as selectivity and frequency estimation, and newer summaries allow more complex functions from matrix and graph algorithms. This tutorial will introduce the concepts behind these summary methods, and outline several examples.
Graham Cormode is a Professor in Computer Science at the University of Warwick in the UK. He works on research topics in data management, privacy and big data analysis. Previously, he was a principal member of technical staff at AT&T Labs-Research from 2006-2013, and before this he was at Bell Labs, and Rutgers University. In 2013, he was recognized as a 'Distinguished Scientist' by the Association of Computing Machinery (ACM). His work has appeared in over 90 conference papers, 30 journal papers, and been awarded 25 US Patents. His work has received two best paper awards, and a ten year "Test of time" award for his work on sketching algorithms. He has edited two books on applications of algorithms to different areas, and coauthored a third. He currently serves as an associate editor for the ACM Transactions on Database Systems (TODS).
LogiQL, a Declarative Language for Enterprise Applications
Todd J. Green (LogicBlox, USA)
We give an overview of LogiQL, a declarative, Datalog-based language for data management and analytics, along with techniques for efficient evaluation of LogiQL programs, emphasizing theoretical foundations when possible. These techniques include: leapfrog triejoin and its associated incremental maintenance algorithm, which we measure against appropriate optimality criteria; purely-functional data structures, which provide elegant versioning and branching capabilities that are indispensable for LogiQL; and transaction repair, a lock-free concurrency control scheme that uses LogiQL, incremental maintenance, and purely-functional data structures as essential ingredients.
T.J. Green is a Computer Scientist at LogicBlox. He received his Ph.D. in Computer and Information Science from the University of Pennsylvania in 2009. His awards include Best Student Paper at ICDT 2009, the Morris and Dorothy Rubinoff Award in 2010 (awarded to the outstanding computer science dissertation from the University of Pennsylvania), an honorable mention for the 2011 Jim Gray SIGMOD dissertation award, an NSF CAREER award in 2010, and Best Paper Runner-Up at ICDE 2012. Prior to LogicBlox, he was an Assistant Professor at UC Davis (2009-2011), an engineer at Xyleme (2001-2003), and a lead developer at Microsoft (1997-2001).