Our goal is to bridge the difference between theoretical and practical approaches to answering queries over databases with nulls, and to suggest ways in which SQL's handling of nulls can be modified to provide correctness guarantees. Theoretical research has long ago identified the notion of correctness of query answering over incomplete data: one needs to find certain answers, which are true regardless of how incomplete information is interpreted. In practice, on the other hand, query answering must be very efficient, and to achieve this, SQL uses three-valued logic for evaluating queries on databases with nulls. Due to the complexity mismatch, the two approaches cannot coincide, but perhaps they are related in some way?
For instance, does SQL always produce answers we can be certain about?
This is not so: SQL's and certain answers semantics could be totally unrelated, and SQL can produce both false negatives (i.e., miss some tuples that are in the answer) and even worse, false positives (i.e., return tuples that are completely wrong). However, it is possible to produce translations of SQL queries that eliminate false positives and
return only certain answers. In fact there are several such translations and all of them guarantee good complexity bounds, at least in theory. In practice, one of them performs quite well. Since one modifies queries to ensure correctness, one in general expects some slowdown; the hope is that it is negligible. This is what we observe in the majority of our experiments. Others, where the behavior is different, reveal issues with commercial query optimizers that go well beyond the problem of handling incomplete data.
Leonid Libkin is Professor of Foundations of Data Management in the School of Informatics at the University of Edinburgh. He was previously a Professor at the University of Toronto and a member of research staff at Bell Laboratories in Murray Hill. His main research interests are in the areas of data management and applications of logic in computer science. He has written five books and over 200 technical papers. His awards include a Marie Curie Chair Award and five Best Paper Awards. He has chaired programme committees of major database conferences (ACM PODS, ICDT) and was the conference chair of the 2010 Federated Logic Conference. He has given many invited conference talks and has served on multiple program committees and editorial boards. He is an ACM fellow, a fellow of the Royal Society
of Edinburgh, and a member of Academia Europaea.
The talk will be given at Postgres Professional seminar at 9 August 2016 14:00. To register for participation, please send a message to firstname.lastname@example.org.