Alexandra Meliou
(University of Washington)"Why and How: A Reverse Perspective on Data Management"
Current trends have seen data grow larger, more intertwined, and more diverse, as more and more users contribute to and use it. This trend has given rise to the need to support richer data analysis tasks. Such tasks involve determining the causes of observations, finding and correcting the sources of error in query results, as well as modifying the data in order to make it conform to complex desirable properties.
In this talk I will discuss three challenges: (a) providing explanations through support for causal queries ("Why"), (b) tracing
and correcting errors at their source (post-factum data cleaning), and (c) integrating database systems with
constrained optimization capabilities ("How"). First, I will show how to apply causal reasoning to tuple provenance
in order to determine the causes of query results, and their responsibility. I will present extensive analysis
of the data complexity for the case of conjunctive queries, and focus on a complete dichotomy between
NP-hard and PTIME cases for the problem of computing responsibility. This concrete characterization of PTIME
cases is crucial in scaling up to the challenges of Big Data. Second, I will demonstrate the applicability
of the causality framework in a practical setting. I will use a mobile sensing application to show that
ranking provenance tuples by their degrees of responsibility identifies errors more effectively than other schemes.
Finally, I will present the Tiresias system, the first how-to query engine, which seamlessly integrates database
systems with constrained problem solving capabilities. The contributions of the system are threefold:
(a) a declarative interface for defining how-to queries over a database, (b) translation rules from the
declarative statements to the constrained problem specification, and (c) a suite of data-specific optimizations
that allow scaling to large data sizes. Initial results of our prototype system implementation
show order-of-magnitude speedups to state-of-the-art solver runtimes, which indicates that there are
significant gains in pushing this functionality within the database engine.
I will conclude with a summary of my contributions, and discuss my future steps with the Tiresias system,
and the bigger vision of reverse data management.
Alexandra is a faculty candidate
Zeit: | Montag, 27.02.2012, 10.30 Uhr |
---|---|
Ort: | Gebäude 49, Raum 206 |