Re: pgsql: Allow pushdown of WHERE quals into subqueries with window functi - Mailing list pgsql-hackers

From Noah Misch
Subject Re: pgsql: Allow pushdown of WHERE quals into subqueries with window functi
Date
Msg-id 20141220061958.GB1838835@tornado.leadboat.com
Whole thread Raw
List pgsql-hackers
On Sat, Jun 28, 2014 at 06:08:05AM +0000, Tom Lane wrote:
> Allow pushdown of WHERE quals into subqueries with window functions.
> 
> We can allow this even without any specific knowledge of the semantics
> of the window function, so long as pushed-down quals will either accept
> every row in a given window partition, or reject every such row.  Because
> window functions act only within a partition, such a case can't result
> in changing the window functions' outputs for any surviving row.
> Eliminating entire partitions in this way obviously can reduce the cost
> of the window-function computations substantially.
> 
> The fly in the ointment is that it's hard to be entirely sure whether
> this is true for an arbitrary qual condition.  This patch allows pushdown
> if (a) the qual references only partitioning columns, and (b) the qual
> contains no volatile functions.  We are at risk of incorrect results if
> the qual can produce different answers for values that the partitioning
> equality operator sees as equal.  While it's not hard to invent cases
> for which that can happen, it seems to seldom be a problem in practice,
> since no one has complained about a similar assumption that we've had
> for many years with respect to DISTINCT.  The potential performance
> gains seem to be worth the risk.

SQL leaves room for our DISTINCT optimization but not for this.  Where
pushdown of a qual beneath DISTINCT changes query results, it does so by
changing which one of various equal, distinguishable values survives to
represent a group.  SQL says:
   If the <set quantifier> DISTINCT is specified, and the most specific type   of a result column is character string,
datetimewith time zone or a   user-defined type, then the precise values retained in that column after   eliminating
redundantduplicates is implementation-dependent.
 

PostgreSQL conforms, making qual pushdown one of the factors determining which
value to retain.  This influences queries having a pushed-down qual that
distinguishes equal values.  An optimization fence changes the result:

SELECT * FROM (SELECT DISTINCT x FROM (VALUES (1.00),(1.1),(1.0)) t1(x) OFFSET 0) t0 WHERE length(x::text) <> 4; x  
-----1.1
(1 row)

SELECT * FROM (SELECT DISTINCT x FROM (VALUES (1.00),(1.1),(1.0)) t1(x)) t0 WHERE length(x::text) <> 4; x  
-----1.01.1
(2 rows)

Type "timetamptz" is unaffected in PostgreSQL.  Character strings are affected
less than SQL contemplates, because PostgreSQL strings use binary equality.  I
don't know why SQL omits "numeric" from the list of affected types.

Here is the corresponding scenario for window functions:

SELECT x, y FROM ( SELECT x, first_value(x) OVER (PARTITION BY x ORDER BY length(x::text)) AS y FROM (VALUES
(1.00),(1.0))t1(x) OFFSET 0
 
) t0 WHERE length(x::text) = 4; x   |  y  
------+-----1.00 | 1.0
(1 row)

SELECT x, y FROM ( SELECT x, first_value(x) OVER (PARTITION BY x ORDER BY length(x::text)) AS y FROM (VALUES
(1.00),(1.0))t1(x)
 
) t0 WHERE length(x::text) = 4; x   |  y   
------+------1.00 | 1.00
(1 row)

Only the first result is correct.  SQL does not give us the wiggle room it
offered for DISTINCT.  The conformance sacrifice is perhaps worthwhile
nonetheless.  I didn't see this highlighted in the original discussion, so I
wished to ensure this compromise is known explicitly.

If the system catalogs recorded whether each equality operator is also an
identity operator, we could restrict the optimization to types where the
equality operator of the partitioning opclass is so marked.  That eliminates
the conformance problem.  Such an annotation would facilitate other planner
advances; for example, predtest.c could attempt to refute f(x) given x = k.

Thanks,
nm



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Next
From: Mark Kirkwood
Date:
Subject: Re: Commitfest problems