Re: Reference to parent query from ANY sublink - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Reference to parent query from ANY sublink
Date
Msg-id 10795.1386796541@sss.pgh.pa.us
Whole thread Raw
In response to Re: Reference to parent query from ANY sublink  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: Reference to parent query from ANY sublink
Re: Reference to parent query from ANY sublink
List pgsql-hackers
Kevin Grittner <kgrittn@ymail.com> writes:
> Kevin Grittner <kgrittn@ymail.com> wrote:
>> I applied it to master and ran the regression tests, and one of
>> the subselect tests failed.
>> 
>> This query:
>> 
>> SELECT '' AS six, f1 AS "Correlated Field", f3 AS "Second
>> Field"
>> �� FROM SUBSELECT_TBL upper
>> �� WHERE f1 IN
>> ���� (SELECT f2 FROM SUBSELECT_TBL WHERE CAST(upper.f2 AS float) = f3);

>> [ ... ] during the `make check` or `make install-check` [ ... ]
>> is missing the last two rows.� Oddly, if I go into the database
>> later and try it, the rows show up.� It's not immediately
>> apparent to me what's wrong.

> Using the v2 patch, with the default statistics from table
> creation, the query modified with an alias of "lower" for the
> second reference, just for clarity, yields a plan which generates
> incorrect results:

> �Hash Join� (cost=37.12..80.40 rows=442 width=12) (actual time=0.059..0.064 rows=3 loops=1)
> �� Hash Cond: (((upper.f2)::double precision = lower.f3) AND (upper.f1 = lower.f2))
> �� ->� Seq Scan on subselect_tbl upper� (cost=0.00..27.70 rows=1770 width=16) (actual time=0.006..0.007 rows=8
loops=1)
> �� ->� Hash� (cost=34.12..34.12 rows=200 width=12) (actual time=0.020..0.020 rows=5 loops=1)
> �������� Buckets: 1024� Batches: 1� Memory Usage: 1kB
> �������� ->� HashAggregate� (cost=32.12..34.12 rows=200 width=12) (actual time=0.014..0.018 rows=6 loops=1)
> �������������� ->� Seq Scan on subselect_tbl lower� (cost=0.00..27.70 rows=1770 width=12) (actual time=0.002..0.004
rows=8loops=1)
 
> �Total runtime: 0.111 ms

FWIW, that plan isn't obviously wrong; if it is broken, most likely the
reason is that the HashAggregate is incorrectly unique-ifying the lower
table.  (Unfortunately, EXPLAIN doesn't show enough about the HashAgg
to know what it's doing exactly.)  The given query is, I think, in
principle equivalent to
SELECT ... FROM SUBSELECT_TBL upper WHERE (f1, f2::float) IN   (SELECT f2, f3 FROM SUBSELECT_TBL);

and if you ask unmodified HEAD to plan that you get
Hash Join  (cost=41.55..84.83 rows=442 width=16)  Hash Cond: ((upper.f1 = subselect_tbl.f2) AND ((upper.f2)::double
precision= subselect_tbl.f3))  ->  Seq Scan on subselect_tbl upper  (cost=0.00..27.70 rows=1770 width=16)  ->  Hash
(cost=38.55..38.55rows=200 width=12)        ->  HashAggregate  (cost=36.55..38.55 rows=200 width=12)              ->
SeqScan on subselect_tbl  (cost=0.00..27.70 rows=1770 width=12)
 

which is the same thing at the visible level of detail ... but this
version computes the correct result.  The cost of the HashAggregate is
estimated higher, though, which suggests that maybe it's distinct'ing on
two columns where the bogus plan only does one.

Not sure about where Antonin's patch is going off the rails.  I suspect
it's too simple somehow, but it's also possible that it's OK and the
real issue is some previously undetected bug in LATERAL processing.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: autovacuum_work_mem
Next
From: Gavin Flower
Date:
Subject: Re: ANALYZE sampling is too good