Re: BUG #14107: Major query planner bug regarding subqueries and indices - Mailing list pgsql-bugs

From Mathias Kunter
Subject Re: BUG #14107: Major query planner bug regarding subqueries and indices
Date
Msg-id e2090738-62b9-7697-886a-b2a01a8c9482@gmail.com
Whole thread Raw
In response to Re: BUG #14107: Major query planner bug regarding subqueries and indices  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
> If you know that the
> sub-select isn't going to return very many rows, you could do
>
> SELECT ... FROM a WHERE a.x = ? OR a.y = ANY(ARRAY(SELECT ...));

Isn't the planner already doing something like this, since the following
query is using the index as expected:

SELECT ... FROM a WHERE a.x IN (SELECT ...);

> but this would blow up rather badly with a large sub-select result,
> so I'm not sure I want to try to make the planner transform it that
> way automatically.

Wouldn't it be possible then to use this optimization based on the
estimated result size of the subquery? I think this would almost always
be faster than a sequential scan anyway. I observed that using the
ANY(ARRAY(SELECT...)) syntax on my small test tables (100 K rows)
already improves the query time by a factor of more than 100, and it
will be even more when tables are large. Please consider implementing
this optimization!

> I don't actually see any way to do very much with your second example at
> all:
>
>> SELECT ... FROM a JOIN b ON (...) WHERE a.x = ? OR b.y = ?;

Assuming both joined tables contain a PK (or another unique column),
then it should be possible by replanning the query as:

SELECT ... FROM a JOIN b ON ((join_cond AND a.x = ?) OR (join_cond AND
b.y = ?));

(Let "join_cond" denote the original join condition here.) Now, the JOIN
implementation must be smart enough to handle OR conditions: First,
obtain the rows satisfying

join_cond AND a.x = ?

by using the index as usual. For each matching row, create the tuple
(a.id, b.id) and insert it into a search tree (or hash or whatever).
Then, obtain the rows satisfying

join_cond AND b.y = ?

For each matching row, query the search tree whether it already contains
the tuple (a.id, b.id), and only add the current row to the final result
if it doesn't.

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #14109: pg_rewind fails to update target control file in one scenario
Next
From: m.giraldo@stt-telefonia.it
Date:
Subject: BUG #14121: Constraint UNIQUE