Thread: Intersect/Union X AND/OR

Intersect/Union X AND/OR

From
Thiago Godoi
Date:
Hi all,

I found this presentation from B.  Momjian:

http://momjian.us/main/writings/pgsql/performance.pdf

I'm interested in what he said about " Intersect/Union X AND/OR " , Can I find a transcription or a video of this presentation? Can anyone explain it to me?

Thanks,

Thiago Godoi


 

Re: Intersect/Union X AND/OR

From
Bruce Momjian
Date:
Thiago Godoi wrote:
> Hi all,
>
> I found this presentation from B.  Momjian:
>
> http://momjian.us/main/writings/pgsql/performance.pdf
>
> I'm interested in what he said about " Intersect/Union X AND/OR " , Can I
> find a transcription or a video of this presentation? Can anyone explain it
> to me?

Well, there is a recording of the webcast on the EnterpriseDB web site,
but I am afraid they only allow viewing of 3+ hour webcasts by
EnterpriseDB customers.

The idea is that a query that uses an OR can be rewritten as two SELECTs
with a UNION between them.  I have seen rare cases where this is a win,
so I mentioned it in that talk.  Intersection is similarly possible for
AND in WHERE clauses.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: Intersect/Union X AND/OR

From
Merlin Moncure
Date:
On Fri, Dec 2, 2011 at 1:49 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Thiago Godoi wrote:
>> Hi all,
>>
>> I found this presentation from B.  Momjian:
>>
>> http://momjian.us/main/writings/pgsql/performance.pdf
>>
>> I'm interested in what he said about " Intersect/Union X AND/OR " , Can I
>> find a transcription or a video of this presentation? Can anyone explain it
>> to me?
>
> Well, there is a recording of the webcast on the EnterpriseDB web site,
> but I am afraid they only allow viewing of 3+ hour webcasts by
> EnterpriseDB customers.
>
> The idea is that a query that uses an OR can be rewritten as two SELECTs
> with a UNION between them.  I have seen rare cases where this is a win,
> so I mentioned it in that talk.  Intersection is similarly possible for
> AND in WHERE clauses.

I've seen this as well.  Also boolean set EXCEPT is useful as well in
the occasional oddball case.

merlin

Re: Intersect/Union X AND/OR

From
Thiago Godoi
Date:
Thanks for the answers.

I found one of these cases , but I'm trying to understand this. Why the performance is better? The number of tuples is making the difference?

My original query :

select table1.id
from table1, (select function(12345) id) table2
where table1.kind = 1234
and table1.id = table2.id

"Nested Loop  (cost=0.00..6.68 rows=1 width=12)"
"  Join Filter: ()"
"  ->  Seq Scan on recorte  (cost=0.00..6.39 rows=1 width=159)"
"        Filter: (id = 616)"
"  ->  Result  (cost=0.00..0.26 rows=1 width=0)"


-- function() returns a resultset

I tryed with explicit join and "in" , but the plan is the same.

When I changed the query to use intersect :


(select table1.id from table1 where table1.kind = 1234)
Intersect
(select function(12345) id)

The new plan is :

"HashSetOp Intersect  (cost=0.00..6.67 rows=1 width=80)"
"  ->  Append  (cost=0.00..6.67 rows=2 width=80)"
"        ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..6.40 rows=1 width=159)"
"              ->  Seq Scan on recorte  (cost=0.00..6.39 rows=1 width=159)"
"                    Filter: (id = 616)"
"        ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.27 rows=1 width=0)"
"              ->  Result  (cost=0.00..0.26 rows=1 width=0)"

The second plan is about 10 times faster than the first one.




2011/12/2 Merlin Moncure <mmoncure@gmail.com>
On Fri, Dec 2, 2011 at 1:49 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Thiago Godoi wrote:
>> Hi all,
>>
>> I found this presentation from B.  Momjian:
>>
>> http://momjian.us/main/writings/pgsql/performance.pdf
>>
>> I'm interested in what he said about " Intersect/Union X AND/OR " , Can I
>> find a transcription or a video of this presentation? Can anyone explain it
>> to me?
>
> Well, there is a recording of the webcast on the EnterpriseDB web site,
> but I am afraid they only allow viewing of 3+ hour webcasts by
> EnterpriseDB customers.
>
> The idea is that a query that uses an OR can be rewritten as two SELECTs
> with a UNION between them.  I have seen rare cases where this is a win,
> so I mentioned it in that talk.  Intersection is similarly possible for
> AND in WHERE clauses.

I've seen this as well.  Also boolean set EXCEPT is useful as well in
the occasional oddball case.

merlin



--
Thiago Godoi


 

Re: Intersect/Union X AND/OR

From
Bruce Momjian
Date:
Thiago Godoi wrote:
> Thanks for the answers.
>
> I found one of these cases , but I'm trying to understand this. Why the
> performance is better? The number of tuples is making the difference?
>
> My original query :
>
> select table1.id
> from table1, (select function(12345) id) table2
> where table1.kind = 1234
> and table1.id = table2.id
>
> "Nested Loop  (cost=0.00..6.68 rows=1 width=12)"
> "  Join Filter: ()"
> "  ->  Seq Scan on recorte  (cost=0.00..6.39 rows=1 width=159)"
> "        Filter: (id = 616)"
> "  ->  Result  (cost=0.00..0.26 rows=1 width=0)"
>
>
> -- function() returns a resultset
>
> I tryed with explicit join and "in" , but the plan is the same.
>
> When I changed the query to use intersect :
>
>
> (select table1.id from table1 where table1.kind = 1234)
> Intersect
> (select function(12345) id)
>
> The new plan is :
>
> "HashSetOp Intersect  (cost=0.00..6.67 rows=1 width=80)"
> "  ->  Append  (cost=0.00..6.67 rows=2 width=80)"
> "        ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..6.40 rows=1
> width=159)"
> "              ->  Seq Scan on recorte  (cost=0.00..6.39 rows=1 width=159)"
> "                    Filter: (id = 616)"
> "        ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.27 rows=1
> width=0)"
> "              ->  Result  (cost=0.00..0.26 rows=1 width=0)"
>
> The second plan is about 10 times faster than the first one.

Well, there are usually several ways to execute a query internally,
intsersect is using a different, and faster, method.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: Intersect/Union X AND/OR

From
Marti Raudsepp
Date:
On Mon, Dec 5, 2011 at 14:14, Thiago Godoi <thiagogodoi10@gmail.com> wrote:
> My original query :
>
> select table1.id
> from table1, (select function(12345) id) table2
> where table1.kind = 1234
> and table1.id = table2.id
>
> "Nested Loop  (cost=0.00..6.68 rows=1 width=12)"
> "  Join Filter: ()"
> "  ->  Seq Scan on recorte  (cost=0.00..6.39 rows=1 width=159)"
> "        Filter: (id = 616)"
> "  ->  Result  (cost=0.00..0.26 rows=1 width=0)"

Note that this EXPLAIN output is quite different from your query.
Intead of a "kind=1234" clause there's "id=616". Also, please post
EXPLAIN ANALYZE results instead whenever possible.

> When I changed the query to use intersect :
[...]
> The second plan is about 10 times faster than the first one.

Judging by these plans, the 1st one should not be slower.

Note that just running the query once and comparing times is often
misleading, especially for short queries, since noise often dominates
the query time -- depending on how busy the server was at the moment,
what kind of data was cached, CPU power management/frequency scaling,
etc. ESPECIALLY don't compare pgAdmin timings since those also include
network variance, the time taken to render results on your screen and
who knows what else.

A simple way to benchmark is with pgbench. Just write the query to a
text file (it needs to be a single line and not more than ~4000
characters).
Then run 'pgbench -n -f pgbench_script -T 5' to run it for 5 seconds.
These results  are still not entirely reliable, but much better than
pgAdmin timings.

Regards,
Marti