Multiple Uniques - Mailing list pgsql-performance

From Markus Schaber
Subject Multiple Uniques
Date
Msg-id 20040902154340.5b1a669a@kingfisher.intern.logi-track.com
Whole thread Raw
Responses Re: Multiple Uniques  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Multiple Uniques  (Greg Stark <gsstark@mit.edu>)
List pgsql-performance
Hello,

Today, we stumbled about the following query plan on PostGreSQL 7.4.1:

logigis=# explain select count(id) from (select distinct id from (select distinct ref_in_id as id from streets union
selectdistinct nref_in_id as id from streets) as blubb) as blabb;  
                                                              QUERY PLAN
              

--------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=33246086.32..33246086.32 rows=1 width=8)
   ->  Subquery Scan blabb  (cost=32970061.50..33246085.82 rows=200 width=8)
         ->  Unique  (cost=32970061.50..33246083.82 rows=200 width=8)
               ->  Sort  (cost=32970061.50..33108072.66 rows=55204464 width=8)
                     Sort Key: id
                     ->  Subquery Scan blubb  (cost=23697404.79..24525471.75 rows=55204464 width=8)
                           ->  Unique  (cost=23697404.79..23973427.11 rows=55204464 width=8)
                                 ->  Sort  (cost=23697404.79..23835415.95 rows=55204464 width=8)
                                       Sort Key: id
                                       ->  Append  (cost=7212374.04..15252815.03 rows=55204464 width=8)
                                             ->  Subquery Scan "*SELECT* 1"  (cost=7212374.04..7626407.52 rows=27602232
width=8)
                                                   ->  Unique  (cost=7212374.04..7350385.20 rows=27602232 width=8)
                                                         ->  Sort  (cost=7212374.04..7281379.62 rows=27602232 width=8)
                                                               Sort Key: ref_in_id
                                                               ->  Seq Scan on streets  (cost=0.00..3129090.32
rows=27602232width=8) 
                                             ->  Subquery Scan "*SELECT* 2"  (cost=7212374.04..7626407.52 rows=27602232
width=8)
                                                   ->  Unique  (cost=7212374.04..7350385.20 rows=27602232 width=8)
                                                         ->  Sort  (cost=7212374.04..7281379.62 rows=27602232 width=8)
                                                               Sort Key: nref_in_id
                                                               ->  Seq Scan on streets  (cost=0.00..3129090.32
rows=27602232width=8) 
(20 rows)

I might have to add that this query is not an every-day query, it's
merely part of some statistics that a colleague needs for some
estimations as he has to create a tool that works on this data.

Both ref_in_id and nref_in_id are non-indexed columns with type int8.

I was somehow irritated by the fact that the Query Plan contains 4 Uniques.

Then, I tried the following query:

logigis=# explain select count(id) from (select distinct id from (select  ref_in_id as id from streets union select
nref_in_idas id from streets) as blubb) as blabb; 
                                                        QUERY PLAN
   

---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=24803496.57..24803496.57 rows=1 width=8)
   ->  Subquery Scan blabb  (cost=24527471.75..24803496.07 rows=200 width=8)
         ->  Unique  (cost=24527471.75..24803494.07 rows=200 width=8)
               ->  Sort  (cost=24527471.75..24665482.91 rows=55204464 width=8)
                     Sort Key: id
                     ->  Subquery Scan blubb  (cost=15254815.03..16082881.99 rows=55204464 width=8)
                           ->  Unique  (cost=15254815.03..15530837.35 rows=55204464 width=8)
                                 ->  Sort  (cost=15254815.03..15392826.19 rows=55204464 width=8)
                                       Sort Key: id
                                       ->  Append  (cost=0.00..6810225.28 rows=55204464 width=8)
                                             ->  Subquery Scan "*SELECT* 1"  (cost=0.00..3405112.64 rows=27602232
width=8)
                                                   ->  Seq Scan on streets  (cost=0.00..3129090.32 rows=27602232
width=8)
                                             ->  Subquery Scan "*SELECT* 2"  (cost=0.00..3405112.64 rows=27602232
width=8)
                                                   ->  Seq Scan on streets  (cost=0.00..3129090.32 rows=27602232
width=8)
(14 rows)

And after re-parsing the documentation about UNION, the following one:

logigis=# explain select count(id) from (select  ref_in_id as id from streets union select  nref_in_id as id from
streets)as blubb; 
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Aggregate  (cost=16220893.16..16220893.16 rows=1 width=8)
   ->  Subquery Scan blubb  (cost=15254815.03..16082881.99 rows=55204464 width=8)
         ->  Unique  (cost=15254815.03..15530837.35 rows=55204464 width=8)
               ->  Sort  (cost=15254815.03..15392826.19 rows=55204464 width=8)
                     Sort Key: id
                     ->  Append  (cost=0.00..6810225.28 rows=55204464 width=8)
                           ->  Subquery Scan "*SELECT* 1"  (cost=0.00..3405112.64 rows=27602232 width=8)
                                 ->  Seq Scan on streets  (cost=0.00..3129090.32 rows=27602232 width=8)
                           ->  Subquery Scan "*SELECT* 2"  (cost=0.00..3405112.64 rows=27602232 width=8)
                                 ->  Seq Scan on streets  (cost=0.00..3129090.32 rows=27602232 width=8)
(10 rows)

Those queries should give the same result.

So, now my question is, why does the query optimizer not recognize that
it can throw away those "non-unique" Sort/Unique passes?

Is PostGreSQL 8 capable of this optimization?


Thanks,
Markus


--
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:schabios@logi-track.com | www.logi-track.com

pgsql-performance by date:

Previous
From: "Merlin Moncure"
Date:
Subject: Re: seqscan instead of index scan
Next
From: Tom Lane
Date:
Subject: Re: Multiple Uniques