Thread: Speed difference between select ... union select ... and select from partitioned_table

Hi List!

I executed 2 equivalents queries. The first one uses a union structure.
The second uses a partitioned table. The tables are the same with 30
millions of rows each one and the returned rows are the same.

But the union query perform faster than the partitioned query.

My question is: why? :)

[pabloa@igor testeo]$ cat query-union.sql
select e, p,  sum( c) as c
from (
        select e, p, count( *) as c
        from tt_00003
        group by e, p
        union
        select e, p, count( *) as c
        from tt_00006
        group by e, p
        union
        select e, p, count( *) as c
        from tt_00009
        group by e, p
        union
        select e, p, count( *) as c
        from tt_00012
        group by e, p
        union
        select e, p, count( *) as c
        from tt_00015
        group by e, p
) as t
group by e, p
order by e, p desc;



[pabloa@igor testeo]$ cat query-heritage.sql
select e, p,  count( *) as c
from tt
group by e, p
order by e, p desc;


The server is a Athlon 64x2 6000+ 2 Gb RAM PostreSQL 8.2.5

The structure tables are:

CREATE TABLE tt_00003
(
-- Inherited:   idtt bigint NOT NULL,
-- Inherited:   idttp bigint NOT NULL,
-- Inherited:   e integer NOT NULL,
-- Inherited:   dmodi timestamp without time zone NOT NULL DEFAULT now(),
-- Inherited:   p integer NOT NULL DEFAULT 0,
-- Inherited:   m text NOT NULL,
  CONSTRAINT tt_00003_pkey PRIMARY KEY (idtt),
  CONSTRAINT tt_00003_idtt_check CHECK (idtt >= 1::bigint AND idtt <=
30000000::bigint)
) INHERITS (tt)
WITHOUT OIDS;
ALTER TABLE tt_00003 ;

CREATE INDEX tt_00003_e
  ON tt_00003
  USING btree
  (e);






On Fri, 2007-10-26 at 16:37 -0400, Pablo Alcaraz wrote:
> Hi List!
>
> I executed 2 equivalents queries. The first one uses a union structure.
> The second uses a partitioned table. The tables are the same with 30
> millions of rows each one and the returned rows are the same.
>
> But the union query perform faster than the partitioned query.
>

I think you mean to use UNION ALL here. UNION forces a DISTINCT, which
results in a sort operation. What surprises me is that the UNION is
actually faster than the partitioning using inheritance.

I suspect it has something to do with the GROUP BYs, but we won't know
until you post EXPLAIN ANALYZE results.

    Regards,
    Jeff Davis


I forgot to post the times:

query-union: 21:59
query-heritage: 1:31:24

Regards

Pablo

Pablo Alcaraz wrote:
> Hi List!
>
> I executed 2 equivalents queries. The first one uses a union
> structure. The second uses a partitioned table. The tables are the
> same with 30 millions of rows each one and the returned rows are the
> same.
>
> But the union query perform faster than the partitioned query.
>
> My question is: why? :)
>
> [pabloa@igor testeo]$ cat query-union.sql
> select e, p,  sum( c) as c
> from (
>        select e, p, count( *) as c
>        from tt_00003
>        group by e, p
>        union
>        select e, p, count( *) as c
>        from tt_00006
>        group by e, p
>        union
>        select e, p, count( *) as c
>        from tt_00009
>        group by e, p
>        union
>        select e, p, count( *) as c
>        from tt_00012
>        group by e, p
>        union
>        select e, p, count( *) as c
>        from tt_00015
>        group by e, p
> ) as t
> group by e, p
> order by e, p desc;
>
>
>
> [pabloa@igor testeo]$ cat query-heritage.sql
> select e, p,  count( *) as c
> from tt
> group by e, p
> order by e, p desc;
>
>
> The server is a Athlon 64x2 6000+ 2 Gb RAM PostreSQL 8.2.5
>
> The structure tables are:
>
> CREATE TABLE tt_00003
> (
> -- Inherited:   idtt bigint NOT NULL,
> -- Inherited:   idttp bigint NOT NULL,
> -- Inherited:   e integer NOT NULL,
> -- Inherited:   dmodi timestamp without time zone NOT NULL DEFAULT now(),
> -- Inherited:   p integer NOT NULL DEFAULT 0,
> -- Inherited:   m text NOT NULL,
>  CONSTRAINT tt_00003_pkey PRIMARY KEY (idtt),
>  CONSTRAINT tt_00003_idtt_check CHECK (idtt >= 1::bigint AND idtt <=
> 30000000::bigint)
> ) INHERITS (tt)
> WITHOUT OIDS;
> ALTER TABLE tt_00003 ;
>
> CREATE INDEX tt_00003_e
>  ON tt_00003
>  USING btree
>  (e);
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                http://www.postgresql.org/about/donate
>


"Pablo Alcaraz" <pabloa@laotraesquina.com.ar> writes:

> Hi List!
>
> I executed 2 equivalents queries. The first one uses a union structure. The
> second uses a partitioned table. The tables are the same with 30 millions of
> rows each one and the returned rows are the same.
>
> But the union query perform faster than the partitioned query.
>
> My question is: why? :)
>
> [pabloa@igor testeo]$ cat query-union.sql
> select e, p,  sum( c) as c
> from (
>        select e, p, count( *) as c
>        from tt_00003
>        group by e, p
>        union
>        select e, p, count( *) as c
>        from tt_00006
>        group by e, p
>        union
...


You should send along the "explain analyze" results for both queries,
otherwise we're just guessing.

Also, you should consider using UNION ALL instead of plain UNION.

Finally you should consider removing all the intermediate GROUP BYs and just
group the entire result. In theory it should be faster but in practice I'm not
sure it works out that way.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

These are the EXPLAIN ANALIZE:

I ran both queries on a CLUSTER and ANALYZEd tables:

UNION QUERY
explain analyze
select e, p,  sum( c) as c
from (
        select e, p, count( *) as c
        from tt_00003
        group by e, p
        union
        select e, p, count( *) as c
        from tt_00006
        group by e, p
        union
        select e, p, count( *) as c
        from tt_00009
        group by e, p
        union
        select e, p, count( *) as c
        from tt_00012
        group by e, p
        union
        select e, p, count( *) as c
        from tt_00015
        group by e, p
) as t
group by e, p
order by e, p desc;

"Sort  (cost=2549202.87..2549203.37 rows=200 width=16) (actual time=263593.182..263593.429 rows=207 loops=1)"
"  Sort Key: t.e, t.p"
"  ->  HashAggregate  (cost=2549192.73..2549195.23 rows=200 width=16) (actual time=263592.469..263592.763 rows=207 loops=1)"
"        ->  Unique  (cost=2549172.54..2549179.88 rows=734 width=8) (actual time=263590.481..263591.764 rows=356 loops=1)"
"              ->  Sort  (cost=2549172.54..2549174.38 rows=734 width=8) (actual time=263590.479..263590.891 rows=356 loops=1)"
"                    Sort Key: e, p, c"
"                    ->  Append  (cost=1307131.88..2549137.60 rows=734 width=8) (actual time=132862.176..263589.774 rows=356 loops=1)"
"                          ->  HashAggregate  (cost=1307131.88..1307133.03 rows=92 width=8) (actual time=132862.173..132862.483 rows=200 loops=1)"
"                                ->  Seq Scan on tt_00003  (cost=0.00..1081550.36 rows=30077536 width=8) (actual time=10.135..83957.424 rows=30000000 loops=1)"
"                          ->  HashAggregate  (cost=1241915.64..1241916.16 rows=42 width=8) (actual time=130726.219..130726.457 rows=156 loops=1)"
"                                ->  Seq Scan on tt_00006  (cost=0.00..1028793.22 rows=28416322 width=8) (actual time=11.389..87338.730 rows=28351293 loops=1)"
"                          ->  HashAggregate  (cost=24.53..27.03 rows=200 width=8) (actual time=0.005..0.005 rows=0 loops=1)"
"                                ->  Seq Scan on tt_00009  (cost=0.00..18.30 rows=830 width=8) (actual time=0.002..0.002 rows=0 loops=1)"
"                          ->  HashAggregate  (cost=24.53..27.03 rows=200 width=8) (actual time=0.004..0.004 rows=0 loops=1)"
"                                ->  Seq Scan on tt_00012  (cost=0.00..18.30 rows=830 width=8) (actual time=0.002..0.002 rows=0 loops=1)"
"                          ->  HashAggregate  (cost=24.53..27.03 rows=200 width=8) (actual time=0.005..0.005 rows=0 loops=1)"
"                                ->  Seq Scan on tt_00015  (cost=0.00..18.30 rows=830 width=8) (actual time=0.001..0.001 rows=0 loops=1)"
"Total runtime: 263594.381 ms"


PARTITIONED QUERY

explain analyze
select e, p,  count( *) as c
from tt
group by e, p
order by e, p desc;

"GroupAggregate  (cost=13256958.67..13842471.95 rows=40000 width=8) (actual time=899391.384..1065585.531 rows=207 loops=1)"
"  ->  Sort  (cost=13256958.67..13403211.99 rows=58501328 width=8) (actual time=899391.364..989749.914 rows=58351293 loops=1)"
"        Sort Key: public.tt.e, public.tt.p"
"        ->  Append  (cost=0.00..2110508.28 rows=58501328 width=8) (actual time=14.031..485211.466 rows=58351293 loops=1)"
"              ->  Seq Scan on tt  (cost=0.00..18.30 rows=830 width=8) (actual time=0.002..0.002 rows=0 loops=1)"
"              ->  Seq Scan on tt_00003 tt  (cost=0.00..1081550.36 rows=30077536 width=8) (actual time=14.024..178657.738 rows=30000000 loops=1)"
"              ->  Seq Scan on tt_00006 tt  (cost=0.00..1028793.22 rows=28416322 width=8) (actual time=39.852..168307.030 rows=28351293 loops=1)"
"              ->  Seq Scan on tt_00009 tt  (cost=0.00..18.30 rows=830 width=8) (actual time=0.001..0.001 rows=0 loops=1)"
"              ->  Seq Scan on tt_00012 tt  (cost=0.00..18.30 rows=830 width=8) (actual time=0.001..0.001 rows=0 loops=1)"
"              ->  Seq Scan on tt_00015 tt  (cost=0.00..18.30 rows=830 width=8) (actual time=0.001..0.001 rows=0 loops=1)"
"              ->  Seq Scan on tt_00018 tt  (cost=0.00..18.30 rows=830 width=8) (actual time=0.001..0.001 rows=0 loops=1)"
"              ->  Seq Scan on tt_00021 tt  (cost=0.00..18.30 rows=830 width=8) (actual time=0.002..0.002 rows=0 loops=1)"
"              ->  Seq Scan on tt_00024 tt  (cost=0.00..18.30 rows=830 width=8) (actual time=0.002..0.002 rows=0 loops=1)"
"              ->  Seq Scan on tt_00027 tt  (cost=0.00..18.30 rows=830 width=8) (actual time=0.002..0.002 rows=0 loops=1)"
"              ->  Seq Scan on tt_00030 tt  (cost=0.00..18.30 rows=830 width=8) (actual time=0.002..0.002 rows=0 loops=1)"
"Total runtime: 1066301.084 ms"

Any idea?

Regards

Pablo


Jeff Davis wrote:
On Fri, 2007-10-26 at 16:37 -0400, Pablo Alcaraz wrote: 
Hi List!

I executed 2 equivalents queries. The first one uses a union structure. 
The second uses a partitioned table. The tables are the same with 30 
millions of rows each one and the returned rows are the same.

But the union query perform faster than the partitioned query.
   
I think you mean to use UNION ALL here. UNION forces a DISTINCT, which
results in a sort operation. What surprises me is that the UNION is
actually faster than the partitioning using inheritance.

I suspect it has something to do with the GROUP BYs, but we won't know
until you post EXPLAIN ANALYZE results.
   Regards,Jeff Davis


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
 

Pablo Alcaraz <pabloa@laotraesquina.com.ar> writes:
> These are the EXPLAIN ANALIZE:

If you raise work_mem enough to let the second query use a hash
aggregate (probably a few MB would do it), I think it'll be about
the same speed as the first one.

The reason it's not picking that on its own is the overestimate
of the number of resulting groups.  This is because
get_variable_numdistinct is not smart about append relations.
We should try to fix that sometime...

            regards, tom lane

On Fri, 2007-10-26 at 16:37 -0400, Pablo Alcaraz wrote:

> I executed 2 equivalents queries. The first one uses a union structure.
> The second uses a partitioned table. The tables are the same with 30
> millions of rows each one and the returned rows are the same.
>
> But the union query perform faster than the partitioned query.
>
> My question is: why? :)

The two queries are equivalent but they have different execution plans.

The UNION query has explicit GROUP BY operations within it. We do not
currently perform a push-down operation onto the individual partitions.
This results in more data copying as well as requiring a single very
large sort, rather than lots of small ones. That is probably enough to
allow it to perform the sort in memory rather than on-disk, thus
allowing a considerable speed-up.

This is on my list of requirements for further partitioning improvements
in 8.4 or beyond.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


Pablo Alcaraz wrote:
These are the EXPLAIN ANALIZE:


If you raise work_mem enough to let the second query use a hash
aggregate (probably a few MB would do it), I think it'll be about
the same speed as the first one.

The reason it's not picking that on its own is the overestimate
of the number of resulting groups.  This is because
get_variable_numdistinct is not smart about append relations.
We should try to fix that sometime...


I re run the partitioned-query. it completed in 15996 seconds. It builded a BIG temp file:

[root@igor xxx]# ls -lh pgsql-data/data/16386/pgsql_tmp/
total 2.2G
-rw------- 1 postgres postgres 1.0G Oct 27 15:35 pgsql_tmp7004.0
-rw------- 1 postgres postgres 1.0G Oct 27 15:35 pgsql_tmp7004.1
-rw------- 1 postgres postgres 175M Oct 27 15:35 pgsql_tmp7004.2

work_mem=1Mb. How much do I need to raise work_mem variable? 2.2G?

Regards

Pablo