Thread: ORDER BY costs

ORDER BY costs

From
Carlos Benkendorf
Date:
Hi,
 
We´ve a SELECT that even without ORDER BY is returning the rows in the order that we liked but when we add the ORDER BY clause the runtime and costs are much bigger.
 
We have to use ORDER BY otherwise in some future postgresql version probably it will not return in the correct order anymore.
 
But if we use ORDER BY it´s too much expensive... is there a way to have the same costs and runtime but with the ORDER BY clause?
 
Why is not the planner using the access plan builded for the "without order by" select  even if we use the order by clause? The results are both the same...
 
Postgresql version: 8.0.3
 
Without order by:
explain analyze
SELECT * FROM iparq.ARRIPT
where
(ANOCALC =  2005
and CADASTRO =  19
and CODVENCTO =  00
a nd PARCELA >=  00 )
or
(ANOCALC =  2005
and CADASTRO =  19
and CODVENCTO >  00 )
or
(ANOCALC =  2005
and CADASTRO >  19 )
or
(ANOCALC >  2005 );
 Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript  (cost=0.00..122255.35 rows=146602 width=897) (actual time=9.303..1609.987 rows=167710 loops=1)
   Index Cond: (((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto = 0::numeric) AND (parcela >= 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto > 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro > 19::numeric)) OR (anocalc > 2005::numeric))
 Total runtime: 1712.456 ms
(3 rows)
 
 
With order by:
explain analyze
SELECT * FROM iparq.ARRIPT
where
(ANOCALC =  2005
and CADASTRO =  19
and COD VENCTO =  00
and PARCELA >=  00 )
or
(ANOCALC =  2005
and CADASTRO =  19
and CODVENCTO >  00 )
or
(ANOCALC =  2005
and CADASTRO >  19 )
or
(ANOCALC >  2005 )
order by ANOCALC asc, CADASTRO asc, CODVENCTO asc, PARCELA asc;
 Sort  (cost=201296.59..201663.10 rows=146602 width=897) (actual time=9752.555..10342.363 rows=167710 loops=1)
   Sort Key: anocalc, cadastro, codvencto, parcela
   ->  Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript  (cost=0.00..122255.35 rows=146602 width=897) (actual time=0.402..1425.085 rows=167710 loops=1)
         Index Cond: (((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto = 0::numeric) AND (parcela >= 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto > 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro > 19::numeric)) OR (anocalc > 2005::numeric))
 Total runtime: 10568.290 ms
(5 rows)

Table definition:
                 Table "iparq.arript"
      Column       |         Type          | Modifiers
-------------------+-----------------------+-----------
 anocalc           | numeric(4,0)          | not null
 cadastro          | numeric(8,0)          | not null
 codvencto         | numeric(2,0)          | not null
 parcela           | numeric(2,0)          | not null
 inscimob          | character varying(18) | not null
 codvencto2        | numeric(2,0)          | not null
 parcela2          | numeric(2,0)          | not null
 codpropr          | numeric(10,0)         | not null
 dtaven            | numeric(8,0)          | not null
 anocalc2          | numeric(4,0)          |
...
...
Indexes:
    "pk_arript" PRIMARY KEY, btree (anocalc, cadastro, codvencto, parcela)
    "iarchave04" UNIQUE, btree (cadastro, anocalc, codvencto, parcela)
    "iarchave02" btree (inscimob, anocalc, codvencto2, parcela2)
    "iarchave03" btree (codpropr, dtaven)
    "iarchave05" btree (anocalc, inscimob, codvencto2, parcela2)
 
Best regards and thank you very much in advance,
 
Carlos Benkendorf


Yahoo! doce lar. Faça do Yahoo! sua homepage.

Re: ORDER BY costs

From
Tom Lane
Date:
Carlos Benkendorf <carlosbenkendorf@yahoo.com.br> writes:
>                  Table "iparq.arript"
>       Column       |         Type          | Modifiers
> -------------------+-----------------------+-----------
>  anocalc           | numeric(4,0)          | not null
>  cadastro          | numeric(8,0)          | not null
>  codvencto         | numeric(2,0)          | not null
>  parcela           | numeric(2,0)          | not null
>  inscimob          | character varying(18) | not null
>  codvencto2        | numeric(2,0)          | not null
>  parcela2          | numeric(2,0)          | not null
>  codpropr          | numeric(10,0)         | not null
>  dtaven            | numeric(8,0)          | not null
>  anocalc2          | numeric(4,0)          |

I suspect you'd find a significant performance improvement from changing
the NUMERIC columns to int or bigint as needed.  Numeric comparisons are
pretty slow.

            regards, tom lane

Re: ORDER BY costs

From
Carlos Benkendorf
Date:
I restored the table in another database and repeated the analyze again with original column definitions (numeric):
 
With order by:
Sort  (cost=212634.30..213032.73 rows=159374 width=897) (actual time=9286.817..9865.030 rows=167710 loops=1)
   Sort Key: anocalc, cadastro, codvencto, parcela
   ->  Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript  (cost=0.00..126604.64 rows=159374 width=897) (actual time=0.152..1062.664 rows=167710 loops=1)
         Index Cond: (((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto = 0::numeric) AND (parcela >= 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto > 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro > 19::numeric)) OR (anocalc > 2005::numeric))
 Total runtime: 10086.884 ms
(5 rows)
 
Without order by:
 Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript  (cost=0.00..126604.64 rows=159374 width=897) (actual time=0.154..809.566 rows=167710 loops=1)
   Index Cond: (((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto = 0::numeric) AND (parcela >= 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto > 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro > 19::numeric)) OR (anocalc > 2005::numeric))
 Total runtime: 894.218 ms
(3 rows)


Then I recreated the table and changed the primary key column type definitions to smallint, integer and bigint.
 
CREATE TABLE arript (
    anocalc smallint     NOT NULL,
    cadastro integer      NOT NULL,
    codvencto smallint     NOT NULL,
    parcela smallint     NOT NULL,
    inscimob character varying(18) NOT NULL,
    codvencto2 smallint     NOT NULL,
    parcela2 smallint     NOT NULL,
    codpropr bigint        NOT NULL,
    dtaven integer      NOT NULL,
    anocalc2 smallint,
    dtabase integer,
    vvt numeric(14,2),
    vvp numeric(14,2),
...
...
 
Now the new analyze:
 
With order by:
 Sort  (cost=180430.98..180775.10 rows=137649 width=826) (actual time=4461.524..5000.707 rows=167710 loops=1)
   Sort Key: anocalc, cadastro, codvencto, parcela
   ->&n bsp; Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript  (cost=0.00..111126.93 rows=137649 width=826) (actual time=0.142..763.255 rows=167710 loops=1)
         Index Cond: (((anocalc = 2005) AND (cadastro = 19) AND (codvencto = 0) AND (parcela >= 0)) OR ((anocalc = 2005) AND (cadastro = 19) AND (codvencto > 0)) OR ((anocalc = 2005) AND (cadastro > 19)) OR (anocalc > 2005))
 Total runtime: 5222.729 ms
(5 rows)
 
 
Without order by:
 Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript  (cost=0.00..111126.93 rows=137649 width=826) (actual time=0.135..505.250 rows=167710 loops=1)
   Index Cond: (((anocalc = 2005) AND (cadastro = 19) AND (codvencto = 0) AND (parcela >= 0)) OR ((anocalc = 2005) AND (cadastro = 19) AND (codvencto > 0)) OR ((anocalc = 2005) AND (cadastro > 19)) OR (anocalc > 2005))
 Total runtime: 589.528 ms
(3 rows)
Total runtime summary:
Primary key columns defined with integer/smallint/bigint and select with order by: 5222.729 ms
Primary key columns defined with integer/smallint/bigint and select without order by: 589.528 ms
Primary key columns defined with numeric and select with order by: 10086.884 ms
Primary key columns defined with numeric and select without order by: 894.218 ms
 
Using order by and integer/smallint/bigint (5222.729) is almost half total runtime of select over numeric columns (10086.884) but is still 6 x more from the numbers of the original select (without order by and number columns=894.218).
 
Is there something more that could be done? Planner cost constants?
 
Thanks very much in advance!< /DIV>
 
Benkendorf
 
 


Yahoo! doce lar. Faça do Yahoo! sua homepage.

Re: ORDER BY costs

From
Carlos Benkendorf
Date:
I´m not sure but I think the extra runtime of the select statement that has the ORDER BY clause is because the planner decided to sort the result set.
 
Is the sort really necessary? Why not only scanning the primary key index pages and retrieving the rows like the select without the order by clause?
 
Aren´t not the rows retrieved from the index in a odered form?
 
Thanks in advance!
 
Benkendorf
 
 
 


Yahoo! doce lar. Faça do Yahoo! sua homepage.

Re: ORDER BY costs

From
Carlos Benkendorf
Date:

Jan Dittmer <jdittmer@sfhq.hn.org> escreveu:
What is your work_mem setting? I think the default is 1MB which is
probably too low as your trying to sort roughly 150000*100Bytes = 15MB.

Jan
I think you would like to say 150000*896Bytes... Am I right? My default work_mem is 2048 and I changed to 200000... and pgsql_tmp directory is not used any more...but...
 
Now the new numbers:
 
Sort  (cost=132929.22..133300.97 rows=148701 width=896) (actual time=3949.663..4029.618 rows=167710 loops=1)
   Sort Key: anocalc, cadastro, codvencto, parcela
   ->  Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript  (cost=0.00..120154.28 rows=148701 width=896) (actual time=0.166..829.260 rows=1677 10 loops=1)
         Index Cond: (((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto = 0::numeric) AND (parcela >= 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto > 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro > 19::numeric)) OR (anocalc > 2005::numeric))
 Total runtime: 4184.723 ms
(5 rows)
 
It is less than with work_mem set to 2000 but is it worthly? I´m afraind of swapping... are not those settings applied for all backends?
 
Benkendorf
 
 


Yahoo! doce lar. Faça do Yahoo! sua homepage.