Thread: Problem with large query

Problem with large query

From
Marc Cousin
Date:
Hi. I hope I'm not asking a too trivial question here...

I'm having trouble with a (quite big) query, and can't find a way to make it
faster.

Here is the information :

Tables :
============================
sces_vte -> 2753539 rows
sces_art -> 602327
sces_fsf -> 8126
sces_frc -> 7763
sces_tps -> 38
sces_gtr -> 35


Query :
===========================
SELECT
sces_gtr_art.gtr_cod,
sces_gtr_art.gtr_lib,
sces_frc_art.fou_cod,
sces_frc_art.fou_lib,
sces_tps.tps_annee_mois,
TO_NUMBER('200401','999999'),
TO_NUMBER('200405','999999'),
sces_tps.tps_libc,
sum(sces_vte.vte_mnt),
sum(sces_vte.vte_qte),
sum(sces_vte.vte_ton),
sces_famille.fsf_codfam,
sces_famille.fsf_lib,
sces_s_famille.fsf_codsfm,
sces_s_famille.fsf_lib
FROM
sces_vte,
sces_art,
sces_fsf  sces_famille,
sces_fsf  sces_s_famille,
sces_frc  sces_frc_art,
sces_tps,
sces_gtr  sces_gtr_art
WHERE
( sces_famille.fsf_codfam=sces_s_famille.fsf_codfam  )
AND  ( sces_famille.fsf_codseg= 0 and sces_famille.fsf_codsfm = 0  )
AND  ( sces_vte.tps_annee_mois=sces_tps.tps_annee_mois  )
AND  ( sces_vte.art_cod=sces_art.art_cod and
sces_vte.dos_cod=sces_art.dos_cod  )
AND  ( sces_gtr_art.gtr_cod=sces_frc_art.gtr_cod  )
AND  ( sces_frc_art.gtr_cod=sces_art.gtr_cod and
sces_frc_art.fou_cod=sces_art.fou_cod )
AND  ( sces_s_famille.fsf_codfam=sces_art.fsf_codfam and
sces_s_famille.fsf_codsfm=sces_art.fsf_codsfm  )
AND  ( sces_s_famille.fsf_codseg = 0  )
AND  (
( ( ( sces_tps.tps_annee_mois ) >= ( TO_NUMBER('200401','999999') ) and
( sces_tps.tps_annee_mois ) <= (
TO_NUMBER('200405','999999') )
)
OR
(
( sces_tps.tps_annee_mois ) >= ( TO_NUMBER('200401','999999') )-100 and
( sces_tps.tps_annee_mois ) <= (
TO_NUMBER('200405','999999') )-100
)  )
AND  ( sces_gtr_art.gtr_cod  in  (2))
)
GROUP BY
sces_gtr_art.gtr_cod,
sces_gtr_art.gtr_lib,
sces_frc_art.fou_cod,
sces_frc_art.fou_lib,
sces_tps.tps_annee_mois,
TO_NUMBER('200401','999999'),
TO_NUMBER('200405','999999'),
sces_tps.tps_libc,
sces_famille.fsf_codfam,
sces_famille.fsf_lib,
sces_s_famille.fsf_codsfm,
sces_s_famille.fsf_lib

Explain Analyze Plan :
====================================
 GroupAggregate  (cost=27161.91..27938.72 rows=16354 width=280) (actual time=484509.210..544436.148 rows=4115 loops=1)
   ->  Sort  (cost=27161.91..27202.79 rows=16354 width=280) (actual time=484496.188..485334.151 rows=799758 loops=1)
         Sort Key: sces_gtr_art.gtr_cod, sces_gtr_art.gtr_lib, sces_frc_art.fou_cod, sces_frc_art.fou_lib,
sces_tps.tps_annee_mois,200401::numeric, 200405::numeric, sces_tps.tps_libc, sces_famille.fsf_codfam,
sces_famille.fsf_lib,sces_s_famille.fsf_codsfm, sces_s_famille.fsf_lib 
         ->  Merge Join  (cost=25727.79..26017.34 rows=16354 width=280) (actual time=58945.821..69321.146 rows=799758
loops=1)
               Merge Cond: (("outer".fsf_codfam = "inner".fsf_codfam) AND ("outer".fsf_codsfm = "inner".fsf_codsfm))
               ->  Sort  (cost=301.36..304.60 rows=1298 width=83) (actual time=27.926..28.256 rows=332 loops=1)
                     Sort Key: sces_s_famille.fsf_codfam, sces_s_famille.fsf_codsfm
                     ->  Seq Scan on sces_fsf sces_s_famille  (cost=0.00..234.24 rows=1298 width=83) (actual
time=0.042..19.124rows=1341 loops=1) 
                           Filter: (fsf_codseg = 0::numeric)
               ->  Sort  (cost=25426.43..25448.05 rows=8646 width=225) (actual time=58917.106..59693.810 rows=799758
loops=1)
                     Sort Key: sces_art.fsf_codfam, sces_art.fsf_codsfm
                     ->  Merge Join  (cost=24726.32..24861.08 rows=8646 width=225) (actual time=19036.709..29404.943
rows=799758loops=1) 
                           Merge Cond: ("outer".tps_annee_mois = "inner".tps_annee_mois)
                           ->  Sort  (cost=2.49..2.53 rows=17 width=23) (actual time=0.401..0.428 rows=20 loops=1)
                                 Sort Key: sces_tps.tps_annee_mois
                                 ->  Seq Scan on sces_tps  (cost=0.00..2.14 rows=17 width=23) (actual time=0.068..0.333
rows=20loops=1) 
                                       Filter: (((tps_annee_mois >= 200301::numeric) OR (tps_annee_mois >=
200401::numeric))AND ((tps_annee_mois <= 200305::numeric) OR (tps_annee_mois >= 200401::numeric)) AND ((tps_annee_mois
>=200301::numeric) OR (tps_annee_mois <= 200405::numeric)) AND ((tps_annee_mois <= 200305::numeric) OR (tps_annee_mois
<=200405::numeric))) 
                           ->  Sort  (cost=24723.83..24747.97 rows=9656 width=214) (actual time=19036.223..19917.214
rows=799757loops=1) 
                                 Sort Key: sces_vte.tps_annee_mois
                                 ->  Nested Loop  (cost=21825.09..24084.74 rows=9656 width=214) (actual
time=417.603..8644.294rows=399879 loops=1) 
                                       ->  Nested Loop  (cost=21825.09..21837.50 rows=373 width=195) (actual
time=417.444..672.741rows=14158 loops=1) 
                                             ->  Seq Scan on sces_gtr sces_gtr_art  (cost=0.00..1.44 rows=1 width=40)
(actualtime=0.026..0.085 rows=1 loops=1) 
                                                   Filter: (gtr_cod = 2::numeric)
                                             ->  Merge Join  (cost=21825.09..21832.34 rows=373 width=165) (actual
time=417.400..568.247rows=14158 loops=1) 
                                                   Merge Cond: ("outer".fsf_codfam = "inner".fsf_codfam)
                                                   ->  Sort  (cost=255.24..255.30 rows=24 width=74) (actual
time=16.597..16.692rows=106 loops=1) 
                                                         Sort Key: sces_famille.fsf_codfam
                                                         ->  Seq Scan on sces_fsf sces_famille  (cost=0.00..254.69
rows=24width=74) (actual time=0.029..15.971 rows=155 loops=1) 
                                                               Filter: ((fsf_codseg = 0::numeric) AND (fsf_codsfm =
0::numeric))
                                                   ->  Sort  (cost=21569.85..21571.64 rows=715 width=91) (actual
time=400.631..416.871rows=14162 loops=1) 
                                                         Sort Key: sces_art.fsf_codfam
                                                         ->  Nested Loop  (cost=0.00..21535.95 rows=715 width=91)
(actualtime=1.320..230.975 rows=14162 loops=1) 
                                                               ->  Seq Scan on sces_frc sces_frc_art
(cost=0.00..182.75rows=728 width=51) (actual time=1.195..14.316 rows=761 loops=1) 
                                                                     Filter: (2::numeric = gtr_cod)
                                                               ->  Index Scan using ind_art_02 on sces_art
(cost=0.00..29.24rows=7 width=61) (actual time=0.040..0.160 rows=19 loops=761) 
                                                                     Index Cond: ((2::numeric = sces_art.gtr_cod) AND
("outer".fou_cod= sces_art.fou_cod)) 
                                       ->  Index Scan using idx_vte_02 on sces_vte  (cost=0.00..6.01 rows=1 width=62)
(actualtime=0.037..0.259 rows=28 loops=14158) 
                                             Index Cond: ((sces_vte.art_cod = "outer".art_cod) AND (sces_vte.dos_cod =
"outer".dos_cod))
 Total runtime: 545435.989 ms



From what I understand from the plan, the worst part of it is the sort. Is there a way I can improve this query ?
(Obviously, as it has many rows, it will still be a slow query, but here it's too slow for us...).

I allready extended the sort_mem (up to 500 MB to be sure, the server has plenty of RAM),
the query has become faster, but I don't know what else to do, to speed up the sort.

BTW the query was generated, not written. We allready are trying to write something better, but we are still facing a
gigantic sort at the end (we need the group by, and there are many lines from the main table (sces_vte) to be retrieved
andaggregated)... 

Thanks in advance...

Re: Problem with large query

From
Tom Lane
Date:
Marc Cousin <mcousin@sigma.fr> writes:
> I'm having trouble with a (quite big) query, and can't find a way to make it
> faster.

Seems like it might help if the thing could use a HashAggregate instead
of sort/group.  Numeric is not hashable, so having those TO_NUMBER
constants in GROUP BY destroys this option instantly ... but why in the
world are you grouping by constants anyway?  You didn't say what the
datatypes of the other columns were...

            regards, tom lane

Re: Problem with large query

From
Marc Cousin
Date:
The query has been generated by business objects ... i'ill try to suggest to the developpers to remove this constant
(ifthey can)... 
The fields used by the sort are of type numeric(6,0) or (10,0) ...
Could it be better if the fields were integer or anything else ?


On Wednesday 08 September 2004 16:40, you wrote:
> Marc Cousin <mcousin@sigma.fr> writes:
> > I'm having trouble with a (quite big) query, and can't find a way to make it
> > faster.
>
> Seems like it might help if the thing could use a HashAggregate instead
> of sort/group.  Numeric is not hashable, so having those TO_NUMBER
> constants in GROUP BY destroys this option instantly ... but why in the
> world are you grouping by constants anyway?  You didn't say what the
> datatypes of the other columns were...
>
>    regards, tom lane
>

Re: Problem with large query

From
Tom Lane
Date:
Marc Cousin <mcousin@sigma.fr> writes:
> The query has been generated by business objects ... i'ill try to suggest to the developpers to remove this constant
(ifthey can)... 
> The fields used by the sort are of type numeric(6,0) or (10,0) ...
> Could it be better if the fields were integer or anything else ?

integer or bigint would be a WHOLE lot faster.  I'd venture that
comparing two numerics is order of a hundred times slower than
comparing two integers.

Even if you don't want to change the fields on-disk, you might think
about casting them all to int/bigint in the query.

Another thing that might or might not be easy is to change the order of
the GROUP BY items so that the fields with the largest number of
distinct values are listed first.  If two rows are distinct at the first
column, the sorting comparison doesn't even have to look at the
remaining columns ...

            regards, tom lane

Re: Problem with large query

From
Marc Cousin
Date:
On Wednesday 08 September 2004 16:56, you wrote:
> Marc Cousin <mcousin@sigma.fr> writes:
> > The query has been generated by business objects ... i'ill try to suggest to the developpers to remove this
constant(if they can)... 
> > The fields used by the sort are of type numeric(6,0) or (10,0) ...
> > Could it be better if the fields were integer or anything else ?
>
> integer or bigint would be a WHOLE lot faster.  I'd venture that
> comparing two numerics is order of a hundred times slower than
> comparing two integers.
>
> Even if you don't want to change the fields on-disk, you might think
> about casting them all to int/bigint in the query.
>
> Another thing that might or might not be easy is to change the order of
> the GROUP BY items so that the fields with the largest number of
> distinct values are listed first.  If two rows are distinct at the first
> column, the sorting comparison doesn't even have to look at the
> remaining columns ...
>
>    regards, tom lane
>
Thanks. I've just had confirmation that they can remove the two constants (allready won 100 seconds thanks to that)
I've tried the cast, and got down to 72 seconds.
So now we're going to try to convert the fields to int or bigint.

Thanks a lot for your help and time.

Re: Problem with large query

From
Adam Sah
Date:
by the way, this reminds me: I just ran a performance study at a company doing
    an oracle-to-postgres conversion, and FYI converting from numeric and decimal
    to integer/bigint/real saved roughly 3x on space and 2x on performance.
    Obviously, YMMV.

adam


Tom Lane wrote:

> Marc Cousin <mcousin@sigma.fr> writes:
>
>>I'm having trouble with a (quite big) query, and can't find a way to make it
>>faster.
>
>
> Seems like it might help if the thing could use a HashAggregate instead
> of sort/group.  Numeric is not hashable, so having those TO_NUMBER
> constants in GROUP BY destroys this option instantly ... but why in the
> world are you grouping by constants anyway?  You didn't say what the
> datatypes of the other columns were...
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>