Thread: Increasing GROUP BY CHAR columns speed

Increasing GROUP BY CHAR columns speed

From
"Andrus"
Date:
Group by using CHAR columns takes abnormally big time.

How to speed it ?

Andrus.

8.1.4, cluster locale is en-us, db encoding is utf-8

set search_path to firma2,public;
explain analyze SELECT
 CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END,
 bilkaib.DB,
 CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END::
CHAR(10) AS dbobjekt,
 CASE WHEN dbkonto.objekt2='+' THEN bilkaib.DB2OBJEKT ELSE null END::
CHAR(10) AS db2objekt,
 CASE WHEN dbkonto.objekt3='+' THEN bilkaib.DB3OBJEKT ELSE null END::
CHAR(10) AS db3objekt,
 CASE WHEN dbkonto.objekt4='+' THEN bilkaib.DB4OBJEKT ELSE null END::
CHAR(10) AS db4objekt,
 CASE WHEN dbkonto.objekt5='+' THEN bilkaib.DB5OBJEKT ELSE null END::
CHAR(10) AS db5objekt,
 CASE WHEN dbkonto.objekt6='+' THEN bilkaib.DB6OBJEKT ELSE null END::
CHAR(10) AS db6objekt,
 CASE WHEN dbkonto.objekt7='+' THEN bilkaib.DB7OBJEKT ELSE null END::
CHAR(10) AS db7objekt,
 CASE WHEN dbkonto.objekt8='+' THEN bilkaib.DB8OBJEKT ELSE null END::
CHAR(10) AS db8objekt,
 CASE WHEN dbkonto.objekt9='+' THEN bilkaib.DB9OBJEKT ELSE null END::
CHAR(10) AS db9objekt,
 bilkaib.CR,
 CASE WHEN crkonto.objekt1='+' THEN bilkaib.crOBJEKT ELSE null END::
CHAR(10) AS crobjekt,
 CASE WHEN crkonto.objekt2='+' THEN bilkaib.cr2OBJEKT ELSE null END::
CHAR(10) AS cr2objekt,
 CASE WHEN crkonto.objekt3='+' THEN bilkaib.cr3OBJEKT ELSE null END::
CHAR(10) AS cr3objekt,
 CASE WHEN crkonto.objekt4='+' THEN bilkaib.cr4OBJEKT ELSE null END::
CHAR(10) AS cr4objekt,
 CASE WHEN crkonto.objekt5='+' THEN bilkaib.cr5OBJEKT ELSE null END::
CHAR(10) AS cr5objekt,
 CASE WHEN crkonto.objekt6='+' THEN bilkaib.cr6OBJEKT ELSE null END::
CHAR(10) AS cr6objekt,
 CASE WHEN crkonto.objekt7='+' THEN bilkaib.cr7OBJEKT ELSE null END::
CHAR(10) AS cr7objekt,
 CASE WHEN crkonto.objekt8='+' THEN bilkaib.cr8OBJEKT ELSE null END::
CHAR(10) AS cr8objekt,
 CASE WHEN crkonto.objekt9='+' THEN bilkaib.cr9OBJEKT ELSE null END::
CHAR(10) AS cr9objekt,
 bilkaib.RAHA,
 CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa OR crkonto.tyyp IN
('K','I') OR dbkonto.tyyp IN ('K','I')
 THEN  bilkaib.KLIENT ELSE NULL END AS klient,

 bilkaib.EXCHRATE,

 CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa
   OR crkonto.tyyp IN ('K','I') OR dbkonto.tyyp IN ('K','I')
 THEN
   klient.nimi ELSE NULL END AS kliendinim,  -- 24.

 CAST(CASE WHEN crkonto.arvekaupa OR dbkonto.arvekaupa
   OR (bilkaib.cr<>'00' AND crkonto.tyyp='K')
   OR (bilkaib.db<>'00' AND dbkonto.tyyp='K')
 THEN bilkaib.doknr ELSE NULL END AS CHAR(25)) AS doknr

 ,bilkaib.ratediffer
 ,CASE WHEN bilkaib.raha='EEK' THEN DATE'20070101' ELSE bilkaib.kuupaev END
AS kuupaev

 ,SUM(bilkaib.summa)::numeric(14,2) AS summa
   from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND
       crkonto.iseloom='A'
     join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND
       dbkonto.iseloom='A'
     left join klient on bilkaib.klient=klient.kood
   where  ( bilkaib.cr LIKE '112'||'%' OR bilkaib.db LIKE '112'||'%' ) AND
bilkaib.kuupaev BETWEEN '2007-01-01' AND '2008-11-26'
  GROUP BY
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28

"GroupAggregate  (cost=52316.23..61434.48 rows=41923 width=838) (actual
time=10771.337..11372.135 rows=577 loops=1)"
"  ->  Sort  (cost=52316.23..52421.03 rows=41923 width=838) (actual
time=10770.529..11012.651 rows=52156 loops=1)"
"        Sort Key: CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN 0 ELSE
bilkaib.id END, bilkaib.db, (CASE WHEN (dbkonto.objekt1 = '+'::bpchar) THEN
bilkaib.dbobjekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt2 = '+'::bpchar) THEN bilkaib.db2objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt3 = '+'::bpchar) THEN
bilkaib.db3objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt4 = '+'::bpchar) THEN bilkaib.db4objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt5 = '+'::bpchar) THEN
bilkaib.db5objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt6 = '+'::bpchar) THEN bilkaib.db6objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt7 = '+'::bpchar) THEN
bilkaib.db7objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt8 = '+'::bpchar) THEN bilkaib.db8objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt9 = '+'::bpchar) THEN
bilkaib.db9objekt ELSE NULL::bpchar END)::character(10), bilkaib.cr, (CASE
WHEN (crkonto.objekt1 = '+'::bpchar) THEN bilkaib.crobjekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt2 = '+'::bpchar) THEN
bilkaib.cr2objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt3 = '+'::bpchar) THEN bilkaib.cr3objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt4 = '+'::bpchar) THEN
bilkaib.cr4objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt5 = '+'::bpchar) THEN bilkaib.cr5objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt6 = '+'::bpchar) THEN
bilkaib.cr6objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt7 = '+'::bpchar) THEN bilkaib.cr7objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt8 = '+'::bpchar) THEN
bilkaib.cr8objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt9 = '+'::bpchar) THEN bilkaib.cr9objekt ELSE NULL::bpchar
END)::character(10), bilkaib.raha, CASE WHEN ((crkonto.klienkaupa)::boolean
OR (dbkonto.klienkaupa)::boolean OR (crkonto.tyyp = 'K'::bpchar) OR
(crkonto.tyyp = 'I'::bpchar) OR (dbkonto.tyyp = 'K'::bpchar) OR
(dbkonto.tyyp = 'I'::bpchar)) THEN bilkaib.klient ELSE NULL::bpchar END,
bilkaib.exchrate, CASE WHEN ((crkonto.klienkaupa)::boolean OR
(dbkonto.klienkaupa)::boolean OR (crkonto.tyyp = 'K'::bpchar) OR
(crkonto.tyyp = 'I'::bpchar) OR (dbkonto.tyyp = 'K'::bpchar) OR
(dbkonto.tyyp = 'I'::bpchar)) THEN klient.nimi ELSE NULL::bpchar END, (CASE
WHEN ((crkonto.arvekaupa)::boolean OR (dbkonto.arvekaupa)::boolean OR
((bilkaib.cr <> '00'::bpchar) AND (crkonto.tyyp = 'K'::bpchar)) OR
((bilkaib.db <> '00'::bpchar) AND (dbkonto.tyyp = 'K'::bpchar))) THEN
bilkaib.doknr ELSE NULL::bpchar END)::character(25), bilkaib.ratediffer,
CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN '2007-01-01'::date ELSE
bilkaib.kuupaev END"
"        ->  Hash Left Join  (cost=936.48..40184.64 rows=41923 width=838)
(actual time=46.000..2820.944 rows=52156 loops=1)"
"              Hash Cond: ("outer".klient = "inner".kood)"
"              ->  Hash Join  (cost=785.35..34086.74 rows=41923 width=764)
(actual time=34.547..1563.790 rows=52156 loops=1)"
"                    Hash Cond: ("outer".cr = "inner".kontonr)"
"                    ->  Hash Join  (cost=764.26..33403.76 rows=48533
width=712) (actual time=32.069..1082.505 rows=52156 loops=1)"
"                          Hash Cond: ("outer".db = "inner".kontonr)"
"                          ->  Bitmap Heap Scan on bilkaib
(cost=743.17..32616.41 rows=56185 width=660) (actual time=29.652..518.289
rows=52156 loops=1)"
"                                Recheck Cond: ((cr ~~ '112%'::text) OR (db
~~ '112%'::text))"
"                                Filter: (((cr ~~ '112%'::text) OR (db ~~
'112%'::text)) AND (kuupaev >= '2007-01-01'::date) AND (kuupaev <=
'2008-11-26'::date))"
"                                ->  BitmapOr  (cost=743.17..743.17
rows=65862 width=0) (actual time=26.539..26.539 rows=0 loops=1)"
"                                      ->  Bitmap Index Scan on
bilkaib_cr_pattern_idx  (cost=0.00..236.63 rows=20939 width=0) (actual
time=8.510..8.510 rows=21028 loops=1)"
"                                            Index Cond: ((cr ~>=~
'112'::bpchar) AND (cr ~<~ '113'::bpchar))"
"                                      ->  Bitmap Index Scan on
bilkaib_db_pattern_idx  (cost=0.00..506.54 rows=44923 width=0) (actual
time=18.013..18.013 rows=45426 loops=1)"
"                                            Index Cond: ((db ~>=~
'112'::bpchar) AND (db ~<~ '113'::bpchar))"
"                          ->  Hash  (cost=20.49..20.49 rows=241 width=66)
(actual time=2.375..2.375 rows=241 loops=1)"
"                                ->  Seq Scan on konto dbkonto
(cost=0.00..20.49 rows=241 width=66) (actual time=0.011..1.207 rows=241
loops=1)"
"                                      Filter: (iseloom = 'A'::bpchar)"
"                    ->  Hash  (cost=20.49..20.49 rows=241 width=66) (actual
time=2.451..2.451 rows=241 loops=1)"
"                          ->  Seq Scan on konto crkonto  (cost=0.00..20.49
rows=241 width=66) (actual time=0.022..1.259 rows=241 loops=1)"
"                                Filter: (iseloom = 'A'::bpchar)"
"              ->  Hash  (cost=147.90..147.90 rows=1290 width=90) (actual
time=11.371..11.371 rows=1290 loops=1)"
"                    ->  Seq Scan on klient  (cost=0.00..147.90 rows=1290
width=90) (actual time=0.009..5.587 rows=1290 loops=1)"
"Total runtime: 11380.437 ms"


If group by is removed same query runs 8 times (!) faster:

set search_path to firma2,public;
explain analyze SELECT
 SUM(bilkaib.summa)::numeric(14,2) AS summa
   from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND
       crkonto.iseloom='A'
     join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND
       dbkonto.iseloom='A'
     left join klient on bilkaib.klient=klient.kood
   where  ( bilkaib.cr LIKE '112'||'%' OR bilkaib.db LIKE '112'||'%' ) AND
bilkaib.kuupaev BETWEEN '2007-01-01' AND '2008-11-26'

"Aggregate  (cost=34944.27..34944.28 rows=1 width=11) (actual
time=1781.456..1781.460 rows=1 loops=1)"
"  ->  Hash Left Join  (cost=936.48..34839.46 rows=41923 width=11) (actual
time=41.194..1545.105 rows=52156 loops=1)"
"        Hash Cond: ("outer".klient = "inner".kood)"
"        ->  Hash Join  (cost=785.35..34086.74 rows=41923 width=27) (actual
time=30.372..1120.431 rows=52156 loops=1)"
"              Hash Cond: ("outer".cr = "inner".kontonr)"
"              ->  Hash Join  (cost=764.26..33403.76 rows=48533 width=41)
(actual time=28.168..710.336 rows=52156 loops=1)"
"                    Hash Cond: ("outer".db = "inner".kontonr)"
"                    ->  Bitmap Heap Scan on bilkaib  (cost=743.17..32616.41
rows=56185 width=55) (actual time=25.970..294.638 rows=52156 loops=1)"
"                          Recheck Cond: ((cr ~~ '112%'::text) OR (db ~~
'112%'::text))"
"                          Filter: (((cr ~~ '112%'::text) OR (db ~~
'112%'::text)) AND (kuupaev >= '2007-01-01'::date) AND (kuupaev <=
'2008-11-26'::date))"
"                          ->  BitmapOr  (cost=743.17..743.17 rows=65862
width=0) (actual time=23.056..23.056 rows=0 loops=1)"
"                                ->  Bitmap Index Scan on
bilkaib_cr_pattern_idx  (cost=0.00..236.63 rows=20939 width=0) (actual
time=7.414..7.414 rows=21028 loops=1)"
"                                      Index Cond: ((cr ~>=~ '112'::bpchar)
AND (cr ~<~ '113'::bpchar))"
"                                ->  Bitmap Index Scan on
bilkaib_db_pattern_idx  (cost=0.00..506.54 rows=44923 width=0) (actual
time=15.627..15.627 rows=45426 loops=1)"
"                                      Index Cond: ((db ~>=~ '112'::bpchar)
AND (db ~<~ '113'::bpchar))"
"                    ->  Hash  (cost=20.49..20.49 rows=241 width=14) (actual
time=2.164..2.164 rows=241 loops=1)"
"                          ->  Seq Scan on konto dbkonto  (cost=0.00..20.49
rows=241 width=14) (actual time=0.012..1.205 rows=241 loops=1)"
"                                Filter: (iseloom = 'A'::bpchar)"
"              ->  Hash  (cost=20.49..20.49 rows=241 width=14) (actual
time=2.177..2.177 rows=241 loops=1)"
"                    ->  Seq Scan on konto crkonto  (cost=0.00..20.49
rows=241 width=14) (actual time=0.019..1.203 rows=241 loops=1)"
"                          Filter: (iseloom = 'A'::bpchar)"
"        ->  Hash  (cost=147.90..147.90 rows=1290 width=16) (actual
time=10.782..10.782 rows=1290 loops=1)"
"              ->  Seq Scan on klient  (cost=0.00..147.90 rows=1290
width=16) (actual time=0.009..5.597 rows=1290 loops=1)"
"Total runtime: 1781.673 ms"


Re: Increasing GROUP BY CHAR columns speed

From
Scott Carey
Date:
The below query is spending most of its time in the sort, or perhaps the complicated check condition before it.
The explain has a 8 second gap in time between the 2.8 seconds after the Hash Left Join and before the Sort.  I'm
guessingits hidden in the sort. 

You can get the planner to switch from a sort to a hash aggregate with a large work_mem.  Try calling

SET work_mem = '100MB';

before this query first.

It may not help that much if the check time is as expensive as it looks in the plan below, but its very easy to try.
If it does help, you may want to temporarily increase that value only for this query rather than making it a default in
theconfig file. 
________________________________________
From: pgsql-performance-owner@postgresql.org [pgsql-performance-owner@postgresql.org] On Behalf Of Andrus
[kobruleht2@hot.ee]
Sent: Friday, November 28, 2008 7:04 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Increasing GROUP BY CHAR columns speed

Group by using CHAR columns takes abnormally big time.

How to speed it ?

Andrus.

8.1.4, cluster locale is en-us, db encoding is utf-8

set search_path to firma2,public;
explain analyze SELECT
 CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END,
 bilkaib.DB,
 CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END::
CHAR(10) AS dbobjekt,
 CASE WHEN dbkonto.objekt2='+' THEN bilkaib.DB2OBJEKT ELSE null END::
CHAR(10) AS db2objekt,
 CASE WHEN dbkonto.objekt3='+' THEN bilkaib.DB3OBJEKT ELSE null END::
CHAR(10) AS db3objekt,
 CASE WHEN dbkonto.objekt4='+' THEN bilkaib.DB4OBJEKT ELSE null END::
CHAR(10) AS db4objekt,
 CASE WHEN dbkonto.objekt5='+' THEN bilkaib.DB5OBJEKT ELSE null END::
CHAR(10) AS db5objekt,
 CASE WHEN dbkonto.objekt6='+' THEN bilkaib.DB6OBJEKT ELSE null END::
CHAR(10) AS db6objekt,
 CASE WHEN dbkonto.objekt7='+' THEN bilkaib.DB7OBJEKT ELSE null END::
CHAR(10) AS db7objekt,
 CASE WHEN dbkonto.objekt8='+' THEN bilkaib.DB8OBJEKT ELSE null END::
CHAR(10) AS db8objekt,
 CASE WHEN dbkonto.objekt9='+' THEN bilkaib.DB9OBJEKT ELSE null END::
CHAR(10) AS db9objekt,
 bilkaib.CR,
 CASE WHEN crkonto.objekt1='+' THEN bilkaib.crOBJEKT ELSE null END::
CHAR(10) AS crobjekt,
 CASE WHEN crkonto.objekt2='+' THEN bilkaib.cr2OBJEKT ELSE null END::
CHAR(10) AS cr2objekt,
 CASE WHEN crkonto.objekt3='+' THEN bilkaib.cr3OBJEKT ELSE null END::
CHAR(10) AS cr3objekt,
 CASE WHEN crkonto.objekt4='+' THEN bilkaib.cr4OBJEKT ELSE null END::
CHAR(10) AS cr4objekt,
 CASE WHEN crkonto.objekt5='+' THEN bilkaib.cr5OBJEKT ELSE null END::
CHAR(10) AS cr5objekt,
 CASE WHEN crkonto.objekt6='+' THEN bilkaib.cr6OBJEKT ELSE null END::
CHAR(10) AS cr6objekt,
 CASE WHEN crkonto.objekt7='+' THEN bilkaib.cr7OBJEKT ELSE null END::
CHAR(10) AS cr7objekt,
 CASE WHEN crkonto.objekt8='+' THEN bilkaib.cr8OBJEKT ELSE null END::
CHAR(10) AS cr8objekt,
 CASE WHEN crkonto.objekt9='+' THEN bilkaib.cr9OBJEKT ELSE null END::
CHAR(10) AS cr9objekt,
 bilkaib.RAHA,
 CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa OR crkonto.tyyp IN
('K','I') OR dbkonto.tyyp IN ('K','I')
 THEN  bilkaib.KLIENT ELSE NULL END AS klient,

 bilkaib.EXCHRATE,

 CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa
   OR crkonto.tyyp IN ('K','I') OR dbkonto.tyyp IN ('K','I')
 THEN
   klient.nimi ELSE NULL END AS kliendinim,  -- 24.

 CAST(CASE WHEN crkonto.arvekaupa OR dbkonto.arvekaupa
   OR (bilkaib.cr<>'00' AND crkonto.tyyp='K')
   OR (bilkaib.db<>'00' AND dbkonto.tyyp='K')
 THEN bilkaib.doknr ELSE NULL END AS CHAR(25)) AS doknr

 ,bilkaib.ratediffer
 ,CASE WHEN bilkaib.raha='EEK' THEN DATE'20070101' ELSE bilkaib.kuupaev END
AS kuupaev

 ,SUM(bilkaib.summa)::numeric(14,2) AS summa
   from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND
       crkonto.iseloom='A'
     join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND
       dbkonto.iseloom='A'
     left join klient on bilkaib.klient=klient.kood
   where  ( bilkaib.cr LIKE '112'||'%' OR bilkaib.db LIKE '112'||'%' ) AND
bilkaib.kuupaev BETWEEN '2007-01-01' AND '2008-11-26'
  GROUP BY
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28

"GroupAggregate  (cost=52316.23..61434.48 rows=41923 width=838) (actual
time=10771.337..11372.135 rows=577 loops=1)"
"  ->  Sort  (cost=52316.23..52421.03 rows=41923 width=838) (actual
time=10770.529..11012.651 rows=52156 loops=1)"
"        Sort Key: CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN 0 ELSE
bilkaib.id END, bilkaib.db, (CASE WHEN (dbkonto.objekt1 = '+'::bpchar) THEN
bilkaib.dbobjekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt2 = '+'::bpchar) THEN bilkaib.db2objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt3 = '+'::bpchar) THEN
bilkaib.db3objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt4 = '+'::bpchar) THEN bilkaib.db4objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt5 = '+'::bpchar) THEN
bilkaib.db5objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt6 = '+'::bpchar) THEN bilkaib.db6objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt7 = '+'::bpchar) THEN
bilkaib.db7objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt8 = '+'::bpchar) THEN bilkaib.db8objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt9 = '+'::bpchar) THEN
bilkaib.db9objekt ELSE NULL::bpchar END)::character(10), bilkaib.cr, (CASE
WHEN (crkonto.objekt1 = '+'::bpchar) THEN bilkaib.crobjekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt2 = '+'::bpchar) THEN
bilkaib.cr2objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt3 = '+'::bpchar) THEN bilkaib.cr3objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt4 = '+'::bpchar) THEN
bilkaib.cr4objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt5 = '+'::bpchar) THEN bilkaib.cr5objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt6 = '+'::bpchar) THEN
bilkaib.cr6objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt7 = '+'::bpchar) THEN bilkaib.cr7objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt8 = '+'::bpchar) THEN
bilkaib.cr8objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt9 = '+'::bpchar) THEN bilkaib.cr9objekt ELSE NULL::bpchar
END)::character(10), bilkaib.raha, CASE WHEN ((crkonto.klienkaupa)::boolean
OR (dbkonto.klienkaupa)::boolean OR (crkonto.tyyp = 'K'::bpchar) OR
(crkonto.tyyp = 'I'::bpchar) OR (dbkonto.tyyp = 'K'::bpchar) OR
(dbkonto.tyyp = 'I'::bpchar)) THEN bilkaib.klient ELSE NULL::bpchar END,
bilkaib.exchrate, CASE WHEN ((crkonto.klienkaupa)::boolean OR
(dbkonto.klienkaupa)::boolean OR (crkonto.tyyp = 'K'::bpchar) OR
(crkonto.tyyp = 'I'::bpchar) OR (dbkonto.tyyp = 'K'::bpchar) OR
(dbkonto.tyyp = 'I'::bpchar)) THEN klient.nimi ELSE NULL::bpchar END, (CASE
WHEN ((crkonto.arvekaupa)::boolean OR (dbkonto.arvekaupa)::boolean OR
((bilkaib.cr <> '00'::bpchar) AND (crkonto.tyyp = 'K'::bpchar)) OR
((bilkaib.db <> '00'::bpchar) AND (dbkonto.tyyp = 'K'::bpchar))) THEN
bilkaib.doknr ELSE NULL::bpchar END)::character(25), bilkaib.ratediffer,
CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN '2007-01-01'::date ELSE
bilkaib.kuupaev END"
"        ->  Hash Left Join  (cost=936.48..40184.64 rows=41923 width=838)
(actual time=46.000..2820.944 rows=52156 loops=1)"
"              Hash Cond: ("outer".klient = "inner".kood)"
"              ->  Hash Join  (cost=785.35..34086.74 rows=41923 width=764)
(actual time=34.547..1563.790 rows=52156 loops=1)"
"                    Hash Cond: ("outer".cr = "inner".kontonr)"
"                    ->  Hash Join  (cost=764.26..33403.76 rows=48533
width=712) (actual time=32.069..1082.505 rows=52156 loops=1)"
"                          Hash Cond: ("outer".db = "inner".kontonr)"
"                          ->  Bitmap Heap Scan on bilkaib
(cost=743.17..32616.41 rows=56185 width=660) (actual time=29.652..518.289
rows=52156 loops=1)"
"                                Recheck Cond: ((cr ~~ '112%'::text) OR (db
~~ '112%'::text))"
"                                Filter: (((cr ~~ '112%'::text) OR (db ~~
'112%'::text)) AND (kuupaev >= '2007-01-01'::date) AND (kuupaev <=
'2008-11-26'::date))"
"                                ->  BitmapOr  (cost=743.17..743.17
rows=65862 width=0) (actual time=26.539..26.539 rows=0 loops=1)"
"                                      ->  Bitmap Index Scan on
bilkaib_cr_pattern_idx  (cost=0.00..236.63 rows=20939 width=0) (actual
time=8.510..8.510 rows=21028 loops=1)"
"                                            Index Cond: ((cr ~>=~
'112'::bpchar) AND (cr ~<~ '113'::bpchar))"
"                                      ->  Bitmap Index Scan on
bilkaib_db_pattern_idx  (cost=0.00..506.54 rows=44923 width=0) (actual
time=18.013..18.013 rows=45426 loops=1)"
"                                            Index Cond: ((db ~>=~
'112'::bpchar) AND (db ~<~ '113'::bpchar))"
"                          ->  Hash  (cost=20.49..20.49 rows=241 width=66)
(actual time=2.375..2.375 rows=241 loops=1)"
"                                ->  Seq Scan on konto dbkonto
(cost=0.00..20.49 rows=241 width=66) (actual time=0.011..1.207 rows=241
loops=1)"
"                                      Filter: (iseloom = 'A'::bpchar)"
"                    ->  Hash  (cost=20.49..20.49 rows=241 width=66) (actual
time=2.451..2.451 rows=241 loops=1)"
"                          ->  Seq Scan on konto crkonto  (cost=0.00..20.49
rows=241 width=66) (actual time=0.022..1.259 rows=241 loops=1)"
"                                Filter: (iseloom = 'A'::bpchar)"
"              ->  Hash  (cost=147.90..147.90 rows=1290 width=90) (actual
time=11.371..11.371 rows=1290 loops=1)"
"                    ->  Seq Scan on klient  (cost=0.00..147.90 rows=1290
width=90) (actual time=0.009..5.587 rows=1290 loops=1)"
"Total runtime: 11380.437 ms"


If group by is removed same query runs 8 times (!) faster:

set search_path to firma2,public;
explain analyze SELECT
 SUM(bilkaib.summa)::numeric(14,2) AS summa
   from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND
       crkonto.iseloom='A'
     join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND
       dbkonto.iseloom='A'
     left join klient on bilkaib.klient=klient.kood
   where  ( bilkaib.cr LIKE '112'||'%' OR bilkaib.db LIKE '112'||'%' ) AND
bilkaib.kuupaev BETWEEN '2007-01-01' AND '2008-11-26'

"Aggregate  (cost=34944.27..34944.28 rows=1 width=11) (actual
time=1781.456..1781.460 rows=1 loops=1)"
"  ->  Hash Left Join  (cost=936.48..34839.46 rows=41923 width=11) (actual
time=41.194..1545.105 rows=52156 loops=1)"
"        Hash Cond: ("outer".klient = "inner".kood)"
"        ->  Hash Join  (cost=785.35..34086.74 rows=41923 width=27) (actual
time=30.372..1120.431 rows=52156 loops=1)"
"              Hash Cond: ("outer".cr = "inner".kontonr)"
"              ->  Hash Join  (cost=764.26..33403.76 rows=48533 width=41)
(actual time=28.168..710.336 rows=52156 loops=1)"
"                    Hash Cond: ("outer".db = "inner".kontonr)"
"                    ->  Bitmap Heap Scan on bilkaib  (cost=743.17..32616.41
rows=56185 width=55) (actual time=25.970..294.638 rows=52156 loops=1)"
"                          Recheck Cond: ((cr ~~ '112%'::text) OR (db ~~
'112%'::text))"
"                          Filter: (((cr ~~ '112%'::text) OR (db ~~
'112%'::text)) AND (kuupaev >= '2007-01-01'::date) AND (kuupaev <=
'2008-11-26'::date))"
"                          ->  BitmapOr  (cost=743.17..743.17 rows=65862
width=0) (actual time=23.056..23.056 rows=0 loops=1)"
"                                ->  Bitmap Index Scan on
bilkaib_cr_pattern_idx  (cost=0.00..236.63 rows=20939 width=0) (actual
time=7.414..7.414 rows=21028 loops=1)"
"                                      Index Cond: ((cr ~>=~ '112'::bpchar)
AND (cr ~<~ '113'::bpchar))"
"                                ->  Bitmap Index Scan on
bilkaib_db_pattern_idx  (cost=0.00..506.54 rows=44923 width=0) (actual
time=15.627..15.627 rows=45426 loops=1)"
"                                      Index Cond: ((db ~>=~ '112'::bpchar)
AND (db ~<~ '113'::bpchar))"
"                    ->  Hash  (cost=20.49..20.49 rows=241 width=14) (actual
time=2.164..2.164 rows=241 loops=1)"
"                          ->  Seq Scan on konto dbkonto  (cost=0.00..20.49
rows=241 width=14) (actual time=0.012..1.205 rows=241 loops=1)"
"                                Filter: (iseloom = 'A'::bpchar)"
"              ->  Hash  (cost=20.49..20.49 rows=241 width=14) (actual
time=2.177..2.177 rows=241 loops=1)"
"                    ->  Seq Scan on konto crkonto  (cost=0.00..20.49
rows=241 width=14) (actual time=0.019..1.203 rows=241 loops=1)"
"                          Filter: (iseloom = 'A'::bpchar)"
"        ->  Hash  (cost=147.90..147.90 rows=1290 width=16) (actual
time=10.782..10.782 rows=1290 loops=1)"
"              ->  Seq Scan on klient  (cost=0.00..147.90 rows=1290
width=16) (actual time=0.009..5.597 rows=1290 loops=1)"
"Total runtime: 1781.673 ms"


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Increasing GROUP BY CHAR columns speed

From
"Andrus"
Date:
Scott,

Thank you.

>The below query is spending most of its time in the sort, or perhaps the
>complicated check condition before it.
>The explain has a 8 second gap in time between the 2.8 seconds after the
>Hash Left Join and before the Sort.  I'm guessing its hidden in the sort.
>You can get the planner to switch from a sort to a hash aggregate with a
>large work_mem.  Try calling
>SET work_mem = '100MB';
>before this query first.
>It may not help that much if the check time is as expensive as it looks in
>the plan below, but its very easy to try.
>If it does help, you may want to temporarily increase that value only for
>this query rather than making it a default in the config file.

SET work_mem = 2097151  (this is max allowed value) or SET work_mem = 97151
decreases query time from 12 seconds to 9 seconds.

My application may ran in servers with  1 GB RAM only. I'm afraid than in
those servers 2097151  will cause error and abort query.

Is it reasonable to add

SET work_mem = 97151

before this query and

SET work_mem TO  DEFAULT

after this query ?
Or should I use max value in cases where there are much more data ? This
query may return a much more data for longer period and more accounts.

CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END::
CHAR(10) AS dbobjekt

is ugly.  I tried to rewrite it using  NullIfNot(dbkonto.objekt1, '+') AS
dbobjekt, bot got error
ERROR:  function nullifnot(character, "unknown") does not exist

How to re-write this in nicer and faster way ?

For most of rows checks

WHEN objektn='+'

will fail: objektn values are usually rarely equal to '+':  they are empty
or null mostly.

Maybe this can be used to optimize the query.

Btw.
Tom Lane's reply from earlier discussion about this query speed (then there
were '' instead of NULL in group columns) some years ago:

"I think the problem is probably that you're sorting two dozen CHAR
columns, and that in many of the rows all these entries are '' forcing
the sort code to compare all two dozen columns (not so)?  So the sort
ends up doing lots and lots and lots of CHAR comparisons.  Which can
be slow, especially in non-C locales."

locale specific check is not  nessecary for those CHAR(10) columns. How to
force PostgreSql to use binary check for grouping ?
Some dbms allow to mark columns as C locale. I havent found this nor
chartobin() function in PostgreSql.
Will creating BinaryNullIfNot(dbkonto.objekt1, '+')   function solve this ?

Andrus.

New testcase:

set search_path to firma2,public;
SET work_mem = 2097151;  -- 9 seconds
-- SET work_mem = 1097151;  -- 9 seconds
--SET work_mem to default; -- 12 seconds

explain analyze SELECT
 CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END,
 bilkaib.DB,
 CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END::
CHAR(10) AS dbobjekt,
 CASE WHEN dbkonto.objekt2='+' THEN bilkaib.DB2OBJEKT ELSE null END::
CHAR(10) AS db2objekt,
 CASE WHEN dbkonto.objekt3='+' THEN bilkaib.DB3OBJEKT ELSE null END::
CHAR(10) AS db3objekt,
 CASE WHEN dbkonto.objekt4='+' THEN bilkaib.DB4OBJEKT ELSE null END::
CHAR(10) AS db4objekt,
 CASE WHEN dbkonto.objekt5='+' THEN bilkaib.DB5OBJEKT ELSE null END::
CHAR(10) AS db5objekt,
 CASE WHEN dbkonto.objekt6='+' THEN bilkaib.DB6OBJEKT ELSE null END::
CHAR(10) AS db6objekt,
 CASE WHEN dbkonto.objekt7='+' THEN bilkaib.DB7OBJEKT ELSE null END::
CHAR(10) AS db7objekt,
 CASE WHEN dbkonto.objekt8='+' THEN bilkaib.DB8OBJEKT ELSE null END::
CHAR(10) AS db8objekt,
 CASE WHEN dbkonto.objekt9='+' THEN bilkaib.DB9OBJEKT ELSE null END::
CHAR(10) AS db9objekt,
 bilkaib.CR,
 CASE WHEN crkonto.objekt1='+' THEN bilkaib.crOBJEKT ELSE null END::
CHAR(10) AS crobjekt,
 CASE WHEN crkonto.objekt2='+' THEN bilkaib.cr2OBJEKT ELSE null END::
CHAR(10) AS cr2objekt,
 CASE WHEN crkonto.objekt3='+' THEN bilkaib.cr3OBJEKT ELSE null END::
CHAR(10) AS cr3objekt,
 CASE WHEN crkonto.objekt4='+' THEN bilkaib.cr4OBJEKT ELSE null END::
CHAR(10) AS cr4objekt,
 CASE WHEN crkonto.objekt5='+' THEN bilkaib.cr5OBJEKT ELSE null END::
CHAR(10) AS cr5objekt,
 CASE WHEN crkonto.objekt6='+' THEN bilkaib.cr6OBJEKT ELSE null END::
CHAR(10) AS cr6objekt,
 CASE WHEN crkonto.objekt7='+' THEN bilkaib.cr7OBJEKT ELSE null END::
CHAR(10) AS cr7objekt,
 CASE WHEN crkonto.objekt8='+' THEN bilkaib.cr8OBJEKT ELSE null END::
CHAR(10) AS cr8objekt,
 CASE WHEN crkonto.objekt9='+' THEN bilkaib.cr9OBJEKT ELSE null END::
CHAR(10) AS cr9objekt,
 bilkaib.RAHA,
 CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa OR crkonto.tyyp IN
('K','I') OR dbkonto.tyyp IN ('K','I')
 THEN  bilkaib.KLIENT ELSE NULL END AS klient,

 bilkaib.EXCHRATE,

 CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa
   OR crkonto.tyyp IN ('K','I') OR dbkonto.tyyp IN ('K','I')
 THEN
   klient.nimi ELSE NULL END AS kliendinim,  -- 24.

 CAST(CASE WHEN crkonto.arvekaupa OR dbkonto.arvekaupa
   OR (bilkaib.cr<>'00' AND crkonto.tyyp='K')
   OR (bilkaib.db<>'00' AND dbkonto.tyyp='K')
 THEN bilkaib.doknr ELSE NULL END AS CHAR(25)) AS doknr

 ,bilkaib.ratediffer
 ,CASE WHEN bilkaib.raha='EEK' THEN DATE'20070101' ELSE bilkaib.kuupaev END
AS kuupaev

 ,SUM(bilkaib.summa)::numeric(14,2) AS summa
   from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND
       crkonto.iseloom='A'
     join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND
       dbkonto.iseloom='A'
     left join klient on bilkaib.klient=klient.kood
   where  ( bilkaib.cr LIKE '112'||'%' OR bilkaib.db LIKE '112'||'%' ) AND
bilkaib.kuupaev BETWEEN '2007-01-01' AND '2008-11-26'
  GROUP BY
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28

"GroupAggregate  (cost=43403.38..52521.63 rows=41923 width=838) (actual
time=8083.171..8620.908 rows=577 loops=1)"
"  ->  Sort  (cost=43403.38..43508.19 rows=41923 width=838) (actual
time=8082.456..8273.259 rows=52156 loops=1)"
"        Sort Key: CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN 0 ELSE
bilkaib.id END, bilkaib.db, (CASE WHEN (dbkonto.objekt1 = '+'::bpchar) THEN
bilkaib.dbobjekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt2 = '+'::bpchar) THEN bilkaib.db2objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt3 = '+'::bpchar) THEN
bilkaib.db3objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt4 = '+'::bpchar) THEN bilkaib.db4objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt5 = '+'::bpchar) THEN
bilkaib.db5objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt6 = '+'::bpchar) THEN bilkaib.db6objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt7 = '+'::bpchar) THEN
bilkaib.db7objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt8 = '+'::bpchar) THEN bilkaib.db8objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt9 = '+'::bpchar) THEN
bilkaib.db9objekt ELSE NULL::bpchar END)::character(10), bilkaib.cr, (CASE
WHEN (crkonto.objekt1 = '+'::bpchar) THEN bilkaib.crobjekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt2 = '+'::bpchar) THEN
bilkaib.cr2objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt3 = '+'::bpchar) THEN bilkaib.cr3objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt4 = '+'::bpchar) THEN
bilkaib.cr4objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt5 = '+'::bpchar) THEN bilkaib.cr5objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt6 = '+'::bpchar) THEN
bilkaib.cr6objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt7 = '+'::bpchar) THEN bilkaib.cr7objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt8 = '+'::bpchar) THEN
bilkaib.cr8objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt9 = '+'::bpchar) THEN bilkaib.cr9objekt ELSE NULL::bpchar
END)::character(10), bilkaib.raha, CASE WHEN ((crkonto.klienkaupa)::boolean
OR (dbkonto.klienkaupa)::boolean OR (crkonto.tyyp = 'K'::bpchar) OR
(crkonto.tyyp = 'I'::bpchar) OR (dbkonto.tyyp = 'K'::bpchar) OR
(dbkonto.tyyp = 'I'::bpchar)) THEN bilkaib.klient ELSE NULL::bpchar END,
bilkaib.exchrate, CASE WHEN ((crkonto.klienkaupa)::boolean OR
(dbkonto.klienkaupa)::boolean OR (crkonto.tyyp = 'K'::bpchar) OR
(crkonto.tyyp = 'I'::bpchar) OR (dbkonto.tyyp = 'K'::bpchar) OR
(dbkonto.tyyp = 'I'::bpchar)) THEN klient.nimi ELSE NULL::bpchar END, (CASE
WHEN ((crkonto.arvekaupa)::boolean OR (dbkonto.arvekaupa)::boolean OR
((bilkaib.cr <> '00'::bpchar) AND (crkonto.tyyp = 'K'::bpchar)) OR
((bilkaib.db <> '00'::bpchar) AND (dbkonto.tyyp = 'K'::bpchar))) THEN
bilkaib.doknr ELSE NULL::bpchar END)::character(25), bilkaib.ratediffer,
CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN '2007-01-01'::date ELSE
bilkaib.kuupaev END"
"        ->  Hash Left Join  (cost=936.48..40184.64 rows=41923 width=838)
(actual time=47.409..2427.059 rows=52156 loops=1)"
"              Hash Cond: ("outer".klient = "inner".kood)"
"              ->  Hash Join  (cost=785.35..34086.74 rows=41923 width=764)
(actual time=35.669..1414.794 rows=52156 loops=1)"
"                    Hash Cond: ("outer".cr = "inner".kontonr)"
"                    ->  Hash Join  (cost=764.26..33403.76 rows=48533
width=712) (actual time=32.839..954.784 rows=52156 loops=1)"
"                          Hash Cond: ("outer".db = "inner".kontonr)"
"                          ->  Bitmap Heap Scan on bilkaib
(cost=743.17..32616.41 rows=56185 width=660) (actual time=30.337..448.153
rows=52156 loops=1)"
"                                Recheck Cond: ((cr ~~ '112%'::text) OR (db
~~ '112%'::text))"
"                                Filter: (((cr ~~ '112%'::text) OR (db ~~
'112%'::text)) AND (kuupaev >= '2007-01-01'::date) AND (kuupaev <=
'2008-11-26'::date))"
"                                ->  BitmapOr  (cost=743.17..743.17
rows=65862 width=0) (actual time=27.194..27.194 rows=0 loops=1)"
"                                      ->  Bitmap Index Scan on
bilkaib_cr_pattern_idx  (cost=0.00..236.63 rows=20939 width=0) (actual
time=8.833..8.833 rows=21028 loops=1)"
"                                            Index Cond: ((cr ~>=~
'112'::bpchar) AND (cr ~<~ '113'::bpchar))"
"                                      ->  Bitmap Index Scan on
bilkaib_db_pattern_idx  (cost=0.00..506.54 rows=44923 width=0) (actual
time=18.345..18.345 rows=45426 loops=1)"
"                                            Index Cond: ((db ~>=~
'112'::bpchar) AND (db ~<~ '113'::bpchar))"
"                          ->  Hash  (cost=20.49..20.49 rows=241 width=66)
(actual time=2.450..2.450 rows=241 loops=1)"
"                                ->  Seq Scan on konto dbkonto
(cost=0.00..20.49 rows=241 width=66) (actual time=0.014..1.232 rows=241
loops=1)"
"                                      Filter: (iseloom = 'A'::bpchar)"
"                    ->  Hash  (cost=20.49..20.49 rows=241 width=66) (actual
time=2.799..2.799 rows=241 loops=1)"
"                          ->  Seq Scan on konto crkonto  (cost=0.00..20.49
rows=241 width=66) (actual time=0.029..1.536 rows=241 loops=1)"
"                                Filter: (iseloom = 'A'::bpchar)"
"              ->  Hash  (cost=147.90..147.90 rows=1290 width=90) (actual
time=11.661..11.661 rows=1290 loops=1)"
"                    ->  Seq Scan on klient  (cost=0.00..147.90 rows=1290
width=90) (actual time=0.014..5.808 rows=1290 loops=1)"
"Total runtime: 8634.630 ms"


Re: Increasing GROUP BY CHAR columns speed

From
"Scott Marlowe"
Date:
On Fri, Nov 28, 2008 at 10:58 AM, Andrus <kobruleht2@hot.ee> wrote:
>
> SET work_mem = 2097151  (this is max allowed value) or SET work_mem = 97151
> decreases query time from 12 seconds to 9 seconds.

Setting work_mem so high that it allocates memory that isn't there
WILL slow down your queries, because the OS will then wind up swapping
out one part of the swap to make room for another part.  There are
values between 100M and 2G.  Run it with increasing work_mem from 100
to 500 or so Meg and see if that helps.  Keep an eye on vmstat 1 or
something to see if your machine starts swapping out while running the
query.  If it does you've overallocated memory and things will start
to slow down a LOT.

Re: Increasing GROUP BY CHAR columns speed

From
Scott Carey
Date:
I'm not sure that postgres allocates the whole work_mem each time, and in any event, the OS won't try and allocate to
RAMuntil the page is actually used.  My experience is that oversized work_mem doesn't hurt until it is actually used.
Else,my configuration with 1000 concurrent connections and work_mem = 1GB would have blown up.  I don't have that much
RAM+ SWAP * overcommit.  Of the 1000 connections, only 8 run queries that would ever need more than 2 or 3 MB of space
toexecute.  Of course, one has to be certain what the client connections do for it to be very over sized, so I would
notrecommend the above in general. 

----------
Back to this query:

In the query case shown, the explain analyze shows:
"GroupAggregate  (cost=43403.38..52521.63 rows=41923 width=838) (actual
time=8083.171..8620.908 rows=577 loops=1)"

Thus, the planner thought that it needed ~40K ish rows for results of ~800 bytes in size, hence an approximation of the
requiredhash space is 80M.  However, it returns only 577 rows, so the actual needed hash space is much smaller.  This
isa common problem I've seen -- the query planner has very conservative estimates for result row counts from any
non-trivialfilter condition / aggregate which leads to poor query plans. 

I'd be suprised if this query used more than 1MB total work_mem in reality for that last step if it used a hash.  As it
stands,sorting will actually use much more. 

I'm still not sure why the planner chose to sort rather than hash with oversized work_mem (is there an implied order in
thequery results I missed?).   My guess is that this query can still get much faster if a hash is possible on the last
part. It looks like the gain so far has more to do with sorting purely in memory which reduced the number of compares
required. But that is just a guess. 

________________________________________
From: Scott Marlowe [scott.marlowe@gmail.com]
Sent: Friday, November 28, 2008 10:24 AM
To: Andrus
Cc: Scott Carey; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Increasing GROUP BY CHAR columns speed

On Fri, Nov 28, 2008 at 10:58 AM, Andrus <kobruleht2@hot.ee> wrote:
>
> SET work_mem = 2097151  (this is max allowed value) or SET work_mem = 97151
> decreases query time from 12 seconds to 9 seconds.

Setting work_mem so high that it allocates memory that isn't there
WILL slow down your queries, because the OS will then wind up swapping
out one part of the swap to make room for another part.  There are
values between 100M and 2G.  Run it with increasing work_mem from 100
to 500 or so Meg and see if that helps.  Keep an eye on vmstat 1 or
something to see if your machine starts swapping out while running the
query.  If it does you've overallocated memory and things will start
to slow down a LOT.

Re: Increasing GROUP BY CHAR columns speed

From
"Andrus"
Date:
Application should work with any server starting at 8.1 with any RAM size
(probably starting at 1 GB).

How to find safe value which does not cause error in SET work_mem command ?

If I use 2 GB maybe this can cause excaption when running in server with 1
GB RAM where this setting may be not allowed.

Andrus.


Re: Increasing GROUP BY CHAR columns speed

From
"Andrus"
Date:
I it seems that slowness is caused by grouping by column

exchrate numeric(13,8)

if this column is excluded, query takes 12 seconds
if this column in present, query takes 27 (!) seconds.
How to fix this ?

Andrus.

set search_path to firma2,public;
SET work_mem = 2097151 ;
explain analyze SELECT
 CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END,
 bilkaib.DB,
 bilkaib.CR,
   bilkaib.RAHA, -- 12 sek
 bilkaib.EXCHRATE, -- 27 sec
 SUM(bilkaib.summa)::numeric(14,2) AS summa
 from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND
       crkonto.iseloom='A'
     join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND
       dbkonto.iseloom='A'
     left join klient on bilkaib.klient=klient.kood
 WHERE --(bilkaib.cr LIKE ''||'%' OR bilkaib.db LIKE ''||'%')
  bilkaib.kuupaev BETWEEN '2008-01-01' AND '2008-12-31'
     GROUP BY 1,2,3,4,5

"GroupAggregate  (cost=71338.72..79761.05 rows=240638 width=58) (actual
time=24570.085..27382.022 rows=217 loops=1)"
"  ->  Sort  (cost=71338.72..71940.31 rows=240638 width=58) (actual
time=24566.700..25744.006 rows=322202 loops=1)"
"        Sort Key: CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN 0 ELSE
bilkaib.id END, bilkaib.db, bilkaib.cr, bilkaib.raha, bilkaib.exchrate"
"        ->  Hash Left Join  (cost=193.31..49829.89 rows=240638 width=58)
(actual time=17.072..9901.578 rows=322202 loops=1)"
"              Hash Cond: ("outer".klient = "inner".kood)"
"              ->  Hash Join  (cost=42.18..45624.00 rows=240638 width=74)
(actual time=4.715..7151.111 rows=322202 loops=1)"
"                    Hash Cond: ("outer".cr = "inner".kontonr)"
"                    ->  Hash Join  (cost=21.09..41803.63 rows=278581
width=74) (actual time=2.306..4598.703 rows=322202 loops=1)"
"                          Hash Cond: ("outer".db = "inner".kontonr)"
"                          ->  Seq Scan on bilkaib  (cost=0.00..37384.19
rows=322507 width=74) (actual time=0.075..1895.027 rows=322202 loops=1)"
"                                Filter: ((kuupaev >= '2008-01-01'::date)
AND (kuupaev <= '2008-12-31'::date))"
"                          ->  Hash  (cost=20.49..20.49 rows=241 width=14)
(actual time=2.193..2.193 rows=241 loops=1)"
"                                ->  Seq Scan on konto dbkonto
(cost=0.00..20.49 rows=241 width=14) (actual time=0.011..1.189 rows=241
loops=1)"
"                                      Filter: (iseloom = 'A'::bpchar)"
"                    ->  Hash  (cost=20.49..20.49 rows=241 width=14) (actual
time=2.386..2.386 rows=241 loops=1)"
"                          ->  Seq Scan on konto crkonto  (cost=0.00..20.49
rows=241 width=14) (actual time=0.020..1.394 rows=241 loops=1)"
"                                Filter: (iseloom = 'A'::bpchar)"
"              ->  Hash  (cost=147.90..147.90 rows=1290 width=16) (actual
time=12.319..12.319 rows=1290 loops=1)"
"                    ->  Seq Scan on klient  (cost=0.00..147.90 rows=1290
width=16) (actual time=0.032..6.979 rows=1290 loops=1)"
"Total runtime: 27434.724 ms"


set search_path to firma2,public;
SET work_mem = 2097151 ;
explain analyze SELECT
 CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END,
 bilkaib.DB,
 bilkaib.CR,
   bilkaib.RAHA,
 SUM(bilkaib.summa)::numeric(14,2) AS summa
 from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND
       crkonto.iseloom='A'
     join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND
       dbkonto.iseloom='A'
     left join klient on bilkaib.klient=klient.kood
 WHERE
  bilkaib.kuupaev BETWEEN '2008-01-01' AND '2008-12-31'
     GROUP BY 1,2,3,4

"HashAggregate  (cost=52837.86..57049.03 rows=240638 width=50) (actual
time=11744.137..11745.578 rows=215 loops=1)"
"  ->  Hash Left Join  (cost=193.31..49829.89 rows=240638 width=50) (actual
time=17.330..9826.549 rows=322202 loops=1)"
"        Hash Cond: ("outer".klient = "inner".kood)"
"        ->  Hash Join  (cost=42.18..45624.00 rows=240638 width=66) (actual
time=4.804..7141.983 rows=322202 loops=1)"
"              Hash Cond: ("outer".cr = "inner".kontonr)"
"              ->  Hash Join  (cost=21.09..41803.63 rows=278581 width=66)
(actual time=2.343..4600.683 rows=322202 loops=1)"
"                    Hash Cond: ("outer".db = "inner".kontonr)"
"                    ->  Seq Scan on bilkaib  (cost=0.00..37384.19
rows=322507 width=66) (actual time=0.081..1939.376 rows=322202 loops=1)"
"                          Filter: ((kuupaev >= '2008-01-01'::date) AND
(kuupaev <= '2008-12-31'::date))"
"                    ->  Hash  (cost=20.49..20.49 rows=241 width=14) (actual
time=2.207..2.207 rows=241 loops=1)"
"                          ->  Seq Scan on konto dbkonto  (cost=0.00..20.49
rows=241 width=14) (actual time=0.014..1.179 rows=241 loops=1)"
"                                Filter: (iseloom = 'A'::bpchar)"
"              ->  Hash  (cost=20.49..20.49 rows=241 width=14) (actual
time=2.426..2.426 rows=241 loops=1)"
"                    ->  Seq Scan on konto crkonto  (cost=0.00..20.49
rows=241 width=14) (actual time=0.029..1.444 rows=241 loops=1)"
"                          Filter: (iseloom = 'A'::bpchar)"
"        ->  Hash  (cost=147.90..147.90 rows=1290 width=16) (actual
time=12.477..12.477 rows=1290 loops=1)"
"              ->  Seq Scan on klient  (cost=0.00..147.90 rows=1290
width=16) (actual time=0.034..7.081 rows=1290 loops=1)"
"Total runtime: 11748.066 ms"


Re: Increasing GROUP BY CHAR columns speed

From
"Scott Marlowe"
Date:
On Fri, Nov 28, 2008 at 12:51 PM, Scott Carey <scott@richrelevance.com> wrote:
> I'm not sure that postgres allocates the whole work_mem each time, and in any event, the OS won't try and allocate to
RAMuntil the page is actually used.  My experience is that oversized work_mem doesn't hurt until it is actually used.
Else,my configuration with 1000 concurrent connections and work_mem = 1GB would have blown up.  I don't have that much
RAM+ SWAP * overcommit.  Of the 1000 connections, only 8 run queries that would ever need more than 2 or 3 MB of space
toexecute.  Of course, one has to be certain what the client connections do for it to be very over sized, so I would
notrecommend the above in general. 

That's kinda like saying I have a revolver with 100 chambers and only
one bullet, and it seems perfectly safe to put it to my head and keep
pulling the trigger.

Of course pg doesn't allocate the whole amount every time.  It
allocates what it needs, up to the max you allow.  by setting it to 1G
it's quite possible that eventually enough queries will be running
that need a fair bit of work_mem and suddenly bang, your server is in
a swap storm and goes non-responsive.

It's far better to set it to something reasonable, like 4 or 8 Meg,
then for the individual queries that need more set it higher at run
time.

Re: Increasing GROUP BY CHAR columns speed

From
"Andrus"
Date:
>I it seems that slowness is caused by grouping by column
>
> exchrate numeric(13,8)

exchrate has different values in few rows.
It has usually value 0
In this sample query it is always 0.

I tried not change exchrate with

nullif( bilkaib.EXCHRATE,0)

but this does not up speed query, no idea why.

Andrus.

Re: Increasing GROUP BY CHAR columns speed

From
"Andrus"
Date:
>I'm still not sure why the planner chose to sort rather than hash with
>oversized work_mem (is there an implied order in the query results I
>missed?).

Group by contains decimal column exchrate. Maybe pg is not capable to use
hash with numeric datatype.

>   My guess is that this query can still get much faster if a hash is
> possible on the last part.  It looks like the gain so far has more to do
> with sorting purely in memory which reduced the number of compares
> required.  But that is just a guess.

I fixed this by adding cast to :::float

bilkaib.exchrate:::float

In this case query is much faster.
Hopefully this will not affect to result since numeric(13,8) can casted to
float without data loss.

Andrus.


Re: Increasing GROUP BY CHAR columns speed

From
"Greg Stark"
Date:
On Sat, Nov 29, 2008 at 6:43 PM, Andrus <kobruleht2@hot.ee> wrote:
>> I'm still not sure why the planner chose to sort rather than hash with
>> oversized work_mem (is there an implied order in the query results I
>> missed?).
>
> Group by contains decimal column exchrate. Maybe pg is not capable to use
> hash with numeric datatype.

It is in 8.3. I think sorting was improved dramatically since 8.1 as well.

> I fixed this by adding cast to :::float
>
> bilkaib.exchrate:::float
>
> In this case query is much faster.
> Hopefully this will not affect to result since numeric(13,8) can casted to
> float without data loss.

That's not true. Even pretty simple values like 1.1 cannot be
represented precisely in a float. It would display properly though
which might be all you're concerned with here. I'm not sure whether
that's true for all values in numeric(13,8) though

Do you really need to be grouping on so many columns? If they're
normally all the same perhaps you can do two queries, one which
fetches the common values without any group by, just a simple
aggregate, and a second which groups by all these columns but only for
the few exceptional records.

You could avoid the collation support on the char() columns by casting
them to bytea first. That might be a bit of a headache though.

--
greg