Re: Increasing GROUP BY CHAR columns speed - Mailing list pgsql-performance

From Andrus
Subject Re: Increasing GROUP BY CHAR columns speed
Date
Msg-id 3E5CE5640FFD4172956D223988D8D578@andrusnotebook
Whole thread Raw
In response to Re: Increasing GROUP BY CHAR columns speed  (Scott Carey <scott@richrelevance.com>)
Responses Re: Increasing GROUP BY CHAR columns speed
List pgsql-performance
Scott,

Thank you.

>The below query is spending most of its time in the sort, or perhaps the
>complicated check condition before it.
>The explain has a 8 second gap in time between the 2.8 seconds after the
>Hash Left Join and before the Sort.  I'm guessing its hidden in the sort.
>You can get the planner to switch from a sort to a hash aggregate with a
>large work_mem.  Try calling
>SET work_mem = '100MB';
>before this query first.
>It may not help that much if the check time is as expensive as it looks in
>the plan below, but its very easy to try.
>If it does help, you may want to temporarily increase that value only for
>this query rather than making it a default in the config file.

SET work_mem = 2097151  (this is max allowed value) or SET work_mem = 97151
decreases query time from 12 seconds to 9 seconds.

My application may ran in servers with  1 GB RAM only. I'm afraid than in
those servers 2097151  will cause error and abort query.

Is it reasonable to add

SET work_mem = 97151

before this query and

SET work_mem TO  DEFAULT

after this query ?
Or should I use max value in cases where there are much more data ? This
query may return a much more data for longer period and more accounts.

CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END::
CHAR(10) AS dbobjekt

is ugly.  I tried to rewrite it using  NullIfNot(dbkonto.objekt1, '+') AS
dbobjekt, bot got error
ERROR:  function nullifnot(character, "unknown") does not exist

How to re-write this in nicer and faster way ?

For most of rows checks

WHEN objektn='+'

will fail: objektn values are usually rarely equal to '+':  they are empty
or null mostly.

Maybe this can be used to optimize the query.

Btw.
Tom Lane's reply from earlier discussion about this query speed (then there
were '' instead of NULL in group columns) some years ago:

"I think the problem is probably that you're sorting two dozen CHAR
columns, and that in many of the rows all these entries are '' forcing
the sort code to compare all two dozen columns (not so)?  So the sort
ends up doing lots and lots and lots of CHAR comparisons.  Which can
be slow, especially in non-C locales."

locale specific check is not  nessecary for those CHAR(10) columns. How to
force PostgreSql to use binary check for grouping ?
Some dbms allow to mark columns as C locale. I havent found this nor
chartobin() function in PostgreSql.
Will creating BinaryNullIfNot(dbkonto.objekt1, '+')   function solve this ?

Andrus.

New testcase:

set search_path to firma2,public;
SET work_mem = 2097151;  -- 9 seconds
-- SET work_mem = 1097151;  -- 9 seconds
--SET work_mem to default; -- 12 seconds

explain analyze SELECT
 CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END,
 bilkaib.DB,
 CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END::
CHAR(10) AS dbobjekt,
 CASE WHEN dbkonto.objekt2='+' THEN bilkaib.DB2OBJEKT ELSE null END::
CHAR(10) AS db2objekt,
 CASE WHEN dbkonto.objekt3='+' THEN bilkaib.DB3OBJEKT ELSE null END::
CHAR(10) AS db3objekt,
 CASE WHEN dbkonto.objekt4='+' THEN bilkaib.DB4OBJEKT ELSE null END::
CHAR(10) AS db4objekt,
 CASE WHEN dbkonto.objekt5='+' THEN bilkaib.DB5OBJEKT ELSE null END::
CHAR(10) AS db5objekt,
 CASE WHEN dbkonto.objekt6='+' THEN bilkaib.DB6OBJEKT ELSE null END::
CHAR(10) AS db6objekt,
 CASE WHEN dbkonto.objekt7='+' THEN bilkaib.DB7OBJEKT ELSE null END::
CHAR(10) AS db7objekt,
 CASE WHEN dbkonto.objekt8='+' THEN bilkaib.DB8OBJEKT ELSE null END::
CHAR(10) AS db8objekt,
 CASE WHEN dbkonto.objekt9='+' THEN bilkaib.DB9OBJEKT ELSE null END::
CHAR(10) AS db9objekt,
 bilkaib.CR,
 CASE WHEN crkonto.objekt1='+' THEN bilkaib.crOBJEKT ELSE null END::
CHAR(10) AS crobjekt,
 CASE WHEN crkonto.objekt2='+' THEN bilkaib.cr2OBJEKT ELSE null END::
CHAR(10) AS cr2objekt,
 CASE WHEN crkonto.objekt3='+' THEN bilkaib.cr3OBJEKT ELSE null END::
CHAR(10) AS cr3objekt,
 CASE WHEN crkonto.objekt4='+' THEN bilkaib.cr4OBJEKT ELSE null END::
CHAR(10) AS cr4objekt,
 CASE WHEN crkonto.objekt5='+' THEN bilkaib.cr5OBJEKT ELSE null END::
CHAR(10) AS cr5objekt,
 CASE WHEN crkonto.objekt6='+' THEN bilkaib.cr6OBJEKT ELSE null END::
CHAR(10) AS cr6objekt,
 CASE WHEN crkonto.objekt7='+' THEN bilkaib.cr7OBJEKT ELSE null END::
CHAR(10) AS cr7objekt,
 CASE WHEN crkonto.objekt8='+' THEN bilkaib.cr8OBJEKT ELSE null END::
CHAR(10) AS cr8objekt,
 CASE WHEN crkonto.objekt9='+' THEN bilkaib.cr9OBJEKT ELSE null END::
CHAR(10) AS cr9objekt,
 bilkaib.RAHA,
 CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa OR crkonto.tyyp IN
('K','I') OR dbkonto.tyyp IN ('K','I')
 THEN  bilkaib.KLIENT ELSE NULL END AS klient,

 bilkaib.EXCHRATE,

 CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa
   OR crkonto.tyyp IN ('K','I') OR dbkonto.tyyp IN ('K','I')
 THEN
   klient.nimi ELSE NULL END AS kliendinim,  -- 24.

 CAST(CASE WHEN crkonto.arvekaupa OR dbkonto.arvekaupa
   OR (bilkaib.cr<>'00' AND crkonto.tyyp='K')
   OR (bilkaib.db<>'00' AND dbkonto.tyyp='K')
 THEN bilkaib.doknr ELSE NULL END AS CHAR(25)) AS doknr

 ,bilkaib.ratediffer
 ,CASE WHEN bilkaib.raha='EEK' THEN DATE'20070101' ELSE bilkaib.kuupaev END
AS kuupaev

 ,SUM(bilkaib.summa)::numeric(14,2) AS summa
   from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND
       crkonto.iseloom='A'
     join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND
       dbkonto.iseloom='A'
     left join klient on bilkaib.klient=klient.kood
   where  ( bilkaib.cr LIKE '112'||'%' OR bilkaib.db LIKE '112'||'%' ) AND
bilkaib.kuupaev BETWEEN '2007-01-01' AND '2008-11-26'
  GROUP BY
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28

"GroupAggregate  (cost=43403.38..52521.63 rows=41923 width=838) (actual
time=8083.171..8620.908 rows=577 loops=1)"
"  ->  Sort  (cost=43403.38..43508.19 rows=41923 width=838) (actual
time=8082.456..8273.259 rows=52156 loops=1)"
"        Sort Key: CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN 0 ELSE
bilkaib.id END, bilkaib.db, (CASE WHEN (dbkonto.objekt1 = '+'::bpchar) THEN
bilkaib.dbobjekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt2 = '+'::bpchar) THEN bilkaib.db2objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt3 = '+'::bpchar) THEN
bilkaib.db3objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt4 = '+'::bpchar) THEN bilkaib.db4objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt5 = '+'::bpchar) THEN
bilkaib.db5objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt6 = '+'::bpchar) THEN bilkaib.db6objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt7 = '+'::bpchar) THEN
bilkaib.db7objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(dbkonto.objekt8 = '+'::bpchar) THEN bilkaib.db8objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (dbkonto.objekt9 = '+'::bpchar) THEN
bilkaib.db9objekt ELSE NULL::bpchar END)::character(10), bilkaib.cr, (CASE
WHEN (crkonto.objekt1 = '+'::bpchar) THEN bilkaib.crobjekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt2 = '+'::bpchar) THEN
bilkaib.cr2objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt3 = '+'::bpchar) THEN bilkaib.cr3objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt4 = '+'::bpchar) THEN
bilkaib.cr4objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt5 = '+'::bpchar) THEN bilkaib.cr5objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt6 = '+'::bpchar) THEN
bilkaib.cr6objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt7 = '+'::bpchar) THEN bilkaib.cr7objekt ELSE NULL::bpchar
END)::character(10), (CASE WHEN (crkonto.objekt8 = '+'::bpchar) THEN
bilkaib.cr8objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN
(crkonto.objekt9 = '+'::bpchar) THEN bilkaib.cr9objekt ELSE NULL::bpchar
END)::character(10), bilkaib.raha, CASE WHEN ((crkonto.klienkaupa)::boolean
OR (dbkonto.klienkaupa)::boolean OR (crkonto.tyyp = 'K'::bpchar) OR
(crkonto.tyyp = 'I'::bpchar) OR (dbkonto.tyyp = 'K'::bpchar) OR
(dbkonto.tyyp = 'I'::bpchar)) THEN bilkaib.klient ELSE NULL::bpchar END,
bilkaib.exchrate, CASE WHEN ((crkonto.klienkaupa)::boolean OR
(dbkonto.klienkaupa)::boolean OR (crkonto.tyyp = 'K'::bpchar) OR
(crkonto.tyyp = 'I'::bpchar) OR (dbkonto.tyyp = 'K'::bpchar) OR
(dbkonto.tyyp = 'I'::bpchar)) THEN klient.nimi ELSE NULL::bpchar END, (CASE
WHEN ((crkonto.arvekaupa)::boolean OR (dbkonto.arvekaupa)::boolean OR
((bilkaib.cr <> '00'::bpchar) AND (crkonto.tyyp = 'K'::bpchar)) OR
((bilkaib.db <> '00'::bpchar) AND (dbkonto.tyyp = 'K'::bpchar))) THEN
bilkaib.doknr ELSE NULL::bpchar END)::character(25), bilkaib.ratediffer,
CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN '2007-01-01'::date ELSE
bilkaib.kuupaev END"
"        ->  Hash Left Join  (cost=936.48..40184.64 rows=41923 width=838)
(actual time=47.409..2427.059 rows=52156 loops=1)"
"              Hash Cond: ("outer".klient = "inner".kood)"
"              ->  Hash Join  (cost=785.35..34086.74 rows=41923 width=764)
(actual time=35.669..1414.794 rows=52156 loops=1)"
"                    Hash Cond: ("outer".cr = "inner".kontonr)"
"                    ->  Hash Join  (cost=764.26..33403.76 rows=48533
width=712) (actual time=32.839..954.784 rows=52156 loops=1)"
"                          Hash Cond: ("outer".db = "inner".kontonr)"
"                          ->  Bitmap Heap Scan on bilkaib
(cost=743.17..32616.41 rows=56185 width=660) (actual time=30.337..448.153
rows=52156 loops=1)"
"                                Recheck Cond: ((cr ~~ '112%'::text) OR (db
~~ '112%'::text))"
"                                Filter: (((cr ~~ '112%'::text) OR (db ~~
'112%'::text)) AND (kuupaev >= '2007-01-01'::date) AND (kuupaev <=
'2008-11-26'::date))"
"                                ->  BitmapOr  (cost=743.17..743.17
rows=65862 width=0) (actual time=27.194..27.194 rows=0 loops=1)"
"                                      ->  Bitmap Index Scan on
bilkaib_cr_pattern_idx  (cost=0.00..236.63 rows=20939 width=0) (actual
time=8.833..8.833 rows=21028 loops=1)"
"                                            Index Cond: ((cr ~>=~
'112'::bpchar) AND (cr ~<~ '113'::bpchar))"
"                                      ->  Bitmap Index Scan on
bilkaib_db_pattern_idx  (cost=0.00..506.54 rows=44923 width=0) (actual
time=18.345..18.345 rows=45426 loops=1)"
"                                            Index Cond: ((db ~>=~
'112'::bpchar) AND (db ~<~ '113'::bpchar))"
"                          ->  Hash  (cost=20.49..20.49 rows=241 width=66)
(actual time=2.450..2.450 rows=241 loops=1)"
"                                ->  Seq Scan on konto dbkonto
(cost=0.00..20.49 rows=241 width=66) (actual time=0.014..1.232 rows=241
loops=1)"
"                                      Filter: (iseloom = 'A'::bpchar)"
"                    ->  Hash  (cost=20.49..20.49 rows=241 width=66) (actual
time=2.799..2.799 rows=241 loops=1)"
"                          ->  Seq Scan on konto crkonto  (cost=0.00..20.49
rows=241 width=66) (actual time=0.029..1.536 rows=241 loops=1)"
"                                Filter: (iseloom = 'A'::bpchar)"
"              ->  Hash  (cost=147.90..147.90 rows=1290 width=90) (actual
time=11.661..11.661 rows=1290 loops=1)"
"                    ->  Seq Scan on klient  (cost=0.00..147.90 rows=1290
width=90) (actual time=0.014..5.808 rows=1290 loops=1)"
"Total runtime: 8634.630 ms"


pgsql-performance by date:

Previous
From: Scott Carey
Date:
Subject: Re: Increasing GROUP BY CHAR columns speed
Next
From: "Scott Marlowe"
Date:
Subject: Re: Increasing GROUP BY CHAR columns speed