Re: Increasing GROUP BY CHAR columns speed - Mailing list pgsql-performance
From | Andrus |
---|---|
Subject | Re: Increasing GROUP BY CHAR columns speed |
Date | |
Msg-id | 3E5CE5640FFD4172956D223988D8D578@andrusnotebook Whole thread Raw |
In response to | Re: Increasing GROUP BY CHAR columns speed (Scott Carey <scott@richrelevance.com>) |
Responses |
Re: Increasing GROUP BY CHAR columns speed
|
List | pgsql-performance |
Scott, Thank you. >The below query is spending most of its time in the sort, or perhaps the >complicated check condition before it. >The explain has a 8 second gap in time between the 2.8 seconds after the >Hash Left Join and before the Sort. I'm guessing its hidden in the sort. >You can get the planner to switch from a sort to a hash aggregate with a >large work_mem. Try calling >SET work_mem = '100MB'; >before this query first. >It may not help that much if the check time is as expensive as it looks in >the plan below, but its very easy to try. >If it does help, you may want to temporarily increase that value only for >this query rather than making it a default in the config file. SET work_mem = 2097151 (this is max allowed value) or SET work_mem = 97151 decreases query time from 12 seconds to 9 seconds. My application may ran in servers with 1 GB RAM only. I'm afraid than in those servers 2097151 will cause error and abort query. Is it reasonable to add SET work_mem = 97151 before this query and SET work_mem TO DEFAULT after this query ? Or should I use max value in cases where there are much more data ? This query may return a much more data for longer period and more accounts. CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END:: CHAR(10) AS dbobjekt is ugly. I tried to rewrite it using NullIfNot(dbkonto.objekt1, '+') AS dbobjekt, bot got error ERROR: function nullifnot(character, "unknown") does not exist How to re-write this in nicer and faster way ? For most of rows checks WHEN objektn='+' will fail: objektn values are usually rarely equal to '+': they are empty or null mostly. Maybe this can be used to optimize the query. Btw. Tom Lane's reply from earlier discussion about this query speed (then there were '' instead of NULL in group columns) some years ago: "I think the problem is probably that you're sorting two dozen CHAR columns, and that in many of the rows all these entries are '' forcing the sort code to compare all two dozen columns (not so)? So the sort ends up doing lots and lots and lots of CHAR comparisons. Which can be slow, especially in non-C locales." locale specific check is not nessecary for those CHAR(10) columns. How to force PostgreSql to use binary check for grouping ? Some dbms allow to mark columns as C locale. I havent found this nor chartobin() function in PostgreSql. Will creating BinaryNullIfNot(dbkonto.objekt1, '+') function solve this ? Andrus. New testcase: set search_path to firma2,public; SET work_mem = 2097151; -- 9 seconds -- SET work_mem = 1097151; -- 9 seconds --SET work_mem to default; -- 12 seconds explain analyze SELECT CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END, bilkaib.DB, CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END:: CHAR(10) AS dbobjekt, CASE WHEN dbkonto.objekt2='+' THEN bilkaib.DB2OBJEKT ELSE null END:: CHAR(10) AS db2objekt, CASE WHEN dbkonto.objekt3='+' THEN bilkaib.DB3OBJEKT ELSE null END:: CHAR(10) AS db3objekt, CASE WHEN dbkonto.objekt4='+' THEN bilkaib.DB4OBJEKT ELSE null END:: CHAR(10) AS db4objekt, CASE WHEN dbkonto.objekt5='+' THEN bilkaib.DB5OBJEKT ELSE null END:: CHAR(10) AS db5objekt, CASE WHEN dbkonto.objekt6='+' THEN bilkaib.DB6OBJEKT ELSE null END:: CHAR(10) AS db6objekt, CASE WHEN dbkonto.objekt7='+' THEN bilkaib.DB7OBJEKT ELSE null END:: CHAR(10) AS db7objekt, CASE WHEN dbkonto.objekt8='+' THEN bilkaib.DB8OBJEKT ELSE null END:: CHAR(10) AS db8objekt, CASE WHEN dbkonto.objekt9='+' THEN bilkaib.DB9OBJEKT ELSE null END:: CHAR(10) AS db9objekt, bilkaib.CR, CASE WHEN crkonto.objekt1='+' THEN bilkaib.crOBJEKT ELSE null END:: CHAR(10) AS crobjekt, CASE WHEN crkonto.objekt2='+' THEN bilkaib.cr2OBJEKT ELSE null END:: CHAR(10) AS cr2objekt, CASE WHEN crkonto.objekt3='+' THEN bilkaib.cr3OBJEKT ELSE null END:: CHAR(10) AS cr3objekt, CASE WHEN crkonto.objekt4='+' THEN bilkaib.cr4OBJEKT ELSE null END:: CHAR(10) AS cr4objekt, CASE WHEN crkonto.objekt5='+' THEN bilkaib.cr5OBJEKT ELSE null END:: CHAR(10) AS cr5objekt, CASE WHEN crkonto.objekt6='+' THEN bilkaib.cr6OBJEKT ELSE null END:: CHAR(10) AS cr6objekt, CASE WHEN crkonto.objekt7='+' THEN bilkaib.cr7OBJEKT ELSE null END:: CHAR(10) AS cr7objekt, CASE WHEN crkonto.objekt8='+' THEN bilkaib.cr8OBJEKT ELSE null END:: CHAR(10) AS cr8objekt, CASE WHEN crkonto.objekt9='+' THEN bilkaib.cr9OBJEKT ELSE null END:: CHAR(10) AS cr9objekt, bilkaib.RAHA, CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa OR crkonto.tyyp IN ('K','I') OR dbkonto.tyyp IN ('K','I') THEN bilkaib.KLIENT ELSE NULL END AS klient, bilkaib.EXCHRATE, CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa OR crkonto.tyyp IN ('K','I') OR dbkonto.tyyp IN ('K','I') THEN klient.nimi ELSE NULL END AS kliendinim, -- 24. CAST(CASE WHEN crkonto.arvekaupa OR dbkonto.arvekaupa OR (bilkaib.cr<>'00' AND crkonto.tyyp='K') OR (bilkaib.db<>'00' AND dbkonto.tyyp='K') THEN bilkaib.doknr ELSE NULL END AS CHAR(25)) AS doknr ,bilkaib.ratediffer ,CASE WHEN bilkaib.raha='EEK' THEN DATE'20070101' ELSE bilkaib.kuupaev END AS kuupaev ,SUM(bilkaib.summa)::numeric(14,2) AS summa from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND crkonto.iseloom='A' join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND dbkonto.iseloom='A' left join klient on bilkaib.klient=klient.kood where ( bilkaib.cr LIKE '112'||'%' OR bilkaib.db LIKE '112'||'%' ) AND bilkaib.kuupaev BETWEEN '2007-01-01' AND '2008-11-26' GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28 "GroupAggregate (cost=43403.38..52521.63 rows=41923 width=838) (actual time=8083.171..8620.908 rows=577 loops=1)" " -> Sort (cost=43403.38..43508.19 rows=41923 width=838) (actual time=8082.456..8273.259 rows=52156 loops=1)" " Sort Key: CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN 0 ELSE bilkaib.id END, bilkaib.db, (CASE WHEN (dbkonto.objekt1 = '+'::bpchar) THEN bilkaib.dbobjekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt2 = '+'::bpchar) THEN bilkaib.db2objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt3 = '+'::bpchar) THEN bilkaib.db3objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt4 = '+'::bpchar) THEN bilkaib.db4objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt5 = '+'::bpchar) THEN bilkaib.db5objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt6 = '+'::bpchar) THEN bilkaib.db6objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt7 = '+'::bpchar) THEN bilkaib.db7objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt8 = '+'::bpchar) THEN bilkaib.db8objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt9 = '+'::bpchar) THEN bilkaib.db9objekt ELSE NULL::bpchar END)::character(10), bilkaib.cr, (CASE WHEN (crkonto.objekt1 = '+'::bpchar) THEN bilkaib.crobjekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (crkonto.objekt2 = '+'::bpchar) THEN bilkaib.cr2objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (crkonto.objekt3 = '+'::bpchar) THEN bilkaib.cr3objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (crkonto.objekt4 = '+'::bpchar) THEN bilkaib.cr4objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (crkonto.objekt5 = '+'::bpchar) THEN bilkaib.cr5objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (crkonto.objekt6 = '+'::bpchar) THEN bilkaib.cr6objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (crkonto.objekt7 = '+'::bpchar) THEN bilkaib.cr7objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (crkonto.objekt8 = '+'::bpchar) THEN bilkaib.cr8objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (crkonto.objekt9 = '+'::bpchar) THEN bilkaib.cr9objekt ELSE NULL::bpchar END)::character(10), bilkaib.raha, CASE WHEN ((crkonto.klienkaupa)::boolean OR (dbkonto.klienkaupa)::boolean OR (crkonto.tyyp = 'K'::bpchar) OR (crkonto.tyyp = 'I'::bpchar) OR (dbkonto.tyyp = 'K'::bpchar) OR (dbkonto.tyyp = 'I'::bpchar)) THEN bilkaib.klient ELSE NULL::bpchar END, bilkaib.exchrate, CASE WHEN ((crkonto.klienkaupa)::boolean OR (dbkonto.klienkaupa)::boolean OR (crkonto.tyyp = 'K'::bpchar) OR (crkonto.tyyp = 'I'::bpchar) OR (dbkonto.tyyp = 'K'::bpchar) OR (dbkonto.tyyp = 'I'::bpchar)) THEN klient.nimi ELSE NULL::bpchar END, (CASE WHEN ((crkonto.arvekaupa)::boolean OR (dbkonto.arvekaupa)::boolean OR ((bilkaib.cr <> '00'::bpchar) AND (crkonto.tyyp = 'K'::bpchar)) OR ((bilkaib.db <> '00'::bpchar) AND (dbkonto.tyyp = 'K'::bpchar))) THEN bilkaib.doknr ELSE NULL::bpchar END)::character(25), bilkaib.ratediffer, CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN '2007-01-01'::date ELSE bilkaib.kuupaev END" " -> Hash Left Join (cost=936.48..40184.64 rows=41923 width=838) (actual time=47.409..2427.059 rows=52156 loops=1)" " Hash Cond: ("outer".klient = "inner".kood)" " -> Hash Join (cost=785.35..34086.74 rows=41923 width=764) (actual time=35.669..1414.794 rows=52156 loops=1)" " Hash Cond: ("outer".cr = "inner".kontonr)" " -> Hash Join (cost=764.26..33403.76 rows=48533 width=712) (actual time=32.839..954.784 rows=52156 loops=1)" " Hash Cond: ("outer".db = "inner".kontonr)" " -> Bitmap Heap Scan on bilkaib (cost=743.17..32616.41 rows=56185 width=660) (actual time=30.337..448.153 rows=52156 loops=1)" " Recheck Cond: ((cr ~~ '112%'::text) OR (db ~~ '112%'::text))" " Filter: (((cr ~~ '112%'::text) OR (db ~~ '112%'::text)) AND (kuupaev >= '2007-01-01'::date) AND (kuupaev <= '2008-11-26'::date))" " -> BitmapOr (cost=743.17..743.17 rows=65862 width=0) (actual time=27.194..27.194 rows=0 loops=1)" " -> Bitmap Index Scan on bilkaib_cr_pattern_idx (cost=0.00..236.63 rows=20939 width=0) (actual time=8.833..8.833 rows=21028 loops=1)" " Index Cond: ((cr ~>=~ '112'::bpchar) AND (cr ~<~ '113'::bpchar))" " -> Bitmap Index Scan on bilkaib_db_pattern_idx (cost=0.00..506.54 rows=44923 width=0) (actual time=18.345..18.345 rows=45426 loops=1)" " Index Cond: ((db ~>=~ '112'::bpchar) AND (db ~<~ '113'::bpchar))" " -> Hash (cost=20.49..20.49 rows=241 width=66) (actual time=2.450..2.450 rows=241 loops=1)" " -> Seq Scan on konto dbkonto (cost=0.00..20.49 rows=241 width=66) (actual time=0.014..1.232 rows=241 loops=1)" " Filter: (iseloom = 'A'::bpchar)" " -> Hash (cost=20.49..20.49 rows=241 width=66) (actual time=2.799..2.799 rows=241 loops=1)" " -> Seq Scan on konto crkonto (cost=0.00..20.49 rows=241 width=66) (actual time=0.029..1.536 rows=241 loops=1)" " Filter: (iseloom = 'A'::bpchar)" " -> Hash (cost=147.90..147.90 rows=1290 width=90) (actual time=11.661..11.661 rows=1290 loops=1)" " -> Seq Scan on klient (cost=0.00..147.90 rows=1290 width=90) (actual time=0.014..5.808 rows=1290 loops=1)" "Total runtime: 8634.630 ms"
pgsql-performance by date: