Re: PostgreSQL performance problem -> tuning - Mailing list pgsql-performance
From | Yaroslav Mazurak |
---|---|
Subject | Re: PostgreSQL performance problem -> tuning |
Date | |
Msg-id | 3F30F7CF.6090900@lviv.bank.gov.ua Whole thread Raw |
In response to | Re: PostgreSQL performance problem -> tuning (Richard Huxton <dev@archonet.com>) |
Responses |
Re: PostgreSQL performance problem -> tuning
Re: PostgreSQL performance problem -> tuning Re: PostgreSQL performance problem -> tuning |
List | pgsql-performance |
Hi All! First, thanks for answers. Richard Huxton wrote: > On Wednesday 06 August 2003 08:34, Yaroslav Mazurak wrote: > Version 7.3.4 is just out - probably worth upgrading as soon as it's > convenient. Has version 7.3.4 significant performance upgrade relative to 7.3.2? I've downloaded version 7.3.4, but not installed yet. >>sort_mem = 131072 > This sort_mem value is *very* large - that's 131MB for *each sort* that gets > done. I'd suggest trying something in the range 1,000-10,000. What's probably > happening with the error above is that PG is allocating ridiculous amounts of > memory, the machines going into swap and everything eventually grinds to a > halt. What mean "each sort"? Each query with SORT clause or some internal (invisible to user) sorts too (I can't imagine: indexed search or whatever else)? I'm reduced sort_mem to 16M. >>fsync = false > I'd turn fsync back on - unless you don't mind losing your data after a crash. This is temporary performance solution - I want get SELECT query result first, but current performance is too low. >>enable_seqscan = false > Don't tinker with these in a live system, they're only really for > testing/debugging. This is another strange behavior of PostgreSQL - he don't use some created indexes (seq_scan only) after ANALYZE too. OK, I'm turned on this option back. >>effective_cache_size = 65536 > So you typically get about 256MB cache usage in top/free? No, top shows 12-20Mb. I'm reduced effective_cache_size to 4K blocks (16M?). >> SELECT statement is: >> >>SELECT showcalc('B00204', dd, r020, t071) AS s04 >>FROM v_file02wide >>WHERE a011 = 3 >> AND inrepdate(data) >> AND SUBSTR(ncks, 2, 2) IN ('NL', 'NM') >> AND r030 = 980; > Hmm - mostly views and function calls, OK - I'll read on. My data are distributed accross multiple tables to integrity and avoid redundancy. During SELECT query these data rejoined to be presented in "human-readable" form. :) "SUBSTR" returns about 25 records, I'm too lazy for write 25 numbers. :) I'm also worried for errors. >>(cost=174200202474.99..174200202474.99 rows=1 width=143) -> Hash Join > ^^^^^^^ > This is a BIG cost estimate and you've got lots more like them. I'm guessing > it's because of the sort_mem / enable_seqscan settings you have. The numbers > don't make sense to me - it sounds like you've pushed the cost estimator into > a very strange corner. I think that cost estimator "pushed into very strange corner" by himself. >> Function showcalc definition is: >>CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4), >>NUMERIC(16)) RETURNS NUMERIC(16) >>LANGUAGE SQL AS ' >>-- Parameters: code, dd, r020, t071 >> SELECT COALESCE( >> (SELECT sc.koef * $4 >> FROM showing AS s NATURAL JOIN showcomp AS sc >> WHERE s.kod LIKE $1 >> AND NOT SUBSTR(acc_mask, 1, 1) LIKE ''['' >> AND SUBSTR(acc_mask, 1, 4) LIKE $3 >> AND SUBSTR(acc_mask, 5, 1) LIKE SUBSTR($2, 1, 1)), > Obviously, you could use = for these 3 rather than LIKE ^^^ > Same below too. >> (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2, >>LENGTH(acc_mask) - 2), $2, $3, $4), 0)) >> FROM showing AS s NATURAL JOIN showcomp AS sc >> WHERE s.kod LIKE $1 >> AND SUBSTR(acc_mask, 1, 1) LIKE ''[''), >> 0) AS showing; >>'; OK, all unnecessary "LIKEs" replaced by "=", JOIN removed too: CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4), NUMERIC(16)) RETURNS NUMERIC(16) LANGUAGE SQL AS ' -- Parameters: code, dd, r020, t071 SELECT COALESCE( (SELECT sc.koef * $4 FROM showing AS s, showcomp AS sc WHERE sc.kod = s.kod AND s.kod LIKE $1 AND NOT SUBSTR(acc_mask, 1, 1) = ''['' AND SUBSTR(acc_mask, 1, 4) = $3 AND SUBSTR(acc_mask, 5, 1) = SUBSTR($2, 1, 1)), (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2, LENGTH(acc_mask) - 2), $2, $3, $4), 0)) FROM showing AS s, showcomp AS sc WHERE sc.kod = s.kod AND s.kod = $1 AND SUBSTR(acc_mask, 1, 1) = ''[''), 0) AS showing; '; >> View v_file02wide is: >>CREATE VIEW v_file02wide AS >>SELECT id_a011 AS a011, data, obl.ko, obl.nazva AS oblast, b030, >>banx.box AS ncks, banx.nazva AS bank, >> epr.dd, r020, r030, a3, r030.nazva AS valuta, k041, >> -- Sum equivalent in national currency >> t071 * get_kurs(id_r030, data) AS t070, >> t071 >>FROM v_file02 AS vf02 >> JOIN kod_obl AS obl USING(id_obl) >> JOIN (dov_bank NATURAL JOIN dov_tvbv) AS banx >> ON banx.id_v = vf02.id_v >> AND data BETWEEN COALESCE(banx.dataa, data) >> AND COALESCE(banx.datab, data) >> JOIN ek_pok_r AS epr USING(id_dd) >> JOIN kl_r020 USING(id_r020) >> JOIN kl_r030 AS r030 USING(id_r030) >> JOIN kl_k041 USING(id_k041); > You might want to rewrite the view so it doesn't use explicit JOIN statements, > i.e FROM a,b WHERE a.id=b.ref rather than FROM a JOIN b ON id=ref > At the moment, this will force PG into making the joins in the order you write > them (I think this is changed in v7.4) I think this is a important remark. Can "JOIN" significantly reduce performance of SELECT statement relative to ", WHERE"? OK, I'm changed VIEW to this text: CREATE VIEW v_file02 AS SELECT filenum, data, id_a011, id_v, id_obl, id_dd, id_r020, id_r030, id_k041, t071 FROM filexxr, file02 WHERE file02.id_r = filexxr.id_r; CREATE VIEW v_file02wide AS SELECT id_a011 AS a011, data, obl.ko, obl.nazva AS oblast, b030, banx.box AS ncks, banx.nazva AS bank, epr.dd, r020, vf02.id_r030 AS r030, a3, kl_r030.nazva AS valuta, k041, -- Sum equivalent in national currency t071 * get_kurs(vf02.id_r030, data) AS t070, t071 FROM v_file02 AS vf02, kod_obl AS obl, v_banx AS banx, ek_pok_r AS epr, kl_r020, kl_r030, kl_k041 WHERE obl.id_obl = vf02.id_obl AND banx.id_v = vf02.id_v AND data BETWEEN COALESCE(banx.dataa, data) AND COALESCE(banx.datab, data) AND epr.id_dd = vf02.id_dd AND kl_r020.id_r020 = vf02.id_r020 AND kl_r030.id_r030 = vf02.id_r030 AND kl_k041.id_k041 = vf02.id_k041; Now (with configuration and view definition changed) "SELECT COUNT(*) FROM v_file02wide;" executes about 6 minutes and 45 seconds instead of 30 seconds (previous). Another annoying "feature" is impossibility writing "SELECT * FROM..." - duplicate column names error. In NATURAL JOIN joined columns hiding automatically. :-| >> Function inrepdate is: >>CREATE OR REPLACE FUNCTION inrepdate(DATE) RETURNS BOOL >>LANGUAGE SQL AS ' >> -- Returns true if given date is in repdate >> SELECT (SELECT COUNT(*) FROM repdate >> WHERE $1 BETWEEN COALESCE(data1, CURRENT_DATE) >> AND COALESCE(data2, CURRENT_DATE)) >> >> > 0; > You can probably replace this with: > SELECT true FROM repdate WHERE $1 ... > You'll need to look at where it's used though. Hmm... table repdate contain date intervals. For example: data1 data2 2003-01-01 2003-01-10 2003-05-07 2003-05-24 ... I need single value (true or false) about given date as parameter - report includes given date or not. COUNT used as aggregate function for this. Can you write this function more simpler? BTW, I prefer SQL language if possible, then PL/pgSQL. This may be mistake? >> Table has indexes almost for all selected fields. > That's not going to help you for the SUBSTR(...) stuff, although you could use > functional indexes (see manuals/list archives for details). Yes, I'm using functional indexes, but not in this case... now in this case too! :) > First thing is to get those two configuration settings somewhere sane, then we > can tune properly. You might like the document at: > http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html Thanks, it's interesting. Current query plan: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=188411.98..188411.98 rows=1 width=151) -> Hash Join (cost=186572.19..188398.39 rows=5437 width=151) Hash Cond: ("outer".id_obl = "inner".id_obl) -> Hash Join (cost=186570.65..188301.70 rows=5437 width=147) Hash Cond: ("outer".id_dd = "inner".id_dd) -> Hash Join (cost=186569.45..188205.34 rows=5437 width=137) Hash Cond: ("outer".id_k041 = "inner".id_k041) -> Hash Join (cost=186568.40..188109.14 rows=5437 width=133) Hash Cond: ("outer".id_r020 = "inner".id_r020) -> Hash Join (cost=186499.15..187944.74 rows=5437 width=122) Hash Cond: ("outer".id_r030 = "inner".id_r030) -> Merge Join (cost=186493.55..187843.99 rows=5437 width=118) Merge Cond: ("outer".id_v = "inner".id_v) Join Filter: (("outer".data >= CASE WHEN ("inner".dataa IS NOT NULL) THEN "inner".dataa WHEN ("outer".data IS NOT NULL) THEN "outer".data ELSE NULL::date END) AND ("outer".data <= CASE WHEN ("inner".datab IS NOT NULL) THEN "inner".datab WHEN ("outer".data IS NOT NULL) THEN "outer".data ELSE NULL::date END)) -> Sort (cost=29324.30..29568.97 rows=97870 width=61) Sort Key: filexxr.id_v -> Hash Join (cost=632.67..21211.53 rows=97870 width=61) Hash Cond: ("outer".id_r = "inner".id_r) -> Seq Scan on file02 (cost=0.00..16888.16 rows=493464 width=32) Filter: (id_r030 = 980) -> Hash (cost=615.41..615.41 rows=6903 width=29) -> Index Scan using index_filexxr_a011 on filexxr (cost=0.00..615.41 rows=6903 width=29) Index Cond: (id_a011 = 3) Filter: inrepdate(data) -> Sort (cost=157169.25..157172.17 rows=1167 width=57) Sort Key: v.id_v -> Hash Join (cost=1.18..157109.80 rows=1167 width=57) Hash Cond: ("outer".id_oz = "inner".id_oz) -> Merge Join (cost=0.00..157088.20 rows=1167 width=53) Merge Cond: ("outer".id_bnk = "inner".id_bnk) -> Index Scan using dov_bank_pkey on dov_bank b (cost=0.00..261328.45 rows=1450 width=17) Filter: (subplan) SubPlan -> Materialize (cost=90.02..90.02 rows=29 width=11) -> Seq Scan on dov_bank (cost=0.00..90.02 rows=29 width=11) Filter: ((dov_bank_box_22(box) = 'NL'::character varying) OR (dov_bank_box_22(box) = 'NM'::character varying)) -> Index Scan using index_dov_tvbv_bnk on dov_tvbv v (cost=0.00..142.42 rows=2334 width=36) -> Hash (cost=1.14..1.14 rows=14 width=4) -> Seq Scan on ozkb o (cost=0.00..1.14 rows=14 width=4) -> Hash (cost=5.08..5.08 rows=208 width=4) -> Seq Scan on kl_r030 (cost=0.00..5.08 rows=208 width=4) -> Hash (cost=64.00..64.00 rows=2100 width=11) -> Seq Scan on kl_r020 (cost=0.00..64.00 rows=2100 width=11) -> Hash (cost=1.04..1.04 rows=4 width=4) -> Seq Scan on kl_k041 (cost=0.00..1.04 rows=4 width=4) -> Hash (cost=1.16..1.16 rows=16 width=10) -> Seq Scan on ek_pok_r epr (cost=0.00..1.16 rows=16 width=10) -> Hash (cost=1.43..1.43 rows=43 width=4) -> Seq Scan on kod_obl obl (cost=0.00..1.43 rows=43 width=4) (49 rows) Now (2K shared_buffers blocks, 16K effective_cache_size blocks, 16Mb sort_mem) PostgreSQL uses much less memory, about 64M... it's not good, I want using all available RAM if possible - PostgreSQL is the main task on this PC. May set effective_cache_size to 192M (48K blocks) be better? I don't understand exactly: effective_cache_size tells PostgreSQL about OS cache size or about available free RAM? With best regards Yaroslav Mazurak.
Attachment
pgsql-performance by date: