Re: PostgreSQL performance problem -> tuning - Mailing list pgsql-performance

From Yaroslav Mazurak
Subject Re: PostgreSQL performance problem -> tuning
Date
Msg-id 3F30F7CF.6090900@lviv.bank.gov.ua
Whole thread Raw
In response to Re: PostgreSQL performance problem -> tuning  (Richard Huxton <dev@archonet.com>)
Responses Re: PostgreSQL performance problem -> tuning
Re: PostgreSQL performance problem -> tuning
Re: PostgreSQL performance problem -> tuning
List pgsql-performance
Hi All!

    First, thanks for answers.

Richard Huxton wrote:

> On Wednesday 06 August 2003 08:34, Yaroslav Mazurak wrote:

> Version 7.3.4 is just out - probably worth upgrading as soon as it's
> convenient.

    Has version 7.3.4 significant performance upgrade relative to 7.3.2?
I've downloaded version 7.3.4, but not installed yet.

>>sort_mem = 131072

> This sort_mem value is *very* large - that's 131MB for *each sort* that gets
> done. I'd suggest trying something in the range 1,000-10,000. What's probably
> happening with the error above is that PG is allocating ridiculous amounts of
> memory, the machines going into swap and everything eventually grinds to a
> halt.

    What mean "each sort"? Each query with SORT clause or some internal
(invisible to user) sorts too (I can't imagine: indexed search or
whatever else)?
    I'm reduced sort_mem to 16M.

>>fsync = false

> I'd turn fsync back on - unless you don't mind losing your data after a crash.

    This is temporary performance solution - I want get SELECT query result
first, but current performance is too low.

>>enable_seqscan = false

> Don't tinker with these in a live system, they're only really for
> testing/debugging.

    This is another strange behavior of PostgreSQL - he don't use some
created indexes (seq_scan only) after ANALYZE too. OK, I'm turned on
this option back.

>>effective_cache_size = 65536

> So you typically get about 256MB cache usage in top/free?

    No, top shows 12-20Mb.
    I'm reduced effective_cache_size to 4K blocks (16M?).

>>    SELECT statement is:
>>
>>SELECT    showcalc('B00204', dd, r020, t071) AS s04
>>FROM    v_file02wide
>>WHERE    a011 = 3
>>    AND inrepdate(data)
>>    AND SUBSTR(ncks, 2, 2) IN ('NL', 'NM')
>>    AND r030 = 980;

> Hmm - mostly views and function calls, OK - I'll read on.

    My data are distributed accross multiple tables to integrity and avoid
redundancy. During SELECT query these data rejoined to be presented in
"human-readable" form. :)
    "SUBSTR" returns about 25 records, I'm too lazy for write 25 numbers.
:) I'm also worried for errors.

>>(cost=174200202474.99..174200202474.99 rows=1 width=143) ->  Hash Join

>          ^^^^^^^
> This is a BIG cost estimate and you've got lots more like them. I'm guessing
> it's because of the sort_mem / enable_seqscan settings you have. The numbers
> don't make sense to me - it sounds like you've pushed the cost estimator into
> a very strange corner.

    I think that cost estimator "pushed into very strange corner" by himself.

>>    Function showcalc definition is:

>>CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4),
>>NUMERIC(16)) RETURNS NUMERIC(16)
>>LANGUAGE SQL AS '
>>-- Parameters: code, dd, r020, t071
>>    SELECT COALESCE(
>>        (SELECT sc.koef * $4
>>            FROM showing AS s NATURAL JOIN showcomp AS sc
>>            WHERE s.kod LIKE $1
>>                AND NOT SUBSTR(acc_mask, 1, 1) LIKE ''[''
>>                AND SUBSTR(acc_mask, 1, 4) LIKE $3
>>                AND SUBSTR(acc_mask, 5, 1) LIKE SUBSTR($2, 1, 1)),

> Obviously, you could use = for these 3 rather than LIKE    ^^^
> Same below too.

>>        (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2,
>>LENGTH(acc_mask) - 2), $2, $3, $4), 0))
>>            FROM showing AS s NATURAL JOIN showcomp AS sc
>>            WHERE s.kod LIKE $1
>>                AND SUBSTR(acc_mask, 1, 1) LIKE ''[''),
>>        0) AS showing;
>>';

    OK, all unnecessary "LIKEs" replaced by "=", JOIN removed too:
CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4),
NUMERIC(16)) RETURNS NUMERIC(16)
LANGUAGE SQL AS '
-- Parameters: code, dd, r020, t071
    SELECT COALESCE(
        (SELECT sc.koef * $4
            FROM showing AS s, showcomp AS sc
            WHERE sc.kod = s.kod
                AND s.kod LIKE $1
                AND NOT SUBSTR(acc_mask, 1, 1) = ''[''
                AND SUBSTR(acc_mask, 1, 4) = $3
                AND SUBSTR(acc_mask, 5, 1) = SUBSTR($2, 1, 1)),
        (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2,
LENGTH(acc_mask) - 2), $2, $3, $4), 0))
            FROM showing AS s, showcomp AS sc
            WHERE sc.kod = s.kod
                AND s.kod = $1
                AND SUBSTR(acc_mask, 1, 1) = ''[''),
        0) AS showing;
';

>>    View v_file02wide is:

>>CREATE VIEW v_file02wide AS
>>SELECT id_a011 AS a011, data, obl.ko, obl.nazva AS oblast, b030,
>>banx.box AS ncks, banx.nazva AS bank,
>>    epr.dd, r020, r030, a3, r030.nazva AS valuta, k041,
>>    -- Sum equivalent in national currency
>>    t071 * get_kurs(id_r030, data) AS t070,
>>    t071
>>FROM v_file02 AS vf02
>>    JOIN kod_obl AS obl USING(id_obl)
>>    JOIN (dov_bank NATURAL JOIN dov_tvbv) AS banx
>>        ON banx.id_v = vf02.id_v
>>            AND data BETWEEN COALESCE(banx.dataa, data)
>>                AND COALESCE(banx.datab, data)
>>    JOIN ek_pok_r AS epr USING(id_dd)
>>    JOIN kl_r020 USING(id_r020)
>>    JOIN kl_r030 AS r030 USING(id_r030)
>>    JOIN kl_k041 USING(id_k041);

> You might want to rewrite the view so it doesn't use explicit JOIN statements,
> i.e FROM a,b WHERE a.id=b.ref rather than FROM a JOIN b ON id=ref
> At the moment, this will force PG into making the joins in the order you write
> them (I think this is changed in v7.4)

    I think this is a important remark. Can "JOIN" significantly reduce
performance of SELECT statement relative to ", WHERE"?
    OK, I'm changed VIEW to this text:

CREATE VIEW v_file02 AS
SELECT filenum, data, id_a011, id_v, id_obl, id_dd, id_r020, id_r030,
id_k041, t071
FROM filexxr, file02
WHERE file02.id_r = filexxr.id_r;

CREATE VIEW v_file02wide AS
SELECT id_a011 AS a011, data, obl.ko, obl.nazva AS oblast, b030,
banx.box AS ncks, banx.nazva AS bank,
    epr.dd, r020, vf02.id_r030 AS r030, a3, kl_r030.nazva AS valuta, k041,
    -- Sum equivalent in national currency
    t071 * get_kurs(vf02.id_r030, data) AS t070, t071
FROM v_file02 AS vf02, kod_obl AS obl, v_banx AS banx,
    ek_pok_r AS epr, kl_r020, kl_r030, kl_k041
WHERE obl.id_obl = vf02.id_obl
    AND banx.id_v = vf02.id_v
    AND data BETWEEN COALESCE(banx.dataa, data)
        AND COALESCE(banx.datab, data)
    AND epr.id_dd = vf02.id_dd
    AND kl_r020.id_r020 = vf02.id_r020
    AND kl_r030.id_r030 = vf02.id_r030
    AND kl_k041.id_k041 = vf02.id_k041;

    Now (with configuration and view definition changed) "SELECT COUNT(*)
FROM v_file02wide;" executes about 6 minutes and 45 seconds instead of
30 seconds (previous).
    Another annoying "feature" is impossibility writing "SELECT * FROM..."
- duplicate column names error. In NATURAL JOIN joined columns hiding
automatically. :-|

>>    Function inrepdate is:

>>CREATE OR REPLACE FUNCTION inrepdate(DATE) RETURNS BOOL
>>LANGUAGE SQL AS '
>>    -- Returns true if given date is in repdate
>>    SELECT (SELECT COUNT(*) FROM repdate
>>        WHERE $1 BETWEEN COALESCE(data1, CURRENT_DATE)
>>            AND COALESCE(data2, CURRENT_DATE))
>>
>>        > 0;

> You can probably replace this with:
>   SELECT true FROM repdate WHERE $1 ...
> You'll need to look at where it's used though.

    Hmm... table repdate contain date intervals. For example:
    data1        data2
    2003-01-01    2003-01-10
    2003-05-07    2003-05-24
    ...
    I need single value (true or false) about given date as parameter -
report includes given date or not. COUNT used as aggregate function for
this. Can you write this function more simpler?
    BTW, I prefer SQL language if possible, then PL/pgSQL. This may be mistake?

>>    Table has indexes almost for all selected fields.

> That's not going to help you for the SUBSTR(...) stuff, although you could use
> functional indexes (see manuals/list archives for details).

    Yes, I'm using functional indexes, but not in this case... now in this
case too! :)

> First thing is to get those two configuration settings somewhere sane, then we
> can tune properly. You might like the document at:

> http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

    Thanks, it's interesting.

    Current query plan:


                            QUERY PLAN



------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=188411.98..188411.98 rows=1 width=151)
    ->  Hash Join  (cost=186572.19..188398.39 rows=5437 width=151)
          Hash Cond: ("outer".id_obl = "inner".id_obl)
          ->  Hash Join  (cost=186570.65..188301.70 rows=5437 width=147)
                Hash Cond: ("outer".id_dd = "inner".id_dd)
                ->  Hash Join  (cost=186569.45..188205.34 rows=5437
width=137)
                      Hash Cond: ("outer".id_k041 = "inner".id_k041)
                      ->  Hash Join  (cost=186568.40..188109.14
rows=5437 width=133)
                            Hash Cond: ("outer".id_r020 = "inner".id_r020)
                            ->  Hash Join  (cost=186499.15..187944.74
rows=5437 width=122)
                                  Hash Cond: ("outer".id_r030 =
"inner".id_r030)
                                  ->  Merge Join
(cost=186493.55..187843.99 rows=5437 width=118)
                                        Merge Cond: ("outer".id_v =
"inner".id_v)
                                        Join Filter: (("outer".data >=
CASE WHEN ("inner".dataa IS NOT NULL) THEN "inner".dataa WHEN
("outer".data IS NOT NULL) THEN "outer".data ELSE NULL::date END) AND
("outer".data <= CASE WHEN ("inner".datab IS NOT NULL) THEN
"inner".datab WHEN ("outer".data IS NOT NULL) THEN "outer".data ELSE
NULL::date END))
                                        ->  Sort
(cost=29324.30..29568.97 rows=97870 width=61)
                                              Sort Key: filexxr.id_v
                                              ->  Hash Join
(cost=632.67..21211.53 rows=97870 width=61)
                                                    Hash Cond:
("outer".id_r = "inner".id_r)
                                                    ->  Seq Scan on
file02  (cost=0.00..16888.16 rows=493464 width=32)
                                                          Filter:
(id_r030 = 980)
                                                    ->  Hash
(cost=615.41..615.41 rows=6903 width=29)
                                                          ->  Index Scan
using index_filexxr_a011 on filexxr  (cost=0.00..615.41 rows=6903 width=29)
                                                                Index
Cond: (id_a011 = 3)
                                                                Filter:
inrepdate(data)
                                        ->  Sort
(cost=157169.25..157172.17 rows=1167 width=57)
                                              Sort Key: v.id_v
                                              ->  Hash Join
(cost=1.18..157109.80 rows=1167 width=57)
                                                    Hash Cond:
("outer".id_oz = "inner".id_oz)
                                                    ->  Merge Join
(cost=0.00..157088.20 rows=1167 width=53)
                                                          Merge Cond:
("outer".id_bnk = "inner".id_bnk)
                                                          ->  Index Scan
using dov_bank_pkey on dov_bank b  (cost=0.00..261328.45 rows=1450 width=17)
                                                                Filter:
(subplan)
                                                                SubPlan
                                                                  ->
Materialize  (cost=90.02..90.02 rows=29 width=11)

->  Seq Scan on dov_bank  (cost=0.00..90.02 rows=29 width=11)

       Filter: ((dov_bank_box_22(box) = 'NL'::character varying) OR
(dov_bank_box_22(box) = 'NM'::character varying))
                                                          ->  Index Scan
using index_dov_tvbv_bnk on dov_tvbv v  (cost=0.00..142.42 rows=2334
width=36)
                                                    ->  Hash
(cost=1.14..1.14 rows=14 width=4)
                                                          ->  Seq Scan
on ozkb o  (cost=0.00..1.14 rows=14 width=4)
                                  ->  Hash  (cost=5.08..5.08 rows=208
width=4)
                                        ->  Seq Scan on kl_r030
(cost=0.00..5.08 rows=208 width=4)
                            ->  Hash  (cost=64.00..64.00 rows=2100 width=11)
                                  ->  Seq Scan on kl_r020
(cost=0.00..64.00 rows=2100 width=11)
                      ->  Hash  (cost=1.04..1.04 rows=4 width=4)
                            ->  Seq Scan on kl_k041  (cost=0.00..1.04
rows=4 width=4)
                ->  Hash  (cost=1.16..1.16 rows=16 width=10)
                      ->  Seq Scan on ek_pok_r epr  (cost=0.00..1.16
rows=16 width=10)
          ->  Hash  (cost=1.43..1.43 rows=43 width=4)
                ->  Seq Scan on kod_obl obl  (cost=0.00..1.43 rows=43
width=4)
(49 rows)

    Now (2K shared_buffers blocks, 16K effective_cache_size blocks, 16Mb
sort_mem) PostgreSQL uses much less memory, about 64M... it's not good,
I want using all available RAM if possible - PostgreSQL is the main task
on this PC.
    May set effective_cache_size to 192M (48K blocks) be better? I don't
understand exactly: effective_cache_size tells PostgreSQL about OS cache
size or about available free RAM?


With best regards
    Yaroslav Mazurak.



Attachment

pgsql-performance by date:

Previous
From: "Mendola Gaetano"
Date:
Subject: Re: PostgreSQL performance problem -> tuning
Next
From: "Shridhar Daithankar"
Date:
Subject: Re: PostgreSQL performance problem -> tuning