Re: PostgreSQL performance problem -> tuning - Mailing list pgsql-performance

From Richard Huxton
Subject Re: PostgreSQL performance problem -> tuning
Date
Msg-id 49274.192.168.1.32.1060184304.squirrel@mainbox.archonet.com
Whole thread Raw
In response to Re: PostgreSQL performance problem -> tuning  (Yaroslav Mazurak <yamazurak@Lviv.Bank.Gov.UA>)
Responses Re: PostgreSQL performance problem -> tuning  (Yaroslav Mazurak <yamazurak@Lviv.Bank.Gov.UA>)
List pgsql-performance
>> On Wednesday 06 August 2003 08:34, Yaroslav Mazurak wrote:
>
>> Version 7.3.4 is just out - probably worth upgrading as soon as it's
>> convenient.
>
>     Has version 7.3.4 significant performance upgrade relative to 7.3.2?
> I've downloaded version 7.3.4, but not installed yet.

No, but there are some bug fixes.

>>>sort_mem = 131072
>
>> This sort_mem value is *very* large - that's 131MB for *each sort* that

>     What mean "each sort"? Each query with SORT clause or some internal
> (invisible to user) sorts too (I can't imagine: indexed search or
> whatever else)?
>     I'm reduced sort_mem to 16M.

It means each sort - if you look at your query plan and see three "sort"
clauses that means that query might allocate 48MB to sorting. Now, that's
good because sorting items on disk is much slower. It's bad because that's
48MB less for everything else that's happening.

>>>fsync = false
>
>> I'd turn fsync back on - unless you don't mind losing your data after a
>> crash.
>
>     This is temporary performance solution - I want get SELECT query result
> first, but current performance is too low.
>
>>>enable_seqscan = false
>
>> Don't tinker with these in a live system, they're only really for
>> testing/debugging.
>
>     This is another strange behavior of PostgreSQL - he don't use some
> created indexes (seq_scan only) after ANALYZE too. OK, I'm turned on
> this option back.

Fair enough, we can work on those. With 7.3.x you can tell PG to examine
some tables more thouroughly to get better plans.

>>>effective_cache_size = 65536
>
>> So you typically get about 256MB cache usage in top/free?
>
>     No, top shows 12-20Mb.
>     I'm reduced effective_cache_size to 4K blocks (16M?).

Cache size is in blocks of 8KB (usually) - it's a way of telling PG what
the chances are of disk blocks being already cached by Linux.

>>>    SELECT statement is:
>>>
>>>SELECT    showcalc('B00204', dd, r020, t071) AS s04
>>>FROM    v_file02wide
>>>WHERE    a011 = 3
>>>    AND inrepdate(data)
>>>    AND SUBSTR(ncks, 2, 2) IN ('NL', 'NM')
>>>    AND r030 = 980;
>
>> Hmm - mostly views and function calls, OK - I'll read on.
>
>     My data are distributed accross multiple tables to integrity and avoid
> redundancy. During SELECT query these data rejoined to be presented in
> "human-readable" form. :)
>     "SUBSTR" returns about 25 records, I'm too lazy for write 25 numbers.
> :) I'm also worried for errors.

Sounds like good policy.

>
>>>(cost=174200202474.99..174200202474.99 rows=1 width=143) ->  Hash Join
>
>>          ^^^^^^^
>> This is a BIG cost estimate and you've got lots more like them. I'm
>> guessing
>> it's because of the sort_mem / enable_seqscan settings you have. The
>> numbers
>> don't make sense to me - it sounds like you've pushed the cost estimator
>> into
>> a very strange corner.
>
>     I think that cost estimator "pushed into very strange corner" by himself.
>
>>>    Function showcalc definition is:
>
>>>CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4),
>>>NUMERIC(16)) RETURNS NUMERIC(16)
>>>LANGUAGE SQL AS '
>>>-- Parameters: code, dd, r020, t071
>>>    SELECT COALESCE(
>>>        (SELECT sc.koef * $4
>>>            FROM showing AS s NATURAL JOIN showcomp AS sc
>>>            WHERE s.kod LIKE $1
>>>                AND NOT SUBSTR(acc_mask, 1, 1) LIKE ''[''
>>>                AND SUBSTR(acc_mask, 1, 4) LIKE $3
>>>                AND SUBSTR(acc_mask, 5, 1) LIKE SUBSTR($2, 1, 1)),
>
>> Obviously, you could use = for these 3 rather than LIKE    ^^^
>> Same below too.
>
>>>        (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2,
>>>LENGTH(acc_mask) - 2), $2, $3, $4), 0))
>>>            FROM showing AS s NATURAL JOIN showcomp AS sc
>>>            WHERE s.kod LIKE $1
>>>                AND SUBSTR(acc_mask, 1, 1) LIKE ''[''),
>>>        0) AS showing;
>>>';
>
>     OK, all unnecessary "LIKEs" replaced by "=", JOIN removed too:
> CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4),
> NUMERIC(16)) RETURNS NUMERIC(16)
> LANGUAGE SQL AS '
> -- Parameters: code, dd, r020, t071
>     SELECT COALESCE(
>         (SELECT sc.koef * $4
>             FROM showing AS s, showcomp AS sc
>             WHERE sc.kod = s.kod
>                 AND s.kod LIKE $1
>                 AND NOT SUBSTR(acc_mask, 1, 1) = ''[''
>                 AND SUBSTR(acc_mask, 1, 4) = $3
>                 AND SUBSTR(acc_mask, 5, 1) = SUBSTR($2, 1, 1)),
>         (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2,
> LENGTH(acc_mask) - 2), $2, $3, $4), 0))
>             FROM showing AS s, showcomp AS sc
>             WHERE sc.kod = s.kod
>                 AND s.kod = $1
>                 AND SUBSTR(acc_mask, 1, 1) = ''[''),
>         0) AS showing;
> ';
>
>>>    View v_file02wide is:
>
>>>CREATE VIEW v_file02wide AS
>>>SELECT id_a011 AS a011, data, obl.ko, obl.nazva AS oblast, b030,
>>>banx.box AS ncks, banx.nazva AS bank,
>>>    epr.dd, r020, r030, a3, r030.nazva AS valuta, k041,
>>>    -- Sum equivalent in national currency
>>>    t071 * get_kurs(id_r030, data) AS t070,
>>>    t071
>>>FROM v_file02 AS vf02
>>>    JOIN kod_obl AS obl USING(id_obl)
>>>    JOIN (dov_bank NATURAL JOIN dov_tvbv) AS banx
>>>        ON banx.id_v = vf02.id_v
>>>            AND data BETWEEN COALESCE(banx.dataa, data)
>>>                AND COALESCE(banx.datab, data)
>>>    JOIN ek_pok_r AS epr USING(id_dd)
>>>    JOIN kl_r020 USING(id_r020)
>>>    JOIN kl_r030 AS r030 USING(id_r030)
>>>    JOIN kl_k041 USING(id_k041);
>
>> You might want to rewrite the view so it doesn't use explicit JOIN
>> statements,
>> i.e FROM a,b WHERE a.id=b.ref rather than FROM a JOIN b ON id=ref
>> At the moment, this will force PG into making the joins in the order you
>> write
>> them (I think this is changed in v7.4)
>
>     I think this is a important remark. Can "JOIN" significantly reduce
> performance of SELECT statement relative to ", WHERE"?
>     OK, I'm changed VIEW to this text:

It can sometimes. What it means is that PG will follow whatever order you
write the joins in. If you know joining a to b to c is the best order,
that can be a good thing. Unfortunately, it means the planner can't make a
better guess based on its statistics.

> CREATE VIEW v_file02 AS
> SELECT filenum, data, id_a011, id_v, id_obl, id_dd, id_r020, id_r030,
> id_k041, t071
> FROM filexxr, file02
> WHERE file02.id_r = filexxr.id_r;
>
> CREATE VIEW v_file02wide AS
> SELECT id_a011 AS a011, data, obl.ko, obl.nazva AS oblast, b030,
> banx.box AS ncks, banx.nazva AS bank,
>     epr.dd, r020, vf02.id_r030 AS r030, a3, kl_r030.nazva AS valuta, k041,
>     -- Sum equivalent in national currency
>     t071 * get_kurs(vf02.id_r030, data) AS t070, t071
> FROM v_file02 AS vf02, kod_obl AS obl, v_banx AS banx,
>     ek_pok_r AS epr, kl_r020, kl_r030, kl_k041
> WHERE obl.id_obl = vf02.id_obl
>     AND banx.id_v = vf02.id_v
>     AND data BETWEEN COALESCE(banx.dataa, data)
>         AND COALESCE(banx.datab, data)
>     AND epr.id_dd = vf02.id_dd
>     AND kl_r020.id_r020 = vf02.id_r020
>     AND kl_r030.id_r030 = vf02.id_r030
>     AND kl_k041.id_k041 = vf02.id_k041;
>
>     Now (with configuration and view definition changed) "SELECT COUNT(*)
> FROM v_file02wide;" executes about 6 minutes and 45 seconds instead of
> 30 seconds (previous).

OK - don't worry if it looks like we're going backwards, we should be able
to get everything running nicely soon.

>     Another annoying "feature" is impossibility writing "SELECT * FROM..."
> - duplicate column names error. In NATURAL JOIN joined columns hiding
> automatically. :-|
>
>>>    Function inrepdate is:
>
>>>CREATE OR REPLACE FUNCTION inrepdate(DATE) RETURNS BOOL
>>>LANGUAGE SQL AS '
>>>    -- Returns true if given date is in repdate
>>>    SELECT (SELECT COUNT(*) FROM repdate
>>>        WHERE $1 BETWEEN COALESCE(data1, CURRENT_DATE)
>>>            AND COALESCE(data2, CURRENT_DATE))
>>>
>>>        > 0;
>
>> You can probably replace this with:
>>   SELECT true FROM repdate WHERE $1 ...
>> You'll need to look at where it's used though.
>
>     Hmm... table repdate contain date intervals. For example:
>     data1        data2
>     2003-01-01    2003-01-10
>     2003-05-07    2003-05-24
>     ...
>     I need single value (true or false) about given date as parameter -
> report includes given date or not. COUNT used as aggregate function for
> this. Can you write this function more simpler?
>     BTW, I prefer SQL language if possible, then PL/pgSQL. This may be
> mistake?

No - not really. You can do things in plpgsql that you can't in sql, but I
use both depending on the situation.

>>>    Table has indexes almost for all selected fields.
>
>> That's not going to help you for the SUBSTR(...) stuff, although you
>> could use
>> functional indexes (see manuals/list archives for details).
>
>     Yes, I'm using functional indexes, but not in this case... now in this
> case too! :)
>
>> First thing is to get those two configuration settings somewhere sane,
>> then we
>> can tune properly. You might like the document at:
>
>> http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
>
>     Thanks, it's interesting.
>
>     Current query plan:
>
>
>                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Aggregate  (cost=188411.98..188411.98 rows=1 width=151)
>     ->  Hash Join  (cost=186572.19..188398.39 rows=5437 width=151)
>           Hash Cond: ("outer".id_obl = "inner".id_obl)
>           ->  Hash Join  (cost=186570.65..188301.70 rows=5437 width=147)
[snip]
Well the cost estimates look much more plausible. You couldn't post
EXPLAIN ANALYSE could you? That actually runs the query.

>     Now (2K shared_buffers blocks, 16K effective_cache_size blocks, 16Mb
> sort_mem) PostgreSQL uses much less memory, about 64M... it's not good,
> I want using all available RAM if possible - PostgreSQL is the main task
> on this PC.

Don't forget that any memory PG is using the operating-system can't. The
OS will cache frequently accessed disk blocks for you, so it's a question
of finding the right balance.

>     May set effective_cache_size to 192M (48K blocks) be better? I don't
> understand exactly: effective_cache_size tells PostgreSQL about OS cache
> size or about available free RAM?

It needs to reflect how much cache the system is using - try the "free"
command to see figures.

If you could post the output of EXPLAIN ANALYSE rather than EXPLAIN, I'll
take a look at it this evening (London time). There's also plenty of other
people on this list who can help too.

HTH

- Richard Huxton

pgsql-performance by date:

Previous
From: "Wilson A. Galafassi Jr."
Date:
Subject: Postgresql slow on XEON 2.4ghz/1gb ram
Next
From: "Wilson A. Galafassi Jr."
Date:
Subject: PostgreSql under Linux