Home > mailing lists

Re: small database huge planning time - Mailing list pgsql-performance

From	Andrei Lepikhov
Subject	Re: small database huge planning time
Date	January 25 01:12:54
Msg-id	65e0c8ac-a77e-41e9-ba18-bf0829c6a640@gmail.com Whole thread Raw
In response to	small database huge planning time (Alexander Kulikov <a-kulikov@hotmail.com>)
List	pgsql-performance

Tree view

On 13/1/26 10:16, Alexander Kulikov wrote:
> Changing parameters: from_collapse_limit, join_collapse_limit, geqo, jit, work_mem and many 
> others does not help at all. I attach 1. additional setting in the 
> postgresql.status.conf. 2. querry itself in query.sql. 3. zql plan in 
> query.sqlplan 4. additioanal information about os, tables etc. would you 
> please help me -Alexander Kulikov

I have analysed your case (query-explain-1c.txt), thanks for the data 
provided off-list.
In short, the main issue is the extensive use of MCV statistics on 
(compressed) bytea columns.

The flamegraph of the planning process (see flamegraph-4999-1c.svg) 
shows little, except that we spend a lot of time in the byteaeq routine.

Your tables contain many variable-length columns.
Look at the statistics on your tables (pgstats-1c.txt). There are 
columns that exist containing 500 MCV elements and Histogram bins.

But how much do we actually touch these statistics and detoast these 
massive arrays? Statistics on pg_statistic usage show my extension 
pg_index_stats [1] - be careful, it is for research purposes for now! It 
has been shown (see stat_used-1c.res) that some statistics were used 200 
times or more during the planning of this query.

Column analysis exposes that _inforg10621::_fld10622rref and 
_inforg8199::_fld8200_rrref are used in join clauses more than 6 times. 
The _inforg10621::_fld15131rref is used in equality comparisons 7 times. 
Remember, how many indexes your tables have, and you can imagine that 
different combinations of parameterised index scan might add extra 
clauses too.

Simple test: nullify the MCV of the two most used columns:

UPDATE pg_statistic
SET
     stanumbers1 = CASE WHEN stakind1 = 1 THEN NULL ELSE stanumbers1 END,
     stavalues1  = CASE WHEN stakind1 = 1 THEN NULL ELSE stavalues1 END,
     stakind1    = CASE WHEN stakind1 = 1 THEN 0 ELSE stakind1 END,
     stanumbers2 = CASE WHEN stakind2 = 1 THEN NULL ELSE stanumbers2 END,
     stavalues2  = CASE WHEN stakind2 = 1 THEN NULL ELSE stavalues2 END,
     stakind2    = CASE WHEN stakind2 = 1 THEN 0 ELSE stakind2 END
WHERE (starelid = '_inforg10621'::regclass AND staattnum = (
   SELECT attnum FROM pg_attribute
   WHERE (attrelid = '_inforg10621'::regclass AND attname = 
'_fld10622rref')))
   OR (starelid = '_inforg8199'::regclass AND staattnum = (
     SELECT attnum FROM pg_attribute
     WHERE (attrelid = '_inforg8199'::regclass AND attname = 
'_fld8201rref')));

and now we see:
Planning:
    Buffers: shared hit=5
    Memory: used=4030kB  allocated=4096kB
  Planning Time: 31.347 ms
  Execution Time: 0.237 ms

That's much better. Let's make the second check and launch this query on 
the alternative Postgres fork (see query-explain-pgpro.txt) - same 
version of Postgres (17.5). It touched almost 1500 buffers compared to 5 
in the first case, but planning time is close to 0! Why?
Flamegraph seems to be the same (flamegraph-4999-pgpro.svg). Statistics 
are exactly the same (pgstats-pgpro.txt). But pg_index_stats show that 
it rarely ends up in the statistic slot (stats_used-pgpro). I suppose 
here we have two factors: toast reading optimisation and (more 
importantly) technique of caching recent statistic slots (in detoasted 
and decompressed state).

What can you do? Not much, but still:
1. Reduce the statistical target on tables causing the problem, or just 
nullify MCV statistics as I did in the example.
2. Complain to the vendor and force them to cache statistics.

[1] https://github.com/danolivo/pg_index_stats

-- 
regards, Andrei Lepikhov,
pgEdge

Attachment

pgsql-performance by date:

From: Andrei Lepikhov
Date: 18 January, 12:24:56
Subject: Re: Slow queries on simple index

From: Dirk Krautschick
Date: 03 February, 04:02:52
Subject: performance impact on read replicas during heavy vacuums on primary

Re: small database huge planning time - Mailing list pgsql-performance

Attachment

Previous

Next