Thread: REINDEXdb performance degrading gradually PG13.4

REINDEXdb performance degrading gradually PG13.4

From
Praneel Devisetty
Date:

Hi,

We are trying to reindex 600k tables in a single database  of size 2.7TB
using reindexdb utility in a shell script
reindexdb -v -d $dbname -h $hostname -U tkcsowner --concurrently -j $parallel -S $schema

our config is as below
              name              | setting
--------------------------------+---------
 auto_explain.log_buffers       | off
 autovacuum_work_mem            | 524288
 dbms_pipe.total_message_buffer | 30
 dynamic_shared_memory_type     | posix
 hash_mem_multiplier            | 1
 logical_decoding_work_mem      | 65536
 maintenance_work_mem           | 2097152
 shared_buffers                 | 4194304
 shared_memory_type             | mmap
 temp_buffers                   | 1024
 wal_buffers                    | 2048
 work_mem                       | 16384

Memory:
 free -h
              total        used        free      shared  buff/cache   available
Mem:           125G         38G        1.1G         93M         85G         86G
Swap:           74G        188M         74G

 nproc
16

Initially it was processing 1000 tables per minute. Performance is gradually dropping and now after 24 hr it was processing 90 tables per minute.

we see stats collector in top -c continuously active
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

 3730 ******  20   0  520928 233844   1244 R  61.8  0.2 650:31.36 postgres: stats collector


postgres=# SELECT date_trunc('second', current_timestamp - pg_postmaster_start_time()) as uptime;
     uptime
----------------
 1 day 04:07:18

top - 13:08:22 up 1 day,  5:45,  2 users,  load average: 1.65, 1.65, 1.56
Tasks: 303 total,   3 running, 300 sleeping,   0 stopped,   0 zombie
%Cpu(s):  9.6 us,  3.4 sy,  0.0 ni, 86.8 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 13185940+total,   992560 free, 40571300 used, 90295552 buff/cache
KiB Swap: 78643200 total, 78450376 free,   192820 used. 90327376 avail Mem

iostat -mxy 5
Linux 3.10.0-1160.53.1.el7.x86_64 (***************************************)     05/31/2022      _x86_64_      (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.22    0.00    3.23    0.06    0.00   88.49

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.60     0.00     0.00    16.00     0.00    2.67    0.00    2.67   3.33   0.20
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00   26.80     0.00     0.16    11.94     0.01    0.37    0.00    0.37   0.69   1.86
sde               0.00     0.00    3.80   26.80     0.04     0.43    31.27     0.03    0.96    0.63    1.01   0.40   1.22

DB version
PostgreSQL 13.4  

Os
bash-4.2$ cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)

 What could be the possible bottleneck ?

Best Regards
Praneel


REINDEXdb performance degrading gradually PG13.4

From
"David G. Johnston"
Date:
On Tuesday, May 31, 2022, Praneel Devisetty <devisettypraneel@gmail.com> wrote:

Initially it was processing 1000 tables per minute. Performance is gradually dropping and now after 24 hr it was processing 90 tables per minute.

That seems like a fairly problematic metric given the general vast disparities in size tables have.

Building indexes is so IO heavy that the non-IO bottlenecks that exists likely have minimal impact on the overall times this rebuild everything will take.  That said, I’ve never done anything at this scale before.  I wouldn’t be too surprised if per-session cache effects are coming into play given the number of objects involved and the assumption that each session used for parallelism is persistent.  I’m not sure how the parallelism works for managing the work queue though as it isn’t documented and I haven’t inspected the source code.

Re: REINDEXdb performance degrading gradually PG13.4

From
Michael Paquier
Date:
On Tue, May 31, 2022 at 08:42:06AM -0700, David G. Johnston wrote:
> Building indexes is so IO heavy that the non-IO bottlenecks that exists
> likely have minimal impact on the overall times this rebuild everything
> will take.  That said, I’ve never done anything at this scale before.  I
> wouldn’t be too surprised if per-session cache effects are coming into play
> given the number of objects involved and the assumption that each session
> used for parallelism is persistent.  I’m not sure how the parallelism works
> for managing the work queue though as it isn’t documented and I haven’t
> inspected the source code.

get_parallel_object_list() in reindexdb.c would give the idea, where
the list of tables to rebuild are ordered based on an "ORDER BY
c.relpages DESC", then the table queue is processed with its own
command, moving on to the next item once we are done with an item in
the list.
--
Michael

Attachment

Re: REINDEXdb performance degrading gradually PG13.4

From
Praneel Devisetty
Date:


On Tue, May 31, 2022 at 9:12 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, May 31, 2022, Praneel Devisetty <devisettypraneel@gmail.com> wrote:

Initially it was processing 1000 tables per minute. Performance is gradually dropping and now after 24 hr it was processing 90 tables per minute.

That seems like a fairly problematic metric given the general vast disparities in size tables have.

Building indexes is so IO heavy that the non-IO bottlenecks that exists likely have minimal impact on the overall times this rebuild everything will take.  That said, I’ve never done anything at this scale before.  I wouldn’t be too surprised if per-session cache effects are coming into play given the number of objects involved and the assumption that each session used for parallelism is persistent.  I’m not sure how the parallelism works for managing the work queue though as it isn’t documented and I haven’t inspected the source code.

could you please share more about   per-session cache effects /Point me to link with more info .

Re: REINDEXdb performance degrading gradually PG13.4

From
Jeff Janes
Date:


On Tue, May 31, 2022 at 11:14 AM Praneel Devisetty <devisettypraneel@gmail.com> wrote:

Hi,

We are trying to reindex 600k tables in a single database  of size 2.7TB
using reindexdb utility in a shell script
reindexdb -v -d $dbname -h $hostname -U tkcsowner --concurrently -j $parallel -S $schema


What is the value of $parallel?  Are all the tables in the same schema?
 
Initially it was processing 1000 tables per minute. Performance is gradually dropping and now after 24 hr it was processing 90 tables per minute.

I can't even get remotely close to 1000 per minute with those options, even with only 100000 single-index tables with all of them being empty.  Are you sure that isn't 1000 per hour?

Using --concurrently really hits the stats system hard (I'm not sure why).   Could you just omit that?  If it is running at 1000 per minute or even per hour, does it really matter if the table is locked for as long as it takes to reindex?

Cheers,

Jeff