Thread: REINDEXdb performance degrading gradually PG13.4

REINDEXdb performance degrading gradually PG13.4

From

Praneel Devisetty

Date:

31 May 2022, 18:14:29

Hi,

We are trying to reindex 600k tables in a single database of size 2.7TB
using reindexdb utility in a shell script
reindexdb -v -d $dbname -h $hostname -U tkcsowner --concurrently -j $parallel -S $schema

our config is as below
name | setting
--------------------------------+---------
auto_explain.log_buffers | off
autovacuum_work_mem | 524288
dbms_pipe.total_message_buffer | 30
dynamic_shared_memory_type | posix
hash_mem_multiplier | 1
logical_decoding_work_mem | 65536
maintenance_work_mem | 2097152
shared_buffers | 4194304
shared_memory_type | mmap
temp_buffers | 1024
wal_buffers | 2048
work_mem | 16384

Memory:
free -h
total used free shared buff/cache available
Mem: 125G 38G 1.1G 93M 85G 86G
Swap: 74G 188M 74G

nproc
16

Initially it was processing 1000 tables per minute. Performance is gradually dropping and now after 24 hr it was processing 90 tables per minute.

we see stats collector in top -c continuously active
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

3730 ****** 20 0 520928 233844 1244 R 61.8 0.2 650:31.36 postgres: stats collector

postgres=# SELECT date_trunc('second', current_timestamp - pg_postmaster_start_time()) as uptime;
uptime
----------------
1 day 04:07:18

top - 13:08:22 up 1 day, 5:45, 2 users, load average: 1.65, 1.65, 1.56
Tasks: 303 total, 3 running, 300 sleeping, 0 stopped, 0 zombie
%Cpu(s): 9.6 us, 3.4 sy, 0.0 ni, 86.8 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 13185940+total, 992560 free, 40571300 used, 90295552 buff/cache
KiB Swap: 78643200 total, 78450376 free, 192820 used. 90327376 avail Mem

iostat -mxy 5
Linux 3.10.0-1160.53.1.el7.x86_64 (***************************************) 05/31/2022 _x86_64_ (16 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
8.22 0.00 3.23 0.06 0.00 88.49

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.60 0.00 0.00 16.00 0.00 2.67 0.00 2.67 3.33 0.20
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 26.80 0.00 0.16 11.94 0.01 0.37 0.00 0.37 0.69 1.86
sde 0.00 0.00 3.80 26.80 0.04 0.43 31.27 0.03 0.96 0.63 1.01 0.40 1.22

DB version
PostgreSQL 13.4

Os
bash-4.2$ cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)

What could be the possible bottleneck ?

Best Regards
Praneel

REINDEXdb performance degrading gradually PG13.4

From

"David G. Johnston"

Date:

31 May 2022, 18:42:06

On Tuesday, May 31, 2022, Praneel Devisetty <devisettypraneel@gmail.com> wrote:

Initially it was processing 1000 tables per minute. Performance is gradually dropping and now after 24 hr it was processing 90 tables per minute.

That seems like a fairly problematic metric given the general vast disparities in size tables have.

Building indexes is so IO heavy that the non-IO bottlenecks that exists likely have minimal impact on the overall times this rebuild everything will take. That said, I’ve never done anything at this scale before. I wouldn’t be too surprised if per-session cache effects are coming into play given the number of objects involved and the assumption that each session used for parallelism is persistent. I’m not sure how the parallelism works for managing the work queue though as it isn’t documented and I haven’t inspected the source code.

Re: REINDEXdb performance degrading gradually PG13.4

From

Michael Paquier

Date:

01 June 2022, 07:37:37

On Tue, May 31, 2022 at 08:42:06AM -0700, David G. Johnston wrote:
> Building indexes is so IO heavy that the non-IO bottlenecks that exists
> likely have minimal impact on the overall times this rebuild everything
> will take.  That said, I’ve never done anything at this scale before.  I
> wouldn’t be too surprised if per-session cache effects are coming into play
> given the number of objects involved and the assumption that each session
> used for parallelism is persistent.  I’m not sure how the parallelism works
> for managing the work queue though as it isn’t documented and I haven’t
> inspected the source code.

get_parallel_object_list() in reindexdb.c would give the idea, where
the list of tables to rebuild are ordered based on an "ORDER BY
c.relpages DESC", then the table queue is processed with its own
command, moving on to the next item once we are done with an item in
the list.
--
Michael

Attachment

signature.asc

Re: REINDEXdb performance degrading gradually PG13.4

From

Praneel Devisetty

Date:

01 June 2022, 10:06:41

On Tue, May 31, 2022 at 9:12 PM David G. Johnston <david.g.johnston@gmail.com> wrote:

On Tuesday, May 31, 2022, Praneel Devisetty <devisettypraneel@gmail.com> wrote:

Initially it was processing 1000 tables per minute. Performance is gradually dropping and now after 24 hr it was processing 90 tables per minute.

That seems like a fairly problematic metric given the general vast disparities in size tables have.

Building indexes is so IO heavy that the non-IO bottlenecks that exists likely have minimal impact on the overall times this rebuild everything will take. That said, I’ve never done anything at this scale before. I wouldn’t be too surprised if per-session cache effects are coming into play given the number of objects involved and the assumption that each session used for parallelism is persistent. I’m not sure how the parallelism works for managing the work queue though as it isn’t documented and I haven’t inspected the source code.

could you please share more about per-session cache effects /Point me to link with more info .

Re: REINDEXdb performance degrading gradually PG13.4

From

Jeff Janes

Date:

01 June 2022, 20:41:07

On Tue, May 31, 2022 at 11:14 AM Praneel Devisetty <devisettypraneel@gmail.com> wrote:

Hi,

We are trying to reindex 600k tables in a single database of size 2.7TB
using reindexdb utility in a shell script
reindexdb -v -d $dbname -h $hostname -U tkcsowner --concurrently -j $parallel -S $schema

What is the value of $parallel? Are all the tables in the same schema?

Initially it was processing 1000 tables per minute. Performance is gradually dropping and now after 24 hr it was processing 90 tables per minute.

I can't even get remotely close to 1000 per minute with those options, even with only 100000 single-index tables with all of them being empty. Are you sure that isn't 1000 per hour?

Using --concurrently really hits the stats system hard (I'm not sure why). Could you just omit that? If it is running at 1000 per minute or even per hour, does it really matter if the table is locked for as long as it takes to reindex?

Cheers,

Jeff