Thread: REINDEXdb performance degrading gradually PG13.4
Hi,We are trying to reindex 600k tables in a single database of size 2.7TBusing reindexdb utility in a shell scriptreindexdb -v -d $dbname -h $hostname -U tkcsowner --concurrently -j $parallel -S $schemaour config is as belowname | setting--------------------------------+---------auto_explain.log_buffers | offautovacuum_work_mem | 524288dbms_pipe.total_message_buffer | 30dynamic_shared_memory_type | posixhash_mem_multiplier | 1logical_decoding_work_mem | 65536maintenance_work_mem | 2097152shared_buffers | 4194304shared_memory_type | mmaptemp_buffers | 1024wal_buffers | 2048work_mem | 16384Memory:free -htotal used free shared buff/cache availableMem: 125G 38G 1.1G 93M 85G 86GSwap: 74G 188M 74Gnproc16Initially it was processing 1000 tables per minute. Performance is gradually dropping and now after 24 hr it was processing 90 tables per minute.we see stats collector in top -c continuously activePID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND3730 ****** 20 0 520928 233844 1244 R 61.8 0.2 650:31.36 postgres: stats collectorpostgres=# SELECT date_trunc('second', current_timestamp - pg_postmaster_start_time()) as uptime;uptime----------------1 day 04:07:18top - 13:08:22 up 1 day, 5:45, 2 users, load average: 1.65, 1.65, 1.56Tasks: 303 total, 3 running, 300 sleeping, 0 stopped, 0 zombie%Cpu(s): 9.6 us, 3.4 sy, 0.0 ni, 86.8 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 stKiB Mem : 13185940+total, 992560 free, 40571300 used, 90295552 buff/cacheKiB Swap: 78643200 total, 78450376 free, 192820 used. 90327376 avail Memiostat -mxy 5Linux 3.10.0-1160.53.1.el7.x86_64 (***************************************) 05/31/2022 _x86_64_ (16 CPU)avg-cpu: %user %nice %system %iowait %steal %idle8.22 0.00 3.23 0.06 0.00 88.49Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %utilsda 0.00 0.00 0.00 0.60 0.00 0.00 16.00 0.00 2.67 0.00 2.67 3.33 0.20sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00sdc 0.00 0.00 0.00 26.80 0.00 0.16 11.94 0.01 0.37 0.00 0.37 0.69 1.86sde 0.00 0.00 3.80 26.80 0.04 0.43 31.27 0.03 0.96 0.63 1.01 0.40 1.22DB versionPostgreSQL 13.4Osbash-4.2$ cat /etc/redhat-releaseCentOS Linux release 7.9.2009 (Core)What could be the possible bottleneck ?Best RegardsPraneel
On Tuesday, May 31, 2022, Praneel Devisetty <devisettypraneel@gmail.com> wrote:
Initially it was processing 1000 tables per minute. Performance is gradually dropping and now after 24 hr it was processing 90 tables per minute.
That seems like a fairly problematic metric given the general vast disparities in size tables have.
Building indexes is so IO heavy that the non-IO bottlenecks that exists likely have minimal impact on the overall times this rebuild everything will take. That said, I’ve never done anything at this scale before. I wouldn’t be too surprised if per-session cache effects are coming into play given the number of objects involved and the assumption that each session used for parallelism is persistent. I’m not sure how the parallelism works for managing the work queue though as it isn’t documented and I haven’t inspected the source code.
On Tue, May 31, 2022 at 08:42:06AM -0700, David G. Johnston wrote: > Building indexes is so IO heavy that the non-IO bottlenecks that exists > likely have minimal impact on the overall times this rebuild everything > will take. That said, I’ve never done anything at this scale before. I > wouldn’t be too surprised if per-session cache effects are coming into play > given the number of objects involved and the assumption that each session > used for parallelism is persistent. I’m not sure how the parallelism works > for managing the work queue though as it isn’t documented and I haven’t > inspected the source code. get_parallel_object_list() in reindexdb.c would give the idea, where the list of tables to rebuild are ordered based on an "ORDER BY c.relpages DESC", then the table queue is processed with its own command, moving on to the next item once we are done with an item in the list. -- Michael
Attachment
On Tue, May 31, 2022 at 9:12 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, May 31, 2022, Praneel Devisetty <devisettypraneel@gmail.com> wrote:Initially it was processing 1000 tables per minute. Performance is gradually dropping and now after 24 hr it was processing 90 tables per minute.That seems like a fairly problematic metric given the general vast disparities in size tables have.Building indexes is so IO heavy that the non-IO bottlenecks that exists likely have minimal impact on the overall times this rebuild everything will take. That said, I’ve never done anything at this scale before. I wouldn’t be too surprised if per-session cache effects are coming into play given the number of objects involved and the assumption that each session used for parallelism is persistent. I’m not sure how the parallelism works for managing the work queue though as it isn’t documented and I haven’t inspected the source code.
could you please share more about per-session cache effects /Point me to link with more info .
On Tue, May 31, 2022 at 11:14 AM Praneel Devisetty <devisettypraneel@gmail.com> wrote:
Hi,We are trying to reindex 600k tables in a single database of size 2.7TBusing reindexdb utility in a shell scriptreindexdb -v -d $dbname -h $hostname -U tkcsowner --concurrently -j $parallel -S $schema
What is the value of $parallel? Are all the tables in the same schema?
Initially it was processing 1000 tables per minute. Performance is gradually dropping and now after 24 hr it was processing 90 tables per minute.
I can't even get remotely close to 1000 per minute with those options, even with only 100000 single-index tables with all of them being empty. Are you sure that isn't 1000 per hour?
Using --concurrently really hits the stats system hard (I'm not sure why). Could you just omit that? If it is running at 1000 per minute or even per hour, does it really matter if the table is locked for as long as it takes to reindex?
Cheers,
Jeff