Performance degradation on concurrent COPY into a single relation in PG16. - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Performance degradation on concurrent COPY into a single relation in PG16. |
Date | |
Msg-id | CAD21AoDvDmUQeJtZrau1ovnT_smN940=Kp6mszNGK3bq9yRN6g@mail.gmail.com Whole thread Raw |
Responses |
Re: Performance degradation on concurrent COPY into a single relation in PG16.
Re: Performance degradation on concurrent COPY into a single relation in PG16. Re: Performance degradation on concurrent COPY into a single relation in PG16. Re: Performance degradation on concurrent COPY into a single relation in PG16. |
List | pgsql-hackers |
Hi all, While testing PG16, I observed that in PG16 there is a big performance degradation in concurrent COPY into a single relation with 2 - 16 clients in my environment. I've attached a test script that measures the execution time of COPYing 5GB data in total to the single relation while changing the number of concurrent insertions, in PG16 and PG15. Here are the results on my environment (EC2 instance, RHEL 8.6, 128 vCPUs, 512GB RAM): * PG15 (4b15868b69) PG15: nclients = 1, execution time = 14.181 PG15: nclients = 2, execution time = 9.319 PG15: nclients = 4, execution time = 5.872 PG15: nclients = 8, execution time = 3.773 PG15: nclients = 16, execution time = 3.202 PG15: nclients = 32, execution time = 3.023 PG15: nclients = 64, execution time = 3.829 PG15: nclients = 128, execution time = 4.111 PG15: nclients = 256, execution time = 4.158 * PG16 (c24e9ef330) PG16: nclients = 1, execution time = 17.112 PG16: nclients = 2, execution time = 14.084 PG16: nclients = 4, execution time = 27.997 PG16: nclients = 8, execution time = 10.554 PG16: nclients = 16, execution time = 7.074 PG16: nclients = 32, execution time = 4.607 PG16: nclients = 64, execution time = 2.093 PG16: nclients = 128, execution time = 2.141 PG16: nclients = 256, execution time = 2.202 PG16 has better scalability (more than 64 clients) but it took much more time than PG15, especially at 1 - 16 clients. The relevant commit is 00d1e02be2 "hio: Use ExtendBufferedRelBy() to extend tables more efficiently". With commit 1cbbee0338 (the previous commit of 00d1e02be2), I got a better numbers, it didn't have a better scalability, though: PG16: nclients = 1, execution time = 17.444 PG16: nclients = 2, execution time = 10.690 PG16: nclients = 4, execution time = 7.010 PG16: nclients = 8, execution time = 4.282 PG16: nclients = 16, execution time = 3.373 PG16: nclients = 32, execution time = 3.205 PG16: nclients = 64, execution time = 3.705 PG16: nclients = 128, execution time = 4.196 PG16: nclients = 256, execution time = 4.201 While investigating the cause, I found an interesting fact that in mdzeroextend if I use only either FileFallocate() or FileZero, we can get better numbers. For example, If I always use FileZero with the following change: @@ -574,7 +574,7 @@ mdzeroextend(SMgrRelation reln, ForkNumber forknum, * that decision should be made though? For now just use a cutoff of * 8, anything between 4 and 8 worked OK in some local testing. */ - if (numblocks > 8) + if (false) { int ret; I got: PG16: nclients = 1, execution time = 16.898 PG16: nclients = 2, execution time = 8.740 PG16: nclients = 4, execution time = 4.656 PG16: nclients = 8, execution time = 2.733 PG16: nclients = 16, execution time = 2.021 PG16: nclients = 32, execution time = 1.693 PG16: nclients = 64, execution time = 1.742 PG16: nclients = 128, execution time = 2.180 PG16: nclients = 256, execution time = 2.296 After further investigation, the performance degradation comes from calling posix_fallocate() (called via FileFallocate()) and pwritev() (called via FileZero) alternatively depending on how many blocks we extend by. And it happens only on the xfs filesystem. Does anyone observe a similar performance issue with the attached benchmark script? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: