Thread: Re: Using COPY FREEZE in pgbench
> I have looked in the code of PQprotocolVersion. The only case in which > it returns 0 is there's no connection. Yes, you are right. Once the > connection has been successfuly established, there's no chance it > fails. So I agree with you. Attached v3 patch addresses this. >> The "g" item in the section describing initialization steps >> (i.e. option -I). I'd suggest just to replace "COPY" with "COPY >> FREEZE" in the sentence. > > Ok. The section is needed to be modified. This is also addressed in the patch. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 50cf22ba6b..9badafbc1f 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -220,6 +220,9 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d data is generated in <command>pgbench</command> client and then sent to the server. This uses the client/server bandwidth extensively through a <command>COPY</command>. + <command>pgbench</command> uses FREEZE option with 14 or later + version of <productname>PostgreSQL</productname> to speed up + subsequent <command>VACUUM</command>. Using <literal>g</literal> causes logging to print one message every 100,000 rows while generating data for the <structname>pgbench_accounts</structname> table. diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index e69d43b26b..a842b59188 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -3976,6 +3976,7 @@ initGenerateDataClientSide(PGconn *con) PGresult *res; int i; int64 k; + int server_version; /* used to track elapsed time and estimate of the remaining time */ pg_time_usec_t start; @@ -4022,7 +4023,18 @@ initGenerateDataClientSide(PGconn *con) /* * accounts is big enough to be worth using COPY and tracking runtime */ - res = PQexec(con, "copy pgbench_accounts from stdin"); + + /* + * If server version is 14.0 or later, we can take account of freeze + * option of copy. + */ + server_version = PQserverVersion(con); + + if (server_version >= 140000) + res = PQexec(con, "copy pgbench_accounts from stdin with (freeze on)"); + else + res = PQexec(con, "copy pgbench_accounts from stdin"); + if (PQresultStatus(res) != PGRES_COPY_IN) { pg_log_fatal("unexpected copy in result: %s", PQerrorMessage(con));
Hello Tatsuo-san, >> I have looked in the code of PQprotocolVersion. The only case in which >> it returns 0 is there's no connection. Yes, you are right. Once the >> connection has been successfuly established, there's no chance it >> fails. So I agree with you. > > Attached v3 patch addresses this. > >>> The "g" item in the section describing initialization steps >>> (i.e. option -I). I'd suggest just to replace "COPY" with "COPY >>> FREEZE" in the sentence. >> >> Ok. The section is needed to be modified. > > This is also addressed in the patch. V3 works for me and looks ok. I changed it to ready in the CF app. -- Fabien.
Hi Fabien, > Hello Tatsuo-san, > >>> I have looked in the code of PQprotocolVersion. The only case in which >>> it returns 0 is there's no connection. Yes, you are right. Once the >>> connection has been successfuly established, there's no chance it >>> fails. So I agree with you. >> >> Attached v3 patch addresses this. >> >>>> The "g" item in the section describing initialization steps >>>> (i.e. option -I). I'd suggest just to replace "COPY" with "COPY >>>> FREEZE" in the sentence. >>> >>> Ok. The section is needed to be modified. >> >> This is also addressed in the patch. > > V3 works for me and looks ok. I changed it to ready in the CF app. Thank you for your review! -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
>> V3 works for me and looks ok. I changed it to ready in the CF app. > > Thank you for your review! Unfortunately it seems cfbot is not happy with the patch. # Failed test 'pgbench scale 1 initialization status (got 1 vs expected 0)' # at t/001_pgbench_with_server.pl line 116. # Failed test 'pgbench scale 1 initialization stderr /(?^:creating foreign keys)/' # at t/001_pgbench_with_server.pl line 116. # 'dropping old tables... # creating tables... # creating 2 partitions... # creating primary keys... # vacuuming... # generating data (client-side)... # 100000 of 100000 tuples (100%) done (elapsed 0.02 s, remaining 0.00 s) # ERROR: cannot perform COPY FREEZE on a partitioned table # pgbench: fatal: PQendcopy failed I think pgbench needs to check whether partitioning option is specified or not. If specified, pgbench should not use COPY FREEZE. Attached v4 patch does this. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 50cf22ba6b..0f6a6babc2 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -220,6 +220,10 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d data is generated in <command>pgbench</command> client and then sent to the server. This uses the client/server bandwidth extensively through a <command>COPY</command>. + <command>pgbench</command> uses FREEZE option with 14 or later + version of <productname>PostgreSQL</productname> to speed up + subsequent <command>VACUUM</command> if portioning option is not + specified. Using <literal>g</literal> causes logging to print one message every 100,000 rows while generating data for the <structname>pgbench_accounts</structname> table. diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index e69d43b26b..44d39ad55d 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -3976,6 +3976,7 @@ initGenerateDataClientSide(PGconn *con) PGresult *res; int i; int64 k; + char *copy_statement; /* used to track elapsed time and estimate of the remaining time */ pg_time_usec_t start; @@ -4022,7 +4023,18 @@ initGenerateDataClientSide(PGconn *con) /* * accounts is big enough to be worth using COPY and tracking runtime */ - res = PQexec(con, "copy pgbench_accounts from stdin"); + + /* + * If partitioning is not enabled and server version is 14.0 or later, we + * can take account of freeze option of copy. + */ + if (partitions == 0 && PQserverVersion(con) >= 140000) + copy_statement = "copy pgbench_accounts from stdin with (freeze on)"; + else + copy_statement = "copy pgbench_accounts from stdin"; + + res = PQexec(con, copy_statement); + if (PQresultStatus(res) != PGRES_COPY_IN) { pg_log_fatal("unexpected copy in result: %s", PQerrorMessage(con));
>>> V3 works for me and looks ok. I changed it to ready in the CF app. >> >> Thank you for your review! > > Unfortunately it seems cfbot is not happy with the patch. Argh. Indeed, I did not thought of testing on a partitioned table:-( ISTM I did "make check" in pgbench to trigger tap tests, but possibly it was in a dream:-( The feature is a little disappointing because if someone has partition tables then probably they have a lot of data and probably they would like freeze to work there. Maybe freeze would work on table partitions themselves, but I do not think it is worth the effort to do that. About v4 there is a typo in the doc "portioning": <command>pgbench</command> uses FREEZE option with 14 or later version of <productname>PostgreSQL</productname> to speed up subsequent <command>VACUUM</command> if portioning option is not specified. I'd suggest: <command>pgbench</command> uses the FREEZE option with 14 or later version of <productname>PostgreSQL</productname> to speed up subsequent <command>VACUUM</command>, unless partitions are enabled. -- Fabien.
> The feature is a little disappointing because if someone has partition > tables then probably they have a lot of data and probably they would > like freeze to work there. Maybe freeze would work on table partitions > themselves, but I do not think it is worth the effort to do that. Agreed. > About v4 there is a typo in the doc "portioning": > > <command>pgbench</command> uses FREEZE option with 14 or later > version of <productname>PostgreSQL</productname> to speed up > subsequent <command>VACUUM</command> if portioning option is not > specified. > > I'd suggest: > > <command>pgbench</command> uses the FREEZE option with 14 or later > version of <productname>PostgreSQL</productname> to speed up > subsequent <command>VACUUM</command>, unless partitions are enabled. Thanks for pointing it out. Also I think that in "with 14 or later version", "version" should be "versions". Attached is the v5 patch. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 50cf22ba6b..f0b3a2a624 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -220,6 +220,9 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d data is generated in <command>pgbench</command> client and then sent to the server. This uses the client/server bandwidth extensively through a <command>COPY</command>. + <command>pgbench</command> uses the FREEZE option with 14 or later + versions of <productname>PostgreSQL</productname> to speed up + subsequent <command>VACUUM</command>, unless partitions are enabled. Using <literal>g</literal> causes logging to print one message every 100,000 rows while generating data for the <structname>pgbench_accounts</structname> table. diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index e69d43b26b..44d39ad55d 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -3976,6 +3976,7 @@ initGenerateDataClientSide(PGconn *con) PGresult *res; int i; int64 k; + char *copy_statement; /* used to track elapsed time and estimate of the remaining time */ pg_time_usec_t start; @@ -4022,7 +4023,18 @@ initGenerateDataClientSide(PGconn *con) /* * accounts is big enough to be worth using COPY and tracking runtime */ - res = PQexec(con, "copy pgbench_accounts from stdin"); + + /* + * If partitioning is not enabled and server version is 14.0 or later, we + * can take account of freeze option of copy. + */ + if (partitions == 0 && PQserverVersion(con) >= 140000) + copy_statement = "copy pgbench_accounts from stdin with (freeze on)"; + else + copy_statement = "copy pgbench_accounts from stdin"; + + res = PQexec(con, copy_statement); + if (PQresultStatus(res) != PGRES_COPY_IN) { pg_log_fatal("unexpected copy in result: %s", PQerrorMessage(con));
> Attached is the v5 patch. About v5: doc gen ok, global and local make check ok. I did a few tests on my laptop. Is seems that copying takes a little more time, say about 10%, but vacuum is indeed very significantly reduced, so that the total time for copying and vacuuming is reduced by 10% on overall. So it is okay for me. -- Fabien.
> I did a few tests on my laptop. Is seems that copying takes a little > more time, say about 10%, but vacuum is indeed very significantly > reduced, so that the total time for copying and vacuuming is reduced > by 10% on overall. > > So it is okay for me. Thanks for the test. I wrote: > Curent master: > pgbench -i -s 100 > : > : > done in 70.78 s (drop tables 0.21 s, create tables 0.02 s, client-side generate 12.42 s, vacuum 51.11 s, primary keys 7.02s). > > Using FREEZE: > done in 16.86 s (drop tables 0.20 s, create tables 0.01 s, client-side generate 11.86 s, vacuum 0.25 s, primary keys 4.53s). > > As you can see total time drops from 70.78 seconds to 16.86 seconds, > that is 4.1 times faster. This is mainly because vacuum takes only > 0.25 seconds after COPY FREEZE while unpatched pgbench takes 51.11 > seconds, which is 204 times slower. I did same test again. 13.2 pgbench + master branch server: done in 15.47 s (drop tables 0.19 s, create tables 0.01 s, client-side generate 9.07 s, vacuum 2.07 s, primary keys 4.13s). With patch on master branch: done in 13.38 s (drop tables 0.19 s, create tables 0.01 s, client-side generate 9.68 s, vacuum 0.23 s, primary keys 3.27s). This time current pgbench performs much faster than I wrote (15.47 s vs. 70.78 s). I don't why. Anyway, this time total pgbench time is reduced by 14% over all here. I hope people agree that the patch is worth the gain. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hello Tatsuo-san, > 13.2 pgbench + master branch server: > done in 15.47 s (drop tables 0.19 s, create tables 0.01 s, client-side generate 9.07 s, vacuum 2.07 s, primary keys 4.13s). > > With patch on master branch: > done in 13.38 s (drop tables 0.19 s, create tables 0.01 s, client-side generate 9.68 s, vacuum 0.23 s, primary keys 3.27s). Yes, this is the kind of figures I got on my laptop. > This time current pgbench performs much faster than I wrote (15.47 s vs. > 70.78 s). I don't why. Duno. > Anyway, this time total pgbench time is reduced by 14% over all > here. I hope people agree that the patch is worth the gain. Yes, because (1) why not take +10% and (2) it exercises an option. -- Fabien.
On Sun, Mar 21, 2021 at 5:23 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > Anyway, this time total pgbench time is reduced by 14% over all > here. I hope people agree that the patch is worth the gain. Most of the time when I run pgbench I use my own shell script, which does this: PGOPTIONS='-c vacuum_freeze_min_age=0 -c wal_compression=off' pgbench -i -s $SCALE Have you considered this case? In other words, have you considered the benefits of this patch for users that currently deliberately force freezing by VACUUM, just because it matters to their benchmark? (BTW you might be surprised how much wal_compression=off matters here.) -- Peter Geoghegan
> Most of the time when I run pgbench I use my own shell script, which does this: > > PGOPTIONS='-c vacuum_freeze_min_age=0 -c wal_compression=off' pgbench > -i -s $SCALE > > Have you considered this case? In other words, have you considered the > benefits of this patch for users that currently deliberately force > freezing by VACUUM, just because it matters to their benchmark? I am not sure how many people use this kind of options while running pgbench -i but we could add yet another switch to fall back to none FREEZE COPY if you want. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Fri, Apr 2, 2021 at 4:58 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > I am not sure how many people use this kind of options while running > pgbench -i but we could add yet another switch to fall back to none > FREEZE COPY if you want. I was unclear. What I meant was that your patch isn't just useful because it speeds up "pgbench -i" for everybody. It's also useful because having all of the tuples already frozen after bulk loading seems like a good benchmarking practice, at least most of the time. The patch changes the initial state of the database with "pgbench -i", I think. But that's good. -- Peter Geoghegan
> I was unclear. What I meant was that your patch isn't just useful > because it speeds up "pgbench -i" for everybody. It's also useful > because having all of the tuples already frozen after bulk loading > seems like a good benchmarking practice, at least most of the time. > > The patch changes the initial state of the database with "pgbench -i", > I think. But that's good. Oh, ok. Thanks for the explanation! -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
After this commit: https://git.postgresql.org/pg/commitdiff/8e03eb92e9ad54e2f1ed8b5a73617601f6262f81 I was worried about that the benefit of COPY FREEZE patch is somewhat reduced or gone. So I ran a pgbench test again. Current master: $ pgbench -i -s 100 test : : done in 20.23 s (drop tables 0.00 s, create tables 0.02 s, client-side generate 13.54 s, vacuum 2.34 s, primary keys 4.33s). With v5 patch: done in 16.92 s (drop tables 0.21 s, create tables 0.01 s, client-side generate 12.68 s, vacuum 0.24 s, primary keys 3.77s). So overall gain by the patch is around 15%, whereas the last test before the commit was 14%. It seems the patch is still beneficial after the commit. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hello Tatsuo-san, > So overall gain by the patch is around 15%, whereas the last test before > the commit was 14%. It seems the patch is still beneficial after the > commit. Yes, that's good! I had a quick look again, and about the comment: /* * If partitioning is not enabled and server version is 14.0 or later, we * can take account of freeze option of copy. */ I'd suggest instead the shorter: /* use COPY with FREEZE on v14 and later without partioning */ Or maybe even to fully drop the comment, because the code is clear enough and the doc already says it. -- Fabien.
Hi fabien, >> So overall gain by the patch is around 15%, whereas the last test >> before the commit was 14%. It seems the patch is still beneficial >> after the commit. > > Yes, that's good! Yeah! > I had a quick look again, and about the comment: > > /* > * If partitioning is not enabled and server version is 14.0 or later, we > * can take account of freeze option of copy. > */ > > I'd suggest instead the shorter: > > /* use COPY with FREEZE on v14 and later without partioning */ > > Or maybe even to fully drop the comment, because the code is clear > enough and the doc already says it. I'd prefer to leave a comment. People (including me) tend to forget things in the future, that are obvious now:-) Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 0c60077e1f..0f432767c2 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -220,6 +220,9 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d data is generated in <command>pgbench</command> client and then sent to the server. This uses the client/server bandwidth extensively through a <command>COPY</command>. + <command>pgbench</command> uses the FREEZE option with 14 or later + versions of <productname>PostgreSQL</productname> to speed up + subsequent <command>VACUUM</command>, unless partitions are enabled. Using <literal>g</literal> causes logging to print one message every 100,000 rows while generating data for the <structname>pgbench_accounts</structname> table. diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 4aeccd93af..54d993075f 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -4119,6 +4119,7 @@ initGenerateDataClientSide(PGconn *con) PGresult *res; int i; int64 k; + char *copy_statement; /* used to track elapsed time and estimate of the remaining time */ pg_time_usec_t start; @@ -4165,7 +4166,15 @@ initGenerateDataClientSide(PGconn *con) /* * accounts is big enough to be worth using COPY and tracking runtime */ - res = PQexec(con, "copy pgbench_accounts from stdin"); + + /* use COPY with FREEZE on v14 and later without partioning */ + if (partitions == 0 && PQserverVersion(con) >= 140000) + copy_statement = "copy pgbench_accounts from stdin with (freeze on)"; + else + copy_statement = "copy pgbench_accounts from stdin"; + + res = PQexec(con, copy_statement); + if (PQresultStatus(res) != PGRES_COPY_IN) { pg_log_fatal("unexpected copy in result: %s", PQerrorMessage(con));
On Sun, 4 Jul 2021 at 09:32, Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > >> So overall gain by the patch is around 15%, whereas the last test > >> before the commit was 14%. It seems the patch is still beneficial > >> after the commit. > > > > Yes, that's good! > > Yeah! > I tested this with -s100 and got similar results, but with -s1000 it was more like 1.5x faster: master: done in 111.33 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 52.45 s, vacuum 32.30 s, primary keys 26.58 s) patch: done in 74.04 s (drop tables 0.46 s, create tables 0.04 s, client-side generate 51.81 s, vacuum 2.11 s, primary keys 19.63 s) Nice! Regards, Dean
> I tested this with -s100 and got similar results, but with -s1000 it > was more like 1.5x faster: > > master: > done in 111.33 s (drop tables 0.00 s, create tables 0.01 s, > client-side generate 52.45 s, vacuum 32.30 s, primary keys 26.58 s) > > patch: > done in 74.04 s (drop tables 0.46 s, create tables 0.04 s, client-side > generate 51.81 s, vacuum 2.11 s, primary keys 19.63 s) > > Nice! > > Regards, > Dean If there's no objection, I am going to commit/push to master branch in early September. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
>> I tested this with -s100 and got similar results, but with -s1000 it >> was more like 1.5x faster: >> >> master: >> done in 111.33 s (drop tables 0.00 s, create tables 0.01 s, >> client-side generate 52.45 s, vacuum 32.30 s, primary keys 26.58 s) >> >> patch: >> done in 74.04 s (drop tables 0.46 s, create tables 0.04 s, client-side >> generate 51.81 s, vacuum 2.11 s, primary keys 19.63 s) >> >> Nice! >> >> Regards, >> Dean > > If there's no objection, I am going to commit/push to master branch in > early September. I have pushed the patch to master branch. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=06ba4a63b85e5aa47b325c3235c16c05a0b58b96 Thank you for those who gave me the valuable reviews! Reviewed-by: Fabien COELHO, Laurenz Albe, Peter Geoghegan, Dean Rasheed -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp