Thread: Extend pgbench partitioning to pgbench_history
Hi there,
While benchmarking a new feature involving tablespace support in CloudNativePG (Kubernetes operator), I wanted to try out the partitioning feature of pgbench. I saw it supporting both range and hash partitioning, but limited to pgbench_accounts.
With the attached patch, I extend the partitioning capability to the pgbench_history table too.
I have been thinking of adding an option to control this, but I preferred to ask in this list whether it really makes sense or not (I struggle indeed to see use cases where accounts is partitioned and history is not).
Please let me know what you think.
Thanks,
Attachment
Please discard the previous patch and use this one (it had a leftover comment from an initial attempt to limit this to hash case).
Thanks,
Gabriele
On Thu, 30 Nov 2023 at 11:29, Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> wrote:
Hi there,While benchmarking a new feature involving tablespace support in CloudNativePG (Kubernetes operator), I wanted to try out the partitioning feature of pgbench. I saw it supporting both range and hash partitioning, but limited to pgbench_accounts.With the attached patch, I extend the partitioning capability to the pgbench_history table too.I have been thinking of adding an option to control this, but I preferred to ask in this list whether it really makes sense or not (I struggle indeed to see use cases where accounts is partitioned and history is not).Please let me know what you think.Thanks,
Attachment
Hi, There are some test failures reported by Cfbot at [1]: [09:15:01.794] 192/276 postgresql:pgbench / pgbench/001_pgbench_with_server ERROR 7.48s exit status 3 [09:15:01.794] >>> INITDB_TEMPLATE=/tmp/cirrus-ci-build/build/tmp_install/initdb-template LD_LIBRARY_PATH=/tmp/cirrus-ci-build/build/tmp_install//usr/local/pgsql/lib REGRESS_SHLIB=/tmp/cirrus-ci-build/build/src/test/regress/regress.so PATH=/tmp/cirrus-ci-build/build/tmp_install//usr/local/pgsql/bin:/tmp/cirrus-ci-build/build/src/bin/pgbench:/tmp/cirrus-ci-build/build/src/bin/pgbench/test:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin PG_REGRESS=/tmp/cirrus-ci-build/build/src/test/regress/pg_regress MALLOC_PERTURB_=67 /usr/local/bin/python3 /tmp/cirrus-ci-build/build/../src/tools/testwrap --basedir /tmp/cirrus-ci-build/build --srcdir /tmp/cirrus-ci-build/src/bin/pgbench --testgroup pgbench --testname 001_pgbench_with_server -- /usr/local/bin/perl -I /tmp/cirrus-ci-build/src/test/perl -I /tmp/cirrus-ci-build/src/bin/pgbench /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl [09:15:01.794] ――――――――――――――――――――――――――――――――――――― ✀ ――――――――――――――――――――――――――――――――――――― [09:15:01.794] stderr: [09:15:01.794] # Failed test 'transaction format for 001_pgbench_log_2' [09:15:01.794] # at /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl line 1247. [09:15:01.794] # Failed test 'transaction count for /tmp/cirrus-ci-build/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_log_3.25193 (11)' [09:15:01.794] # at /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl line 1257. [09:15:01.794] # Failed test 'transaction format for 001_pgbench_log_3' [09:15:01.794] # at /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl line 1257. [09:15:01.794] # Looks like you failed 3 tests of 439. [09:15:01.794] [09:15:01.794] (test program exited with status code 3) [1] - https://cirrus-ci.com/task/5139049757802496 Thanks and regards Shlok Kyal
Sorry, I did not intend to send this message for this email. I by mistake sent this mail. Please ignore this mail On Wed, 10 Jan 2024 at 15:27, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > Hi, > > There are some test failures reported by Cfbot at [1]: > > [09:15:01.794] 192/276 postgresql:pgbench / > pgbench/001_pgbench_with_server ERROR 7.48s exit status 3 > [09:15:01.794] >>> > INITDB_TEMPLATE=/tmp/cirrus-ci-build/build/tmp_install/initdb-template > LD_LIBRARY_PATH=/tmp/cirrus-ci-build/build/tmp_install//usr/local/pgsql/lib > REGRESS_SHLIB=/tmp/cirrus-ci-build/build/src/test/regress/regress.so > PATH=/tmp/cirrus-ci-build/build/tmp_install//usr/local/pgsql/bin:/tmp/cirrus-ci-build/build/src/bin/pgbench:/tmp/cirrus-ci-build/build/src/bin/pgbench/test:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin > PG_REGRESS=/tmp/cirrus-ci-build/build/src/test/regress/pg_regress > MALLOC_PERTURB_=67 /usr/local/bin/python3 > /tmp/cirrus-ci-build/build/../src/tools/testwrap --basedir > /tmp/cirrus-ci-build/build --srcdir > /tmp/cirrus-ci-build/src/bin/pgbench --testgroup pgbench --testname > 001_pgbench_with_server -- /usr/local/bin/perl -I > /tmp/cirrus-ci-build/src/test/perl -I > /tmp/cirrus-ci-build/src/bin/pgbench > /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl > [09:15:01.794] ――――――――――――――――――――――――――――――――――――― ✀ > ――――――――――――――――――――――――――――――――――――― > [09:15:01.794] stderr: > [09:15:01.794] # Failed test 'transaction format for 001_pgbench_log_2' > [09:15:01.794] # at > /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl line > 1247. > [09:15:01.794] # Failed test 'transaction count for > /tmp/cirrus-ci-build/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_log_3.25193 > (11)' > [09:15:01.794] # at > /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl line > 1257. > [09:15:01.794] # Failed test 'transaction format for 001_pgbench_log_3' > [09:15:01.794] # at > /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl line > 1257. > [09:15:01.794] # Looks like you failed 3 tests of 439. > [09:15:01.794] > [09:15:01.794] (test program exited with status code 3) > > [1] - https://cirrus-ci.com/task/5139049757802496 > > Thanks and regards > Shlok Kyal
At 2023-11-30 11:29:15 +0100, gabriele.bartolini@enterprisedb.com wrote: > > With the attached patch, I extend the partitioning capability to the > pgbench_history table too. > > I have been thinking of adding an option to control this, but I preferred > to ask in this list whether it really makes sense or not (I struggle indeed > to see use cases where accounts is partitioned and history is not). I don't have a strong opinion about this, but I also can't think of a reason to want to create partitions for pgbench_accounts but leave out pgbench_history. > From ba8f507b126a9c5bd22dd40bb8ce0c1f0c43ac59 Mon Sep 17 00:00:00 2001 > From: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> > Date: Thu, 30 Nov 2023 11:02:39 +0100 > Subject: [PATCH] Include pgbench_history in partitioning method for pgbench > > In case partitioning, make sure that pgbench_history is also partitioned with > the same criteria. I think "If partitioning" or "If we're creating partitions" would read better here. Also, same criteria as what? Maybe you could just add "as pgbench_accounts" to the end. > diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml > index 05d3f81619..4c02d2a61d 100644 > --- a/doc/src/sgml/ref/pgbench.sgml > +++ b/doc/src/sgml/ref/pgbench.sgml > […] > @@ -378,9 +378,9 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d > <term><option>--partitions=<replaceable>NUM</replaceable></option></term> > <listitem> > <para> > - Create a partitioned <literal>pgbench_accounts</literal> table with > - <replaceable>NUM</replaceable> partitions of nearly equal size for > - the scaled number of accounts. > + Create partitioned <literal>pgbench_accounts</literal> and <literal>pgbench_history</literal> > + tables with <replaceable>NUM</replaceable> partitions of nearly equal size for > + the scaled number of accounts - and future history records. > Default is <literal>0</literal>, meaning no partitioning. > </para> I would just leave out the "-" and write "number of accounts and future history records". > diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c > index 2e1650d0ad..87adaf4d8f 100644 > --- a/src/bin/pgbench/pgbench.c > +++ b/src/bin/pgbench/pgbench.c > […] > @@ -889,8 +889,10 @@ usage(void) > " --index-tablespace=TABLESPACE\n" > " create indexes in the specified tablespace\n" > " --partition-method=(range|hash)\n" > - " partition pgbench_accounts with this method (default: range)\n" > - " --partitions=NUM partition pgbench_accounts into NUM parts (default: 0)\n" > + " partition pgbench_accounts and pgbench_history with this method" > + " (default: range)." > + " --partitions=NUM partition pgbench_accounts and pgbench_history into NUM parts" > + " (default: 0)\n" > " --tablespace=TABLESPACE create tables in the specified tablespace\n" > " --unlogged-tables create tables as unlogged tables\n" > "\nOptions to select what to run:\n" There's a missing newline after "(default: range).". I read through the rest of the patch closely. It looks fine to me. It applies, builds, and does create the partitions as intended. -- Abhijit
Hi Abhijit,
Thanks for your input. Please accept my updated patch.
Ciao,
Gabriele
On Tue, 16 Jan 2024 at 12:53, Abhijit Menon-Sen <ams@toroid.org> wrote:
At 2023-11-30 11:29:15 +0100, gabriele.bartolini@enterprisedb.com wrote:
>
> With the attached patch, I extend the partitioning capability to the
> pgbench_history table too.
>
> I have been thinking of adding an option to control this, but I preferred
> to ask in this list whether it really makes sense or not (I struggle indeed
> to see use cases where accounts is partitioned and history is not).
I don't have a strong opinion about this, but I also can't think of a
reason to want to create partitions for pgbench_accounts but leave out
pgbench_history.
> From ba8f507b126a9c5bd22dd40bb8ce0c1f0c43ac59 Mon Sep 17 00:00:00 2001
> From: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
> Date: Thu, 30 Nov 2023 11:02:39 +0100
> Subject: [PATCH] Include pgbench_history in partitioning method for pgbench
>
> In case partitioning, make sure that pgbench_history is also partitioned with
> the same criteria.
I think "If partitioning" or "If we're creating partitions" would read
better here. Also, same criteria as what? Maybe you could just add "as
pgbench_accounts" to the end.
> diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
> index 05d3f81619..4c02d2a61d 100644
> --- a/doc/src/sgml/ref/pgbench.sgml
> +++ b/doc/src/sgml/ref/pgbench.sgml
> […]
> @@ -378,9 +378,9 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
> <term><option>--partitions=<replaceable>NUM</replaceable></option></term>
> <listitem>
> <para>
> - Create a partitioned <literal>pgbench_accounts</literal> table with
> - <replaceable>NUM</replaceable> partitions of nearly equal size for
> - the scaled number of accounts.
> + Create partitioned <literal>pgbench_accounts</literal> and <literal>pgbench_history</literal>
> + tables with <replaceable>NUM</replaceable> partitions of nearly equal size for
> + the scaled number of accounts - and future history records.
> Default is <literal>0</literal>, meaning no partitioning.
> </para>
I would just leave out the "-" and write "number of accounts and future
history records".
> diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
> index 2e1650d0ad..87adaf4d8f 100644
> --- a/src/bin/pgbench/pgbench.c
> +++ b/src/bin/pgbench/pgbench.c
> […]
> @@ -889,8 +889,10 @@ usage(void)
> " --index-tablespace=TABLESPACE\n"
> " create indexes in the specified tablespace\n"
> " --partition-method=(range|hash)\n"
> - " partition pgbench_accounts with this method (default: range)\n"
> - " --partitions=NUM partition pgbench_accounts into NUM parts (default: 0)\n"
> + " partition pgbench_accounts and pgbench_history with this method"
> + " (default: range)."
> + " --partitions=NUM partition pgbench_accounts and pgbench_history into NUM parts"
> + " (default: 0)\n"
> " --tablespace=TABLESPACE create tables in the specified tablespace\n"
> " --unlogged-tables create tables as unlogged tables\n"
> "\nOptions to select what to run:\n"
There's a missing newline after "(default: range).".
I read through the rest of the patch closely. It looks fine to me. It
applies, builds, and does create the partitions as intended.
-- Abhijit
Attachment
Hello Gabriele, I think the improvement makes sense (it's indeed a bit strange to not partition the history table), and the patch looks good. I did think about whether this should be optional in some way - that is, separate from partitioning the accounts table, and users would have to explicitly enable (or disable) it. But I don't think we need to do that. The vast majority of users simply want to partition everything. And this is just one way to partition the database anyway, it's our opinion on how to do that, but there's many other options how we might partition the tables, and we don't (and don't want too) have options for that. The only case that I can think of where this might matter is when running a benchmarks that will be compared to some earlier results (executed using an older pgbench version). That will be affected by this, but I don't think we make many promises about compatibility in this regard ... it's probably better to always compare results only from the same pgbench version, I guess. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Feb 16, 2024 at 12:50 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Hello Gabriele, > > I think the improvement makes sense (it's indeed a bit strange to not > partition the history table), and the patch looks good. > > I did think about whether this should be optional in some way - that is, > separate from partitioning the accounts table, and users would have to > explicitly enable (or disable) it. But I don't think we need to do that. > > The vast majority of users simply want to partition everything. And this > is just one way to partition the database anyway, it's our opinion on > how to do that, but there's many other options how we might partition > the tables, and we don't (and don't want too) have options for that. I wonder how common it would be to partition a history table by account ID? I sort of imagined the most common kind of partitioning for an audit table is by time (range). Anyway, I'm not objecting to doing it by account ID, just asking if there is a reason to do so. Speaking of which, Tomas said users might want to "partition everything" -- so any reason not to also partition tellers and branches? This change to the docs seems a bit misleading: <listitem> <para> - Create a partitioned <literal>pgbench_accounts</literal> table with - <replaceable>NUM</replaceable> partitions of nearly equal size for - the scaled number of accounts. + Create partitioned <literal>pgbench_accounts</literal> and <literal>pgbench_history</literal> + tables with <replaceable>NUM</replaceable> partitions of nearly equal size for + the scaled number of accounts and future history records. Default is <literal>0</literal>, meaning no partitioning. </para> </listitem> It says that partitions of "future history records" will be equal in size. While it's true that at the end of a pgbench run, if you use a random distribution for aid, the pgbench_history partitions should be roughly equally sized, it is confusing to say it will "create pgbench_history with partitions of equal size". Maybe it would be better to write a new sentence about partitioning pgbench_history without worrying about mirroring the sentence structure of the existing sentence. > The only case that I can think of where this might matter is when > running a benchmarks that will be compared to some earlier results > (executed using an older pgbench version). That will be affected by > this, but I don't think we make many promises about compatibility in > this regard ... it's probably better to always compare results only from > the same pgbench version, I guess. As a frequent pgbench user, I always use the same pgbench version even when comparing different versions of Postgres. Other changes have made it difficult to compare results across pgbench versions without providing it as an option (see 06ba4a63b85e). So, I don't think it is a problem if it is noted in release notes. - Melanie
On Fri, Feb 16, 2024 at 3:14 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > [ review comments ] Since there has been no response to these review comments for more than 3 months, I have set https://commitfest.postgresql.org/48/4679/ to Returned with Feedback. Please feel free to update the status when there is a new version of the patch (or at least a response to these comments). -- Robert Haas EDB: http://www.enterprisedb.com