Re: pgbench - allow to create partitioned tables - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench - allow to create partitioned tables
Date
Msg-id alpine.DEB.2.21.1909100818130.1895@lancre
Whole thread Raw
In response to Re: pgbench - allow to create partitioned tables  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: pgbench - allow to create partitioned tables
Re: pgbench - allow to create partitioned tables
List pgsql-hackers
Hello Amit,

Thanks for the feedback.

> + case 11: /* partitions */
> + initialization_option_set = true;
> + partitions = atoi(optarg);
> + if (partitions < 0)
> + {
> + fprintf(stderr, "invalid number of partitions: \"%s\"\n",
> + optarg);
> + exit(1);
> + }
> + break;
>
> Is there a reason why we treat "partitions = 0" as a valid value?

Yes. It is an explicit "do not create partitioned tables", which differ 
from 1 which says "create a partitionned table with just one partition".

> Also, shouldn't we keep some max limit for this parameter as well?

I do not think so. If someone wants to test how terrible it is to use 
100000 partitions, we should not prevent it.

> Forex. how realistic it will be if the user gives the value of
> partitions the same or greater than the number of rows in
> pgbench_accounts table?

Although I agree that it does not make much sense, for testing purposes 
why not, to test overheads in critical cases for instance.

> I understand it is not sensible to give such a value, but I guess the 
> API should behave sanely in such cases as well.

Yep, it should work.

> I am not sure what will be the good max value for it, but I
> think there should be one.

I disagree. Pgbench is a tool for testing performance for given 
parameters. If postgres accepts a parameter there is no reason why pgbench 
should reject it.

> @@ -3625,6 +3644,7 @@ initCreateTables(PGconn *con)
>  const char *bigcols; /* column decls if accountIDs are 64 bits */
>  int declare_fillfactor;
>  };
> +
>  static const struct ddlinfo DDLs[] = {
>
> Spurious line change.

Indeed.

> *
> +    "  --partitions=NUM         partition account table in NUM parts
> (defaults: 0)\n"
> +    "  --partition-method=(range|hash)\n"
> +    "                           partition account table with this
> method (default: range)\n"
>
> Refer complete table name like pgbench_accounts instead of just account. 
> It will be clear and in sync with what we display in some other options 
> like --skip-some-updates.

Ok.

> *
> +    "  --partitions=NUM         partition account table in NUM parts
> (defaults: 0)\n"
>
> /defaults/default.

Ok.

> I think we should print the information about partitions in
> printResults.  It can help users while analyzing results.

Hmmm. Why not, with some hocus-pocus to get the information out of 
pg_catalog, and trying to fail gracefully so that if pgbench is run 
against a no partitioning-support version.

> *
> +enum { PART_NONE, PART_RANGE, PART_HASH }
> + partition_method = PART_NONE;
> +
>
> I think it is better to follow the style of QueryMode enum by using
> typedef here, that will make look code in sync with nearby code.

Hmmm. Why not. This means inventing a used-once type name for 
partition_method. My great creativity lead to partition_method_t.

> *
> - int i;
>
>  fprintf(stderr, "creating tables...\n");
>
> - for (i = 0; i < lengthof(DDLs); i++)
> + for (int i = 0; i < lengthof(DDLs); i++)
>
> This is unnecessary change as far as this patch is concerned.  I
> understand there is no problem in writing either way, but let's not
> change the coding pattern here as part of this patch.

The reason I did that is that I had a stupid bug in a development version 
which was due to an accidental reuse of this index, which would have been 
prevented by this declaration style. Removed anyway.

> + if (partitions >= 1)
> + {
> + int64 part_size = (naccounts * (int64) scale + partitions - 1) / partitions;
> + char ff[64];
> + ff[0] = '\0';
> + append_fillfactor(ff, sizeof(ff));
> +
> + fprintf(stderr, "creating %d partitions...\n", partitions);
> +
> + for (int p = 1; p <= partitions; p++)
> + {
> + char query[256];
> +
> + if (partition_method == PART_RANGE)
> + {
>
> part_size can be defined inside "if (partition_method == PART_RANGE)"
> as it is used here.

I just wanted to avoid recomputing the value in the loop, but indeed it 
may be computed needlessly. Moved.

> In general, this part of the code can use some comments.

Ok.

Attached an updated version.

-- 
Fabien.
Attachment

pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Next
From: Alvaro Herrera from 2ndQuadrant
Date:
Subject: Re: logical decoding : exceeded maxAllocatedDescs for .spill files