Benchmark function for uuidv7() - Mailing list pgsql-hackers
From | Sergey Prokhorenko |
---|---|
Subject | Benchmark function for uuidv7() |
Date | |
Msg-id | 1756972323.5090422.1733344622055@mail.yahoo.com Whole thread Raw |
In response to | Re: Отв.: Re: UUID v7 (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Friday 29 November 2024 at 08:55:09 pm GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Nov 29, 2024 at 5:59 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
>
>
>
> Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au
>
>
> On Friday 29 November 2024 at 09:19:33 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
> On Thu, Nov 28, 2024 at 8:13 PM Sergey Prokhorenko
>
> <sergeyprokhorenko@yahoo.com.au> wrote:
> >
> > I mean to add not benchmark results to the patch, but functions so that everyone can compare themselves on their equipment. The comparison with UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and an integer key. And I have described many use cases, and in your benchmark there is only one, the simplest.
>
>
> I don't think we should add such benchmark functions at least to this
> patch. If there already is a well-established workload using UUIDv7
> and UUIDv4 etc, users can use pgbench with custom scripts, or it might
> make sense to add it to pgbench as a built-in workload. Which however
> should be a separate patch. Having said that, I think users should use
> benchmarks that fit their workloads, and it would not be easy to
> establish workloads that are reasonable for most systems.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
>
>
>
>
>
>
> Workloads can and must be added with parameters. Typically, companies use test tables of 10,000 and 1,000,000 records, etc. Different companies have mostly similar usage scenarios (for example, incremental loading). Each company has to duplicate the work of others, creating the same benchmarks. The worst thing is that this is entrusted to incompetent employees who are not very good at understanding typical key usage scenarios. As a rule, these are programmers, not system analysts. Accordingly, the solution in 99% of cases will be in favor of integer keys, as they take up less space and are generated faster. If we leave this problem until the next patch, it will take us a year and a half. This is completely wrong.
There are still 4 months left until the feature freeze. We can discuss
this topic and might find solutions. I don't think it's a blocker of
this patch (UUIDv7 implementation patch).
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
_________________________________________________________________________________________
_________________________________________________________________________________________
Regards,
Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au
<sergeyprokhorenko@yahoo.com.au> wrote:
>
>
>
> Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au
>
>
> On Friday 29 November 2024 at 09:19:33 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
> On Thu, Nov 28, 2024 at 8:13 PM Sergey Prokhorenko
>
> <sergeyprokhorenko@yahoo.com.au> wrote:
> >
> > I mean to add not benchmark results to the patch, but functions so that everyone can compare themselves on their equipment. The comparison with UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and an integer key. And I have described many use cases, and in your benchmark there is only one, the simplest.
>
>
> I don't think we should add such benchmark functions at least to this
> patch. If there already is a well-established workload using UUIDv7
> and UUIDv4 etc, users can use pgbench with custom scripts, or it might
> make sense to add it to pgbench as a built-in workload. Which however
> should be a separate patch. Having said that, I think users should use
> benchmarks that fit their workloads, and it would not be easy to
> establish workloads that are reasonable for most systems.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
>
>
>
>
>
>
> Workloads can and must be added with parameters. Typically, companies use test tables of 10,000 and 1,000,000 records, etc. Different companies have mostly similar usage scenarios (for example, incremental loading). Each company has to duplicate the work of others, creating the same benchmarks. The worst thing is that this is entrusted to incompetent employees who are not very good at understanding typical key usage scenarios. As a rule, these are programmers, not system analysts. Accordingly, the solution in 99% of cases will be in favor of integer keys, as they take up less space and are generated faster. If we leave this problem until the next patch, it will take us a year and a half. This is completely wrong.
There are still 4 months left until the feature freeze. We can discuss
this topic and might find solutions. I don't think it's a blocker of
this patch (UUIDv7 implementation patch).
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
_________________________________________________________________________________________
_________________________________________________________________________________________
I am not a programmer, but a systems analyst. Therefore, I cannot develop a function for the benchmark myself, but I can describe the requirements for this function.
The function for the benchmark could be implemented as a separate patch (not UUIDv7 implementation patch) to avoid blocking the UUIDv7 implementation patch.
Requirements for the uuidv7() benchmark function
1. Benchmark function
The function for uuidv7() benchmark should be called uuidv7_benchmark().
This function should have parameters:
• optional parameter mock_table_record_count. If the parameter is not passed, then the value of 1 million records is taken
• optional parameter payload_size_b. If the parameter is not passed, then the value of 2048 bytes is taken
2. Benchmark results table
The result of each benchmark step for a certain surrogate key type should be dumped into a separate row of the uuidv7_benchmark_results table. This table should contain the following columns:
• benchmark_start_datetime
• mock_table_record_count
• step_name
• surrogate_key_type
• rate_per_ms (nullable)
• cpu_usage_ percent (nullable)
• memory_usage_mb (nullable)
• drive_usage_mb_per_s (nullable)
The table rows should be sequentially sorted by the columns benchmark_start_datetime, step_name, surrogate_key_type.
Sample benchmark results table:
| benchmark_start_datetime | mock_table_record_count | step_name | surrogate_key_type | rate_per_ms | cpu_usage \_ percent | memory_usage_mb | drive_usage_mb_per_s |
| ------------------------ | ----------------------- | ----------------- | ------------------ | ----------- | -------------------- | --------------- | -------------------- |
| | | 1_insert | UUIDv7 | | | | |
| | | 1_insert | BIGSERIAL | | | | |
| | | 1_insert | UUIDv4 | | | | |
| | | 2_parallel_insert | UUIDv7 | | | | |
| | | 2_parallel_insert | BIGSERIAL | | | | |
| | | 2_parallel_insert | UUIDv4 | | | | |
| | | 3_left_join | UUIDv7 | | | | |
| | | 3_left_join | BIGSERIAL | | | | |
| | | 3_left_join | UUIDv4 | | | | |
| | | … | … | | | | |
3. Mock table
Each record in the mock_table table must contain the following columns:
• id with the UUID data type, PRIMARY KEY (indexed)
• payload with the bytea data type
4. Compared surrogate key types
Each benchmark step is run sequentially with the following surrogate key types in the mock_table.id column:
• UUIDv7
• BIGSERIAL
• The name of the function for generating surrogate keys (for example, one of the formats: UUIDv4, ULID or Snowflake ID), if the user specifies such a function and makes it available
5. Benchmark steps
Before running the benchmark steps, a mock table is created.
Pseudocode:
CREATE TABLE mock_table (
id UUID PRIMARY KEY DEFAULT uuidv7(),
payload BYTEA
);
The benchmark must have the following steps:
step_name = '1_insert'
Pseudocode:
INSERT INTO mock_table (payload)
SELECT filled_payload(payload_size_b)
FROM generate_series(1, mock_table_record_count);
step_name = '2_parallel_insert'
The algorithm is at the discretion of the developer.
step_name = '3_left_join'
Pseudocode:
SELECT COUNT(*)
FROM mock_table a
LEFT JOIN mock_table b ON b.id = a.id
WHERE b.id IS NULL;
step_name = '4_inner_join'
Pseudocode:
SELECT COUNT(*)
FROM mock_table a
INNER JOIN mock_table b ON b.id = a.id
WHERE b.id IS NULL;
step_name = '5_group_by'
Pseudocode:
SELECT id, COUNT(*)
FROM mock_table
GROUP BY id
HAVING COUNT(*) > 1;
step_name = '6_delete'
Pseudocode:
DELETE FROM mock_table a
USING mock_table b
WHERE b.id = a.id;
Regards,
Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au
pgsql-hackers by date: