Home > mailing lists

Re: Skip partition tuple routing with constant partition key - Mailing list pgsql-hackers

From	Amit Langote
Subject	Re: Skip partition tuple routing with constant partition key
Date	March 25, 2022 06:22:39
Msg-id	CA+HiwqGEj3EoK3FgSc9w19Kp++w9sNJOAKmoG0_hbz09ZmBYMg@mail.gmail.com Whole thread Raw
In response to	Re: Skip partition tuple routing with constant partition key (Zhihong Yu <zyu@yugabyte.com>)
Responses	Re: Skip partition tuple routing with constant partition key (Greg Stark <stark@mit.edu>) Re: Skip partition tuple routing with constant partition key (David Rowley <dgrowleyml@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, Mar 24, 2022 at 1:55 AM Zhihong Yu <zyu@yugabyte.com> wrote:
> On Wed, Mar 23, 2022 at 5:52 AM Amit Langote <amitlangote09@gmail.com> wrote:
>> I've attached an updated version of the patch, though I haven't
>> changed the threshold constant.
> + * Threshold of the number of tuples to need to have been processed before
> + * maybe_cache_partition_bound_offset() (re-)assesses whether caching must be
>
> The first part of the comment should be:
>
> Threshold of the number of tuples which need to have been processed

Sounds the same to me, so leaving it as it is.

> +       (double) pd->n_tups_inserted / pd->n_offset_changed > 1)
>
> I think division can be avoided - the condition can be written as:
>
>   pd->n_tups_inserted > pd->n_offset_changed
>
> +           /* Check if the value is below the high bound */
>
> high bound -> upper bound

Both done, thanks.

In the attached updated patch, I've also lowered the threshold number
of tuples to wait before re-enabling caching from 1000 down to 10.
AFAICT, it only makes things better for the cases in which the
proposed caching is supposed to help, while not affecting the cases in
which caching might actually make things worse.

I've repeated the benchmark mentioned in [1]:

-- creates a range-partitioned table with 1000 partitions
create unlogged table foo (a int) partition by range (a);
select 'create unlogged table foo_' || i || ' partition of foo for
values from (' || (i-1)*100000+1 || ') to (' || i*100000+1 || ');'
from generate_series(1, 1000) i;
\gexec

-- generates a 100 million record file
copy (select generate_series(1, 100000000)) to '/tmp/100m.csv' csv;

HEAD:

postgres=# copy foo from '/tmp/100m.csv' csv; truncate foo;
COPY 100000000
Time: 39445.421 ms (00:39.445)
TRUNCATE TABLE
Time: 381.570 ms
postgres=# copy foo from '/tmp/100m.csv' csv; truncate foo;
COPY 100000000
Time: 38779.235 ms (00:38.779)

Patched:

postgres=# copy foo from '/tmp/100m.csv' csv; truncate foo;
COPY 100000000
Time: 33136.202 ms (00:33.136)
TRUNCATE TABLE
Time: 394.939 ms
postgres=# copy foo from '/tmp/100m.csv' csv; truncate foo;
COPY 100000000
Time: 33914.856 ms (00:33.915)
TRUNCATE TABLE
Time: 407.451 ms

So roughly, 38 seconds with HEAD vs. 33 seconds with the patch applied.

(Curiously, the numbers with both HEAD and patched look worse this
time around, because they were 31 seconds with HEAD vs. 26 seconds
with patched back in May 2021.  Unless that's measurement noise, maybe
something to look into.)

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CA%2BHiwqFbMSLDMinPRsGQVn_gfb-bMy0J2z_rZ0-b9kSfxXF%2BAg%40mail.gmail.com

Attachment

v10-0001-Optimze-get_partition_for_tuple-by-caching-bound.patch

pgsql-hackers by date:

From: "kuroda.hayato@fujitsu.com"
Date: 25 March 2022, 06:20:18
Subject: RE: Logical replication timeout problem

From: Michael Paquier
Date: 25 March 2022, 06:27:24
Subject: Re: Assert in pageinspect with NULL pages

Re: Skip partition tuple routing with constant partition key - Mailing list pgsql-hackers

Attachment

Previous

Next