Home > mailing lists

[Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From	k.jamison@fujitsu.com
Subject	[Patch] Optimize dropping of relation buffers using dlist
Date	October 28, 2019 11:13:19
Msg-id	OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com Whole thread Raw
Responses	RE: [Patch] Optimize dropping of relation buffers using dlist
List	pgsql-hackers

Tree view

Hi,

Currently, we need to scan the WHOLE shared buffers when VACUUM

truncated off any empty pages at end of transaction or when relation

is TRUNCATEd.

As for our customer case, we periodically truncate thousands of tables,

and it's possible to TRUNCATE single table per transaction. This can be

problematic later on during recovery which could take longer, especially

when a sudden failover happens after those TRUNCATEs and when we

have to scan a large-sized shared buffer. In the performance test below,

it took almost 12.5 minutes for recovery to complete for 100GB shared

buffers. But we want to keep failover very short (within 10 seconds).

Previously, I made an improvement in speeding the truncates of relation

forks from 3 scans to one scan.[1] This time, the aim of this patch is

to further speedup the invalidation of pages, by linking the cached pages

of the target relation in a doubly-linked list and just traversing it

instead of scanning the whole shared buffers. In DropRelFileNodeBuffers,

we just get the number of target buffers to invalidate for the relation.

There is a significant win in this patch, because we were able to

complete failover and recover in 3 seconds more or less.

I performed similar tests to what I did in the speedup truncates of

relations forks.[1][2] However, this time using 100GB shared_buffers.

[Machine spec used in testing]

Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz

CPU: 16, Number of cores per socket: 8

RHEL6.5, Memory: 256GB++

[Test]

1. (Master) Create table (ex. 10,000 tables). Insert data to tables.

2. (Master) DELETE FROM TABLE (ex. all rows of 10,000 tables)

(Standby) To test with failover, pause the WAL replay on standby server.

(SELECT pg_wal_replay_pause();)

3. (M) psql -c "\timing on" (measures total execution of SQL queries)

4. (M) VACUUM (whole db)

5. (M) Stop primary server. pg_ctl stop -D $PGDATA -w

6. (S) Resume wal replay and promote standby.[2]

[Results]

A. HEAD (origin/master branch)

A1. Vacuum execution on Primary server

Time: 730932.408 ms (12:10.932) ~12min 11s

A2. Vacuum + Failover (WAL Recovery on Standby)

waiting for server to promote...........................

.................................... stopped waiting

pg_ctl: server did not promote in time

2019/10/25_12:13:09.692─┐

2019/10/25_12:25:43.576─┘

-->Total: 12min34s

B. PATCH

B1. Vacuum execution on Primary/Master

Time: 6.518333s = 6518.333 ms

B2. Vacuum + Failover (WAL Recovery on Standby)

2019/10/25_14:17:21.822

waiting for server to promote...... done

server promoted

2019/10/25_14:17:24.827

2019/10/25_14:17:24.833

-->Total: 3.011s

[Other Notes]

Maybe one disadvantage is that we can have a variable number of

relations, and allocated the same number of relation structures as

the size of shared buffers. I tried to reduce the use of memory when

doing hash table lookup operation by having a fixed size array (100)

or threshold of target buffers to invalidate.

When doing CachedBufLookup() to scan the count of each buffer in the

dlist, I made sure to reduce the number of scans (2x at most).

First, we scan the dlist of cached buffers of relations.

Then store the target buffers in buf_id_array. Non-target buffers

would be removed from dlist but added to temporary dlist.

After reaching end of main dlist, we append the temporary dlist to

tail of main dlist.

I also performed pgbench buffer test, and this patch did not cause

overhead to normal DB access performance.

Another one that I'd need feedback of is the use of new dlist operations

for this cached buffer list. I did not use in this patch the existing

Postgres dlist architecture (ilist.h) because I want to save memory space

as much as possible especially when NBuffers become large. Both dlist_node

& dlist_head are 16 bytes. OTOH, two int pointers for this patch is 8 bytes.

Hope to hear your feedback and comments.

Thanks in advance,

Kirk Jamison

[1] https://www.postgresql.org/message-id/flat/D09B13F772D2274BB348A310EE3027C64E2067%40g01jpexmbkw24

[2] https://www.postgresql.org/message-id/D09B13F772D2274BB348A310EE3027C6502672%40g01jpexmbkw24

Attachment

v1-Optimize-dropping-of-relation-buffers-using-dlist.patch

pgsql-hackers by date:

From: Michael Paquier
Date: 28 October 2019, 10:55:46
Subject: Re: update ALTER TABLE with ATTACH PARTITION lock mode

From: Amit Langote
Date: 28 October 2019, 11:16:54
Subject: Re: [PATCH] Do not use StdRdOptions in Access Methods

[Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

Attachment

Previous

Next