RE: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From k.jamison@fujitsu.com
Subject RE: [Patch] Optimize dropping of relation buffers using dlist
Date
Msg-id OSBPR01MB23413F14ED6B2D0D007698F4EFED0@OSBPR01MB2341.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [Patch] Optimize dropping of relation buffers using dlist  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses RE: [Patch] Optimize dropping of relation buffers using dlist  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
List pgsql-hackers
On Thursday, October 22, 2020 3:15 PM, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> I'm not sure about the exact steps of the test, but it can be expected if we
> have many small relations to truncate.
>
> Currently BUF_DROP_FULL_SCAN_THRESHOLD is set to Nbuffers / 512,
> which is quite arbitrary that comes from a wild guess.
>
> Perhaps we need to run benchmarks that drops one relation with several
> different ratios between the number of buffers to-be-dropped and Nbuffers,
> and preferably both on spinning rust and SSD.

Sorry to get back to you on this just now.
Since we're prioritizing the vacuum patch, we also need to finalize which threshold value to use.
I proceeded testing with my latest set of patches because Amit-san's comments on the code, the ones we addressed,
don't really affect the performance. I'll post the updated patches for 0002 & 0003 after we come up with the proper
boolean parameter name for smgrnblocks and the buffer full scan threshold value.

Test the VACUUM performance with the following thresholds:
   NBuffers/512, NBuffers/256, NBuffers/128,
and determine which of the ratio has the best performance in terms of speed.

I tested this on my machine (CPU 4v, 8GB memory, ext4) running on SSD.
Configure streaming replication environment.
shared_buffers = 100GB
autovacuum = off
full_page_writes = off
checkpoint_timeout = 30min

[Steps]
1. Create TABLE
2. INSERT data
3. DELETE from TABLE
4. Pause WAL replay on standby
5. VACUUM. Stop the primary.
6. Resume WAL replay and promote standby.

With 1 relation, there were no significant changes that we can observe:
(In seconds)
| s_b   | Master | NBuffers/512 | NBuffers/256 | NBuffers/128 |
|-------|--------|--------------|--------------|--------------|
| 128MB | 0.106  | 0.105        | 0.105        | 0.105        |
| 100GB | 0.106  | 0.105        | 0.105        | 0.105        |

So I tested with 100 tables and got more convincing measurements:

| s_b   | Master | NBuffers/512 | NBuffers/256 | NBuffers/128 |
|-------|--------|--------------|--------------|--------------|
| 128MB | 1.006  | 1.007        | 1.006        | 0.107        |
| 1GB   | 0.706  | 0.606        | 0.606        | 0.605        |
| 20GB  | 1.907  | 0.606        | 0.606        | 0.605        |
| 100GB | 7.013  | 0.706        | 0.606        | 0.607        |

The threshold NBuffers/128 has the best performance for default shared_buffers (128MB)
with 0.107 s, and equally performing with large shared_buffers up to 100GB.

We can use NBuffers/128 for the threshold, although I don't have a measurement for HDD yet.
However, I wonder if the above method would suffice to determine the final threshold that we
can use. If anyone has suggestions on how we can come up with the final value, like if I need
to modify some steps above, I'd appreciate it.

Regarding the parameter name. Instead of accurate, we can use "cached" as originally intended from
the early versions of the patch since it is the smgr that handles smgrnblocks to get the the block
size of smgr_cached_nblocks.. "accurate" may confuse us because the cached value may not
be actually accurate..

Regards,
Kirk Jamison




pgsql-hackers by date:

Previous
From: Ajin Cherian
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: Nikhil Benesch
Date:
Subject: Re: Why does to_json take "anyelement" rather than "any"?