RE: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From k.jamison@fujitsu.com
Subject RE: [Patch] Optimize dropping of relation buffers using dlist
Date
Msg-id OSBPR01MB234176B1829AECFE9FDDFCC2EFE90@OSBPR01MB2341.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: [Patch] Optimize dropping of relation buffers using dlist  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
Responses Re: [Patch] Optimize dropping of relation buffers using dlist  (Amit Kapila <amit.kapila16@gmail.com>)
RE: [Patch] Optimize dropping of relation buffers using dlist  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
> From: k.jamison@fujitsu.com <k.jamison@fujitsu.com>
> On Thursday, October 22, 2020 3:15 PM, Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > I'm not sure about the exact steps of the test, but it can be expected
> > if we have many small relations to truncate.
> >
> > Currently BUF_DROP_FULL_SCAN_THRESHOLD is set to Nbuffers / 512,
> which
> > is quite arbitrary that comes from a wild guess.
> >
> > Perhaps we need to run benchmarks that drops one relation with several
> > different ratios between the number of buffers to-be-dropped and
> > Nbuffers, and preferably both on spinning rust and SSD.
>
> Sorry to get back to you on this just now.
> Since we're prioritizing the vacuum patch, we also need to finalize which
> threshold value to use.
> I proceeded testing with my latest set of patches because Amit-san's
> comments on the code, the ones we addressed, don't really affect the
> performance. I'll post the updated patches for 0002 & 0003 after we come up
> with the proper boolean parameter name for smgrnblocks and the buffer full
> scan threshold value.
>
> Test the VACUUM performance with the following thresholds:
>    NBuffers/512, NBuffers/256, NBuffers/128, and determine which of the
> ratio has the best performance in terms of speed.
>
> I tested this on my machine (CPU 4v, 8GB memory, ext4) running on SSD.
> Configure streaming replication environment.
> shared_buffers = 100GB
> autovacuum = off
> full_page_writes = off
> checkpoint_timeout = 30min
>
> [Steps]
> 1. Create TABLE
> 2. INSERT data
> 3. DELETE from TABLE
> 4. Pause WAL replay on standby
> 5. VACUUM. Stop the primary.
> 6. Resume WAL replay and promote standby.
>
> With 1 relation, there were no significant changes that we can observe:
> (In seconds)
> | s_b   | Master | NBuffers/512 | NBuffers/256 | NBuffers/128 |
> |-------|--------|--------------|--------------|--------------|
> | 128MB | 0.106  | 0.105        | 0.105        | 0.105        |
> | 100GB | 0.106  | 0.105        | 0.105        | 0.105        |
>
> So I tested with 100 tables and got more convincing measurements:
>
> | s_b   | Master | NBuffers/512 | NBuffers/256 | NBuffers/128 |
> |-------|--------|--------------|--------------|--------------|
> | 128MB | 1.006  | 1.007        | 1.006        | 0.107        |
> | 1GB   | 0.706  | 0.606        | 0.606        | 0.605        |
> | 20GB  | 1.907  | 0.606        | 0.606        | 0.605        |
> | 100GB | 7.013  | 0.706        | 0.606        | 0.607        |
>
> The threshold NBuffers/128 has the best performance for default
> shared_buffers (128MB) with 0.107 s, and equally performing with large
> shared_buffers up to 100GB.
>
> We can use NBuffers/128 for the threshold, although I don't have a
> measurement for HDD yet.
> However, I wonder if the above method would suffice to determine the final
> threshold that we can use. If anyone has suggestions on how we can come
> up with the final value, like if I need to modify some steps above, I'd
> appreciate it.
>
> Regarding the parameter name. Instead of accurate, we can use "cached" as
> originally intended from the early versions of the patch since it is the smgr
> that handles smgrnblocks to get the the block size of smgr_cached_nblocks..
> "accurate" may confuse us because the cached value may not be actually
> accurate..

Hi,

So I proceeded to update the patches using the "cached" parameter and updated
the corresponding comments to it in 0002.

I've addressed the suggestions and comments of Amit-san on 0003:
1. For readability, I moved the code block to a new static function FindAndDropRelFileNodeBuffers()
2. Initialize the bool cached with false.
3. It's also decided that we don't need the extra pre-checking of RelFileNode
when locking the bufhdr in FindAndDropRelFileNodeBuffers

I repeated the recovery performance test for vacuum. (I made a mistake previously in NBuffers/128)
The 3 kinds of thresholds are almost equally performant. I chose NBuffers/256 for this patch.

| s_b   | Master | NBuffers/512 | NBuffers/256 | NBuffers/128 |
|-------|--------|--------------|--------------|--------------|
| 128MB | 1.006  | 1.007        | 1.007        | 1.007        |
| 1GB   | 0.706  | 0.606        | 0.606        | 0.606        |
| 20GB  | 1.907  | 0.606        | 0.606        | 0.606        |
| 100GB | 7.013  | 0.706        | 0.606        | 0.606        |

Although we said that we'll prioritize vacuum optimization first, I've also updated the 0004 patch
(truncate optimization) which addresses the previous concern of slower truncate due to
redundant lookup of already-dropped buffers. In the new patch, we initially drop relation buffers
using the optimized DropRelFileNodeBuffers() if the buffers do not exceed the full-scan threshold,
then later on we drop the remaining buffers using full-scan.

Regards,
Kirk Jamison

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: logical streaming of xacts via test_decoding is broken
Next
From: Tom Lane
Date:
Subject: Re: remove spurious CREATE INDEX CONCURRENTLY wait