Re: Use streaming read I/O in BRIN vacuuming - Mailing list pgsql-hackers
From | Arseniy Mukhin |
---|---|
Subject | Re: Use streaming read I/O in BRIN vacuuming |
Date | |
Msg-id | CAE7r3MLCU68tKC0-3d1iKV-3mYpDy5O7ireaFQsAOsyJEFtPXQ@mail.gmail.com Whole thread Raw |
In response to | Re: Use streaming read I/O in BRIN vacuuming (Nazir Bilal Yavuz <byavuz81@gmail.com>) |
List | pgsql-hackers |
Hi, On Thu, Oct 9, 2025 at 2:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sun, Aug 31, 2025 at 12:48 PM Arseniy Mukhin > <arseniy.mukhin.dev@gmail.com> wrote: > > > > Hi! > > > > On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > > > > Hi! > > > > > > > On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote: > > > > > > > > PFA the patch that migrates BRIN vacuum to the read stream API. > > > > > > The patch is nice and straightforward. Looks good to me. > > > > > > > Thank you for the review! > > > > > Some notes that do not seem to me problem of this patch: > > > 1. This comment is copied 7 times already across codebase. > > > "It is safe to use batchmode as block_range_read_stream_cb" > > > Maybe we can refactor comments and function names... > > > > Yes, I had similar thoughts, but having these comments at callsites > > has its own benefits, there is a thread about these comments [0]... > > > > > 2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everythingis scanned. Unlike other index vacuums, see btvacuumscan() for example. > > > > If I understand correctly, in other access methods you need to be sure > > that you read the relation up to the end, so you don't leave any index > > tuples that should be pruned. BRIN doesn't have a prune phase, there > > is only a cleanup phase. So it seems it's not a big deal if you miss > > several pages that were allocated during the vacuum. > > > > Thank you for proposing the patch! I've reviewed the patch and have > some comments: Thank you for the review! > > + stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE | > + READ_STREAM_FULL | > + READ_STREAM_SEQUENTIAL | > + READ_STREAM_USE_BATCHING, > + strategy, > + idxrel, > + MAIN_FORKNUM, > + block_range_read_stream_cb, > + &p, > + 0); > > Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are > some reasons to use it, we should leave comments there. Good point, thank you. I looked again at the usage of the READ_STREAM_SEQUENTIAL flag, and I'm leaning toward not using it here. But I'm not completely sure, so I decided to ask about the flag usage in the thread [0]. > > --- > Have you done any performance testing with this patch? Since BRIN > indexes typically don't grow very large, I'm curious how much benefit > the read_stream provides for BRIN index cleanup. Yeah, BRIN is known as a compact index, so these changes should have less impact, but I thought it should still be useful, just on a smaller scale. I didn't run any measurements, as the changes are similar to those made to other index AMs, where this was considered an improvement. But after your question, I decided to try running some benchmarks. ------------------------------------------ Performance testing ------------------------------------------ Setup & data (fill_data.sql) shared_buffers = 1Gb Table size: ~970Mb Index size: ~30Mb (BRIN index artificially was made quite big, but it lets us see patch impact better). Test cases 1) Cold data - restart db, drop page cache, table seqscan, run vacuum. This way we read index data from the disk (restart_with_cache_drop.sh) 2) Page cache - restart db, table seqscan, run vacuum. This way we read index data from the page cache (restart_with_wo_cache_drop.sh) 3) Shared buffers - run vacuum with data in shared buffers (use query with bitmap scan to warm cache) (vacuum_cached.sh) Every io_method was used with every test case. I also tried to remove READ_STREAM_SEQUENTIAL and measured it too (it's called patched_wo_seq). What were measured: 1) Duration of brin_vacuum_scan (without FreeSpaceMapVacuum(idxrel)) 2) Avg read rate from vacuum report. Index has 4260 pages, and the vacuum report says 4300 pages were read, so it looks like almost all read rate is from brin_vacuum_scan(). Results --- Cold data --- Method | Version | Duration (ms) | Throughput (MB/s) ---------------------------------------------------------------------- io_uring | master | 14.6 ± 0.8 | 2068.1 ± 127.6 | patched | 9.9 ± 1.3 | 2947.3 ± 363.0 | patched_wo_seq | 10.5 ± 1.5 | 2817.5 ± 359.7 sync | master | 16.1 ± 1.9 | 1921.6 ± 211.6 | patched | 15.4 ± 1.6 | 1985.9 ± 182.3 | patched_wo_seq | 13.9 ± 1.2 | 2155.7 ± 180.4 worker | master | 16.4 ± 1.6 | 1850.0 ± 168.6 | patched | 9.2 ± 0.5 | 3077.3 ± 154.3 | patched_wo_seq | 9.0 ± 0.5 | 3133.3 ± 158.1 --- Page cache --- Method | Version | Duration (ms) | Throughput (MB/s) ---------------------------------------------------------------------- io_uring | master | 10.4 ± 1.7 | 2971.3 ± 469.6 | patched | 7.3 ± 1.7 | 4152.7 ± 910.6 | patched_wo_seq | 7.0 ± 1.4 | 4208.0 ± 742.2 sync | master | 10.4 ± 2.3 | 3037.9 ± 655.6 | patched | 9.2 ± 1.5 | 3319.8 ± 646.7 | patched_wo_seq | 9.1 ± 1.6 | 3384.7 ± 674.9 worker | master | 10.2 ± 1.7 | 3013.4 ± 490.0 | patched | 3.0 ± 0.2 | 7878.6 ± 540.3 | patched_wo_seq | 3.0 ± 0.3 | 8127.3 ± 1093.7 --- Shared buffers --- Method | Version | Duration (ms) | Throughput (MB/s) ---------------------------------------------------------------------- io_uring | master | 3.1 ± 0.3 | 0.0 ± 0.0 | patched | 3.2 ± 0.5 | 0.0 ± 0.0 | patched_wo_seq | 3.3 ± 0.4 | 0.0 ± 0.0 sync | master | 3.1 ± 0.4 | 0.0 ± 0.0 | patched | 3.3 ± 0.5 | 0.0 ± 0.0 | patched_wo_seq | 3.1 ± 0.5 | 0.0 ± 0.0 worker | master | 3.2 ± 0.4 | 0.0 ± 0.0 | patched | 3.4 ± 0.3 | 0.0 ± 0.0 | patched_wo_seq | 3.6 ± 0.3 | 0.0 ± 0.0 Looks like we have a win for 'cold data' and 'page cache' cases with worker and io_uring, and some modest improvement for sync. READ_STREAM_SEQUENTIAL doesn't seem to change anything. I don't have much benchmarking experience, so hope I didn't make any bad mistakes.I'm new to benchmarking, so I hope I haven't made any serious mistakes. Thank you! [0] https://www.postgresql.org/message-id/CAE7r3MJ5NHb1BMo1oCNWmfrG%3DYtx-GKX44YNdA21FuQQQeq_Qg%40mail.gmail.com Best regards, Arseniy Mukhin
Attachment
pgsql-hackers by date: