Re: Parallel heap vacuum - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Parallel heap vacuum |
Date | |
Msg-id | 64197238-b57d-42d4-aca4-5d482b9dc311@vondra.me Whole thread Raw |
In response to | Parallel heap vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
RE: Parallel heap vacuum
Re: Parallel heap vacuum |
List | pgsql-hackers |
Hi, Thanks for working on this. I took a quick look at this today, to do some basic review. I plan to do a bunch of testing, but that's mostly to get a better idea of what kind of improvements to expect - the initial results look quite nice and sensible. A couple basic comments: 1) I really like the idea of introducing "compute_workers" callback to the heap AM interface. I faced a similar issue with calculating workers for index builds, because right now plan_create_index_workers is doing that the logic works for btree, but really not brin etc. It didn't occur to me we might make this part of the index AM ... 2) I find it a bit weird vacuumlazy.c needs to include optimizer/paths.h because it really has nothing to do with planning / paths. I realize it needs the min_parallel_table_scan_size, but it doesn't seem right. I guess it's a sign this bit of code (calculating parallel workers based on log of relation size) should in some "shared" location. 3) The difference in naming ParallelVacuumState vs. PHVState is a bit weird. I suggest ParallelIndexVacuumState and ParallelHeapVacuumState to make it consistent and clear. 4) I think it would be good to have some sort of README explaining how the parallel heap vacuum works, i.e. how it's driven by FSM. Took me a while to realize how the workers coordinate which blocks to scan. 5) Wouldn't it be better to introduce the scan_stats (grouping some of the fields in a separate patch)? Seems entirely independent from the parallel part, so doing it separately would make it easier to review. Also, maybe reference the fields through scan_stats->x, instead of through vacrel->scan_stats->x, when there's the pointer. 6) Is it a good idea to move NewRelfrozenXID/... to the scan_stats? AFAIK it's not a statistic, it's actually a parameter affecting the decisions, right? 7) I find it a bit strange that heap_vac_scan_next_block() needs to check if it's a parallel scan, and redirect to the parallel callback. I mean, shouldn't the caller know which callback to invoke? Why should the serial callback care about this? 8) It's not clear to me why do_lazy_scan_heap() needs to "advertise" the current block. Can you explain? 9) I'm a bit confused why the code checks IsParallelWorker() in so many places. Doesn't that mean the leader can't participate in the vacuum like a regular worker? 10) I'm not quite sure I understand the comments at the end of do_lazy_scan_heap - it says "do a cycle of vacuuming" but I guess that means "index vacuuming", right? And then it says "pause without invoking index and heap vacuuming" but isn't the whole point of this block to do that cleanup so that the TidStore can be discarded? Maybe I just don't understand how the work is divided between the leader and workers ... 11) Why does GlobalVisState need to move to snapmgr.h? If I undo this the patch still builds fine for me. thanks -- Tomas Vondra
pgsql-hackers by date: