Re: Online checksums patch - once again - Mailing list pgsql-hackers
From | Daniel Gustafsson |
---|---|
Subject | Re: Online checksums patch - once again |
Date | |
Msg-id | A6B7AB7C-138B-4A95-8CA6-864CC326D067@yesql.se Whole thread Raw |
In response to | Re: Online checksums patch - once again (Justin Pryzby <pryzby@telsasoft.com>) |
List | pgsql-hackers |
> On 28 Jul 2020, at 04:33, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Thu, Jun 25, 2020 at 11:43:00AM +0200, Daniel Gustafsson wrote: >> The attached v19 fixes a few doc issues I had missed. > > + They can also be enabled or disabled at a later timne, either as an offline > => time Fixed. > + * awaiting shutdown, but we can continue turning off checksums anyway > => a waiting This was intentional, as it refers to the impending requested shutdown and not one which is blocked waiting. I've reworded the comment to make this clearer. > + * We are starting a checksumming process scratch, and need to start by > => FROM scratch Fixed. > + * to inprogress new relations will set relhaschecksums in pg_class so it > => inprogress COMMA Fixed. > + * Relation no longer exist. We don't consider this an error since > => exists Fixed. > + * so when the cluster comes back up processing will habe to be resumed. > => have Fixed. > + "completed one pass over all databases for checksum enabling, %i databases processed", > => I think this will be confusing to be hardcoded "one". It'll say "one" over > and over. Good point, I've reworded this based on the number of processed databases to indicate what will happen next. This call should probably be removed in case we merge and ship this feature, but it's handy for this part of the patch process. > + * still exist. > => exists Fixed. > In many places, you refer to "datachecksumsworker" (sums) but in nine places > you refer to datachecksumworker (sum). Good catch, they should all be using "sums". Fixed. > +ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy) > +{ > + BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum); > > => I think looping over numblocks is safe since new blocks are intended to be > written with checksum, right? Maybe it's good to say that here. Fixed. > + BlockNumber b; > > blknum will be easier to grep for Fixed. > + (errmsg("background worker \"datachecksumsworker\" starting for database oid %d", > => Should be %u or similar (several of these) Fixed. As per Roberts review downthread, these will be reworded in a future version. > It looks like you rewrite every page, even if it already has correct checksum, > to handle replicas. I wonder if it's possible/reasonable/good to skip pages > with correct checksum when wal_level=minimal ? That would AFAICT be possible, but I'm not sure it's worth adding that before the patch is deemed safe in its simpler form. I've added a comment to record this as a potential future optimization. > It looks like it's not possible to change the checksum delay while a checksum > worker is already running. That may be important to allow: 1) decreased delay > during slow periods; 2) increased delay if the process is significantly done, > but needs to be throttled to avoid disrupting production environment. > > Have you collaborated with Julien about this one? His patch adds new GUCs: > https://www.postgresql.org/message-id/20200714090808.GA20780@nol > checksum_cost_delay > checksum_cost_page > checksum_cost_limit I honestly hadn't thought about that, but I very much agree that any controls introduced should work the same for both of these patches. > Maybe you'd say that Julien's pg_check_relation() should accept parameters > instead of adding GUCs. I think you should be in agreement on that. It'd be > silly if the verification function added three GUCs and allowed adjusting > throttle midcourse, but the checksum writer process didn't use them. Agreed. I'm not a fan of using yet more GUCs for controlling this, but I don't a good argument against it. It's in line with the cost-based vacuum delays so I guess it's the most appropriate interface. > If you used something like that, I guess you'd also want to distinguish > checksum_cost_page_read vs write. Possibly, the GUCs part should be a > preliminary shared patch 0001 that you both used. +1. Thanks for the review! I will attach a v20 to Robers email with these changes included as well. cheers ./daniel
pgsql-hackers by date: