Re: Online checksums patch - once again - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Online checksums patch - once again
Date
Msg-id CA+TgmoZrwxbra85d7tjgT9CmrT3H5iHGTi40A=6gcReN7WgnHg@mail.gmail.com
Whole thread Raw
In response to Re: Online checksums patch - once again  (Magnus Hagander <magnus@hagander.net>)
Responses Re: Online checksums patch - once again  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers
On Wed, Jan 22, 2020 at 2:50 PM Magnus Hagander <magnus@hagander.net> wrote:
> The reasoning that led us to *not* doing that is that it's a one-off
> operation. That along with the fact that we hope to at some point be
> able to change the default to chekcsums on  (and t wouldn't be
> necessary for the transition on->off as that is very fast), it would
> become an increasingly rate on-off operation. And by adding these
> flags to the catalogs, everybody is paying the overhead for this
> one-off rare operation. Another option would be to add the flag on the
> pg_database level which would decrease the overhead, but our guess was
> that this would also decrease the usefulness in most cases to make it
> not worth it (most people with big databases don't have many big
> databases in the same cluster -- it's usually just one or two, so in
> the end the results would be more or less the same was we have now as
> it would have to keep re-doing the big ones)

I understand, but the point for me is that the patch does not seem
robust as written. Nobody's going to be happy if there are reasonably
high-probability scenarios where it turns checksums part way on and
then just stops. Now, that can probably be improved to some degree
without adding catalog flags, but I bet it can be improved more and
for less effort if we do add catalog flags. Maybe being able to
survive a cluster restart without losing track of progress is not a
hard requirement for this feature, but it certainly seems nice. And I
would venture to say that continuing to run without giving up if there
happen to be no background workers available for a while IS a hard
requirement, because that can easily happen due to normal use of
parallel query. We do not normally commit features if, without any
error occurring, they might just give up part way through the
operation.

I think the argument about adding catalog flags adding overhead is
pretty much bogus. Fixed-width fields in catalogs are pretty cheap.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: making the backend's json parser work in frontend code
Next
From: Magnus Hagander
Date:
Subject: Re: Online checksums patch - once again