Home > mailing lists

Re: Online enabling of checksums - Mailing list pgsql-hackers

From	Magnus Hagander
Subject	Re: Online enabling of checksums
Date	February 25, 2018 00:56:57
Msg-id	CABUevEzMuHn6Hc2GeCrjcefxXTnwdMb0Fg7zPkMCH-EArA5suA@mail.gmail.com Whole thread Raw
In response to	Re: Online enabling of checksums (Andres Freund <andres@anarazel.de>)
Responses	Re: Online enabling of checksums
List	pgsql-hackers

Tree view

On Sat, Feb 24, 2018 at 10:49 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-02-24 22:45:09 +0100, Magnus Hagander wrote:
> Is it really that invisible? Given how much we argue over adding single
> counters to the stats system, I'm not sure it's quite that low.

That's appears to be entirely unrelated. The stats stuff is expensive
because we currently have to essentialy write out the stats for *all*
tables in a database, once a counter is updated. And those counters are
obviously constantly updated. Thus the overhead of adding one column is
essentially multiplied by the number of tables in the system. Whereas
here it's a single column that can be updated on a per-row basis, which
is barely ever going to be written to.

Am I missing something?

It's probably at least partially unrelated, you are right. I may have misread our reluctance to add more values there as a general reluctancy to add more values to central columns.

> We did consider doing it at a per-table basis as well. But this is also an
> overhead that has to be paid forever, whereas the risk of having to read
> the database files more than once (because it'd only have to read them on
> the second pass, not write anything) is a one-off operation. And for all
> those that have initialized with checksums in the first place don't have to
> pay any overhead at all in the current design.

Why does it have to be paid forever?

The size of the pg_class row would be there forever. Granted, it's not that big an overhead given that there are already plenty of columns there. But the point being you can never remove that column, and it will be there for users who never even considered running without checksums. It's certainly not a large overhead, but it's also not zero.

> I very strongly doubg it's a "very noticeable operational problem". People
> don't restart their databases very often... Let's say it takes 2-3 weeks to
> complete a run in a fairly large database. How many such large databases
> actually restart that frequently? I'm not sure I know of any. And the only
> effect of it is you have to start the process over (but read-only for the
> part you have already done). It's certainly not ideal, but I don't agree
> it's in any form a "very noticeable problem".

I definitely know large databases that fail over more frequently than
that.

I would argue that they have bigger issues than enabling checksums... By far.

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

pgsql-hackers by date:

From: Andres Freund
Date: 25 February 2018, 00:49:57
Subject: Re: Online enabling of checksums

From: Tomas Vondra
Date: 25 February 2018, 01:01:59
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Re: Online enabling of checksums - Mailing list pgsql-hackers

Previous

Next