Re: [PATCH] pg_stat_toast v10 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [PATCH] pg_stat_toast v10
Date
Msg-id CA+TgmobwX3Xnq-69yBM=SCLyzHo2=mhJd7Tt3oPMoHzk0_Xs3Q@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] pg_stat_toast v10  ("Gunnar \"Nick\" Bluth" <gunnar.bluth@pro-open.de>)
Responses Re: [PATCH] pg_stat_toast  ("Gunnar \"Nick\" Bluth" <gunnar.bluth@pro-open.de>)
List pgsql-hackers
On Thu, Mar 31, 2022 at 9:16 AM Gunnar "Nick" Bluth
<gunnar.bluth@pro-open.de> wrote:
> That was meant to say "v10", sorry!

Hi,

From my point of view, at least, it would be preferable if you'd stop
changing the subject line every time you post a new version.

Based on the test results in
http://postgr.es/m/42bfa680-7998-e7dc-b50e-480cdd986ffc@pro-open.de
and the comments from Andres in
https://www.postgresql.org/message-id/20211212234113.6rhmqxi5uzgipwx2%40alap3.anarazel.de
my judgement would be that, as things stand today, this patch has no
chance of being accepted, due to overhead. Now, Andres is currently
working on an overhaul of the statistics collector and perhaps that
would reduce the overhead of something like this to an acceptable
level. If it does, that would be great news; I just don't know whether
that's the case.

As far as the statistics themselves are concerned, I am somewhat
skeptical about whether it's really worth adding code for this.
According to the documentation, the purpose of the patch is to allow
you to assess choice of storage and compression method settings for a
column and is not intended to be enabled permanently. However, it
seems to me that you could assess that pretty easily without this
patch: just create a couple of different tables with different
settings, load up the same data via COPY into each one, and see what
happens. Now you might answer that with the patch you would get more
detailed and accurate statistics, and I think that's true, but it
doesn't really look like the additional level of detail would be
critical to have in order to make a proper assessment. You might also
say that creating multiple copies of the table and loading the data
multiple times would be expensive, and that's also true, but you don't
really need to load it all. A representative sample of 1GB or so would
probably suffice in most cases, and that doesn't seem likely to be a
huge load on the system.

Also, as we add more compression options, it's going to be hard to
assess this sort of thing without trying stuff anyway. For example if
you can set the lz4 compression level, you're not going to know which
level is actually going to work best without trying out a bunch of
them and seeing what happens. If we allow access to other sorts of
compression parameters like zstd's "long" option, similarly, if you
really care, you're going to have to try it.

So my feeling is that this feels like a lot of machinery and a lot of
worst-case overhead to solve a problem that's really pretty easy to
solve without any new code at all, and therefore I'd be inclined to
reject it. However, it's a well-known fact that sometimes my feelings
about things are pretty stupid, and this might be one of those times.
If so, I hope someone will enlighten me by telling me what I'm
missing.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Next
From: Robert Haas
Date:
Subject: Re: How to generate a WAL record spanning multiple WAL files?