Re: [PATCH] pg_stat_toast - Mailing list pgsql-hackers

From Gunnar \"Nick\" Bluth
Subject Re: [PATCH] pg_stat_toast
Date
Msg-id 83538137-f40b-5ead-7966-9b91a4af1fac@pro-open.de
Whole thread Raw
In response to Re: [PATCH] pg_stat_toast  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Am 06.04.22 um 18:55 schrieb Andres Freund:
> Hi,
> 
> On 2022-04-06 12:24:20 -0400, Robert Haas wrote:
>> On Wed, Apr 6, 2022 at 12:01 PM Gunnar "Nick" Bluth
>> <gunnar.bluth@pro-open.de> wrote:
>>> Fair enough. At that point, a lot of things become unexpectedly painful.
>>> How many % of the installed base may that be though?
>>
>> I don't have statistics on that, but it's large enough that the
>> expense associated with the statistics collector is a reasonably
>> well-known pain point, and for some users, a really severe one.
> 
> Yea. I've seen well over 100MB/s of write IO solely due to stats files writes
> on production systems, years ago.

Wow. Yeah, I tend to forget there's systems like ads' out there ;-)


>> I'm fairly sure it's not going to make things so cheap that we can afford to
>> add all the statistics anybody wants, but it's so painful that even modest
>> relief would be more than welcome.
> 
> It definitely doesn't make stats free. But I'm hopefull that avoiding the
> regular writing out / readin back in, and the ability to only optionally store
> some stats (by varying allocation size or just having different kinds of
> stats), will reduce the cost sufficiently that we can start keeping more
> stats.

Knock on wood!


> Which is not to say that these stats are the right ones (nor that they're the
> wrong ones).

;-)


> I think if I were to tackle providing more information about toasting, I'd
> start not by adding a new stats view, but by adding a function to pgstattuple
> that scans the relation and collects stats for each toasted column. An SRF
> returning one row for each toastable column. With information like
> 
> - column name
> - #inline datums
> - #compressed inline datums
> - sum(uncompressed inline datum size)
> - sum(compressed inline datum size)
> - #external datums
> - #compressed external datums
> - sum(uncompressed external datum size)
> - sum(compressed external datum size)
> 
> IIRC this shouldn't require visiting the toast table itself.

But it would still require a seqscan and quite some cycles. However,
sure, something like that is an option.

> Perhaps also an SRF that returns information about each compression method
> separately (i.e. collect above information, but split by compression method)?
> Perhaps even with the ability to measure how big the gains of recompressing
> into another method would be?

Even more of the above, but yeah, sound nifty.

>>>> However, experience has taught me that a lot of skepticism is
>>>> warranted when it comes to claims about how cheap extensions to the
>>>> statistics system will be.
>>>
>>> Again, fair enough!
>>> Maybe we first need statistics about statistics collection and handling? ;-)
>>
>> Heh.
> 
> I've wondered about adding pg_stat_stats the other day, actually :)
> https://postgr.es/m/20220404193435.hf3vybaajlpfmbmt%40alap3.anarazel.de

OMG LOL!

-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Remove error message hints mentioning configure options
Next
From: Michael Paquier
Date:
Subject: Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN