Thread: Can't we give better table bloat stats easily?

Can't we give better table bloat stats easily?

From
Greg Stark
Date:
Everywhere I've worked I've seen people struggle with table bloat. It's hard to even measure how much of it you have or where, let alone actually fix it.

If you search online you'll find dozens of different queries estimating how much empty space are in your tables and indexes based on pg_stats statistics and suppositions about header lengths and padding and plugging them into formulas of varying credibility. 

But isn't this all just silliiness these days? We could actually sum up the space recorded in the fsm and get a much more trustworthy number in milliseconds. 

I rigged up a quick proof of concept and the code seems super simple and quick. There's one or two tables where the number is a bit suspect and there's no fsm if vacuum hasn't run but that seems pretty small potatoes for such a huge help in reducing user pain.


Re: Can't we give better table bloat stats easily?

From
Andres Freund
Date:
Hi,

On 2019-08-16 20:39:21 -0400, Greg Stark wrote:
> But isn't this all just silliiness these days? We could actually sum up the
> space recorded in the fsm and get a much more trustworthy number in
> milliseconds.

You mean like pgstattuple_approx()?

https://www.postgresql.org/docs/current/pgstattuple.html

Or something different?

> I rigged up a quick proof of concept and the code seems super simple and
> quick. There's one or two tables where the number is a bit suspect and
> there's no fsm if vacuum hasn't run but that seems pretty small potatoes
> for such a huge help in reducing user pain.

Hard to comment on what you propose, without more details. But note that
you can't just look at the FSM, because in a lot of workloads it is
often hugely out of date. And fairly obviously it doesn't provide you
with information about how much space is currently occupied by dead
tuples.  What pgstattuple_approx does is to use the FSM for blocks that
are all-visible, and look at the page otherwise.

Greetings,

Andres Freund



Re: Can't we give better table bloat stats easily?

From
Masahiko Sawada
Date:
On Sat, Aug 17, 2019 at 9:59 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-08-16 20:39:21 -0400, Greg Stark wrote:
> > But isn't this all just silliiness these days? We could actually sum up the
> > space recorded in the fsm and get a much more trustworthy number in
> > milliseconds.
>
> You mean like pgstattuple_approx()?
>
> https://www.postgresql.org/docs/current/pgstattuple.html
>
> Or something different?
>
> > I rigged up a quick proof of concept and the code seems super simple and
> > quick. There's one or two tables where the number is a bit suspect and
> > there's no fsm if vacuum hasn't run but that seems pretty small potatoes
> > for such a huge help in reducing user pain.
>
> Hard to comment on what you propose, without more details. But note that
> you can't just look at the FSM, because in a lot of workloads it is
> often hugely out of date. And fairly obviously it doesn't provide you
> with information about how much space is currently occupied by dead
> tuples.  What pgstattuple_approx does is to use the FSM for blocks that
> are all-visible, and look at the page otherwise.
>

It's just an idea but we could have pgstattuple_approx use sample scan
to estimate the table bloat more faster.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Can't we give better table bloat stats easily?

From
Jeff Janes
Date:
On Fri, Aug 16, 2019 at 8:39 PM Greg Stark <stark@mit.edu> wrote:
Everywhere I've worked I've seen people struggle with table bloat. It's hard to even measure how much of it you have or where, let alone actually fix it.

If you search online you'll find dozens of different queries estimating how much empty space are in your tables and indexes based on pg_stats statistics and suppositions about header lengths and padding and plugging them into formulas of varying credibility. 

There is not much we can do to suppress bad advice that people post on their own blogs.  If wiki.postgresql.org is hosting bad advice, by all means we should fix that.
 
But isn't this all just silliiness these days? We could actually sum up the space recorded in the fsm and get a much more trustworthy number in milliseconds. 

If you have bloat problems, then you probably have vacuuming problems.  If you have vacuuming problems, how much can you trust fsm anyway?

Cheers,

Jeff