Calculage avg. width when operator = is missing - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Calculage avg. width when operator = is missing
Date
Msg-id CACACo5QGXRc5daiUytveu99qfM=xkV=2SxEkZwuqCOqoz1wwaA@mail.gmail.com
Whole thread Raw
Responses Re: Calculage avg. width when operator = is missing  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
Hi Hackers,

I've recently stumbled upon a problem with table bloat estimation in case there are columns of type JSON.

The quick bloat estimation queries use sum over pg_statistic.stawidth of table's columns, but in case of JSON the corresponding entry is never created by the ANALYZE command due to equality comparison operator missing.  I understand why there is no such operator defined for this particular type, but shouldn't we still try to produce meaningful average width estimation?

In my case the actual bloat is around 40% as verified with pgstattuple, while the bloat reported by quick estimate can be between 75% and 95%(!) in three instances of this problem.  We're talking about some hundreds of GB of miscalculation.

Attached patch against master makes the std_typanalyze still try to compute the minimal stats even if there is no "=" operator.  Makes sense?

I could also find this report in archives that talks about similar problem, but due to all values being over the analyze threshold:


I think we could try harder, otherwise any estimate relying on average width can be way off in such cases.

--
Alex

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Improving test coverage of extensions with pg_dump
Next
From: Robert Haas
Date:
Subject: Re: 9.5: Can't connect with PGSSLMODE=require on Windows