Re: Calculage avg. width when operator = is missing - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Re: Calculage avg. width when operator = is missing
Date
Msg-id CACACo5T=D=0Kiok=Ok53G3ijQTcY82kfSuDEqjL9antMqnWn9Q@mail.gmail.com
Whole thread Raw
In response to Re: Calculage avg. width when operator = is missing  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Calculage avg. width when operator = is missing  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
<p dir="ltr">On Sep 22, 2015 8:58 PM, "Andrew Dunstan" <<a
href="mailto:andrew@dunslane.net">andrew@dunslane.net</a>>wrote:<br /> ><br /> ><br /> ><br /> > On
09/22/201512:16 PM, Shulgin, Oleksandr wrote:<br /> >><br /> >> Hi Hackers,<br /> >><br /> >>
I'verecently stumbled upon a problem with table bloat estimation in case there are columns of type JSON.<br />
>><br/> >> The quick bloat estimation queries use sum over pg_statistic.stawidth of table's columns, but in
caseof JSON the corresponding entry is never created by the ANALYZE command due to equality comparison operator
missing. I understand why there is no such operator defined for this particular type, but shouldn't we still try to
producemeaningful average width estimation?<br /> >><br /> >> In my case the actual bloat is around 40% as
verifiedwith pgstattuple, while the bloat reported by quick estimate can be between 75% and 95%(!) in three instances
ofthis problem.  We're talking about some hundreds of GB of miscalculation.<br /> >><br /> >> Attached
patchagainst master makes the std_typanalyze still try to compute the minimal stats even if there is no "=" operator. 
Makessense?<br /> >><br /> >> I could also find this report in archives that talks about similar problem,
butdue to all values being over the analyze threshold:<br /> >><br /> >> <a
href="http://www.postgresql.org/message-id/flat/12480.1389370514@sss.pgh.pa.us#12480.1389370514@sss.pgh.pa.us">http://www.postgresql.org/message-id/flat/12480.1389370514@sss.pgh.pa.us#12480.1389370514@sss.pgh.pa.us</a><br
/>>><br /> >> I think we could try harder, otherwise any estimate relying on average width can be way off
insuch cases.<br /> ><br /> > Yes, "/revenons/ à /nos moutons/." You can set up text based comparison ops fairly
easilyfor json - you just need to be aware of the limitations. See <a
href="https://gist.github.com/adunstan/32ad224d7499d2603708">https://gist.github.com/adunstan/32ad224d7499d2603708</a><p
dir="ltr">Yes,I've already tried this approach and have found that analyze performance degrades an order of magnitude
dueto sql-level function overhead and casts to text.  In my tests, from 200ms to 2000ms with btree ops on a default
sampleof 30,000 rows.<p dir="ltr">Should have mentioned that.<p dir="ltr">There is a very hacky way to substitute
bttextcmpfor the sort support function after defining the opclass by updating pg_amproc, buy I would rather avoid that.
:-)<p dir="ltr">--<br /> Alex<br /> 

pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: [COMMITTERS] pgsql: Use gender-neutral language in documentation
Next
From: Christopher Browne
Date:
Subject: Re: [COMMITTERS] pgsql: Use gender-neutral language in documentation