Re: Choosing values for multivariate MCV lists - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Choosing values for multivariate MCV lists |
Date | |
Msg-id | 20190629130126.53rshkfu2go6atkl@development Whole thread Raw |
In response to | Re: Choosing values for multivariate MCV lists (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Choosing values for multivariate MCV lists
|
List | pgsql-hackers |
On Tue, Jun 25, 2019 at 11:18:19AM +0200, Tomas Vondra wrote: >On Mon, Jun 24, 2019 at 02:54:01PM +0100, Dean Rasheed wrote: >>On Mon, 24 Jun 2019 at 00:42, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >>> >>>On Sun, Jun 23, 2019 at 10:23:19PM +0200, Tomas Vondra wrote: >>>>On Sun, Jun 23, 2019 at 08:48:26PM +0100, Dean Rasheed wrote: >>>>>On Sat, 22 Jun 2019 at 15:10, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >>>>>>One annoying thing I noticed is that the base_frequency tends to end up >>>>>>being 0, most likely due to getting too small. It's a bit strange, though, >>>>>>because with statistic target set to 10k the smallest frequency for a >>>>>>single column is 1/3e6, so for 2 columns it'd be ~1/9e12 (which I think is >>>>>>something the float8 can represent). >>>>>> >>>>> >>>>>Yeah, it should be impossible for the base frequency to underflow to >>>>>0. However, it looks like the problem is with mcv_list_items()'s use >>>>>of %f to convert to text, which is pretty ugly. >>>>> >>>> >>>>Yeah, I realized that too, eventually. One way to fix that would be >>>>adding %.15f to the sprintf() call, but that just adds ugliness. It's >>>>probably time to rewrite the function to build the tuple from datums, >>>>instead of relying on BuildTupleFromCStrings. >>>> >>> >>>OK, attached is a patch doing this. It's pretty simple, and it does >>>resolve the issue with frequency precision. >>> >>>There's one issue with the signature, though - currently the function >>>returns null flags as bool array, but values are returned as simple >>>text value (formatted in array-like way, but still just a text). >>> >>>In the attached patch I've reworked both to proper arrays, but obviously >>>that'd require a CATVERSION bump - and there's not much apetite for that >>>past beta2, I suppose. So I'll just undo this bit. >>> >> >>Hmm, I didn't spot that the old code was using a single text value >>rather than a text array. That's clearly broken, especially since it >>wasn't even necessarily constructing a valid textual representation of >>an array (e.g., if an individual value's textual representation >>included the array markers "{" or "}"). >> >>IMO fixing this to return a text array is worth doing, even though it >>means a catversion bump. >> > >Yeah :-( > >It used to be just a "debugging" function, but now that we're using it >e.g. in pg_stats_ext definition, we need to be more careful about the >output. Presumably we could keep the text output and make sure it's >escaped properly etc. We could even build an array internally and then >run it through an output function. That'd not require catversion bump. > >I'll cleanup the patch changing the function signature. If others think >the catversion bump would be too significant annoyance at this point, I >will switch back to the text output (with proper formatting). > >Opinions? > Attached is a cleaned-up version of that patch. The main difference is that instead of using construct_md_array() this uses ArrayBuildState to construct the arrays, which is much easier. The docs don't need any update because those were already using text[] for the parameter, the code was inconsistent with it. This does require catversion bump, but as annoying as it is, I think it's worth it (and there's also the thread discussing the serialization issues). Barring objections, I'll get it committed later next week, once I get back from PostgresLondon. As I mentioned before, if we don't want any additional catversion bumps, it's possible to pass the arrays through output functions - that would allow us keeping the text output (but correct, unlike what we have now). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: