Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date
Msg-id ec49a434-b814-2cfa-f602-02a0cc9e518c@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers
Hi,

Attached is an updated version of the patch series, addressing issues
pointed out by Alvaro. Let me go through the main changes:


1) I've updated / reworked the docs, updating the XML docs. There were
some obsolete references to functions that got renamed later, and I've
also reworked some of the user-facing docs with the aim to meet Alvaro's
suggestions. I've removed the references to READMEs etc, and at this
point I'm not sure I have a good idea how to improve this further ...


2) I got rid of the UPDATE_RESULT macro, along with counting the
matches. Initially I intended to just expand the macro and fix the match
counting (as mentioned in the FIXME), but I came to the conclusion it's
not really worth the complexity.

The idea was that by keeping the count of matching MCV items / histogram
buckets, we can terminate early in some cases. For example when
evaluating AND-clause, we can just terminate when (nmatches==0). But I
have no numbers demonstrating this actually helps, and furthermore it
was not implemented in histograms (well, we still counted the matches
but never terminated).

So I've just ripped that out and we can put it back later if needed.


3) Regarding the pg_mcv_list_items() and pg_histogram_buckets()
functions, it occurred to me that people can't really inject malicious
values because are no casts to the custom data types used to store MCV
lists and histograms in pg_statistic_ext.

The other issue was the lack of knowledge of data types for values
stored in the statistics. The code used OID of the statistic to get this
information (by looking at the relation). But it occurred to me this
could be solved the same way the regular statistics solve this - by
storing OID of the types. The anyarray does this automatically, but
there's no reason we can't do that too in pg_mcv_list and pg_histogram.

So I've done that, and the functions now take the custom data types
instead of the OID. I've also tweaked the documentation to use the
lateral syntax (good idea!) and added a new section into funcs.sgml.


4) I've merged the 0001 and 0002 patches. The 0001 was not really a bug
fix, and it was a behavior change required by introducing the MCV list,
so merging it seems right.


5) I've moved some changes from the histogram patch to MCV. The original
patch series was structured so that it introduced some code in mcv.c and
them moved it into extended_statistic.c so that it can be shared. Now
it's introduced in mcv.c right away, which makes it easier to understand
and reduces size of the patches.


6) I've fixed a bunch of comments, obsolete FIXMEs, etc.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: ECPG installcheck tests fail if PGDATABASE is set
Next
From: Chapman Flack
Date:
Subject: Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility