Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date
Msg-id 20180310171906.gd6vyz6hnou2n3h7@alvherre.pgsql
Whole thread Raw
In response to Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On 0002:

In terms of docs, I think it's better not to have anything user-facing
in the README.  Consider that users are going to be reading the HTML
docs only, and many of them may not have the README available at all.
So anything that could be useful to users must be in the XML docs only;
keep in the README only stuff that would be useful to a developer (a
section such as "not yet implemented" would belong there, for example).
Stuff that's in the XML should not appear in the README (because DRY).
For the same reason, having the XML docs end with "see the README" seems
a bad idea to me.

UPDATE_RESULT() is a bit weird to me.  I think after staring at it for a
while it looks okay, but why was it such a shock?  In 0002 it's only
used in one place so I would suggest to have it expanded, but I see you
use it in 0003 also, three times I think.  IMO for clarity it seems
better to just have the expanded code rather than the macro.

find_ext_attnums (and perhaps other places) have references to renamed
columns, "starelid" and others.  Also there is this comment:
/* Prepare to scan pg_statistic_ext for entries having indrelid = this rel. */
which is outdated since it uses syscache, not a scan.  Just remove the
comment ...

Please add a comment on what does build_attnums() do.

pg_stats_ext_mcvlist_items is odd.  I suppose you made it take oid to
avoid having to deal with a malicious bytea?  The query in docs is
pretty odd-looking,

SELECT * FROM pg_mcv_list_items((SELECT oid FROM pg_statistic_ext WHERE stxname = 'stts2'));
If we keep the function as is, I would suggest to use LATERAL instead,
  SELECT m.* FROM pg_statistic_ext, pg_mcv_list_items(oid) m WHERE stxname = 'stts2';
but seems like it should be more like this instead:
  SELECT m.* FROM pg_statistic_ext, pg_mcv_list_items(stxmcv) m WHERE stxname = 'stts2';
and not have the output formatting function load the data again from the
table.  It'd be a bit like a type-specific UNNEST.

There are a few elog(ERROR) messages.  The vast majority seem to be just
internal messages so they're okay, but there is one that should be
ereport:

+   if (total_length > (1024 * 1024))
+       elog(ERROR, "serialized MCV list exceeds 1MB (%ld)", total_length);
I think we have some precedent for better wording, such as
    errmsg("index row size %zu exceeds maximum %zu for index \"%s\""
so I would say
   errmsg("serialized MCV list size %zu exceedes maximum %zu" )
though I wonder when is this error thrown -- if this is detected during
analyze for example, what happens?

There is this FIXME:
+    * FIXME Should skip already estimated clauses (using the estimatedclauses
+    * bitmap).
Are you planning on implementing this before commit?

There are other FIXMEs also.  This in particular caught my attention:

+           /* merge the bitmap into the existing one */
+           for (i = 0; i < mcvlist->nitems; i++)
+           {
+               /*
+                * Merge the result into the bitmap (Min for AND, Max for OR).
+                *
+                * FIXME this does not decrease the number of matches
+                */
+               UPDATE_RESULT(matches[i], or_matches[i], is_or);
+           }

We come back to UPDATE_RESULT again ... and note how the comment makes
no sense unless you know what UPDATE_RESULT does internally.  This is
one more indication that the macro is not a great thing to have.  Let's
lose it.  But while at it, what to do about the FIXME?

You also have this
+   /* XXX syscache contains OIDs of deleted stats (not invalidated) */
+   if (!HeapTupleIsValid(htup))
+       return NULL;
but what does it mean?  Is it here to cover for some unknown bug?
Should we maybe not have this at all?

Another XXX comment says
+ * XXX All the memory is allocated in a single chunk, so that the caller
+ * can simply pfree the return value to release all of it.

but I would say just remove the XXX and leave the rest of the comment.

There is another XXX comment that says "this is useless", and I agree.
Just take it all out ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: [HACKERS] [PATCH] Incremental sort
Next
From: Alvaro Herrera
Date:
Subject: Re: FOR EACH ROW triggers on partitioned tables