Re: Using multiple extended statistics for estimates - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Using multiple extended statistics for estimates
Date
Msg-id 20191107.133820.1542779476322302789.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Using multiple extended statistics for estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Using multiple extended statistics for estimates
List pgsql-hackers
Hello.

At Wed, 6 Nov 2019 20:58:49 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in 
> >Here is a slightly polished v2 of the patch, the main difference being
> >that computing clause_attnums was moved to a separate function.
> >
> 
> This time with the attachment ;-)

This patch is a kind of straight-forward, which repeats what the
previous statext_mcv_clauselist_selectivity did as long as remaining
clauses matches any of MV-MCVs. Almost no regression in the cases
where zero or just one MV-MCV applies to the given clause list.

It applies cleanly on the current master and seems working as
expected.


I have some comments.

Could we have description in the documentation on what multiple
MV-MCVs are used in a query? And don't we need some regression tests?


+/*
+ * statext_mcv_clause_attnums
+ *        Recalculate attnums from compatible but not-yet-estimated clauses.

It returns attnums collected from multiple clause*s*. Is the name OK
with "clause_attnums"?

The comment says as if it checks the compatibility of each clause but
the work is done in the caller side. I'm not sure such strictness is
required, but it might be better that the comment represents what
exactly the function does.


+ */
+static Bitmapset *
+statext_mcv_clause_attnums(int nclauses, Bitmapset **estimatedclauses,
+                           Bitmapset **list_attnums)

The last two parameters are in the same type in notation but in
different actual types.. that is, one is a pointer to Bitmapset*, and
another is an array of Bitmaptset*. The code in the function itself
suggests that, but it would be helpful if a brief explanation of the
parameters is seen in the function comment.

+        /*
+         * Recompute attnums in the remaining clauses (we simply use the bitmaps
+         * computed earlier, so that we don't have to inspect the clauses again).
+         */
+        clauses_attnums = statext_mcv_clause_attnums(list_length(clauses),

Couldn't we avoid calling this function twice with the same parameters
at the first round in the loop?

+        foreach(l, clauses)
         {
-            stat_clauses = lappend(stat_clauses, (Node *) lfirst(l));
-            *estimatedclauses = bms_add_member(*estimatedclauses, listidx);
+            /*
+             * If the clause is compatible with the selected statistics, mark it
+             * as estimated and add it to the list to estimate.
+             */
+            if (list_attnums[listidx] != NULL &&
+                bms_is_subset(list_attnums[listidx], stat->keys))
+            {
+                stat_clauses = lappend(stat_clauses, (Node *) lfirst(l));
+                *estimatedclauses = bms_add_member(*estimatedclauses, listidx);
+            }

The loop runs through all clauses every time. I agree that that is
better than using a copy of the clauses to avoid to step on already
estimated clauses, but maybe we need an Assertion that the listidx is
not a part of estimatedclauses to make sure no clauses are not
estimated twice.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: base backup client as auxiliary backend process
Next
From: Michael Paquier
Date:
Subject: Re: Attempt to consolidate reading of XLOG page