Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date
Msg-id 0c41dd0a-78bc-2bbb-43a1-3f8993514240@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Mark Dilger <hornschnorter@gmail.com>)
Responses Re: [HACKERS] PATCH: multivariate histograms and MCV lists
List pgsql-hackers

On 12/20/2017 02:44 AM, Mark Dilger wrote:
> 
>> On Dec 19, 2017, at 4:31 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> Hi,
>>
>> On 12/19/2017 08:17 PM, Mark Dilger wrote:
>>>
>>> I tested your latest patches on my mac os x laptop and got one test
>>> failure due to the results of 'explain' coming up differently.  For the record,
>>> I followed these steps:
>>>
>>> cd postgresql/
>>> git pull
>>> # this got my directory up to 8526bcb2df76d5171b4f4d6dc7a97560a73a5eff with no local changes
>>> patch -p 1 < ../0001-multivariate-MCV-lists.patch
>>> patch -p 1 < ../0002-multivariate-histograms.patch
>>> ./configure --prefix=/Users/mark/master/testinstall --enable-cassert --enable-tap-tests --enable-depend && make -j4
&&make check-world
 
>>>
>>
>> Yeah, those steps sounds about right.
>>
>> Apparently this got broken by ecc27d55f4, although I don't quite
>> understand why - but it works fine before. Can you try if it works fine
>> on 9f4992e2a9 and fails with ecc27d55f4?
> 
> It succeeds with 9f4992e2a9.  It fails with ecc27d55f4.  The failures look
> to be the same as I reported previously.
> 

Gah, this turned out to be a silly bug. The ecc27d55f4 commit does:

    ... and fix dependencies_clauselist_selectivity() so that
    estimatedclauses actually is a pure output argument as stated by
    its API contract.

which does bring the code in line with the comment stating that
'estimatedclauses' is an output parameter. It wasn't meant to be
strictly output, though, but an input/output one instead (to pass
information about already estimated clauses when applying multiple
statistics).

With only dependencies it did not matter, but with the new MCV and
histogram patches we do this:

   Bitmapset *estimatedclauses = NULL;

   s1 *= statext_clauselist_selectivity(..., &estimatedclauses);

   s1 *= dependencies_clauselist_selectivity(..., &estimatedclauses);

Since ecc27d55f4, the first thing dependencies_clauselist_selectivity
does is resetting estimatedclauses to NULL, throwing away  information
about which clauses were estimated by MCV and histogram stats.

Of course, that's something ecc27d55f4 could not predict, but the reset
of estimatedclauses also makes the first loop over clauses rather
confusing, as it also checks the estimatedclauses bitmapset:

    listidx = 0;
    foreach(l, clauses)
    {
        Node *clause = (Node *) lfirst(l);

        if (!bms_is_member(listidx, *estimatedclauses))
        {
            ...
        }

        listidx++;
    }

Of course, the index can never be part of the bitmapset - we've just
reset it to NULL, and it's the first loop. This does not break anything,
but it's somewhat confusing.

Attached is an updated patch series, where the first patch fixes this by
removing the reset of estimatedclauses (and tweaking the comment).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: January CommitFest is underway!
Next
From: Alexander Korotkov
Date:
Subject: Re: LIKE foo% optimization easily defeated by OR?