Re: [sqlsmith] Crash in mcv_get_match_bitmap - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [sqlsmith] Crash in mcv_get_match_bitmap
Date
Msg-id 20190715013425.v2hqfq5ubcbslw67@development
Whole thread Raw
In response to Re: [sqlsmith] Crash in mcv_get_match_bitmap  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [sqlsmith] Crash in mcv_get_match_bitmap  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
OK, attached is a sequence of WIP fixes for the issues discussed here.


1) using column-specific collations (instead of type/default ones)

The collations patch is pretty simple, but I'm not sure it actually does
the right thing, particularly during estimation where it uses collation
from the Var node (varcollid). But looking at 5e0928005, this should use
the same collation as when building the extended statistics (which we
get from the per-column stats, as stored in pg_statistic.stacoll#).

But we don't actually store collations for extended statistics, so we
can either modify pg_statistic_ext_data and store it there, or lookup
the per-column statistic info during estimation, and use that. I kinda
think the first option is the right one, but that'd mean yet another
catversion bump.

OTOH 5e0928005 actually did modify the extended statistics (mvdistinct
and dependencies) to use type->typcollation during building, so maybe we
want to use the default type collation for some reason?


2) proper extraction of Var/Const from opclauses

This is the primary issue discussed in this thread - I've renamed the
function to examine_opclause_expression() so that it kinda resembles
examine_variable() and I've moved it to the "internal" header file. We
still need it from two places so it can't be static, but hopefully this
naming is acceptable.


3) handling of NULL values (Const and MCV items)

Aside from the issue that Const may represent NULL, I've realized the
code might do the wrong thing for NULL in the MCV item itself. It did
treat it as mismatch and update the bitmap, but it might have invoke the
operator procedure anyway (depending on whether it's AND/OR clause,
what's the current value in the bitmap, etc.). This patch should fix
both issues by treating them as mismatch, and skipping the proc call.


4) refactoring of the bitmap updates

This is not a bug per se, but while working on (3) I've realized the
code updating the bitmap is quite repetitive and it does stuff like

   if (is_or)
      bitmap[i] = Max(bitmap[i], match)
   else
      bitmap[i] = Min(bitmap[i], match)

over and over on various places. This moves this into a separate static
function, which I think makes it way more readable. Also, it replaces
the Min/Max with a plain boolean operators (the patch originally used
three states, not true/false, hence the Min/Max).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Using unique btree indexes for pathkeys with one extra column
Next
From: Thomas Munro
Date:
Subject: Re: refactoring - share str2*int64 functions