Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PoC/WIP: Extended statistics on expressions
Date
Msg-id d571f2de-6246-4f6e-ca1d-32cc53a2016a@enterprisedb.com
Whole thread Raw
In response to Re: PoC/WIP: Extended statistics on expressions  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
Hi,

Attached is a rebased patch series, merging the changes from the last
review into the 0003 patch, and with a WIP patch 0004 reworking the
tracking of expressions (to address the inefficiency due to relying on
MaxHeapAttributeNumber).

The 0004 passes is very much an experimental patch with a lot of ad hoc
changes. It passes make check, but it definitely needs much more work,
cleanup and testing. At this point it's more a demonstration of what
would be needed to rework it like this.

The main change is that instead of handling expressions by assigning
them attnums above MaxHeapAttributeNumber, we assign them system-like
attnums, i.e. negative ones. So the first one gets -1, the second one
-2, etc. And then we shift all attnums above 0, to allow using the
bitmapset as before.

Overall, this works, but the shifting is kinda pointless - it allows us
to build a bitmapset, but it's mostly useless because it depends on how
many expressions are in the statistics definition. So we can't compare
or combine bitmapsets for different statistics, and (more importantly)
we can't easily compare bitmapset on attnums from clauses.

Using MaxHeapAttributeNumber allowed using the bitmapsets at least for
regular attributes. Not sure if that's a major advantage, outweighing
wasting some space.

I wonder if we should just ditch the bitmapsets, and just use simple
arrays of attnums. I don't think we expect too many elements here,
especially when dealing with individual statistics. So now we're just
building and rebuilding the bitmapsets ... seems pointless.


One thing I'd like to improve (independently of what we do with the
bitmapsets) is getting rid of the distinction between attributes and
expressions when building the statistics - currently all the various
places have to care about whether the item is attribute or expression,
and look either into the tuple or array of pre-calculated value, do
various shifts to get the indexes, etc. That's quite tedious, and I've
made a lot of errors in that (and I'm sure there are more). So IMO we
should simplify this by replacing this with something containing values
for both attributes and expressions, handling it in a unified way.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Greg Nancarrow
Date:
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Next
From: Michael Paquier
Date:
Subject: Re: cryptohash: missing locking functions for OpenSSL <= 1.0.2?