Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Re: PoC/WIP: Extended statistics on expressions |
Date | |
Msg-id | CAEZATCUuj5FPjtOU2=7Z2pTC8LEvrf79+Go7R8yQ+eE0ezfgfQ@mail.gmail.com Whole thread Raw |
In response to | Re: PoC/WIP: Extended statistics on expressions (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: PoC/WIP: Extended statistics on expressions
|
List | pgsql-hackers |
On Tue, 19 Jan 2021 at 01:57, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > A slightly bigger issue that I don't like is the way it assigns > > attribute numbers for expressions starting from > > MaxHeapAttributeNumber+1, so the first expression has an attnum of > > 1601. That leads to pretty inefficient use of Bitmapsets, since most > > tables only contain a handful of columns, and there's a large block of > > unused space in the middle the Bitmapset. > > > > An alternative approach might be to use regular attnums for columns > > and use negative indexes -1, -2, -3, ... for expressions in the stored > > stats. Then when storing and retrieving attnums from Bitmapsets, it > > could just offset by STATS_MAX_DIMENSIONS (8) to avoid negative values > > in the Bitmapsets, since there can't be more than that many > > expressions (just like other code stores system attributes using > > FirstLowInvalidHeapAttributeNumber). > > Well, I tried this but unfortunately it's not that simple. We still need > to build the bitmaps, so I don't think add_expression_to_attributes can > be just removed. I mean, we need to do the offsetting somewhere, even if > we change how we do it. Hmm, I was imagining that the offsetting would happen in each place that adds or retrieves an attnum from a Bitmapset, much like a lot of other code does for system attributes, and then you'd know you had an expression if the resulting attnum was negative. I was also thinking that it would be these negative attnums that would be stored in the stats data, so instead of something like "1, 2 => 1601", it would be "1, 2 => -1", so in some sense "-1" would be the "real" attnum associated with the expression. > But the main issue is that in some cases the number of expressions is > not really limited by STATS_MAX_DIMENSIONS - for example when applying > functional dependencies, we "merge" multiple statistics, so we may end > up with more expressions. So we can't just use STATS_MAX_DIMENSIONS. Ah, I see. I hadn't really fully understood what that code was doing. ISTM though that this is really an internal problem to the dependencies code, in that these "merged" Bitmapsets containing attrs from multiple different stats objects do not (and must not) ever go outside that local code that uses them. So that code would be free to use a different offset for its own purposes -- e..g., collect all the distinct expressions across all the stats objects and then offset by the number of distinct expressions. > Also, if we offset regular attnums by STATS_MAX_DIMENSIONS, that inverts > the order of processing (so far we've assumed expressions are after > regular attnums). So the changes are more extensive - I tried doing that > anyway, and I'm still struggling with crashes and regression failures. > Of course, that doesn't mean we shouldn't do it, but it's far from > mechanical. (Some of that is probably a sign this code needs a bit more > work to polish.) Interesting. What code assumes expressions come after attributes? Ideally, I think it would be cleaner if no code assumed any particular order, but I can believe that it might be convenient in some circumstances. > But I wonder if it'd be easier to just calculate the actual max attnum > and then use it instead of MaxHeapAttributeNumber ... Hmm, I'm not sure how that would work. There still needs to be an attnum that gets stored in the database, and it has to continue to work if the user adds columns to the table. That's why I was advocating storing negative values, though I haven't actually tried it to see what might go wrong. Regards, Dean
pgsql-hackers by date: