Re: Abbreviated keys for Numeric - Mailing list pgsql-hackers
From | Andrew Gierth |
---|---|
Subject | Re: Abbreviated keys for Numeric |
Date | |
Msg-id | 87a8z7gokj.fsf@news-spur.riddles.org.uk Whole thread Raw |
In response to | Re: Abbreviated keys for Numeric (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Abbreviated keys for Numeric
|
List | pgsql-hackers |
>>>>> "Peter" == Peter Geoghegan <pg@heroku.com> writes: Peter> Attached is a revision of this patch, that I'm calling v2. WhatPeter> do you think, Andrew? "No." is I think the best summary of my response. I strongly suggest whichever committer ends up looking at this consider my original version unchanged in preference to this. The cost/benefit decision of supporting abbreviation on 32bit platforms is a point that can be debated (I strongly support retaining the 32bit code, obviously), but the substantive changes here are actively wrong. Peter> Other than that, I've tried to keep things closer to the textPeter> opclass. For example, the cost model now hasa few debuggingPeter> traces (disabled by default). I have altered the ad-hoc costPeter> model so that it no longer concernsitself with NULL inputs,Peter> which seemed questionable (not least since the abbreviationPeter> conversion functionisn't actually called for NULL inputs. AnyPeter> attempt to track the full count within numeric code thereforePeter>cannot work.). This is simply wrong. The reason why the cost model (in my version) tracks non-null values by having its own counter is precisely BECAUSE the passed-in memtupcount includes nulls, and therefore the code will UNDERESTIMATE the fraction of successfully abbreviated values if the comparison is based on memtupcount. In your version, if there are 9999 null values at the start of the input, then on the first non-null value after that, memtupcount will be 10000 and there will be only 1 distinct abbreviated value, causing abbreviation to spuriously abort. The test to clamp the estimate to 1.0 is just nonsensical and serves no purpose whatever, and the comment for it is wrong. You should fix the text abbreviation code, not propagate your mistakes further. (BTW, there's an outright typo in your code, ';;' for ';' at the end of a line. Sloppy.) Peter> I also now allocate a buffer of scratch memory separately fromPeter> the main sortsupport object - doing one allocationfor allPeter> sortsupport state, bunched together as a buffer seemed like aPeter> questionable micro-optimization. It's yet another cache line... I admit I did not benchmark that choice, but then neither did you. Peter> It seemed unwise to silently disable abbreviation when someonePeter> happened to build with DEC_DIGITS != 4. A staticassertion nowPeter> gives these unusual cases the opportunity to make an informedPeter> decision about either disablingabbreviation or not changingPeter> DEC_DIGITS in light of the performance penalty, in aPeter> self-documenting way. A) Nobody in their right minds is ever going to do that anyway B) Anybody who does that is either not concerned about performance or is concerned only about performance of the low-level numeric ops, and abbreviation is the last thing they're going to be worried about in either case. -- Andrew (irc:RhodiumToad)
pgsql-hackers by date: