Re: Abbreviated keys for text cost model fix - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Abbreviated keys for text cost model fix
Date
Msg-id 54EB7CB1.8000308@2ndquadrant.com
Whole thread Raw
In response to Re: Abbreviated keys for text cost model fix  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Abbreviated keys for text cost model fix  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On 23.2.2015 19:22, Peter Geoghegan wrote:
> On Mon, Feb 23, 2015 at 8:40 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> The durations are much higher than without the single unsorted row added
>> at the end. Queries often take 20x longer to finish (on the same code),
>> depending on the scale.
> 
> I knew this would happen.  :-)
> 
>> What's interesting here is that some queries are much faster, but query
>> #3 is slow until we hit 2M rows:
>>
>>     select * from (select * from stuff_int_desc order by randint
>>                    offset 100000000000) foo
> 
> Not sure why that would be. Can you see what a "#define 
> DEBUG_ABBREV_KEYS" (within varlena.c) build shows happens with the 
> cost model? Enable debug1 output for that.

test=# select * from (select * from stuff_text_asc order by randtxt
offset 100000000000) foo;
DEBUG:  abbrev_distinct after 160: 152.862464 (key_distinct: 155.187096,
norm_abbrev_card: 0.955390, prop_card: 0.200000)
DEBUG:  abbrev_distinct after 320: 314.784336 (key_distinct: 313.425344,
norm_abbrev_card: 0.983701, prop_card: 0.200000)
DEBUG:  abbrev_distinct after 640: 629.050490 (key_distinct: 643.945298,
norm_abbrev_card: 0.982891, prop_card: 0.200000)
DEBUG:  abbrev_distinct after 1280: 1194.271440 (key_distinct:
1257.153875, norm_abbrev_card: 0.933025, prop_card: 0.200000)
DEBUG:  abbrev_distinct after 2560: 2402.820431 (key_distinct:
2631.772790, norm_abbrev_card: 0.938602, prop_card: 0.200000)
DEBUG:  abbrev_distinct after 5120: 4490.142750 (key_distinct:
5261.865309, norm_abbrev_card: 0.876981, prop_card: 0.200000)
DEBUG:  abbrev_distinct after 10240: 8000.884171 (key_distinct:
10123.941172, norm_abbrev_card: 0.781336, prop_card: 0.200000)
DEBUG:  abbrev_distinct after 20480: 13596.546899 (key_distinct:
20563.364696, norm_abbrev_card: 0.663894, prop_card: 0.130000)
DEBUG:  abbrev_distinct after 40960: 23369.899021 (key_distinct:
40500.600680, norm_abbrev_card: 0.570554, prop_card: 0.084500)
DEBUG:  abbrev_distinct after 81920: 30884.544030 (key_distinct:
81466.848436, norm_abbrev_card: 0.377009, prop_card: 0.054925)randtxt
---------
(0 rows)

>> Looking at the previous tests, I see this is exactly what's
>> happening to this query with 'random' dataset - it's slightly
>> slower than master up until 2M rows, when it suddenly jumps to the
>> same speedup as the other queries. Can we do something about that?
> 
> Possibly.
> 
>> Anyway, I'm wondering what conclusion we can do from this? I believe
>> vast majority of datasets in production won't be perfectly sorted,
>> because when the table is CLUSTERed by index we tend to use index scan
>> to do the sort (so no problem), or the data are not actually perfectly
>> sorted (and here we get significant speedup).
> 
> One conclusion might be that commit a3f0b3d6 should not have added the
> pre-check for perfectly sorted input, which is not in the Bentley &
> Mcilroy original - exploiting the fact that the set is already
> partially sorted is a fine thing, but we're not doing the necessary

Would that pre-check hurt even the random test case? I can imagine it
might behave strangely with almost perfectly sorted data (when only the
very last row is unsorted), but not for random data (but see below).

> bookkeeping. But that's not really why I suggested that test case.
> Rather, I wanted to put the possible regression in the event of
> perfectly sorted input in perspective. Maybe that regression isn't
> worth worrying about, if in the event of a single tuple being out of
> order at the end of the set the outcome dramatically shifts to
> strongly favor abbreviation. Another way of looking at it would be
> that the pre-sorted case is an argument against strxfrm() sorting in
> general more than abbreviated keys in particular, an argument that
> seems new and strange to me. The analysis of algorithms focuses on the
> worst case and average case for a reason, and (say) the glibc
> documentation never considers this when discussing strxfrm().

Yeah, every algorithm has a worst case - either you get a very fast
execution on average with a few rare cases when it's much slower, or you
get very bad average performance.

But after looking at the data a bit more, I actually think this is a red
herring. After looking at the actual runtimes (and not just speedups), I
get this:
     scale    query#     master   cost-model    speedup    100000         1        884          103       856%
1000000        1      12153         1506       807%   2000000         1      26187         3894       673%   3000000
    1      38147         6601       578%   4000000         1      56332        10976       513%   5000000         1
77009        16409       469%
 
    100000         2        883          110       805%   1000000         2      12114         1574       770%
2000000        2      26104         4038       646%   3000000         2      38028         6828       557%   4000000
    2      56138        11294       497%   5000000         2      76720        16829       456%
 
    100000         3        102          106        97%   1000000         3       1507         1537        98%
2000000        3      26984         3977       678%   3000000         3      39090         6753       579%   4000000
    3      57239        11228       510%   5000000         3      78150        16769       466%
 

So while it's true that for the 3rd query we get much worse results
compared to the other queries (i.e. we don't get >400% speedup but ~3%
slowdown compared to master), it's true that master performs
exceptionally well for this query with small datasets. Once we get to 2M
rows, the master performance drops significantly but cost-model keeps
the performance characteristics and the speedup jumps back to ~700%
which is nice.

These numbers are for the 'ASC + unsorted row' test, but I do get
exactly the same pattern for the 'random' tests done previously.

It would be nice if we could address the 3% regression for the last
query, but I guess it's not a big deal. The numbers in general are
absolutely impressive. Kudos.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: OBJECT_ATTRIBUTE is useless (or: ALTER TYPE vs ALTER TABLE for composites)
Next
From: Tom Lane
Date:
Subject: Precedence of NOT LIKE, NOT BETWEEN, etc