Re: sortsupport for text - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: sortsupport for text |
Date | |
Msg-id | CAEYLb_VjJijP4B-ZvZjKvfF=weJSvn76M6BiWOdoRDfH2p8FHg@mail.gmail.com Whole thread Raw |
In response to | Re: sortsupport for text (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: sortsupport for text
|
List | pgsql-hackers |
On 14 June 2012 20:32, Robert Haas <robertmhaas@gmail.com> wrote: > Yeah, but *it doesn't matter*. If you test this on strings that are > long enough that they get pushed out to TOAST, you'll find that it > doesn't measurably improve performance, because the overhead of > detoasting so completely dominates any savings on the palloc side that > you can't pick them out of the inter-run noise. That's probably true, but it's also beside the point. As recently as a few hours ago, you yourself said "my guess is that most values people sort by are pretty short, making this concern mostly academic". Why are you getting hung up on toasting now? > Here we know that it doesn't matter, so the application of Knuth's first law > of optimization is appropriate. I'm not advocating some Byzantine optimisation, or even something that could reasonably be described as an optimisation at all here. I'm questioning why you've unnecessarily complicated the code by having the buffer size just big enough to fit the biggest value seen so far, but arbitrarily aligned to a value that is completely irrelevant to bttextfastcmp_locale(), rather than using simple geometric expansion, which is more or less the standard way of managing the growth of a dynamic array. You have to grow the array in some way. The basic approach I've outlined has something to recommend it - why does it make sense to align the size of the buffer to TEXTBUFLEN in particular though? It's quite easy to imagine what you've done here resulting in an excessive number of allocations (and pfree()s), which *could* be expensive. If you're so conservative about allocating memory, don't grow the array at quite so aggressive a rate as doubling it each time. There is a trade-off between space and time to be made here, but I don't know why you think that the right choice is to use almost the smallest possible amount of memory in all cases. >> Another concern is that it seems fairly pointless to have two buffers. >> Wouldn't it be more sensible to have a single buffer that was >> partitioned to make two logical, equally-sized buffers, given that in >> general each buffer is expected to grow at exactly the same rate? > > Sure, but it would be making the code more complicated in return for > no measurable performance benefit. We generally avoid that. Fair enough. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
pgsql-hackers by date: