Home > mailing lists

Re: sortsupport for text - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: sortsupport for text
Date	June 14, 2012 19:31:12
Msg-id	CAEYLb_VjJijP4B-ZvZjKvfF=weJSvn76M6BiWOdoRDfH2p8FHg@mail.gmail.com Whole thread Raw
In response to	Re: sortsupport for text (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: sortsupport for text
List	pgsql-hackers

Tree view

On 14 June 2012 20:32, Robert Haas <robertmhaas@gmail.com> wrote:
> Yeah, but *it doesn't matter*.  If you test this on strings that are
> long enough that they get pushed out to TOAST, you'll find that it
> doesn't measurably improve performance, because the overhead of
> detoasting so completely dominates any savings on the palloc side that
> you can't pick them out of the inter-run noise.

That's probably true, but it's also beside the point. As recently as a
few hours ago, you yourself said "my guess is that most values people
sort by are pretty short, making this concern mostly academic". Why
are you getting hung up on toasting now?

> Here we know that it doesn't matter, so the application of Knuth's first law
> of optimization is appropriate.

I'm not advocating some Byzantine optimisation, or even something that
could reasonably be described as an optimisation at all here. I'm
questioning why you've unnecessarily complicated the code by having
the buffer size just big enough to fit the biggest value seen so far,
but arbitrarily aligned to a value that is completely irrelevant to
bttextfastcmp_locale(), rather than using simple geometric expansion,
which is more or less the standard way of managing the growth of a
dynamic array.

You have to grow the array in some way. The basic approach I've
outlined has something to recommend it - why does it make sense to
align the size of the buffer to TEXTBUFLEN in particular though? It's
quite easy to imagine what you've done here resulting in an excessive
number of allocations (and pfree()s), which *could* be expensive. If
you're so conservative about allocating memory, don't grow the array
at quite so aggressive a rate as doubling it each time.

There is a trade-off between space and time to be made here, but I
don't know why you think that the right choice is to use almost the
smallest possible amount of memory in all cases.

>> Another concern is that it seems fairly pointless to have two buffers.
>> Wouldn't it be more sensible to have a single buffer that was
>> partitioned to make two logical, equally-sized buffers, given that in
>> general each buffer is expected to grow at exactly the same rate?
>
> Sure, but it would be making the code more complicated in return for
> no measurable performance benefit.  We generally avoid that.

Fair enough.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

pgsql-hackers by date:

From: Andres Freund
Date: 14 June 2012, 18:58:30
Subject: Re: WAL format changes

From: Peter Eisentraut
Date: 14 June 2012, 19:42:33
Subject: transforms

Re: sortsupport for text - Mailing list pgsql-hackers

Previous

Next