Re: sortsupport for text - Mailing list pgsql-hackers

From Robert Haas
Subject Re: sortsupport for text
Date
Msg-id CA+Tgmoa9jBOwQGhdEZnY1V1Te0hZadtHqQV4nFnLspqUOSB+ag@mail.gmail.com
Whole thread Raw
In response to Re: sortsupport for text  (Greg Stark <stark@mit.edu>)
Responses Re: sortsupport for text  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark <stark@mit.edu> wrote:
> On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> 12789    28.2686  libc-2.13.so             strcoll_l
>> 6802     15.0350  postgres                 text_cmp
>
> I'm still curious how it would compare to call strxfrm and sort the
> resulting binary blobs. I don't think the sortsupport stuff actually
> makes this any easier though. Since using it requires storing the
> binary blob somewhere I think the support would have to be baked into
> tuplesort (or hacked into the sortkey as an expr that was evaluated
> earlier somehow).

Well, the real problem here is that the strxfrm'd representations
aren't just bigger - they are huge.  On MacBook Pro, if the input
representation is n characters, the strxfrm'd representation is 9x+3
characters.  If the we're sorting very wide tuples of which the sort
key is only a small part, maybe that would be acceptable, but if the
sort key makes up the bulk of the tuple than caching the strxfrm()
representation works out to slashing work_mem tenfold.  That might be
just fine if the sort is going to fit in work_mem either way, but if
it turns a quicksort into a heap sort then I feel pretty confident
that it's going to be a loser.  Keep in mind that even if the
strxfrm'd representation were no larger at all, it would still amount
to an additional copy of the data, so you'd still potentially be
eating up lots of work_mem that way.  The fact that it's an order of
magnitude larger is just making a bad problem worse.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade and pg_config dependency
Next
From: Robert Haas
Date:
Subject: Re: sortsupport for text