On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark <stark@mit.edu> wrote:
> On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> 12789 28.2686 libc-2.13.so strcoll_l
>> 6802 15.0350 postgres text_cmp
>
> I'm still curious how it would compare to call strxfrm and sort the
> resulting binary blobs. I don't think the sortsupport stuff actually
> makes this any easier though. Since using it requires storing the
> binary blob somewhere I think the support would have to be baked into
> tuplesort (or hacked into the sortkey as an expr that was evaluated
> earlier somehow).
Well, the real problem here is that the strxfrm'd representations
aren't just bigger - they are huge. On MacBook Pro, if the input
representation is n characters, the strxfrm'd representation is 9x+3
characters. If the we're sorting very wide tuples of which the sort
key is only a small part, maybe that would be acceptable, but if the
sort key makes up the bulk of the tuple than caching the strxfrm()
representation works out to slashing work_mem tenfold. That might be
just fine if the sort is going to fit in work_mem either way, but if
it turns a quicksort into a heap sort then I feel pretty confident
that it's going to be a loser. Keep in mind that even if the
strxfrm'd representation were no larger at all, it would still amount
to an additional copy of the data, so you'd still potentially be
eating up lots of work_mem that way. The fact that it's an order of
magnitude larger is just making a bad problem worse.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company