Re: Parallel Sort - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Parallel Sort
Date
Msg-id CAM3SWZTz3FtcNT=_gOOhX_5Qt_QRvG-5oma6x1GsuR6VC1AWzQ@mail.gmail.com
Whole thread Raw
In response to Parallel Sort  (Noah Misch <noah@leadboat.com>)
Responses Re: Parallel Sort  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Mon, May 13, 2013 at 7:28 AM, Noah Misch <noah@leadboat.com> wrote:
> We should decide whether to actually sort in parallel based on the comparator
> cost and the data size.  The system currently has no information on comparator
> cost: bt*cmp (and indeed almost all built-in functions) all have procost=1,
> but bttextcmp is at least 1000x slower than btint4cmp.

I think that this effort could justify itself independently of any
attempt to introduce parallelism to in-memory sorting. I abandoned a
patch to introduce timsort to Postgres, because I knew that there was
no principled way to reap the benefits. Unless you introduce
parallelism, it's probably going to be virtually impossible to come up
with an alogorithm that does in-memory sorting faster (in terms of the
amount of system time taken) than a highly optimized quicksort when
sorting integers. But sorting types with really expensive comparators
(even considerably more expensive than bttextcmp) for
pass-by-reference Datums (where the memory locality advantage of
quicksort doesn't really help so much) makes timsort much more
compelling. That's why it's used for Python lists.


-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: counting algorithm for incremental matview maintenance
Next
From: Cédric Villemain
Date:
Subject: Re: PostgreSQL 9.3 beta breaks some extensions "make install"