Home > mailing lists

Re: [HACKERS] qsort again (was Re: Strange Create - Mailing list pgsql-performance

From	PFC
Subject	Re: [HACKERS] qsort again (was Re: Strange Create
Date	February 17, 2006 06:55:13
Msg-id	op.s435ywc1cigqcu@apollo13 Whole thread Raw
In response to	Re: [HACKERS] qsort again (was Re: Strange Create (Markus Schaber <schabi@logix-tt.com>)
List	pgsql-performance

Tree view

    Has anybody got some profiler data on the amount of time spent in
comparisons during a sort ? Say, the proposals here would give the most
gains on simple types like INTEGER ; so it would be interesting to know
how much time is now spent in comparisons for sorting a column of ints. If
it's like, 10% of the total time, well...

    More hand-waving :

    What are the usage case for sorts ?

    - potentially huge data sets : create index, big joins, reporting queries
etc.
    - small data sets : typically, a query with an ORDER BY which will return
a small amount of rows (website stuff), or joins not small enough to use a
HashAggregate, but not big enough to create an index just for them.

    The cost of a comparison vs. moving stuff around and fetching stuff is
probably very different in these two cases. If it all neatly fits in
sort_mem, you can do fancy stuff (like sorting on SortKeys) which will
need to access the data in almost random order when time comes to hand the
sorted data back. So, I guess the SortKey idea would rather apply to the
latter case only, which is CPU limited.

    Anyway, I was wondering about queries with multipart keys, like ORDER BY
zipcode, client_name, date and the like. Using just an int64 as the key
isn't going to help a lot here. Why not use a binary string of limited
length ? I'd tend to think it would not be that slower than comparing
ints, and it would be faster than calling each comparison function for
each column. Each key part would get assigned to a byte range in the
string.
    It would waste some memory, but for instance, using 2x the memory for
half the time would be a good tradeoff if the amount of memory involved is
in the order of megabytes.
    Also, it would solve the problem of locales. Comparisons involving
locales are slow, but strings can be converted to a canonical form
(removing accents and stuff), and then binary sorted.

    Also I'll insert a plug for the idea that the Sort needs to know if there
will be a LIMIT afterwards ; this way it could reduce its working set by
simply discarding the rows which would have been discarded anyway by the
LIMIT. Say you want the 100 first rows out of a million ordered rows. If
the sort knows this, it can be performed in the amount of memory for a 100
rows sort.

pgsql-performance by date:

From: Markus Schaber
Date: 17 February 2006, 06:19:53
Subject: Re: [HACKERS] qsort again (was Re: Strange Create

From: Ron
Date: 17 February 2006, 09:01:43
Subject: Re: [HACKERS] qsort again (was Re: Strange Create

Re: [HACKERS] qsort again (was Re: Strange Create - Mailing list pgsql-performance

Previous

Next