Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Mithun Cy
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAD__OuhgUbZgszr-Ftf=bqgz-YwD60T5xMOikvXYiBQUCNpPqQ@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Using quicksort for every external sort run  (Mithun Cy <mithun.cy@enterprisedb.com>)
Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers

On Tue, Dec 29, 2015 at 4:33 AM, Peter Geoghegan <pg@heroku.com> wrote:
>Attached is a revision that significantly overhauls the memory patch,
>with several smaller changes.

I just ran some tests on above patch. Mainly to compare
how "longer sort keys" would behave with new(Qsort) and old Algo(RS) for sorting.
I have 8GB of ram and ssd storage.

Settings and Results.
----------------------------
Work_mem= DEFAULT (4mb).
key width = 520.


CASE 1. Data is pre-sorted as per  sort key order.

CASE 2. Data is sorted in opposite order of sort key.

CASE 3. Data is randomly distributed.


Key length 520




Number of records
320000064000001280000025600000

1.7 GB3.5GB7 GB14GB





CASE 1



RS23654.67735172.81144965.442106420.155
Qsort14100.36240612.829101068.107334893.391





CASE 2



RS13427.37836882.89898492.644310670.15
Qsort12475.13332559.074100772.531322080.602





CASE 3



RS17202.96645163.234122323.299337058.856
Qsort12530.72623343.75359431.315152862.837


If data is sorted as same as sort key order then current code performs better than proposed patch
as sort size increases.

It appears new algo do not seem have any major impact if rows are presorted in opposite order.

For randomly distributed order quick sort performs well when compared to current sort method (RS).


======================================================
Now Increase the work_mem to 64MB and for 14 GB of data to sort.

CASE 1: We can see Qsort is able to catchup with current sort method(RS).
CASE 2:  No impact.
CASE 3: RS is able to catchup with Qsort.


CASE 1RS128822.735

Qsort90857.496

CSAE 2RS105631.775

Qsort105938.334

CASE 3RS152301.054

Qsort149649.347


I think for long keys both old (RS) and new (Qsort) sort method has its own characteristics
based on data distribution. I think work_mem is the key If properly set new method(Qsort) will
be able to fit most of the cases. If work_mem is not tuned right it, there are cases it can regress.


--
Thanks and Regards
Mithun C Y

Attachment

pgsql-hackers by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: Fuzzy substring searching with the pg_trgm extension
Next
From: Alexander Korotkov
Date:
Subject: Re: [PATCH] Refactoring of LWLock tranches