Re: Sort optimizations: Making in-memory sort cache-aware - Mailing list pgsql-hackers

From Ankit Kumar Pandey
Subject Re: Sort optimizations: Making in-memory sort cache-aware
Date
Msg-id 15cd9c50-eb5a-2fad-1d01-1c93b836f2c0@gmail.com
Whole thread Raw
In response to Re: Sort optimizations: Making in-memory sort cache-aware  (Ankit Kumar Pandey <itsankitkp@gmail.com>)
List pgsql-hackers
Hi Andres,


I took a stab at naive version of this but been stuck for sometime now.

I have added logic to sort on first column at first pass,

realloc all tuples and do full sort at second pass, but I am not seeing

any benefit (it is actually regressing) at all.

Tried doing above both at bulk and at chunks of data.

 > You effectively end up with a bounded number of pre-sorted blocks, so 
the most
 > obvious thing to try is to build a heap of those blocks and 
effectively do a
 > heapsort between the presorted blocks.

I am not very clear about implementation for this. How can we do 
heapsort between

  the presorted blocks? Do we keep changing state->bound=i, i+n, i+2n 
something like

this and keep calling make_bounded_heap/sort_bounded_heap?

 > A related, but separate, improvement is to reduce / remove the memory
 > allocation overhead.

This is still pending from my side.


I have attached some benchmarking results with script and POC

patches (which includes GUC switch to enable optimization for ease of 
testing) for the same.

Tested on WORK_MEM=3 GB for 1 and 10 Million rows data.

Please let me know things which I can fix and re-attempt.


Thanks,

Ankit



Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Add LZ4 compression in pg_dump
Next
From: Tom Lane
Date:
Subject: Re: Add SHELL_EXIT_CODE to psql