Home > mailing lists

Re: Sort optimizations: Making in-memory sort cache-aware - Mailing list pgsql-hackers

From	Ankit Kumar Pandey
Subject	Re: Sort optimizations: Making in-memory sort cache-aware
Date	March 2, 2023 18:32:21
Msg-id	15cd9c50-eb5a-2fad-1d01-1c93b836f2c0@gmail.com Whole thread Raw
In response to	Re: Sort optimizations: Making in-memory sort cache-aware (Ankit Kumar Pandey <itsankitkp@gmail.com>)
List	pgsql-hackers

Tree view

Hi Andres,


I took a stab at naive version of this but been stuck for sometime now.

I have added logic to sort on first column at first pass,

realloc all tuples and do full sort at second pass, but I am not seeing

any benefit (it is actually regressing) at all.

Tried doing above both at bulk and at chunks of data.

 > You effectively end up with a bounded number of pre-sorted blocks, so 
the most
 > obvious thing to try is to build a heap of those blocks and 
effectively do a
 > heapsort between the presorted blocks.

I am not very clear about implementation for this. How can we do 
heapsort between

  the presorted blocks? Do we keep changing state->bound=i, i+n, i+2n 
something like

this and keep calling make_bounded_heap/sort_bounded_heap?

 > A related, but separate, improvement is to reduce / remove the memory
 > allocation overhead.

This is still pending from my side.


I have attached some benchmarking results with script and POC

patches (which includes GUC switch to enable optimization for ease of 
testing) for the same.

Tested on WORK_MEM=3 GB for 1 and 10 Million rows data.

Please let me know things which I can fix and re-attempt.


Thanks,

Ankit

Attachment

pgsql-hackers by date:

From: Tomas Vondra
Date: 02 March 2023, 18:27:00
Subject: Re: Add LZ4 compression in pg_dump

From: Tom Lane
Date: 02 March 2023, 18:35:27
Subject: Re: Add SHELL_EXIT_CODE to psql

Re: Sort optimizations: Making in-memory sort cache-aware - Mailing list pgsql-hackers

Attachment

Previous

Next