Hi Andres,
I took a stab at naive version of this but been stuck for sometime now.
I have added logic to sort on first column at first pass,
realloc all tuples and do full sort at second pass, but I am not seeing
any benefit (it is actually regressing) at all.
Tried doing above both at bulk and at chunks of data.
> You effectively end up with a bounded number of pre-sorted blocks, so
the most
> obvious thing to try is to build a heap of those blocks and
effectively do a
> heapsort between the presorted blocks.
I am not very clear about implementation for this. How can we do
heapsort between
the presorted blocks? Do we keep changing state->bound=i, i+n, i+2n
something like
this and keep calling make_bounded_heap/sort_bounded_heap?
> A related, but separate, improvement is to reduce / remove the memory
> allocation overhead.
This is still pending from my side.
I have attached some benchmarking results with script and POC
patches (which includes GUC switch to enable optimization for ease of
testing) for the same.
Tested on WORK_MEM=3 GB for 1 and 10 Million rows data.
Please let me know things which I can fix and re-attempt.
Thanks,
Ankit