Re: [HACKERS] Small improvement to compactify_tuples - Mailing list pgsql-hackers

From Sokolov Yura
Subject Re: [HACKERS] Small improvement to compactify_tuples
Date
Msg-id 0dcb7a5afa3fae6001bd1712253cf81d@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] Small improvement to compactify_tuples  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: [HACKERS] Small improvement to compactify_tuples  (Sokolov Yura <funny.falcon@postgrespro.ru>)
List pgsql-hackers
Heikki Linnakangas писал 2017-05-15 12:06:
> On 05/14/2017 09:47 PM, Sokolov Yura wrote:
>> Good day, everyone.
>> 
>> I've been playing a bit with unlogged tables - just random updates on
>> simple
>> key-value table. I've noticed amount of cpu spent in a 
>> compactify_tuples
>> (called by PageRepareFragmentaion). Most of time were spent in qsort 
>> of
>> itemidbase items.
> 
> Ah, I played with this too a couple of years ago, see
> https://www.postgresql.org/message-id/546B89DE.7030906%40vmware.com,
> but got distracted by other things and never got around to commit
> that.
> 
>> itemidbase array is bounded by number of tuples in a page, and
>> itemIdSortData
>> structure is simple, so specialized version could be a better choice.
>> 
>> Attached patch adds combination of one pass of prefix sort with
>> insertion
>> sort for larger array and shell sort for smaller array.
>> Insertion sort and shell sort are implemented as macros and could be
>> reused.
> 
> Cool! Could you compare that against the bucket sort I posted in the
> above thread, please?
> 
> At a quick glance, your "prefix sort" seems to be the the same
> algorithm as the bucket sort that I implemented. You chose 256
> buckets, where I picked 32. And you're adding a shell sort
> implementation, for small arrays, while I used a straight insertion
> sort. Not sure what these differences mean in practice.
> 
> - Heikki

Thank you for attention.

My first version of big page sort was almost exactly same to yours.
I had a bug in my insertion sort, so I had to refactor it.
(bug were fixed)

I found that items in itemidbase are almost sorted, so it is important
to try keep its order in prefix sort. So I've changed --count[i] to
count[i+1]++.

And it looks like it is better to have more buckets:
- with 256 buckets, size of single bucket is almost always less than 2,
so array is almost always sorted after prefix sort pass.

But it looks like it is better to sort each bucket separately, as you
did, and as it was in my early version.

Also I used your names for functions and some comments.

I attached new version of the patch.

I left memcpy intact cause it looks like it doesn't take noticable
cpu time.

-- 
Sokolov Yura aka funny.falcon
Postgres Professional: https://postgrespro.ru
The Russian PostgreSQL Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] UPDATE of partition key
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Bug in ExecModifyTable function and trigger issues forforeign tables