Re: [HACKERS] Small improvement to compactify_tuples - Mailing list pgsql-hackers

From Sokolov Yura
Subject Re: [HACKERS] Small improvement to compactify_tuples
Date
Msg-id e49befcc6f1d7099834c6fdf5c675a60@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] Small improvement to compactify_tuples  (Sokolov Yura <funny.falcon@postgrespro.ru>)
Responses Re: [HACKERS] Small improvement to compactify_tuples  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Sokolov Yura писал 2017-05-15 15:08:
> Heikki Linnakangas писал 2017-05-15 12:06:
>> On 05/14/2017 09:47 PM, Sokolov Yura wrote:
>>> Good day, everyone.
>>> 
>>> I've been playing a bit with unlogged tables - just random updates on
>>> simple
>>> key-value table. I've noticed amount of cpu spent in a 
>>> compactify_tuples
>>> (called by PageRepareFragmentaion). Most of time were spent in qsort 
>>> of
>>> itemidbase items.
>> 
>> Ah, I played with this too a couple of years ago, see
>> https://www.postgresql.org/message-id/546B89DE.7030906%40vmware.com,
>> but got distracted by other things and never got around to commit
>> that.
>> 
>>> itemidbase array is bounded by number of tuples in a page, and
>>> itemIdSortData
>>> structure is simple, so specialized version could be a better choice.
>>> 
>>> Attached patch adds combination of one pass of prefix sort with
>>> insertion
>>> sort for larger array and shell sort for smaller array.
>>> Insertion sort and shell sort are implemented as macros and could be
>>> reused.
>> 
>> Cool! Could you compare that against the bucket sort I posted in the
>> above thread, please?
>> 
>> At a quick glance, your "prefix sort" seems to be the the same
>> algorithm as the bucket sort that I implemented. You chose 256
>> buckets, where I picked 32. And you're adding a shell sort
>> implementation, for small arrays, while I used a straight insertion
>> sort. Not sure what these differences mean in practice.
>> 
>> - Heikki
> 
> Thank you for attention.
> 
> My first version of big page sort was almost exactly same to yours.
> I had a bug in my insertion sort, so I had to refactor it.
> (bug were fixed)
> 
> I found that items in itemidbase are almost sorted, so it is important
> to try keep its order in prefix sort. So I've changed --count[i] to
> count[i+1]++.
> 
> And it looks like it is better to have more buckets:
> - with 256 buckets, size of single bucket is almost always less than 2,
> so array is almost always sorted after prefix sort pass.
> 
> But it looks like it is better to sort each bucket separately, as you
> did, and as it was in my early version.
> 
> Also I used your names for functions and some comments.
> 
> I attached new version of the patch.
> 
> I left memcpy intact cause it looks like it doesn't take noticable
> cpu time.

In a sequel, I propose to simplify PageRepairFragmentation in attached
patch.

-- 
Sokolov Yura aka funny.falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] NOT NULL constraints on range partition key columns
Next
From: Mark Dilger
Date:
Subject: Re: [HACKERS] Event triggers + table partitioning cause server crash in current master