Re: Sorting writes during checkpoint - Mailing list pgsql-patches

From ITAGAKI Takahiro
Subject Re: Sorting writes during checkpoint
Date
Msg-id 20080416125802.78C9.52131E4D@oss.ntt.co.jp
Whole thread Raw
In response to Re: Sorting writes during checkpoint  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: Sorting writes during checkpoint
Re: Sorting writes during checkpoint
List pgsql-patches
Greg Smith <gsmith@gregsmith.com> wrote:

> On Tue, 15 Apr 2008, ITAGAKI Takahiro wrote:
>
> > 2x Quad core Xeon, 16GB RAM, 4x HDD (RAID-0)
>
> What is the disk controller in this system?  I'm specifically curious
> about what write cache was involved, so I can get a better feel for the
> hardware your results came from.

I used HP ProLiant DL380 G5 with Smart Array P400 with 256MB cache
(http://h10010.www1.hp.com/wwpc/us/en/sm/WF06a/15351-15351-3328412-241644-241475-1121516.html)
and ext3fs on LVM of CentOS 5.1 (Linux version 2.6.18-53.el5).
Dirty region of database was probably larger than disk controller's cache.


> buf_to_write = (BufAndTag *) palloc(NBuffers * sizeof(BufAndTag));
>
> If shared_buffers(=NBuffers) is set to something big, this could give some
> memory churn.  And I think it's a bad idea to allocate something this
> large at checkpoint time, because what happens if that fails?  Really not
> the time you want to discover there's no RAM left.

Hmm, but I think we need to copy buffer tags into bgwriter's local memory
in order to avoid locking taga many times in the sorting. Is it better to
allocate sorting buffers at the first time and keep and reuse it from then on?


> BufAndTag is a relatively small structure (5 ints).  Let's call it 40
> bytes; even that's only a 0.5% overhead relative to the shared buffer
> allocation.  If we can speed checkpoints significantly with that much
> overhead it sounds like a good tradeoff to me.

I thinks sizeof(BufAndTag) is 20 bytes because sizeof(int) is 4 on typical
platforms (and if not, I should rewrite the patch to be always so).
It is 0.25% of shared buffers; when shared_buffers is set to 10GB,
it takes 25MB of process local memory. If we want to consume less memory
for it, RelFileNode in BufferTag could be hashed and packed into an integer;
The blockNum order is important for this purpose, but RelFileNode is not.
It makes the overhead to 12 bytes per page (0.15%). Is it worth doing?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



pgsql-patches by date:

Previous
From: Andrew Chernow
Date:
Subject: Re: libpq object hooks patch
Next
From: "Brendan Jurd"
Date:
Subject: Re: [HACKERS] Text <-> C string