Re: Controlling Load Distributed Checkpoints - Mailing list pgsql-hackers

From PFC
Subject Re: Controlling Load Distributed Checkpoints
Date
Msg-id op.ttvrtciocigqcu@apollo13
Whole thread Raw
In response to Re: Controlling Load Distributed Checkpoints  ("Jim C. Nasby" <decibel@decibel.org>)
Responses Re: Controlling Load Distributed Checkpoints  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
>> >If we extended relations by more than 8k at a time, we would know a lot
>> >more about disk layout, at least on filesystems with a decent amount of
>> >free space.
>>
>> I doubt it makes that much difference. If there was a significant amount
>> of fragmentation, we'd hear more complaints about seq scan performance.
>>
>> The issue here is that we don't know which relations are on which drives
>> and controllers, how they're striped, mirrored etc.
>
> Actually, isn't pre-allocation one of the tricks that Greenplum uses to
> get it's seqscan performance?
My tests here show that, at least on reiserfs, after a few hours of  
benchmark torture (this represents several million write queries), table  
files become significantly fragmented. I believe the table and index files  
get extended more or less simultaneously and end up somehow a bit mixed up  
on disk. Seq scan perf suffers. reiserfs doesn't have an excellent  
fragmentation behaviour... NTFS is worse than hell in this respect. So,  
pre-alloc could be a good idea. Brutal Defrag (cp /var/lib/postgresql to  
somewhere and back) gets seq scan perf back to disk throughput.
Also, by the way, InnoDB uses a BTree organized table. The advantage is  
that data is always clustered on the primary key (which means you have to  
use something as your primary key that isn't necessary "natural", you have  
to choose it to get good clustering, and you can't always do it right, so  
it somehow, in the end, sucks rather badly). Anyway, seq-scan on InnoDB is  
very slow because, as the btree grows (just like postgres indexes) pages  
are split and scanning the pages in btree order becomes a mess of seeks.  
So, seq scan in InnoDB is very very slow unless periodic OPTIMIZE TABLE is  
applied. (caveat to the postgres TODO item "implement automatic table  
clustering"...)


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: EXPLAIN omits schema?
Next
From: Andrew Dunstan
Date:
Subject: Re: Bug in UTF8-Validation Code?