Re: pg_restore takes ages - Mailing list pgsql-general
From | scott.marlowe |
---|---|
Subject | Re: pg_restore takes ages |
Date | |
Msg-id | Pine.LNX.4.33.0310031607310.28368-100000@css120.ihs.com Whole thread Raw |
In response to | Re: pg_restore takes ages (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: pg_restore takes ages
|
List | pgsql-general |
On Fri, 3 Oct 2003, Tom Lane wrote: > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > Yes, you are. Basically, with fsync on, things have to happen in order. > > I.e. > > write to WAL what you're gonna do. WAIT for confirmation on write > > write the tuples out. wait for confirmation > > checkpoint the WAL. wait for confirmation > > Not really. With fsync on, we *only* sync the WAL writes. Data writes > can happen whenever, so long as we know the corresponding WAL writes > went down first. We only wait for data writes to complete before > considering that a checkpoint is complete --- which is something that is > not in the main line of execution and doesn't block other activity. > > This is one good reason for keeping WAL on a separate drive from the > data files --- you are then freeing the system to schedule data I/O as > optimally as it can. Oh, Ok. That's why the WAL is such a choking point. > > Note that if you're running on IDE drives, you already ARE probably > > running with fsync off if write caching is enabled, so you'll need to turn > > it off (hdparm -W0 /dev/hdx in linux) to ensure fsync actually works. > > It'd be interesting to think about whether a write-caching IDE drive > could safely be used for data storage, if WAL is elsewhere. Well, I just so happen to have a machine with two drives in it. I'll get back to you on that. > Right offhand I think the only problem is how to know when it's safe > to consider a checkpoint complete. Maybe all that would be needed is > a long enough time delay after issuing sync(2) in the checkpoint code. > Do these drives guarantee "data will be written within 30 seconds" or > something like that? Or can the delay be indefinite when load is heavy? I don't know. My guess is that they probably don't wait too long, they just wait until they have enough to write to make it "worth their while" i.e. they don't write 8k at a time, they write 64k or 512k at a time, now that many have 8 Meg of cache, it would make sense to group writes to one area together. I'll test this out. Do you think "pgbench -c 50 -t 100000000" is enough thrash, or do I need more connections at once? The machine I'm on has 2 gig ram, so it can probably open several hundred connections, but the time to bring the database back up by replaying the WAL with hundreds of concurrent transactions is gonna be a bit.
pgsql-general by date: