Thread: large inserts and fsync
Forgive me if this is a many-times rehashed topic. I’m very new to postgresql, most of my background is in Oracle.
Running postgres 8.2.5 with one master and three slaves (using slony)
For an upcoming release there is a 16 million row insert that on our test cluster takes about 2.5 hours to complete with all indices dropped beforehand.
If I turn off fsync, it completes in under 10 minutes.
Other than the protection that fsync will allow me should there be a crash in the middle of such a process, my thinking was to turn off fsync for this part of the release, get the insert done, then restart fsync once the insert is complete.
Am I opening myself up to any dangers that aren’t obvious by doing this? Any advice to the contrary? And of course, if I am out of my mind for doing this, please let me know.
Thanks in advance
Aaron
On Fri, Sep 05, 2008 at 09:16:41AM -0400, Aaron Burnett wrote: > For an upcoming release there is a 16 million row insert that on our test > cluster takes about 2.5 hours to complete with all indices dropped > beforehand. > > If I turn off fsync, it completes in under 10 minutes. Have you tried bundling all the INSERT statements into a single transaction? If you haven't then PG will run each statement in its own transaction and then commit each INSERT statement to disk separately, incurring large overheads. Sam
Yes, the developer already made sure of that and I verified. On 9/5/08 11:10 AM, "Sam Mason" <sam@samason.me.uk> wrote: > On Fri, Sep 05, 2008 at 09:16:41AM -0400, Aaron Burnett wrote: >> For an upcoming release there is a 16 million row insert that on our test >> cluster takes about 2.5 hours to complete with all indices dropped >> beforehand. >> >> If I turn off fsync, it completes in under 10 minutes. > > Have you tried bundling all the INSERT statements into a single > transaction? If you haven't then PG will run each statement in its own > transaction and then commit each INSERT statement to disk separately, > incurring large overheads. > > > Sam
On Fri, Sep 05, 2008 at 11:19:13AM -0400, Aaron Burnett wrote: > On 9/5/08 11:10 AM, "Sam Mason" <sam@samason.me.uk> wrote: > > On Fri, Sep 05, 2008 at 09:16:41AM -0400, Aaron Burnett wrote: > >> For an upcoming release there is a 16 million row insert that on our test > >> cluster takes about 2.5 hours to complete with all indices dropped > >> beforehand. > >> > >> If I turn off fsync, it completes in under 10 minutes. > > > > Have you tried bundling all the INSERT statements into a single > > transaction? > > Yes, the developer already made sure of that and I verified. I was under the impression that the only time PG synced the data to disk was when the transaction was COMMITed. I've never needed to turn off fsync for performance reasons even when pulling in hundreds of millions of rows. I do tend to use a single large COPY rather than many small INSERT statements. PG spends an inordinate amount of time parsing millions of SQL statements, whereas a tab delimited file is much easier to parse. Could you try bumping "checkpoint_segments" up a bit? or have you tried that already? Sam
> > > Have you tried bundling all the INSERT statements into a single > > > transaction? > > > > Yes, the developer already made sure of that and I verified. I would verify that again, because fsync shouldn't make much of a difference in that circumstance. I might not do all 16 million in one transaction, but if you're doing 10 or 100 thousand at a time, it should be pretty fast. A language-level auto-commit remains to be disabled, perhaps? -- Alan
Aaron Burnett <aburnett@bzzagent.com> writes: > On 9/5/08 11:10 AM, "Sam Mason" <sam@samason.me.uk> wrote: >> Have you tried bundling all the INSERT statements into a single >> transaction? > Yes, the developer already made sure of that and I verified. Hmm, in that case the penalty probably comes from pushing WAL data out to disk synchronously. It might be worth playing with wal_sync_method and/or raising wal_buffers. The trouble with turning fsync off is that a system crash midway through the import might leave you with a corrupt database. If you're willing to start over from initdb then okay, but if you are importing into a database that already contains valuable data, I wouldn't recommend it. regards, tom lane
On Fri, 5 Sep 2008, Tom Lane wrote: > The trouble with turning fsync off is that a system crash midway through > the import might leave you with a corrupt database. If you're willing > to start over from initdb then okay, but if you are importing into a > database that already contains valuable data, I wouldn't recommend it. If you have enough disk space, realistically if you're running with fsync off you should setup enough PITR features to get a base backup first, or just copy the database directory if you can take the server down a bit. Then your worst case becomes just starting over from that backup rather than initdb. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Yeah, the backup is standard operating procedure before the start of the release process.
I'm going to try a few of the suggestions offered here first, then fall back on the fsync option if I can't get an appreciable increase in speed.
thanks all for the help and thoughts.
-----Original Message-----
From: Greg Smith [mailto:gsmith@gregsmith.com]
Sent: Sat 9/6/2008 4:45 AM
To: Tom Lane
Cc: Aaron Burnett; Sam Mason; pgsql-general@postgresql.org
Subject: Re: [GENERAL] large inserts and fsync
On Fri, 5 Sep 2008, Tom Lane wrote:
> The trouble with turning fsync off is that a system crash midway through
> the import might leave you with a corrupt database. If you're willing
> to start over from initdb then okay, but if you are importing into a
> database that already contains valuable data, I wouldn't recommend it.
If you have enough disk space, realistically if you're running with fsync
off you should setup enough PITR features to get a base backup first, or
just copy the database directory if you can take the server down a bit.
Then your worst case becomes just starting over from that backup rather
than initdb.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD