Re: autocommit (true/false) for more than 1 million records - Mailing list pgsql-performance

From Felipe Santos
Subject Re: autocommit (true/false) for more than 1 million records
Date
Msg-id CAPYcRiURd0Zzg-Oo9R0126xideKz42A+tNNem+a0LW0rwzc=-w@mail.gmail.com
Whole thread Raw
In response to Re: autocommit (true/false) for more than 1 million records  (Emi Lu <emilu@encs.concordia.ca>)
List pgsql-performance
Hi Emi,

Databases that comply to the ACID standard (http://en.wikipedia.org/wiki/ACID) ensure that that are no data loss by first writing the data changes to the database log in opposition to updating the actual data on the filesystem first (on the datafiles).

Each database has its own way of doing it, but it basically consists of writing the data to the logfile at each COMMIT and writing the data to the datafile only when it's necessary.

So the COMMIT command is a way of telling the database to write the data changes to the logfile.

Both logfiles and datafiles resides on the filesystem, but why writing to the logfile is faster?

It is because the logfile is written sequentially, while the datafile is totally dispersed and may even be fragmented.

Resuming: autocommit false is faster because you avoid going to the hard disk to write the changes into the logfile, you keep them in RAM memory until you decide to write them to the logfile (at each 10K rows for instance).

Be aware that, eventually, you will need to write data to the logfile, so you can't avoid that. But usually the performance is better if you write X rows at a time to the logfile, rather than writing every and each row one by one (because of the hard disk writing overhead).

The number of rows you need to write to get a better performance will depend on your environment and is pretty much done by blind-testing the process. For millions of rows, I usually commit at each 10K or 50K rows.

Regards,

Felipe




2014-08-25 10:40 GMT-03:00 Emi Lu <emilu@encs.concordia.ca>:
Good morning,
Trying to insert into one table with 1 million records through java
JDBC into psql8.3. May I know (1) or (2) is better please?

(1) set autocommit(true)
(2) set autocommit(false)
      commit every n records (e.g., 100, 500, 1000, etc)
It depends on what you need.

Data will be available to concurrent processes earlier with (1), while
(2) will go faster.
No need to worry about the lock/loosing records because after data
loading will do a check. For now, I'd like the fastest way. Would
you suggest commit every 1000 or 3000 records?
The improvement drops off pretty quickly in my experience, but it
depends on the size of the records and other things.
The table is huge with almost 170 columns.

Try it and see..?  It's almost certainly going to depend on your
specific environment.
Can you let me know what are the "specific environment" please? Such as: ......

By the way, could someone let me know why set autocommit(false) is for sure faster than true please? Or, some online docs talk about this.

Thanks a lot!
Emi



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

pgsql-performance by date:

Previous
From: David Johnston
Date:
Subject: Re: autocommit (true/false) for more than 1 million records
Next
From: Jeff Janes
Date:
Subject: Re: autocommit (true/false) for more than 1 million records