Thread: PostgreSQL Performance issue

PostgreSQL Performance issue

From

Date:

27 April 2010, 05:27:36

Dear All Experts,

I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.

However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database.

I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.

I have a simple function in db which is being called from Java batch program to populate the records into tables from flat files.

I have the below system configuration for my database server.

Database Server	PostgreSQL v8.3.5
Operating System	Windows 2003 Server 64 bit, Service Pack 2
CPU	*2 Quad Core AMD Opteron Processor 2376 @ 2.29 GHz**
Memory	16 GB RAM
Disk Space	total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]

and I have set my postgresql.conf parameters as below.

======================================================================

#------------------------------------------------------------------------------

# RESOURCE USAGE (except WAL)

#------------------------------------------------------------------------------

# - Memory -

shared_buffers = 1GB

temp_buffers = 256MB

max_prepared_transactions = 100

work_mem = 512MB

maintenance_work_mem = 512MB

# - Free Space Map -

max_fsm_pages = 1600000

max_fsm_relations = 10000 ‘

#------------------------------------------------------------------------------

# WRITE AHEAD LOG

#------------------------------------------------------------------------------

wal_buffers = 5MB # min 32kB

checkpoint_segments = 32

checkpoint_completion_target = 0.9

#------------------------------------------------------------------------------

# QUERY TUNING

#------------------------------------------------------------------------------

# - Planner Method Configuration -

enable_hashagg = on

enable_hashjoin = on

enable_indexscan = on

enable_mergejoin = on

enable_nestloop = on

enable_seqscan = on

enable_sort = on

effective_cache_size = 8GB

========================================================================

Please advise me the best or optimum way setting these parameters to achieve better performance.

Also note that, when I am setting my shared_buffer = 2GB or high , Postgres is throwing an error “shared_buffer size cannot be more than size_t”

It would be very grateful, if anyone can help me on this.

Many thanks

Re: PostgreSQL Performance issue

From

Thom Brown

Date:

27 April 2010, 05:34:35

On 27 April 2010 09:11, <A.Bhattacharya@sungard.com> wrote:

Dear All Experts,

I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database.

I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.
I have a simple function in db which is being called from Java batch program to populate the records into tables from flat files.

I have the below system configuration for my database server.
Database Server
PostgreSQL v8.3.5
Operating System
Windows 2003 Server 64 bit, Service Pack 2
CPU
2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz
Memory
16 GB RAM
Disk Space
total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]

and I have set my postgresql.conf parameters as below.

======================================================================
#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------

# - Memory -

shared_buffers = 1GB
temp_buffers = 256MB
max_prepared_transactions = 100

work_mem = 512MB
maintenance_work_mem = 512MB


# - Free Space Map -

max_fsm_pages = 1600000

max_fsm_relations = 10000                                                                                           ‘

#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------
wal_buffers = 5MB                                            # min 32kB

checkpoint_segments = 32
checkpoint_completion_target = 0.9

#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------

# - Planner Method Configuration -

enable_hashagg = on
enable_hashjoin = on
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = on
enable_seqscan = on
enable_sort = on

effective_cache_size = 8GB

========================================================================

Please advise me the best or optimum way setting these parameters to achieve better performance.
Also note that, when I am setting my shared_buffer = 2GB or high , Postgres is throwing an error “shared_buffer size cannot be more than size_t”

It would be very grateful, if anyone can help me on this.

Many thanks

Are these all being sent in 1 transaction? Can't you use COPY to bulk insert into the database? If not, can you insert in batches (like 1,000 at a time) Have you got any triggers/constraints/complicated domains on the table you're inserting into?

Thom

Re: PostgreSQL Performance issue

From

"Nikhil G. Daddikar"

Date:

27 April 2010, 05:39:48

Most likely you are inserting one per transaction. Set autocommit to false and commit only after all the inserts are done.

-n.

On 27-04-2010 13:41, A.Bhattacharya@sungard.com wrote:

Dear All Experts,

I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database.

I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.
I have a simple function in db which is being called from Java batch program to populate the records into tables from flat files.

I have the below system configuration for my database server.
Database Server
PostgreSQL v8.3.5
Operating System
Windows 2003 Server 64 bit, Service Pack 2
CPU
2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz
Memory
16 GB RAM
Disk Space
total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]

and I have set my postgresql.conf parameters as below.

======================================================================
#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------

# - Memory -

shared_buffers = 1GB
temp_buffers = 256MB
max_prepared_transactions = 100

work_mem = 512MB
maintenance_work_mem = 512MB


# - Free Space Map -

max_fsm_pages = 1600000

max_fsm_relations = 10000                                                                                           ‘

#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------
wal_buffers = 5MB                                            # min 32kB

checkpoint_segments = 32
checkpoint_completion_target = 0.9

#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------

# - Planner Method Configuration -

enable_hashagg = on
enable_hashjoin = on
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = on
enable_seqscan = on
enable_sort = on

effective_cache_size = 8GB

========================================================================

Please advise me the best or optimum way setting these parameters to achieve better performance.
Also note that, when I am setting my shared_buffer = 2GB or high , Postgres is throwing an error “shared_buffer size cannot be more than size_t”

It would be very grateful, if anyone can help me on this.

Many thanks

Re: PostgreSQL Performance issue

From

Magnus Hagander

Date:

27 April 2010, 05:48:57

On Tue, Apr 27, 2010 at 10:11, <A.Bhattacharya@sungard.com> wrote:

Dear All Experts,

I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.

You really need to upgrade. At least to 8.3.10. It has many important bugfixes.

However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database.

I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.
I have a simple function in db which is being called from Java batch program to populate the records into tables from flat files.

I have the below system configuration for my database server.
Database Server
PostgreSQL v8.3.5
Operating System
Windows 2003 Server 64 bit, Service Pack 2
CPU
2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz
Memory
16 GB RAM
Disk Space
total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]

The interesting point is not how much disk you have, but what configuration you have it in. Eitehr way, 622k records in 4 1/2 hours is obviously crappy even for a single disk though.

and I have set my postgresql.conf parameters as below.

shared_buffers = 1GB

You might want to try to lower that one drastically, say 128Mb. In some cases, this has been known to give better performance on Windows. not in all case though, so you have to try it out.

temp_buffers = 256MB
max_prepared_transactions = 100

Are you really using 2-phase commit on the system? If not, turn this off. This is prepared transactions, not prepared statements.

========================================================================

Please advise me the best or optimum way setting these parameters to achieve better performance.
Also note that, when I am setting my shared_buffer = 2GB or high , Postgres is throwing an error “shared_buffer size cannot be more than size_t”

That is normal since your binary is 32-bit. In fact, having shared_buffers at 1Gb may give you some trouble with your fairly high work_mem as well, as the *total* amount of memory in the process is limited. That's another reason to try a lower shared_buffers.

(other than that, read the comments from Thom)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: PostgreSQL Performance issue

From

John R Pierce

Date:

27 April 2010, 05:58:49

A.Bhattacharya@sungard.com wrote:
>
> Dear All Experts,
>
> I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
>
> However, I have a batch program written in Java which processes the
> data and populates them into tables in Postgres database.
>
> I have *622,000 number of records *but it is taking almost *4 and half
> hours* to load these data into the tables.
>
> I have a simple function in db which is being called from Java batch
> program to populate the records into tables from flat files.
>
> I have the below system configuration for my database server.
>
> Database Server
>
>
>
> *PostgreSQL v8.3.5*
>
> Operating System
>
>
>
> *Windows 2003 Server 64 bit, Service Pack 2*
>
> CPU
>
>
>
> *2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz*
>
> Memory
>
>
>
> *16 GB RAM*
>
> Disk Space
>
>
>
> *total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]*
>
...

with one thread doing inserts, the other 7 cores will be idle. but
you're almost certainly disk IO bound.

OTOH, if you're calling a function (is that pl-pgsql, pl-java, pl-perl,
or what?) for each insert, you could be compute bound on that single
core. really depends on what that function is doing.

Re: PostgreSQL Performance issue

From

Alban Hertroys

Date:

27 April 2010, 06:18:09

On 27 Apr 2010, at 10:11, <A.Bhattacharya@sungard.com> <A.Bhattacharya@sungard.com> wrote:

> Dear All Experts,
>
> I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
> However, I have a batch program written in Java which processes the data and populates them into tables in Postgres
database.
>
> I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.

Wow, that's averaging almost 40s per insert! I don't think those inserts not being in a single transaction explains
thatmuch of a delay. Not that changing that doesn't help, but there's more going on here. 

Maybe you got some foreign key constraints involved that don't have indices on the foreign keys? In that case I expect
thatyou either have a lot of foreign references, only a few but from rather large tables (several million rows at
least)or something in between? 

Any other constraints that could be relevant?

> I have a simple function in db which is being called from Java batch program to populate the records into tables from
flatfiles. 

Did you verify that most of the time is spent waiting on the database?

I'm not entirely sure what you mean by the above. Do you have a batch program that starts the java interpreter for each
flatfile? 

Are they running synchronously (one after the other) or parallel? In the latter case you may be waiting on locks.

Is the data that your program needs to insert in one line in the flat file or does it need to collect data from
multiplelines throughout the file? 

How much memory does your java program use, could it be that it causes postgres to be swapped out?

Did you do any benchmarking on your "simple function in db" or on the queries it performs (assuming it does perform
any)?

> I have the below system configuration for my database server.
> Database Server
> PostgreSQL v8.3.5
> Operating System
> Windows 2003 Server 64 bit, Service Pack 2
> CPU
> 2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz
> Memory
> 16 GB RAM
> Disk Space
> total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]

A system like that should be able to insert that small a number of records in no time.

> Also note that, when I am setting my shared_buffer = 2GB or high , Postgres is throwing an error “shared_buffer size
cannotbe more than size_t” 

That's odd... Is this a 32-bit Postgres build or is a 64-bit Windows incapable of assigning more than a 32-bit number
forthe amount of shared memory? Are you running in some kind of 32-bit compatibility mode maybe (PAE comes to mind)? 

That said, I haven't used Windows for anything more serious than gaming since last century - I'm not exactly an expert
onits behaviour. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4bd6abc310411173714063!

Re: PostgreSQL Performance issue

From

Date:

27 April 2010, 06:31:55

Thanks a lot for your help. However I am new to Postgres database therefore it would be nice if you can let me know how to set autocommit off.

I know from psql client issuing “\set Autocommit Off” would set it off but unfortunately it doesn’t set it off.

Many thanks for your help.

From: Magnus Hagander [mailto:magnus@hagander.net]
Sent: Tuesday, April 27, 2010 2:19 PM
To: Bhattacharya, A
Cc: pgsql-general
Subject: Re: [GENERAL] PostgreSQL Performance issue

On Tue, Apr 27, 2010 at 10:11, <A.Bhattacharya@sungard.com> wrote:

Dear All Experts,

I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.

You really need to upgrade. At least to 8.3.10. It has many important bugfixes.

However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database.

I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.
I have a simple function in db which is being called from Java batch program to populate the records into tables from flat files.

I have the below system configuration for my database server.
Database Server
PostgreSQL v8.3.5
Operating System
Windows 2003 Server 64 bit, Service Pack 2
CPU
2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz
Memory
16 GB RAM
Disk Space
total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]

The interesting point is not how much disk you have, but what configuration you have it in. Eitehr way, 622k records in 4 1/2 hours is obviously crappy even for a single disk though.

and I have set my postgresql.conf parameters as below.

shared_buffers = 1GB

You might want to try to lower that one drastically, say 128Mb. In some cases, this has been known to give better performance on Windows. not in all case though, so you have to try it out.

temp_buffers = 256MB
max_prepared_transactions = 100

Are you really using 2-phase commit on the system? If not, turn this off. This is prepared transactions, not prepared statements.

========================================================================

Please advise me the best or optimum way setting these parameters to achieve better performance.
Also note that, when I am setting my shared_buffer = 2GB or high , Postgres is throwing an error “shared_buffer size cannot be more than size_t”

(other than that, read the comments from Thom)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: PostgreSQL Performance issue

From

"Rob Richardson"

Date:

27 April 2010, 10:12:35

I am curious to know how much of your delay is due to PostgreSQL and how much to your Java batch program. If you comment out the call to the database function, so that you are reading your input file but not doing anything with the data, how long does your batch program take to run?

RobR

Re: PostgreSQL Performance issue

From

Alban Hertroys

Date:

27 April 2010, 10:22:07

On 27 Apr 2010, at 11:15, <A.Bhattacharya@sungard.com> <A.Bhattacharya@sungard.com> wrote:

> Thanks a lot for your help. However I am new to Postgres database therefore it would be nice if you can let me know
howto set autocommit off. 
> I know from psql client issuing “\set Autocommit Off” would set it off but unfortunately it doesn’t set it off.

It's a client-side setting, not a server-side one. Hence you need to turn it off in your java program, probably in the
databaseconnection settings. 

P.S. Your message was so mangled that I couldn't find which part of the original message you were referring to, so I
deletedthe remainder. Top-posting is considered bad form in mailing-lists. 

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.

!DSPAM:737,4bd6e4f110411684215286!

Re: PostgreSQL Performance issue

From

Merlin Moncure

Date:

27 April 2010, 11:31:27

On Tue, Apr 27, 2010 at 5:17 AM, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:
> On 27 Apr 2010, at 10:11, <A.Bhattacharya@sungard.com> <A.Bhattacharya@sungard.com> wrote:
>
>> Dear All Experts,
>>
>> I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
>> However, I have a batch program written in Java which processes the data and populates them into tables in Postgres
database.
>>
>> I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.
>
> Wow, that's averaging almost 40s per insert! I don't think those inserts not being in a single transaction explains
thatmuch of a delay. Not that changing that doesn't help, but there's more going on here. 

You got that backwards: it's 40 inserts/sec.  This is about par for
the course on windows style fsync on slow hardware iirc.  Solution is
to use transactions, or play with fsync.

merlin

Re: PostgreSQL Performance issue

From

"Bayless Kirtley"

Date:

27 April 2010, 11:32:26


> Thanks a lot for your help. However I am new to Postgres database
> therefore it would be nice if you can let me know how to set autocommit
> off.
> I know from psql client issuing “\set Autocommit Off” would set it off but
> unfortunately it doesn’t set it off.

It's a client-side setting, not a server-side one. Hence you need to turn it
off in your java program, probably in the database connection settings.

P.S. Your message was so mangled that I couldn't find which part of the
original message you were referring to, so I deleted the remainder.
Top-posting is considered bad form in mailing-lists.

Alban Hertroys

--------------------------------------------------------------------------------------------------------

Actually, top posting is considered bad form in THIS mailing list. All the
others I participate in top post regularly. That said though, it is
considered good form to post in the manner expected by the list that one is
posting to. ;-)

Bayless

Re: PostgreSQL Performance issue

From

Greg Smith

Date:

27 April 2010, 13:14:54

A.Bhattacharya@sungard.com wrote:
> I have *622,000 number of records *but it is taking almost *4 and half
> hours* to load these data into the tables.
>
> I have a simple function in db which is being called from Java batch
> program to populate the records into tables from flat files.
>

Four likely possibilities here, in order of how easy they are to test
for and therefore resolve:

1) Your hardware doesn't handle commits very well.  You can turn off the
synchronous_commit command while doing the data loading to see if that
helps.  See http://www.postgresql.org/docs/8.3/static/wal-async-commit.html

2) You're doing a commit after every single transaction.  See
http://java.sun.com/docs/books/tutorial/jdbc/basics/transactions.html
for an intro to disabling this.  Common practice is to commit every 100
to 1000 transactions instead.

3) The simple function in your database is executing very slowly.

4) The design of the database involves a lot of overhead, such as
difficult to process foreign keys or an excess of indexes on some tables.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: PostgreSQL Performance issue

From

Vincenzo Romano

Date:

27 April 2010, 13:23:22

2010/4/27 Greg Smith <greg@2ndquadrant.com>:
> A.Bhattacharya@sungard.com wrote:
>>
>> I have *622,000 number of records *but it is taking almost *4 and half
>> hours* to load these data into the tables.

Without the schema and the queries, all you can get is guessing.

--
Vincenzo Romano
NotOrAnd Information Technologies
NON QVIETIS MARIBVS NAVTA PERITVS

Re: PostgreSQL Performance issue

From

Date:

28 April 2010, 02:10:02

RobR

The Java program hardly takes a minute to process all the flat files but at the time inserting the records into the db, the entire process takes more than 4 hours.

Many thanks

Re: PostgreSQL Performance issue

From

Date:

29 April 2010, 20:57:17

Hello there,

1. Try using COPY Command, you will see significant decrease in the loading time.

2. Turn off auto commit and Remove foreign key constraints if it is only one time load - this will also help in decreasing the load time.

Try these options and let us know how it went.

We load around 6M rows of data into a table using copy command it takes few mins to load the data and system configuration is not that high too. Also one more thing we use linux box over here.

Do a small test as to how long will it take to do 1000 inserts into a similar table and send us your timings and definition of the table. I will compare against mine.

Thanks

Deepak

On Tue, Apr 27, 2010 at 10:09 PM, <A.Bhattacharya@sungard.com> wrote:

I am curious to know how much of your delay is due to PostgreSQL and how much to your Java batch program. If you comment out the call to the database function, so that you are reading your input file but not doing anything with the data, how long does your batch program take to run?

RobR

--
The Java program hardly takes a minute to process all the flat files but at the time inserting the records into the db, the entire process takes more than 4 hours.

Many thanks