Thread: PostgreSQL Performance issue
Dear All Experts,
I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database.
I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.
I have a simple function in db which is being called from Java batch program to populate the records into tables from flat files.
I have the below system configuration for my database server.
Database Server | PostgreSQL v8.3.5 |
Operating System | Windows 2003 Server 64 bit, Service Pack 2 |
CPU | 2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz |
Memory | 16 GB RAM |
Disk Space | total 2.5 TB [C drive – 454 GB & D drive 1.99 TB] |
and I have set my postgresql.conf parameters as below.
======================================================================
#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------
# - Memory -
shared_buffers = 1GB
temp_buffers = 256MB
max_prepared_transactions = 100
work_mem = 512MB
maintenance_work_mem = 512MB
# - Free Space Map -
max_fsm_pages = 1600000
max_fsm_relations = 10000 ‘
#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------
wal_buffers = 5MB # min 32kB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------
# - Planner Method Configuration -
enable_hashagg = on
enable_hashjoin = on
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = on
enable_seqscan = on
enable_sort = on
effective_cache_size = 8GB
========================================================================
Please advise me the best or optimum way setting these parameters to achieve better performance.
Also note that, when I am setting my shared_buffer = 2GB or high , Postgres is throwing an error “shared_buffer size cannot be more than size_t”
It would be very grateful, if anyone can help me on this.
Many thanks
Dear All Experts,
I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database.
I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.
I have a simple function in db which is being called from Java batch program to populate the records into tables from flat files.
I have the below system configuration for my database server.
Database Server
PostgreSQL v8.3.5
Operating System
Windows 2003 Server 64 bit, Service Pack 2
CPU
2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz
Memory
16 GB RAM
Disk Space
total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]
and I have set my postgresql.conf parameters as below.
======================================================================
#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------
# - Memory -
shared_buffers = 1GB
temp_buffers = 256MB
max_prepared_transactions = 100
work_mem = 512MB
maintenance_work_mem = 512MB
# - Free Space Map -
max_fsm_pages = 1600000
max_fsm_relations = 10000 ‘
#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------
wal_buffers = 5MB # min 32kB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------
# - Planner Method Configuration -
enable_hashagg = on
enable_hashjoin = on
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = on
enable_seqscan = on
enable_sort = on
effective_cache_size = 8GB
========================================================================
Please advise me the best or optimum way setting these parameters to achieve better performance.
Also note that, when I am setting my shared_buffer = 2GB or high , Postgres is throwing an error “shared_buffer size cannot be more than size_t”
It would be very grateful, if anyone can help me on this.
Many thanks
Thom
Most likely you are inserting one per transaction. Set autocommit to false and commit only after all the inserts are done.
-n.
On 27-04-2010 13:41, A.Bhattacharya@sungard.com wrote:
Dear All Experts,
I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database.
I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.
I have a simple function in db which is being called from Java batch program to populate the records into tables from flat files.
I have the below system configuration for my database server.
Database Server
PostgreSQL v8.3.5
Operating System
Windows 2003 Server 64 bit, Service Pack 2
CPU
2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz
Memory
16 GB RAM
Disk Space
total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]
and I have set my postgresql.conf parameters as below.
======================================================================
#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------
# - Memory -
shared_buffers = 1GB
temp_buffers = 256MB
max_prepared_transactions = 100
work_mem = 512MB
maintenance_work_mem = 512MB
# - Free Space Map -
max_fsm_pages = 1600000
max_fsm_relations = 10000 ‘
#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------
wal_buffers = 5MB # min 32kB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------
# - Planner Method Configuration -
enable_hashagg = on
enable_hashjoin = on
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = on
enable_seqscan = on
enable_sort = on
effective_cache_size = 8GB
========================================================================
Please advise me the best or optimum way setting these parameters to achieve better performance.
Also note that, when I am setting my shared_buffer = 2GB or high , Postgres is throwing an error “shared_buffer size cannot be more than size_t”
It would be very grateful, if anyone can help me on this.
Many thanks
Dear All Experts,
I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
You really need to upgrade. At least to 8.3.10. It has many important bugfixes.
However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database.
I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.
I have a simple function in db which is being called from Java batch program to populate the records into tables from flat files.
I have the below system configuration for my database server.
Database Server
PostgreSQL v8.3.5
Operating System
Windows 2003 Server 64 bit, Service Pack 2
CPU
2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz
Memory
16 GB RAM
Disk Space
total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]
The interesting point is not how much disk you have, but what configuration you have it in. Eitehr way, 622k records in 4 1/2 hours is obviously crappy even for a single disk though.
and I have set my postgresql.conf parameters as below.
shared_buffers = 1GB
You might want to try to lower that one drastically, say 128Mb. In some cases, this has been known to give better performance on Windows. not in all case though, so you have to try it out.
temp_buffers = 256MB
max_prepared_transactions = 100
Are you really using 2-phase commit on the system? If not, turn this off. This is prepared transactions, not prepared statements.
========================================================================
Please advise me the best or optimum way setting these parameters to achieve better performance.
Also note that, when I am setting my shared_buffer = 2GB or high , Postgres is throwing an error “shared_buffer size cannot be more than size_t”
That is normal since your binary is 32-bit. In fact, having shared_buffers at 1Gb may give you some trouble with your fairly high work_mem as well, as the *total* amount of memory in the process is limited. That's another reason to try a lower shared_buffers.
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
A.Bhattacharya@sungard.com wrote: > > Dear All Experts, > > I am using in PostgreSQL 8.3.5 database on windows 64 bit OS. > > However, I have a batch program written in Java which processes the > data and populates them into tables in Postgres database. > > I have *622,000 number of records *but it is taking almost *4 and half > hours* to load these data into the tables. > > I have a simple function in db which is being called from Java batch > program to populate the records into tables from flat files. > > I have the below system configuration for my database server. > > Database Server > > > > *PostgreSQL v8.3.5* > > Operating System > > > > *Windows 2003 Server 64 bit, Service Pack 2* > > CPU > > > > *2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz* > > Memory > > > > *16 GB RAM* > > Disk Space > > > > *total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]* > ... with one thread doing inserts, the other 7 cores will be idle. but you're almost certainly disk IO bound. OTOH, if you're calling a function (is that pl-pgsql, pl-java, pl-perl, or what?) for each insert, you could be compute bound on that single core. really depends on what that function is doing.
On 27 Apr 2010, at 10:11, <A.Bhattacharya@sungard.com> <A.Bhattacharya@sungard.com> wrote: > Dear All Experts, > > I am using in PostgreSQL 8.3.5 database on windows 64 bit OS. > However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database. > > I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables. Wow, that's averaging almost 40s per insert! I don't think those inserts not being in a single transaction explains thatmuch of a delay. Not that changing that doesn't help, but there's more going on here. Maybe you got some foreign key constraints involved that don't have indices on the foreign keys? In that case I expect thatyou either have a lot of foreign references, only a few but from rather large tables (several million rows at least)or something in between? Any other constraints that could be relevant? > I have a simple function in db which is being called from Java batch program to populate the records into tables from flatfiles. Did you verify that most of the time is spent waiting on the database? I'm not entirely sure what you mean by the above. Do you have a batch program that starts the java interpreter for each flatfile? Are they running synchronously (one after the other) or parallel? In the latter case you may be waiting on locks. Is the data that your program needs to insert in one line in the flat file or does it need to collect data from multiplelines throughout the file? How much memory does your java program use, could it be that it causes postgres to be swapped out? Did you do any benchmarking on your "simple function in db" or on the queries it performs (assuming it does perform any)? > I have the below system configuration for my database server. > Database Server > PostgreSQL v8.3.5 > Operating System > Windows 2003 Server 64 bit, Service Pack 2 > CPU > 2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz > Memory > 16 GB RAM > Disk Space > total 2.5 TB [C drive – 454 GB & D drive 1.99 TB] A system like that should be able to insert that small a number of records in no time. > Also note that, when I am setting my shared_buffer = 2GB or high , Postgres is throwing an error “shared_buffer size cannotbe more than size_t” That's odd... Is this a 32-bit Postgres build or is a 64-bit Windows incapable of assigning more than a 32-bit number forthe amount of shared memory? Are you running in some kind of 32-bit compatibility mode maybe (PAE comes to mind)? That said, I haven't used Windows for anything more serious than gaming since last century - I'm not exactly an expert onits behaviour. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4bd6abc310411173714063!
Thanks a lot for your help. However I am new to Postgres database therefore it would be nice if you can let me know how to set autocommit off.
I know from psql client issuing “\set Autocommit Off” would set it off but unfortunately it doesn’t set it off.
Many thanks for your help.
From: Magnus Hagander [mailto:magnus@hagander.net]
Sent: Tuesday, April 27, 2010 2:19 PM
To: Bhattacharya, A
Cc: pgsql-general
Subject: Re: [GENERAL] PostgreSQL Performance issue
On Tue, Apr 27, 2010 at 10:11, <A.Bhattacharya@sungard.com> wrote:
Dear All Experts,
I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
You really need to upgrade. At least to 8.3.10. It has many important bugfixes.
However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database.
I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.
I have a simple function in db which is being called from Java batch program to populate the records into tables from flat files.
I have the below system configuration for my database server.
Database Server
PostgreSQL v8.3.5
Operating System
Windows 2003 Server 64 bit, Service Pack 2
CPU
2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz
Memory
16 GB RAM
Disk Space
total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]
The interesting point is not how much disk you have, but what configuration you have it in. Eitehr way, 622k records in 4 1/2 hours is obviously crappy even for a single disk though.
and I have set my postgresql.conf parameters as below.
shared_buffers = 1GB
You might want to try to lower that one drastically, say 128Mb. In some cases, this has been known to give better performance on Windows. not in all case though, so you have to try it out.
temp_buffers = 256MB
max_prepared_transactions = 100
Are you really using 2-phase commit on the system? If not, turn this off. This is prepared transactions, not prepared statements.
========================================================================
Please advise me the best or optimum way setting these parameters to achieve better performance.
Also note that, when I am setting my shared_buffer = 2GB or high , Postgres is throwing an error “shared_buffer size cannot be more than size_t”
That is normal since your binary is 32-bit. In fact, having shared_buffers at 1Gb may give you some trouble with your fairly high work_mem as well, as the *total* amount of memory in the process is limited. That's another reason to try a lower shared_buffers.
(other than that, read the comments from Thom)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On 27 Apr 2010, at 11:15, <A.Bhattacharya@sungard.com> <A.Bhattacharya@sungard.com> wrote: > Thanks a lot for your help. However I am new to Postgres database therefore it would be nice if you can let me know howto set autocommit off. > I know from psql client issuing “\set Autocommit Off” would set it off but unfortunately it doesn’t set it off. It's a client-side setting, not a server-side one. Hence you need to turn it off in your java program, probably in the databaseconnection settings. P.S. Your message was so mangled that I couldn't find which part of the original message you were referring to, so I deletedthe remainder. Top-posting is considered bad form in mailing-lists. Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling. !DSPAM:737,4bd6e4f110411684215286!
On Tue, Apr 27, 2010 at 5:17 AM, Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote: > On 27 Apr 2010, at 10:11, <A.Bhattacharya@sungard.com> <A.Bhattacharya@sungard.com> wrote: > >> Dear All Experts, >> >> I am using in PostgreSQL 8.3.5 database on windows 64 bit OS. >> However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database. >> >> I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables. > > Wow, that's averaging almost 40s per insert! I don't think those inserts not being in a single transaction explains thatmuch of a delay. Not that changing that doesn't help, but there's more going on here. You got that backwards: it's 40 inserts/sec. This is about par for the course on windows style fsync on slow hardware iirc. Solution is to use transactions, or play with fsync. merlin
> Thanks a lot for your help. However I am new to Postgres database > therefore it would be nice if you can let me know how to set autocommit > off. > I know from psql client issuing “\set Autocommit Off” would set it off but > unfortunately it doesn’t set it off. It's a client-side setting, not a server-side one. Hence you need to turn it off in your java program, probably in the database connection settings. P.S. Your message was so mangled that I couldn't find which part of the original message you were referring to, so I deleted the remainder. Top-posting is considered bad form in mailing-lists. Alban Hertroys -------------------------------------------------------------------------------------------------------- Actually, top posting is considered bad form in THIS mailing list. All the others I participate in top post regularly. That said though, it is considered good form to post in the manner expected by the list that one is posting to. ;-) Bayless
A.Bhattacharya@sungard.com wrote: > I have *622,000 number of records *but it is taking almost *4 and half > hours* to load these data into the tables. > > I have a simple function in db which is being called from Java batch > program to populate the records into tables from flat files. > Four likely possibilities here, in order of how easy they are to test for and therefore resolve: 1) Your hardware doesn't handle commits very well. You can turn off the synchronous_commit command while doing the data loading to see if that helps. See http://www.postgresql.org/docs/8.3/static/wal-async-commit.html 2) You're doing a commit after every single transaction. See http://java.sun.com/docs/books/tutorial/jdbc/basics/transactions.html for an intro to disabling this. Common practice is to commit every 100 to 1000 transactions instead. 3) The simple function in your database is executing very slowly. 4) The design of the database involves a lot of overhead, such as difficult to process foreign keys or an excess of indexes on some tables. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
2010/4/27 Greg Smith <greg@2ndquadrant.com>: > A.Bhattacharya@sungard.com wrote: >> >> I have *622,000 number of records *but it is taking almost *4 and half >> hours* to load these data into the tables. Without the schema and the queries, all you can get is guessing. -- Vincenzo Romano NotOrAnd Information Technologies NON QVIETIS MARIBVS NAVTA PERITVS
I am curious to know how much of your delay is due to PostgreSQL and how much to your Java batch program. If you comment out the call to the database function, so that you are reading your input file but not doing anything with the data, how long does your batch program take to run?
RobR
--
The Java program hardly takes a minute to process all the flat files but at the time inserting the records into the db, the entire process takes more than 4 hours.
Many thanks
I am curious to know how much of your delay is due to PostgreSQL and how much to your Java batch program. If you comment out the call to the database function, so that you are reading your input file but not doing anything with the data, how long does your batch program take to run?
RobR
--
The Java program hardly takes a minute to process all the flat files but at the time inserting the records into the db, the entire process takes more than 4 hours.
Many thanks