Thread: High update activity, PostgreSQL vs BigDBMS
I don't want to violate any license agreement by discussing performance, so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as BigDBMS here. I'm trying to convince my employer to replace BigDBMS with PostgreSQL for at least some of our Java applications. As a proof of concept, I started with a high-volume (but conceptually simple) network data collection application. This application collects files of 5-minute usage statistics from our network devices, and stores a raw form of these stats into one table and a normalized form into a second table. We are currently storing about 12 million rows a day in the normalized table, and each month we start new tables. For the normalized data, the app inserts rows initialized to zero for the entire current day first thing in the morning, then throughout the day as stats are received, executes updates against existing rows. So the app has very high update activity. In my test environment, I have a dual-x86 Linux platform running the application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and PostgreSQL 8.2.0 (only one at a time.) The Sun box has 4 disk arrays attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those familiar with these devices.) The arrays are set up with RAID5. So I'm working with a consistent hardware platform for this comparison. I'm only processing a small subset of files (144.) BigDBMS processed this set of data in 20000 seconds, with all foreign keys in place. With all foreign keys in place, PG took 54000 seconds to complete the same job. I've tried various approaches to autovacuum (none, 30-seconds) and it doesn't seem to make much difference. What does seem to make a difference is eliminating all the foreign keys; in that configuration, PG takes about 30000 seconds. Better, but BigDBMS still has it beat significantly. I've got PG configured so that that the system database is on disk array 2, as are the transaction log files. The default table space for the test database is disk array 3. I've got all the reference tables (the tables to which the foreign keys in the stats tables refer) on this array. I also store the stats tables on this array. Finally, I put the indexes for the stats tables on disk array 4. I don't use disk array 1 because I believe it is a software array. I'm out of ideas how to improve this picture any further. I'd appreciate some suggestions. Thanks. -- Guy Rouillier
Here are my few recommendations that might help you:
- You will need to do table partitioning (http://www.postgresql.org/docs/current/static/ddl-partitioning.html ) as you are storing quite a lot of data in one table per day.
- You are using a RAID5 setup which is something that can also affect performance so switching to RAID1 might help you there, but again you have a RAID5 with 12 disks so hmm that shouldn't be that much of a problem.
- Have you done the tuning for postgresql.conf parameters? if not then you really need to do this for like checkpoint segments, random page cost, shared buffers, cache size, fsm pages, vacuum cost delay, work_mem, bgwriter etc etc. You can get good advice for tuning these parameters at --> http://www.powerpostgresql.com/PerfList/
- For autovacuuming you need to properly tune the thresholds so that the vacuum and analyze is done at the right time not affecting the database server performance. (You can find help for this at http://www.postgresql.org/docs/current/static/routine-vacuuming.html under "22.1.4. The auto-vacuum daemon")
- You will need to separate your transactional logs i.e. pg_xlog folder to a different drive other then your database server drive. This can be done by creating symlinks for pg_xlog folder.
- I hope you are doing proper connection pool management, because good use of database connections can be really effect the overall performance, connections can be expensive to create, and consume memory if they are not properly exited.
Hope that helps your tests...
----------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
- You will need to do table partitioning (http://www.postgresql.org/docs/current/static/ddl-partitioning.html ) as you are storing quite a lot of data in one table per day.
- You are using a RAID5 setup which is something that can also affect performance so switching to RAID1 might help you there, but again you have a RAID5 with 12 disks so hmm that shouldn't be that much of a problem.
- Have you done the tuning for postgresql.conf parameters? if not then you really need to do this for like checkpoint segments, random page cost, shared buffers, cache size, fsm pages, vacuum cost delay, work_mem, bgwriter etc etc. You can get good advice for tuning these parameters at --> http://www.powerpostgresql.com/PerfList/
- For autovacuuming you need to properly tune the thresholds so that the vacuum and analyze is done at the right time not affecting the database server performance. (You can find help for this at http://www.postgresql.org/docs/current/static/routine-vacuuming.html under "22.1.4. The auto-vacuum daemon")
- You will need to separate your transactional logs i.e. pg_xlog folder to a different drive other then your database server drive. This can be done by creating symlinks for pg_xlog folder.
- I hope you are doing proper connection pool management, because good use of database connections can be really effect the overall performance, connections can be expensive to create, and consume memory if they are not properly exited.
Hope that helps your tests...
----------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
On 12/28/06, Guy Rouillier <guyr-ml1@burntmail.com > wrote:
I don't want to violate any license agreement by discussing performance,
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as
BigDBMS here.
I'm trying to convince my employer to replace BigDBMS with PostgreSQL
for at least some of our Java applications. As a proof of concept, I
started with a high-volume (but conceptually simple) network data
collection application. This application collects files of 5-minute
usage statistics from our network devices, and stores a raw form of
these stats into one table and a normalized form into a second table.
We are currently storing about 12 million rows a day in the normalized
table, and each month we start new tables. For the normalized data, the
app inserts rows initialized to zero for the entire current day first
thing in the morning, then throughout the day as stats are received,
executes updates against existing rows. So the app has very high update
activity.
In my test environment, I have a dual-x86 Linux platform running the
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and
PostgreSQL 8.2.0 (only one at a time.) The Sun box has 4 disk arrays
attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those
familiar with these devices.) The arrays are set up with RAID5. So I'm
working with a consistent hardware platform for this comparison. I'm
only processing a small subset of files (144.)
BigDBMS processed this set of data in 20000 seconds, with all foreign
keys in place. With all foreign keys in place, PG took 54000 seconds to
complete the same job. I've tried various approaches to autovacuum
(none, 30-seconds) and it doesn't seem to make much difference. What
does seem to make a difference is eliminating all the foreign keys; in
that configuration, PG takes about 30000 seconds. Better, but BigDBMS
still has it beat significantly.
I've got PG configured so that that the system database is on disk array
2, as are the transaction log files. The default table space for the
test database is disk array 3. I've got all the reference tables (the
tables to which the foreign keys in the stats tables refer) on this
array. I also store the stats tables on this array. Finally, I put the
indexes for the stats tables on disk array 4. I don't use disk array 1
because I believe it is a software array.
I'm out of ideas how to improve this picture any further. I'd
appreciate some suggestions. Thanks.
--
Guy Rouillier
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Guy, Did you tune postgresql ? How much memory does the box have? Have you tuned postgresql ? Dave On 28-Dec-06, at 12:46 AM, Guy Rouillier wrote: > I don't want to violate any license agreement by discussing > performance, so I'll refer to a large, commercial PostgreSQL- > compatible DBMS only as BigDBMS here. > > I'm trying to convince my employer to replace BigDBMS with > PostgreSQL for at least some of our Java applications. As a proof > of concept, I started with a high-volume (but conceptually simple) > network data collection application. This application collects > files of 5-minute usage statistics from our network devices, and > stores a raw form of these stats into one table and a normalized > form into a second table. We are currently storing about 12 million > rows a day in the normalized table, and each month we start new > tables. For the normalized data, the app inserts rows initialized > to zero for the entire current day first thing in the morning, then > throughout the day as stats are received, executes updates against > existing rows. So the app has very high update activity. > > In my test environment, I have a dual-x86 Linux platform running > the application, and an old 4-CPU Sun Enterprise 4500 running > BigDBMS and PostgreSQL 8.2.0 (only one at a time.) The Sun box has > 4 disk arrays attached, each with 12 SCSI hard disks (a D1000 and 3 > A1000, for those familiar with these devices.) The arrays are set > up with RAID5. So I'm working with a consistent hardware platform > for this comparison. I'm only processing a small subset of files > (144.) > > BigDBMS processed this set of data in 20000 seconds, with all > foreign keys in place. With all foreign keys in place, PG took > 54000 seconds to complete the same job. I've tried various > approaches to autovacuum (none, 30-seconds) and it doesn't seem to > make much difference. What does seem to make a difference is > eliminating all the foreign keys; in that configuration, PG takes > about 30000 seconds. Better, but BigDBMS still has it beat > significantly. > > I've got PG configured so that that the system database is on disk > array 2, as are the transaction log files. The default table space > for the test database is disk array 3. I've got all the reference > tables (the tables to which the foreign keys in the stats tables > refer) on this array. I also store the stats tables on this > array. Finally, I put the indexes for the stats tables on disk > array 4. I don't use disk array 1 because I believe it is a > software array. > > I'm out of ideas how to improve this picture any further. I'd > appreciate some suggestions. Thanks. > > -- > Guy Rouillier > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
"Shoaib Mir" <shoaibmir@gmail.com> writes: > Here are my few recommendations that might help you: > [ snip good advice ] Another thing to look at is whether you are doing inserts/updates as individual transactions, and if so see if you can "batch" them to reduce the per-transaction overhead. regards, tom lane
At 12:46 AM 12/28/2006, Guy Rouillier wrote: >I don't want to violate any license agreement by discussing >performance, so I'll refer to a large, commercial >PostgreSQL-compatible DBMS only as BigDBMS here. > >I'm trying to convince my employer to replace BigDBMS with >PostgreSQL for at least some of our Java applications. As a proof >of concept, I started with a high-volume (but conceptually simple) >network data collection application. This application collects >files of 5-minute usage statistics from our network devices, and >stores a raw form of these stats into one table and a normalized >form into a second table. We are currently storing about 12 million >rows a day in the normalized table, and each month we start new >tables. For the normalized data, the app inserts rows initialized >to zero for the entire current day first thing in the morning, then >throughout the day as stats are received, executes updates against >existing rows. So the app has very high update activity. > >In my test environment, I have a dual-x86 Linux platform running the >application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS >and PostgreSQL 8.2.0 (only one at a time.) The Sun box has 4 disk >arrays attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, >for those familiar with these devices.) The arrays are set up with >RAID5. So I'm working with a consistent hardware platform for this >comparison. I'm only processing a small subset of files (144.) > >BigDBMS processed this set of data in 20000 seconds, with all >foreign keys in place. With all foreign keys in place, PG took >54000 seconds to complete the same job. I've tried various >approaches to autovacuum (none, 30-seconds) and it doesn't seem to >make much difference. What does seem to make a difference is >eliminating all the foreign keys; in that configuration, PG takes >about 30000 seconds. Better, but BigDBMS still has it beat significantly. If you are using pg configured as default installed, you are not getting pg's best performance. Ditto using data structures optimized for BigDBMS. A= go through each query and see what work_mem needs to be for that query to be as RAM resident as possible. If you have enough RAM, set work_mem for that query that large. Remember that work_mem is =per query=, so queries running in parallel eat the sum of each of their work_mem's. B= Make sure shared buffers is set reasonably. A good rule of thumb for 8.x is that shared buffers should be at least ~1/4 your RAM. If your E4500 is maxed with RAM, there's a good chance shared buffers should be considerably more than 1/4 of RAM. C= What file system are you using? Unlike BigDBMS, pg does not have its own native one, so you have to choose the one that best suits your needs. For update heavy applications involving lots of small updates jfs and XFS should both be seriously considered. D= Your table schema and physical table layout probably needs to change. What BigDBMS likes here is most likely different from what pg likes. E= pg does not actually update records in place. It appends new records to the table and marks the old version invalid. This means that things like pages size, RAID stripe size, etc etc may need to have different values than they do for BigDBMS. Another consequence is that pg likes RAID 10 even more than most of its competitors. F= This may seem obvious, but how many of the foreign keys and other overhead do you actually need? Get rid of the unnecessary. G= Bother the folks at Sun, like Josh Berkus, who know pq inside and out +and+ know your HW (or have access to those that do ;-) )inside and out. I'll bet they'll have ideas I'm not thinking of. H= Explain Analyze is your friend. Slow queries may need better table statistics, or better SQL, or may be symptoms of issues "C" or "D" above or ... >I've got PG configured so that that the system database is on disk >array 2, as are the transaction log files. The default table space >for the test database is disk array 3. I've got all the reference >tables (the tables to which the foreign keys in the stats tables >refer) on this array. I also store the stats tables on this >array. Finally, I put the indexes for the stats tables on disk >array 4. I don't use disk array 1 because I believe it is a software array. I= With 4 arrays of 12 HDs each, you definitely have enough spindles to place pg_xlog somewhere separate from all the other pg tables. In addition, you should analyze you table access patterns and then scatter them across your 4 arrays in such as way as to minimize head contention. >I'm out of ideas how to improve this picture any further. I'd >appreciate some suggestions. Thanks. Hope this helps, Ron Peacetree
Hi all, > A= go through each query and see what work_mem needs to be for that > query to be as RAM resident as possible. If you have enough RAM, set > work_mem for that query that large. Remember that work_mem is =per > query=, so queries running in parallel eat the sum of each of their > work_mem's. How can I know what work_mem needs a query needs? Regards -- Arnau
Ron wrote: > C= What file system are you using? Unlike BigDBMS, pg does not have > its own native one, so you have to choose the one that best suits > your needs. For update heavy applications involving lots of small > updates jfs and XFS should both be seriously considered. Actually it has been suggested that a combination of ext2 (for WAL) and ext3 (for data, with data journalling disabled) is a good performer. AFAIK you don't want the overhead of journalling for the WAL partition. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Fri, 29 Dec 2006, Alvaro Herrera wrote: > Ron wrote: > >> C= What file system are you using? Unlike BigDBMS, pg does not have >> its own native one, so you have to choose the one that best suits >> your needs. For update heavy applications involving lots of small >> updates jfs and XFS should both be seriously considered. > > Actually it has been suggested that a combination of ext2 (for WAL) and > ext3 (for data, with data journalling disabled) is a good performer. > AFAIK you don't want the overhead of journalling for the WAL partition. When benchmarking various options for a new PG server at one of my clients, I tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be fastest to have ext2 for the WAL. The winning time was 157m46.713s for ext2, 159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3 data=writeback. This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/ 1GB BBU cache. This config benched out faster than a 6disk RAID10 + 2 disk RAID1 for those of you who have been wondering if the BBU write back cache mitigates the need for separate WAL (at least on this workload). Those are the fastest times for each config, but ext2 WAL was always faster than the other two options. I didn't test any other filesystems in this go around. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
You should search the archives for Luke Lonegran's posting about how IO in Postgresql is significantly bottlenecked because it's not async. A 12 disk array is going to max out Postgresql's max theoretical write capacity to disk, and therefore BigRDBMS is always going to win in such a config. You can also look towards Bizgres which allegedly elimates some of these problems, and is cheaper than most BigRDBMS products.
Alex.
Alex.
On 12/28/06, Guy Rouillier <guyr-ml1@burntmail.com> wrote:
I don't want to violate any license agreement by discussing performance,
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as
BigDBMS here.
I'm trying to convince my employer to replace BigDBMS with PostgreSQL
for at least some of our Java applications. As a proof of concept, I
started with a high-volume (but conceptually simple) network data
collection application. This application collects files of 5-minute
usage statistics from our network devices, and stores a raw form of
these stats into one table and a normalized form into a second table.
We are currently storing about 12 million rows a day in the normalized
table, and each month we start new tables. For the normalized data, the
app inserts rows initialized to zero for the entire current day first
thing in the morning, then throughout the day as stats are received,
executes updates against existing rows. So the app has very high update
activity.
In my test environment, I have a dual-x86 Linux platform running the
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and
PostgreSQL 8.2.0 (only one at a time.) The Sun box has 4 disk arrays
attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those
familiar with these devices.) The arrays are set up with RAID5. So I'm
working with a consistent hardware platform for this comparison. I'm
only processing a small subset of files (144.)
BigDBMS processed this set of data in 20000 seconds, with all foreign
keys in place. With all foreign keys in place, PG took 54000 seconds to
complete the same job. I've tried various approaches to autovacuum
(none, 30-seconds) and it doesn't seem to make much difference. What
does seem to make a difference is eliminating all the foreign keys; in
that configuration, PG takes about 30000 seconds. Better, but BigDBMS
still has it beat significantly.
I've got PG configured so that that the system database is on disk array
2, as are the transaction log files. The default table space for the
test database is disk array 3. I've got all the reference tables (the
tables to which the foreign keys in the stats tables refer) on this
array. I also store the stats tables on this array. Finally, I put the
indexes for the stats tables on disk array 4. I don't use disk array 1
because I believe it is a software array.
I'm out of ideas how to improve this picture any further. I'd
appreciate some suggestions. Thanks.
--
Guy Rouillier
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Tom Lane wrote: > "Shoaib Mir" <shoaibmir@gmail.com> writes: >> Here are my few recommendations that might help you: >> [ snip good advice ] > > Another thing to look at is whether you are doing inserts/updates as > individual transactions, and if so see if you can "batch" them to > reduce the per-transaction overhead. Thank you everyone who replied with suggestions. Unfortunately, this is a background activity for me, so I can only work on it when I can squeeze in time. Right now, I can't do anything; I swapped out a broken switch in our network and the DB server is currently inaccessible ;(. I will eventually work through all suggestions, but I'll start with the ones I can respond to without further investigation. I'm not doing updates as individual transactions. I cannot use the Java batch functionality because the code uses stored procedures to do the inserts and updates, and the PG JDBC driver cannot handle executing stored procedures in batch. Briefly, executing a stored procedure returns a result set, and Java batches don't expect result sets. So, in the code I turn autocommit off, and do a commit every 100 executions of the stored proc. The exact same code is running against BigDBMS, so any penalty from this approach should be evenly felt. -- Guy Rouillier
Alvaro Herrera wrote: > Ron wrote: > >> C= What file system are you using? Unlike BigDBMS, pg does not have >> its own native one, so you have to choose the one that best suits >> your needs. For update heavy applications involving lots of small >> updates jfs and XFS should both be seriously considered. > > Actually it has been suggested that a combination of ext2 (for WAL) and > ext3 (for data, with data journalling disabled) is a good performer. > AFAIK you don't want the overhead of journalling for the WAL partition. I'm curious as to why ext3 for data with journalling disabled? Would that not be the same as ext2? -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin
On þri, 2007-01-02 at 09:04 -0500, Geoffrey wrote: > Alvaro Herrera wrote: > > > > Actually it has been suggested that a combination of ext2 (for WAL) and > > ext3 (for data, with data journalling disabled) is a good performer. > > AFAIK you don't want the overhead of journalling for the WAL partition. > > I'm curious as to why ext3 for data with journalling disabled? Would > that not be the same as ext2? I believe Alvaro was referring to ext3 with journalling enabled for meta-data, but not for data. I also believe this is the standard ext3 configuration, but I could be wrong on that. gnari
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2 Jan 2007, at 14:54, Ragnar wrote: > On þri, 2007-01-02 at 09:04 -0500, Geoffrey wrote: >> Alvaro Herrera wrote: >>> >>> Actually it has been suggested that a combination of ext2 (for >>> WAL) and >>> ext3 (for data, with data journalling disabled) is a good performer. >>> AFAIK you don't want the overhead of journalling for the WAL >>> partition. >> >> I'm curious as to why ext3 for data with journalling disabled? Would >> that not be the same as ext2? > > I believe Alvaro was referring to ext3 with journalling enabled > for meta-data, but not for data. > I also believe this is the standard ext3 configuration, but I > could be wrong on that. > > gnari > > it doesn't really belong here but ext3 has data journaled (data and meta data) ordered (meta data journald but data written before meta data (default)) journald (meta data only journal) modes. The performance differences between ordered and meta data only journaling should be very small enyway - -- Viele Grüße, Lars Heidieker lars@heidieker.de http://paradoxon.info - ------------------------------------ Mystische Erklärungen. Die mystischen Erklärungen gelten für tief; die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind. -- Friedrich Nietzsche -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iD8DBQFFmnJUcxuYqjT7GRYRApNrAJ9oYusdw+Io4iSZrEITTbFy2qDA4QCgmBW5 7cpQZmlIv61EF2wP2yNXZhA= =glwc -----END PGP SIGNATURE-----
More specifically, you should set the noatime,data=writeback options in fstab on ext3 partitions for best performance. Correct? > it doesn't really belong here but ext3 has > data journaled (data and meta data) > ordered (meta data journald but data written before meta data (default)) > journald (meta data only journal) > modes. > > The performance differences between ordered and meta data only > journaling should be very small enyway
On Fri, 2006-12-29 at 07:52 -0500, Ron wrote: > A= go through each query and see what work_mem needs to be for that > query to be as RAM resident as possible. If you have enough RAM, set > work_mem for that query that large. Remember that work_mem is =per > query=, so queries running in parallel eat the sum of each of their work_mem's. Just to clarify, from the docs on work_mem at http://www.postgresql.org/docs/current/static/runtime-config- resource.html : "Specifies the amount of memory to be used by internal sort operations and hash tables before switching to temporary disk files. The value is specified in kilobytes, and defaults to 1024 kilobytes (1 MB). Note that for a complex query, several sort or hash operations might be running in parallel; each one will be allowed to use as much memory as this value specifies before it starts to put data into temporary files. Also, several running sessions could be doing such operations concurrently. So the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of IN subqueries." Regards, Jeff Davis
I've got back access to my test system. I ran another test run with the same input data set. This time I put pg_xlog on a different RAID volume (the unused one that I suspect is a software RAID), and I turned fsync=off in postgresql.conf. I left the rest of the configuration alone (all foreign keys removed), etc. Unfortunately, this only dropped elapsed time down to about 28000 seconds (from 30000), still significantly more than BigDBMS. Additional info inline below. Shoaib Mir wrote: > Here are my few recommendations that might help you: > > - You will need to do table partitioning > (http://www.postgresql.org/docs/current/static/ddl-partitioning.html > <http://www.postgresql.org/docs/current/static/ddl-partitioning.html>) > as you are storing quite a lot of data in one table per day. I'm focusing on the detailed perspective for now. The 144 files I'm processing represent not even two hours of data, so that surely wouldn't be split up. > > - You are using a RAID5 setup which is something that can also affect > performance so switching to RAID1 might help you there, but again you > have a RAID5 with 12 disks so hmm that shouldn't be that much of a problem. Agreed. > > - Have you done the tuning for postgresql.conf parameters? if not then > you really need to do this for like checkpoint segments, random page > cost, shared buffers, cache size, fsm pages, vacuum cost delay, > work_mem, bgwriter etc etc. You can get good advice for tuning these > parameters at --> http://www.powerpostgresql.com/PerfList/ The box has 3 GB of memory. I would think that BigDBMS would be hurt by this more than PG. Here are the settings I've modified in postgresql.conf: autovacuum=on stats_row_level = on max_connections = 10 listen_addresses = 'db01,localhost' shared_buffers = 128MB work_mem = 16MB maintenance_work_mem = 64MB temp_buffers = 32MB max_fsm_pages = 204800 checkpoint_segments = 30 redirect_stderr = on log_line_prefix = '%t %d' > > - For autovacuuming you need to properly tune the thresholds so that the > vacuum and analyze is done at the right time not affecting the database > server performance. (You can find help for this at > http://www.postgresql.org/docs/current/static/routine-vacuuming.html > under "22.1.4. The auto-vacuum daemon") The real-life load on this database would be fairly constant throughout the day. Stats from network devices are received every 15 minutes from each device, but they are staggered. As a result, the database is almost constantly being updated, so there is no dead time to do vacuums. > > - You will need to separate your transactional logs i.e. pg_xlog folder > to a different drive other then your database server drive. This can be > done by creating symlinks for pg_xlog folder. Done, see opening remarks. Unfortunately minor impact. > > - I hope you are doing proper connection pool management, because good > use of database connections can be really effect the overall > performance, connections can be expensive to create, and consume memory > if they are not properly exited. I probably should have mentioned this originally but was afraid of information overload. The application runs on JBoss and uses JBoss connection pools. So connections are pooled, but I don't know how they would compare to native PG connection pools. Essentially, JBoss gets native JDBC connections, and the pools simply allow them to be re-used without opening and closing each time. So if the native PG connection pools provide any pooling optimizations beyond that, those advantages are not being realized. > > Hope that helps your tests... Thanks to everyone for providing suggestions, and I apologize for my delay in responding to each of them. > > ---------------- > Shoaib Mir > EnterpriseDB (www.enterprisedb.com <http://www.enterprisedb.com>) > > On 12/28/06, *Guy Rouillier* <guyr-ml1@burntmail.com > <mailto:guyr-ml1@burntmail.com>> wrote: > > I don't want to violate any license agreement by discussing > performance, > so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as > BigDBMS here. > > I'm trying to convince my employer to replace BigDBMS with PostgreSQL > for at least some of our Java applications. As a proof of concept, I > started with a high-volume (but conceptually simple) network data > collection application. This application collects files of 5-minute > usage statistics from our network devices, and stores a raw form of > these stats into one table and a normalized form into a second table. > We are currently storing about 12 million rows a day in the normalized > table, and each month we start new tables. For the normalized data, the > app inserts rows initialized to zero for the entire current day first > thing in the morning, then throughout the day as stats are received, > executes updates against existing rows. So the app has very high update > activity. > > In my test environment, I have a dual-x86 Linux platform running the > application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and > PostgreSQL 8.2.0 (only one at a time.) The Sun box has 4 disk arrays > attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those > familiar with these devices.) The arrays are set up with RAID5. So I'm > working with a consistent hardware platform for this comparison. I'm > only processing a small subset of files (144.) > > BigDBMS processed this set of data in 20000 seconds, with all foreign > keys in place. With all foreign keys in place, PG took 54000 seconds to > complete the same job. I've tried various approaches to autovacuum > (none, 30-seconds) and it doesn't seem to make much difference. What > does seem to make a difference is eliminating all the foreign keys; in > that configuration, PG takes about 30000 seconds. Better, but BigDBMS > still has it beat significantly. > > I've got PG configured so that that the system database is on disk > array > 2, as are the transaction log files. The default table space for the > test database is disk array 3. I've got all the reference tables (the > tables to which the foreign keys in the stats tables refer) on this > array. I also store the stats tables on this array. Finally, I put the > indexes for the stats tables on disk array 4. I don't use disk array 1 > because I believe it is a software array. > > I'm out of ideas how to improve this picture any further. I'd > appreciate some suggestions. Thanks. > > -- > Guy Rouillier > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > > -- Guy Rouillier
Guy Rouillier wrote: > I've got back access to my test system. I ran another test run with the > same input data set. This time I put pg_xlog on a different RAID volume > (the unused one that I suspect is a software RAID), and I turned > fsync=off in postgresql.conf. I left the rest of the configuration > alone (all foreign keys removed), etc. Unfortunately, this only dropped > elapsed time down to about 28000 seconds (from 30000), still > significantly more than BigDBMS. Additional info inline below. Although tuning is extremely important, you also have to look at the application itself. I discovered (the hard way) thatthere's simply no substitute for a bit of redesign/rewriting of the schema and/or SQL statements. Many of us who "grew up" on Oracle assume that their SQL is standard stuff, and that Oracle's optimizer is "the way it'sdone." But in fact most Oracle applications are tweaked and tuned to take advantage of Oracle's strengths and avoidits weaknesses. If you designed an application from the ground up to use Postgres, then migrated to Oracle, you wouldprobably be equally frustrated by Oracle's poor performance on your Postgres-tuned application. I don't know if you have access to the application's SQL, or the time to experiment a bit, but unless your schema is trivaland your SQL is boneheaded simple, you're not going to get equal performance from Postgres until you do some analysisof your application under real-world conditions, and optimize the problem areas. In my case, I found just a few specific SQL constructs that, with a bit of tuning, made massive differences in performance. Craig
On 5-Jan-07, at 9:51 PM, Guy Rouillier wrote: > I've got back access to my test system. I ran another test run > with the same input data set. This time I put pg_xlog on a > different RAID volume (the unused one that I suspect is a software > RAID), and I turned fsync=off in postgresql.conf. I left the rest > of the configuration alone (all foreign keys removed), etc. > Unfortunately, this only dropped elapsed time down to about 28000 > seconds (from 30000), still significantly more than BigDBMS. > Additional info inline below. > > Shoaib Mir wrote: >> Here are my few recommendations that might help you: >> - You will need to do table partitioning (http:// >> www.postgresql.org/docs/current/static/ddl-partitioning.html >> <http://www.postgresql.org/docs/current/static/ddl- >> partitioning.html>) as you are storing quite a lot of data in one >> table per day. > > I'm focusing on the detailed perspective for now. The 144 files > I'm processing represent not even two hours of data, so that surely > wouldn't be split up. > >> - You are using a RAID5 setup which is something that can also >> affect performance so switching to RAID1 might help you there, but >> again you have a RAID5 with 12 disks so hmm that shouldn't be that >> much of a problem. > > Agreed. > >> - Have you done the tuning for postgresql.conf parameters? if not >> then you really need to do this for like checkpoint segments, >> random page cost, shared buffers, cache size, fsm pages, vacuum >> cost delay, work_mem, bgwriter etc etc. You can get good advice >> for tuning these parameters at --> http://www.powerpostgresql.com/ >> PerfList/ > > The box has 3 GB of memory. I would think that BigDBMS would be > hurt by this more than PG. Here are the settings I've modified in > postgresql.conf: As I said you need to set shared_buffers to at least 750MB this is the starting point, it can actually go higher. Additionally effective cache should be set to 2.25 G turning fsync is not a real world situation. Additional tuning of file systems can provide some gain, however as Craig pointed out some queries may need to be tweaked. > > autovacuum=on > stats_row_level = on > max_connections = 10 > listen_addresses = 'db01,localhost' > shared_buffers = 128MB > work_mem = 16MB > maintenance_work_mem = 64MB > temp_buffers = 32MB > max_fsm_pages = 204800 > checkpoint_segments = 30 > redirect_stderr = on > log_line_prefix = '%t %d' > >> - For autovacuuming you need to properly tune the thresholds so >> that the vacuum and analyze is done at the right time not >> affecting the database server performance. (You can find help for >> this at http://www.postgresql.org/docs/current/static/routine- >> vacuuming.html under "22.1.4. The auto-vacuum daemon") > > The real-life load on this database would be fairly constant > throughout the day. Stats from network devices are received every > 15 minutes from each device, but they are staggered. As a result, > the database is almost constantly being updated, so there is no > dead time to do vacuums. > >> - You will need to separate your transactional logs i.e. pg_xlog >> folder to a different drive other then your database server drive. >> This can be done by creating symlinks for pg_xlog folder. > > Done, see opening remarks. Unfortunately minor impact. > >> - I hope you are doing proper connection pool management, because >> good use of database connections can be really effect the overall >> performance, connections can be expensive to create, and consume >> memory if they are not properly exited. > > I probably should have mentioned this originally but was afraid of > information overload. The application runs on JBoss and uses JBoss > connection pools. So connections are pooled, but I don't know how > they would compare to native PG connection pools. Essentially, > JBoss gets native JDBC connections, and the pools simply allow them > to be re-used without opening and closing each time. So if the > native PG connection pools provide any pooling optimizations beyond > that, those advantages are not being realized. the PG Connection pools will not help, they do not currently provide any extra optimization. Dave > >> Hope that helps your tests... > > Thanks to everyone for providing suggestions, and I apologize for > my delay in responding to each of them. > >> ---------------- >> Shoaib Mir >> EnterpriseDB (www.enterprisedb.com <http://www.enterprisedb.com>) >> On 12/28/06, *Guy Rouillier* <guyr-ml1@burntmail.com <mailto:guyr- >> ml1@burntmail.com>> wrote: >> I don't want to violate any license agreement by discussing >> performance, >> so I'll refer to a large, commercial PostgreSQL-compatible >> DBMS only as >> BigDBMS here. >> I'm trying to convince my employer to replace BigDBMS with >> PostgreSQL >> for at least some of our Java applications. As a proof of >> concept, I >> started with a high-volume (but conceptually simple) network data >> collection application. This application collects files of 5- >> minute >> usage statistics from our network devices, and stores a raw >> form of >> these stats into one table and a normalized form into a second >> table. >> We are currently storing about 12 million rows a day in the >> normalized >> table, and each month we start new tables. For the normalized >> data, the >> app inserts rows initialized to zero for the entire current >> day first >> thing in the morning, then throughout the day as stats are >> received, >> executes updates against existing rows. So the app has very >> high update >> activity. >> In my test environment, I have a dual-x86 Linux platform >> running the >> application, and an old 4-CPU Sun Enterprise 4500 running >> BigDBMS and >> PostgreSQL 8.2.0 (only one at a time.) The Sun box has 4 disk >> arrays >> attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, >> for those >> familiar with these devices.) The arrays are set up with >> RAID5. So I'm >> working with a consistent hardware platform for this >> comparison. I'm >> only processing a small subset of files (144.) >> BigDBMS processed this set of data in 20000 seconds, with all >> foreign >> keys in place. With all foreign keys in place, PG took 54000 >> seconds to >> complete the same job. I've tried various approaches to >> autovacuum >> (none, 30-seconds) and it doesn't seem to make much >> difference. What >> does seem to make a difference is eliminating all the foreign >> keys; in >> that configuration, PG takes about 30000 seconds. Better, but >> BigDBMS >> still has it beat significantly. >> I've got PG configured so that that the system database is on >> disk >> array >> 2, as are the transaction log files. The default table space >> for the >> test database is disk array 3. I've got all the reference >> tables (the >> tables to which the foreign keys in the stats tables refer) on >> this >> array. I also store the stats tables on this array. Finally, >> I put the >> indexes for the stats tables on disk array 4. I don't use >> disk array 1 >> because I believe it is a software array. >> I'm out of ideas how to improve this picture any further. I'd >> appreciate some suggestions. Thanks. >> -- >> Guy Rouillier >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 5: don't forget to increase your free space map settings > > > -- > Guy Rouillier > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
Craig A. James wrote: > I don't know if you have access to the application's SQL, or the time to > experiment a bit, but unless your schema is trival and your SQL is > boneheaded simple, you're not going to get equal performance from > Postgres until you do some analysis of your application under real-world > conditions, and optimize the problem areas. Craig, thanks for taking the time to think about this. Yes, I have all the application source code, and all the time in the world, as I'm doing this experimentation on my own time. The test hardware is old stuff no one intends to use for production work ever again, so I can use it as long as I want. The application is fairly straightforward, but as you say, what is working okay with BigDBMS isn't working as well under PG. I'm going to try other configuration suggestions made by others before I attempt logic changes. The core logic is unchangeable; millions of rows of data in a single table will be updated throughout the day. If PG can't handle high volume updates well, this may be brick wall. -- Guy Rouillier
Dave Cramer wrote: >> >> The box has 3 GB of memory. I would think that BigDBMS would be hurt >> by this more than PG. Here are the settings I've modified in >> postgresql.conf: > > As I said you need to set shared_buffers to at least 750MB this is the > starting point, it can actually go higher. Additionally effective cache > should be set to 2.25 G turning fsync is not a real world situation. > Additional tuning of file systems can provide some gain, however as > Craig pointed out some queries may need to be tweaked. Dave, thanks for the hard numbers, I'll try them. I agree turning fsync off is not a production option. In another reply to my original posting, Alex mentioned that BigDBMS gets an advantage from its async IO. So simply as a test, I turned fsync off in an attempt to open wide all the pipes. Regarding shared_buffers=750MB, the last discussions I remember on this subject said that anything over 10,000 (8K buffers = 80 MB) had unproven benefits. So I'm surprised to see such a large value suggested. I'll certainly give it a try and see what happens. >> >> autovacuum=on >> stats_row_level = on >> max_connections = 10 >> listen_addresses = 'db01,localhost' >> shared_buffers = 128MB >> work_mem = 16MB >> maintenance_work_mem = 64MB >> temp_buffers = 32MB >> max_fsm_pages = 204800 >> checkpoint_segments = 30 >> redirect_stderr = on >> log_line_prefix = '%t %d' -- Guy Rouillier
> Regarding shared_buffers=750MB, the last discussions I remember on this > subject said that anything over 10,000 (8K buffers = 80 MB) had unproven > benefits. So I'm surprised to see such a large value suggested. I'll > certainly give it a try and see what happens. > That is old news :) As of 8.1 it is quite beneficial to go well above the aforementioned amount. J > >> > >> autovacuum=on > >> stats_row_level = on > >> max_connections = 10 > >> listen_addresses = 'db01,localhost' > >> shared_buffers = 128MB > >> work_mem = 16MB > >> maintenance_work_mem = 64MB > >> temp_buffers = 32MB > >> max_fsm_pages = 204800 > >> checkpoint_segments = 30 > >> redirect_stderr = on > >> log_line_prefix = '%t %d' -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Guy, > The application is fairly straightforward, but as you say, what is > working okay with BigDBMS isn't working as well under PG. I'm going to > try other configuration suggestions made by others before I attempt > logic changes. The core logic is unchangeable; millions of rows of data > in a single table will be updated throughout the day. If PG can't > handle high volume updates well, this may be brick wall. Here are a couple things I learned. ANALYZE is VERY important, surprisingly so even for small tables. I had a case last week where a temporary "scratch" tablewith just 100 rows was joined to two more tables of 6 and 12 million rows. You might think that a 100-row table wouldn'tneed to be analyzed, but it does: Without the ANALYZE, Postgres generated a horrible plan that took many minutesto run; with the ANALYZE, it took milliseconds. Any time a table's contents change dramatically, ANALYZE it, ESPECIALLYif it's a small table. After all, changing 20 rows in a 100-row table has a much larger affect on its statisticsthan changing 20 rows in a million-row table. Postgres functions like count() and max() are "plug ins" which has huge architectural advantages. But in pre-8.1 releases,there was a big speed penalty for this: functions like count() were very, very slow, requiring a full table scan. I think this is vastly improved from 8.0x to 8.1 and forward; others might be able to comment whether count() is nowas fast in Postgres as Oracle. The "idiom" to replace count() was "select col from tbl order by col desc limit 1". Itworked miracles for my app. Postgres has explicit garbage collection via VACUUM, and you have to design your application with this in mind. In Postgres,update is delete+insert, meaning updates create garbage. If you have very "wide" tables, but only a subset of thecolumns are updated frequently, put these columns in a separate table with an index to join the two tables. For example,my original design was something like this: integer primary key very large text column ... a bunch of integer columns, float columns, and small text columns The properties were updated by the application, but the large text column never changed. This led to huge garbage-collectionproblems as the large text field was repeatedly deleted and reinserted by the updates. By separating theseinto two tables, one with the large text column, and the other table with the dynamic, but smaller, columns, garbageis massively reduced, and performance increased, both immediately (smaller data set to update) and long term (smallervacuums). You can use views to recreate your original combined columns, so the changes to your app are limited towhere updates occur. If you have a column that is *frequently* updated (say, for example, a user's last-access timestamp each time s/he hits yourweb server) then you definitely want this in its own table, not mixed in with the user's name, address, etc. Partitioning in Postgres is more powerful than in Oracle. Use it if you can. Partial indexes are VERY nice in Postgres, if your data is poorly distributed (for example, a mostly-NULL column with a smallpercentage of very important values). I'm sure there are more things that others can contribute. Craig
Guy Rouillier wrote: > The application is fairly straightforward, but as you say, what is > working okay with BigDBMS isn't working as well under PG. I'm going to > try other configuration suggestions made by others before I attempt > logic changes. The core logic is unchangeable; millions of rows of data > in a single table will be updated throughout the day. If PG can't > handle high volume updates well, this may be brick wall. I understand your reluctance to change your working design in the change over to PostgreSQL but - 1. Your table definitions may or may not be the issue and a small change in design (even only choice of datatype) may be all that is needed to get the needed performance out of PostgreSQL. These changes would be done before you put PostgreSQL into production use so the amount of current usage is not relevant when deciding/analyzing these changes but they may affect your ability to use PostgreSQL as an alternative. 2. I think that the idea of logic changes suggested earlier was more aimed at your select/update commands than the structure of your tables. You should expect to have some SQL changes between any database and using select/update's designed to take advantage of PostgreSQL strengths can give you performance improvements. -- Shane Ambler pgSQL@007Marketing.com Get Sheeky @ http://Sheeky.Biz
On 6-Jan-07, at 11:32 PM, Guy Rouillier wrote: > Dave Cramer wrote: > >>> >>> The box has 3 GB of memory. I would think that BigDBMS would be >>> hurt by this more than PG. Here are the settings I've modified >>> in postgresql.conf: >> As I said you need to set shared_buffers to at least 750MB this is >> the starting point, it can actually go higher. Additionally >> effective cache should be set to 2.25 G turning fsync is not a >> real world situation. Additional tuning of file systems can >> provide some gain, however as Craig pointed out some queries may >> need to be tweaked. > > Dave, thanks for the hard numbers, I'll try them. I agree turning > fsync off is not a production option. In another reply to my > original posting, Alex mentioned that BigDBMS gets an advantage > from its async IO. So simply as a test, I turned fsync off in an > attempt to open wide all the pipes. > > Regarding shared_buffers=750MB, the last discussions I remember on > this subject said that anything over 10,000 (8K buffers = 80 MB) > had unproven benefits. So I'm surprised to see such a large value > suggested. I'll certainly give it a try and see what happens. That is 25% of your available memory. This is just a starting point. There are reports that going as high as 50% can be advantageous, however you need to measure it yourself. > >>> >>> autovacuum=on >>> stats_row_level = on >>> max_connections = 10 >>> listen_addresses = 'db01,localhost' >>> shared_buffers = 128MB >>> work_mem = 16MB >>> maintenance_work_mem = 64MB >>> temp_buffers = 32MB >>> max_fsm_pages = 204800 >>> checkpoint_segments = 30 >>> redirect_stderr = on >>> log_line_prefix = '%t %d' > -- > Guy Rouillier > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
Craig A. James wrote: > The "idiom" to replace count() was > "select col from tbl order by col desc limit 1". It worked miracles for > my app. Sorry, I meant to write, "the idiom to replace MAX()", not count()... MAX() was the function that was killing me, 'tho count()also gave me problems. Craig
I'm using 8.2 and using order by & limit is still faster than MAX() even though MAX() now seems to rewrite to an almost identical plan internally. Count(*) still seems to use a full table scan rather than an index scan. Using one of our tables, MySQL/Oracle/MS-SQL all return instantly while PG takes longer ther 700ms. Luckily we can design around this issue. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Craig A. James Sent: Sunday, January 07, 2007 5:57 PM To: Guy Rouillier; PostgreSQL Performance Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS Craig A. James wrote: > The "idiom" to replace count() was > "select col from tbl order by col desc limit 1". It worked miracles for > my app. Sorry, I meant to write, "the idiom to replace MAX()", not count()... MAX() was the function that was killing me, 'tho count() also gave me problems. Craig ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
"Adam Rich" <adam.r@sbcglobal.net> writes: > I'm using 8.2 and using order by & limit is still faster than MAX() > even though MAX() now seems to rewrite to an almost identical plan > internally. Care to quantify that? AFAICT any difference is within measurement noise, at least for the case of separately-issued SQL commands. > Count(*) still seems to use a full table scan rather than an index scan. Yup. Don't hold your breath for something different. Postgres has made design choices that make certain cases fast and others slow, and count(*) is one case that has come out on the short end of the stick. If that's your most important measure of performance, then indeed you should select a different database that's made different tradeoffs. regards, tom lane
On Sun, 2007-01-07 at 20:26 -0600, Adam Rich wrote: > I'm using 8.2 and using order by & limit is still faster than MAX() > even though MAX() now seems to rewrite to an almost identical plan > internally. Gonna need you to back that up :) Can we get an explain analyze? > Count(*) still seems to use a full table scan rather than an index scan. > There is a TODO out there to help this. Don't know if it will get done. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Dave Cramer wrote: > > On 6-Jan-07, at 11:32 PM, Guy Rouillier wrote: > >> Dave Cramer wrote: >> >>>> >>>> The box has 3 GB of memory. I would think that BigDBMS would be >>>> hurt by this more than PG. Here are the settings I've modified in >>>> postgresql.conf: >>> As I said you need to set shared_buffers to at least 750MB this is >>> the starting point, it can actually go higher. Additionally effective >>> cache should be set to 2.25 G turning fsync is not a real world >>> situation. Additional tuning of file systems can provide some gain, >>> however as Craig pointed out some queries may need to be tweaked. >> >> Dave, thanks for the hard numbers, I'll try them. I agree turning >> fsync off is not a production option. In another reply to my original >> posting, Alex mentioned that BigDBMS gets an advantage from its async >> IO. So simply as a test, I turned fsync off in an attempt to open >> wide all the pipes. >> >> Regarding shared_buffers=750MB, the last discussions I remember on >> this subject said that anything over 10,000 (8K buffers = 80 MB) had >> unproven benefits. So I'm surprised to see such a large value >> suggested. I'll certainly give it a try and see what happens. > > That is 25% of your available memory. This is just a starting point. > There are reports that going as high as 50% can be advantageous, however > you need to measure it yourself. Ok, I ran with the settings below, but with shared_buffers=768MB effective_cache_size=2048MB fsync=on This run took 29000 seconds. I'm beginning to think configuration changes are not going to buy significant additional improvement. Time to look at the app implementation. > >> >>>> >>>> autovacuum=on >>>> stats_row_level = on >>>> max_connections = 10 >>>> listen_addresses = 'db01,localhost' >>>> shared_buffers = 128MB >>>> work_mem = 16MB >>>> maintenance_work_mem = 64MB >>>> temp_buffers = 32MB >>>> max_fsm_pages = 204800 >>>> checkpoint_segments = 30 >>>> redirect_stderr = on >>>> log_line_prefix = '%t %d' >> --Guy Rouillier >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 4: Have you searched our list archives? >> >> http://archives.postgresql.org >> > -- Guy Rouillier
Here's the queries and explains... Granted, it's not a huge difference here, but both timings are absolutely consistent. Using max(), this runs almost 15 queries/sec and "limit 1" runs at almost 40 queries/sec. Is the differene in explain analyze expected behavior? (rows=168196 vs. rows=1) (The table is freshly analayzed) select max(item_id) from receipt_items Result (cost=0.04..0.05 rows=1 width=0) (actual time=0.030..0.031 rows=1 loops=1) InitPlan -> Limit (cost=0.00..0.04 rows=1 width=4) (actual time=0.023..0.024 rows=1 loops=1) -> Index Scan Backward using receipt_items_pkey on receipt_items (cost=0.00..6883.71 rows=168196 width=4) (actual time=0.020..0.020 rows=1 loops=1) Filter: (item_id IS NOT NULL) Total runtime: 0.067 ms select item_id from receipt_items order by item_id desc limit 1 Limit (cost=0.00..0.04 rows=1 width=4) (actual time=0.010..0.011 rows=1 loops=1) -> Index Scan Backward using receipt_items_pkey on receipt_items (cost=0.00..6883.71 rows=168196 width=4) (actual time=0.008..0.008 rows=1 loops=1) Total runtime: 0.026 ms A couple more similar examples from this table: select max(create_date) from receipt_items Result (cost=0.05..0.06 rows=1 width=0) (actual time=0.032..0.032 rows=1 loops=1) InitPlan -> Limit (cost=0.00..0.05 rows=1 width=8) (actual time=0.025..0.026 rows=1 loops=1) -> Index Scan Backward using test_idx_1 on receipt_items (cost=0.00..7986.82 rows=168196 width=8) (actual time=0.022..0.022 rows=1 loops=1) Filter: (create_date IS NOT NULL) Total runtime: 0.069 ms select create_date from receipt_items order by create_date desc limit 1; Limit (cost=0.00..0.05 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1) -> Index Scan Backward using test_idx_1 on receipt_items (cost=0.00..7986.82 rows=168196 width=8) (actual time=0.009..0.009 rows=1 loops=1) Total runtime: 0.027 ms -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Sunday, January 07, 2007 8:48 PM To: Adam Rich Cc: 'Craig A. James'; 'Guy Rouillier'; 'PostgreSQL Performance' Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS "Adam Rich" <adam.r@sbcglobal.net> writes: > I'm using 8.2 and using order by & limit is still faster than MAX() > even though MAX() now seems to rewrite to an almost identical plan > internally. Care to quantify that? AFAICT any difference is within measurement noise, at least for the case of separately-issued SQL commands. > Count(*) still seems to use a full table scan rather than an index scan. Yup. Don't hold your breath for something different. Postgres has made design choices that make certain cases fast and others slow, and count(*) is one case that has come out on the short end of the stick. If that's your most important measure of performance, then indeed you should select a different database that's made different tradeoffs. regards, tom lane
Ron wrote: > C= What file system are you using? Unlike BigDBMS, pg does not have its > own native one, so you have to choose the one that best suits your > needs. For update heavy applications involving lots of small updates > jfs and XFS should both be seriously considered. Ron, thanks for your ideas. Many of them I've addressed in response to suggestions from others. I wanted to address this one in particular. Unfortunately, I do not have the liberty to change file systems on this old Sun box. All file systems are formatted Sun UFS. BigDBMS is equally subject to whatever pluses or minuses can be attributed to this file system, so I'm thinking that this issue would be a wash between the two. I've come to the conclusion that configuration changes to PG alone will not equal the playing field. My next step is to try to determine where the biggest payback will be regarding changing the implementation. -- Guy Rouillier
Here's another, more drastic example... Here the order by / limit version runs in less than 1/7000 the time of the MAX() version. select max(item_id) from events e, receipts r, receipt_items ri where e.event_id=r.event_id and r.receipt_id=ri.receipt_id Aggregate (cost=10850.84..10850.85 rows=1 width=4) (actual time=816.382..816.383 rows=1 loops=1) -> Hash Join (cost=2072.12..10503.30 rows=139019 width=4) (actual time=155.177..675.870 rows=147383 loops=1) Hash Cond: (ri.receipt_id = r.receipt_id) -> Seq Scan on receipt_items ri (cost=0.00..4097.56 rows=168196 width=8) (actual time=0.009..176.894 rows=168196 loops=1) -> Hash (cost=2010.69..2010.69 rows=24571 width=4) (actual time=155.146..155.146 rows=24571 loops=1) -> Hash Join (cost=506.84..2010.69 rows=24571 width=4) (actual time=34.803..126.452 rows=24571 loops=1) Hash Cond: (r.event_id = e.event_id) -> Seq Scan on receipts r (cost=0.00..663.58 rows=29728 width=8) (actual time=0.006..30.870 rows=29728 loops=1) -> Hash (cost=469.73..469.73 rows=14843 width=4) (actual time=34.780..34.780 rows=14843 loops=1) -> Seq Scan on events e (cost=0.00..469.73 rows=14843 width=4) (actual time=0.007..17.603 rows=14843 loops=1) Total runtime: 816.645 ms select item_id from events e, receipts r, receipt_items ri where e.event_id=r.event_id and r.receipt_id=ri.receipt_id order by item_id desc limit 1 Limit (cost=0.00..0.16 rows=1 width=4) (actual time=0.047..0.048 rows=1 loops=1) -> Nested Loop (cost=0.00..22131.43 rows=139019 width=4) (actual time=0.044..0.044 rows=1 loops=1) -> Nested Loop (cost=0.00..12987.42 rows=168196 width=8) (actual time=0.032..0.032 rows=1 loops=1) -> Index Scan Backward using receipt_items_pkey on receipt_items ri (cost=0.00..6885.50 rows=168196 width=8) (actual time=0.016..0.016 rows=1 loops=1) -> Index Scan using receipts_pkey on receipts r (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1) Index Cond: (r.receipt_id = ri.receipt_id) -> Index Scan using events_pkey on events e (cost=0.00..0.04 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1) Index Cond: (e.event_id = r.event_id) Total runtime: 0.112 ms -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Joshua D. Drake Sent: Sunday, January 07, 2007 9:10 PM To: Adam Rich Cc: 'Craig A. James'; 'Guy Rouillier'; 'PostgreSQL Performance' Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS On Sun, 2007-01-07 at 20:26 -0600, Adam Rich wrote: > I'm using 8.2 and using order by & limit is still faster than MAX() > even though MAX() now seems to rewrite to an almost identical plan > internally. Gonna need you to back that up :) Can we get an explain analyze? > Count(*) still seems to use a full table scan rather than an index scan. > There is a TODO out there to help this. Don't know if it will get done. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Craig A. James wrote: > Postgres functions like count() and max() are "plug ins" which has huge > architectural advantages. But in pre-8.1 releases, there was a big > speed penalty for this: functions like count() were very, very slow, > requiring a full table scan. I think this is vastly improved from 8.0x > to 8.1 and forward; others might be able to comment whether count() is > now as fast in Postgres as Oracle. The "idiom" to replace count() was ^^^^^^ BigDBMS == Oracle. ;-) -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Thu, Dec 28, 2006 at 02:15:31PM -0800, Jeff Frost wrote: > When benchmarking various options for a new PG server at one of my clients, > I tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be > fastest to have ext2 for the WAL. The winning time was 157m46.713s for > ext2, 159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3 > data=writeback. This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/ > 1GB BBU cache. This config benched out faster than a 6disk RAID10 + 2 disk > RAID1 for those of you who have been wondering if the BBU write back cache > mitigates the need for separate WAL (at least on this workload). Those are > the fastest times for each config, but ext2 WAL was always faster than the > other two options. I didn't test any other filesystems in this go around. Uh, if I'm reading this correctly, you're saying that WAL on a separate ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of ~158.5 minutes, or 0.4%? Is that even above the noise for your measurements? I suspect the phase of the moon might play a bigger role ;P -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Sun, Jan 07, 2007 at 11:26:01PM -0500, Guy Rouillier wrote: > Ok, I ran with the settings below, but with > > shared_buffers=768MB > effective_cache_size=2048MB > fsync=on > > This run took 29000 seconds. I'm beginning to think configuration > changes are not going to buy significant additional improvement. Time > to look at the app implementation. Very likely, but one thing I haven't seen mentioned is what your bottleneck actually is. Is it CPU? Disk? Something else (ie: neither CPU or IO is at 100%). Additionally, since you have multiple arrays, are you sure they're being utilized equally? Having something like MRTG or cricket will make your tuning much easier. Unlike Oracle, PostgreSQL has no ability to avoid hitting the base table even if an index could cover a query... so compared to Oracle you'll need to dedicate a lot more IO to the base tables. Search around for PostgreSQL on Solaris tuning tips... there's some OS-settings that can make a huge difference. In particular, by default Solaris will only dedicate a fraction of memory to disk caching. That won't bother Oracle much but it's a big deal to PostgreSQL. I think there's some other relevant OS parameters as well. For vacuum, you're going to need to tune the vacuum_cost_* settings so that you can balance the IO impact of vacuums with the need to complete the vacuums in a reasonable time. You'll find this easiest to tune by running manual vacuums and monitoring IO activity. You'll also likely need to tune the bgwriter so that checkpoints aren't killing you. If you're targeting a checkpoint every 5 minutes you'll need to at least up bgwriter_all_maxpages to shared_buffers (in pages) / 300 / 5. I'd round up a bit. As with everything, you'll need to tweak your values from there. If you're using stock bgwriter settings then you'll probably be seeing a big IO spike every time a checkpoint occurs. Speaking of which... how often are checkpoints? If you can tolerate 5 minutes of recovery time, (the default checkpoint_timeout), I suggest setting checkpount_warning to 290 seconds or so; that way if you're getting checkpoints much more often than every 5 minutes you'll be able to see in the logs. Speaking of which, going longer between checkpoints will likely help performance, if you can tolerate longer recovery times. I haven't actually tested the correlation, but I would expect recovery to complete in a maximum of checkpount_timeout seconds. If you can tolerate being in recovery mode for 10 minutes after a crash, try bumping checkpount_timeout, checkpount_warning and checkpoint_segments and see what it does for performance (if you do that you'll also want to tweak bgwriter further... in this case increasing bgwriter_delay would be easiest). Given what sounds like decent IO capabilities, you'll likely get better query plans from decreasing random_page_cost, probably to between 2 and 3. Speaking of IO... if you can switch to RAID10 you'll likely get better preformance since your write load is so heavy. Normally RAID5 is a complete performance killer as soon as you're doing much writing, but I'm guessing that those nice expensive Sun arrays are better than most RAID controllers. All that being said... generally the biggest tuning impact to be had for any database environment is in how the application is using the database. A few sub-optimal things in the application/database design could easily erase every gain you'll get from all your tuning. I suggest running EXPLAIN ANALYZE on the queries that are run most often and seeing what that shows. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Tue, 9 Jan 2007, Jim C. Nasby wrote: > On Thu, Dec 28, 2006 at 02:15:31PM -0800, Jeff Frost wrote: >> When benchmarking various options for a new PG server at one of my clients, >> I tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be >> fastest to have ext2 for the WAL. The winning time was 157m46.713s for >> ext2, 159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3 >> data=writeback. This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/ >> 1GB BBU cache. This config benched out faster than a 6disk RAID10 + 2 disk >> RAID1 for those of you who have been wondering if the BBU write back cache >> mitigates the need for separate WAL (at least on this workload). Those are >> the fastest times for each config, but ext2 WAL was always faster than the >> other two options. I didn't test any other filesystems in this go around. > > Uh, if I'm reading this correctly, you're saying that WAL on a separate > ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of > ~158.5 minutes, or 0.4%? Is that even above the noise for your > measurements? I suspect the phase of the moon might play a bigger role > ;P That's what I thought too...cept I ran it 20 times and ext2 won by that margin every time, so it was quite repeatable. :-/ -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On Tue, Jan 09, 2007 at 09:10:51AM -0800, Jeff Frost wrote: > On Tue, 9 Jan 2007, Jim C. Nasby wrote: > > >On Thu, Dec 28, 2006 at 02:15:31PM -0800, Jeff Frost wrote: > >>When benchmarking various options for a new PG server at one of my > >>clients, > >>I tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be > >>fastest to have ext2 for the WAL. The winning time was 157m46.713s for > >>ext2, 159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3 > >>data=writeback. This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/ > >>1GB BBU cache. This config benched out faster than a 6disk RAID10 + 2 > >>disk > >>RAID1 for those of you who have been wondering if the BBU write back cache > >>mitigates the need for separate WAL (at least on this workload). Those > >>are > >>the fastest times for each config, but ext2 WAL was always faster than the > >>other two options. I didn't test any other filesystems in this go around. > > > >Uh, if I'm reading this correctly, you're saying that WAL on a separate > >ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of > >~158.5 minutes, or 0.4%? Is that even above the noise for your > >measurements? I suspect the phase of the moon might play a bigger role > >;P > > That's what I thought too...cept I ran it 20 times and ext2 won by that > margin every time, so it was quite repeatable. :-/ Even so, you've got to really be hunting for performance to go through the hassle of different filesystems just to gain 0.4%... :) -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Wed, 10 Jan 2007, Jim C. Nasby wrote: >>>> RAID1 for those of you who have been wondering if the BBU write back cache >>>> mitigates the need for separate WAL (at least on this workload). Those >>>> are >>>> the fastest times for each config, but ext2 WAL was always faster than the >>>> other two options. I didn't test any other filesystems in this go around. >>> >>> Uh, if I'm reading this correctly, you're saying that WAL on a separate >>> ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of >>> ~158.5 minutes, or 0.4%? Is that even above the noise for your >>> measurements? I suspect the phase of the moon might play a bigger role >>> ;P >> >> That's what I thought too...cept I ran it 20 times and ext2 won by that >> margin every time, so it was quite repeatable. :-/ > > Even so, you've got to really be hunting for performance to go through > the hassle of different filesystems just to gain 0.4%... :) Indeed, but actually, I did the math again and it appears that it saves close to 2 minutes versus one big ext3. I guess the moral of the story is that having a separate pg_xlog even on the same physical volume tends to be slightly faster for write oriented workloads. Ext2 is slightly faster than ext3, but of course you could likely go with another filesystem yet and be even slightly faster as well. :-) I guess the real moral of the story is that you can probably use one big ext3 with the default config and it won't matter much more than 1-2% if you have a BBU. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
I originally posted the question below back in Dec 2006, and many helpful suggestions resulted. Unfortunately, since this was a closet effort, my official duties pushed further exploration to the back burner, then I lost my original test environment. So while I can no longer compare to BigDBMS, I've just made some discoveries that I thought others might find helpful. The app (which I inherited) was implemented making exhaustive use of stored procedures. All inserts and updates are done using procs. When configuration changes produced no noticeable improvements in performance, I turned to the application architecture. In a new environment, I updated an insert/update intensive part of the app to use embedded insert and update statements instead of invoking stored procedures that did the same work. All the remaining code, database implementation, hardware, etc remains the same. The results were significant. Running a repeatable test set of data produced the following results: With stored procs: 2595 seconds With embedded inserts/updates: 991 seconds So at least in this one scenario, it looks like the extensive use of stored procs is contributing significantly to long run times. Guy Rouillier wrote: > I don't want to violate any license agreement by discussing performance, > so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as > BigDBMS here. > > I'm trying to convince my employer to replace BigDBMS with PostgreSQL > for at least some of our Java applications. As a proof of concept, I > started with a high-volume (but conceptually simple) network data > collection application. This application collects files of 5-minute > usage statistics from our network devices, and stores a raw form of > these stats into one table and a normalized form into a second table. We > are currently storing about 12 million rows a day in the normalized > table, and each month we start new tables. For the normalized data, the > app inserts rows initialized to zero for the entire current day first > thing in the morning, then throughout the day as stats are received, > executes updates against existing rows. So the app has very high update > activity. > > In my test environment, I have a dual-x86 Linux platform running the > application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and > PostgreSQL 8.2.0 (only one at a time.) The Sun box has 4 disk arrays > attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those > familiar with these devices.) The arrays are set up with RAID5. So I'm > working with a consistent hardware platform for this comparison. I'm > only processing a small subset of files (144.) > > BigDBMS processed this set of data in 20000 seconds, with all foreign > keys in place. With all foreign keys in place, PG took 54000 seconds to > complete the same job. I've tried various approaches to autovacuum > (none, 30-seconds) and it doesn't seem to make much difference. What > does seem to make a difference is eliminating all the foreign keys; in > that configuration, PG takes about 30000 seconds. Better, but BigDBMS > still has it beat significantly. > > I've got PG configured so that that the system database is on disk array > 2, as are the transaction log files. The default table space for the > test database is disk array 3. I've got all the reference tables (the > tables to which the foreign keys in the stats tables refer) on this > array. I also store the stats tables on this array. Finally, I put the > indexes for the stats tables on disk array 4. I don't use disk array 1 > because I believe it is a software array. > > I'm out of ideas how to improve this picture any further. I'd > appreciate some suggestions. Thanks. > -- Guy Rouillier