Thread: Performance - moving from oracle to postgresql
Hi: I'm beginning the push at our company to look at running postgreSQL in production here. We have a dual CPU 2.8 GHZ Xeon Box running oracle. Typical CPU load runs between 20% and 90%. Raw DB size is about 200GB. We hit the disk at roughly 15MB/s read volume and 3MB/s write. At any given time we have from 2 to 70 sessions running on the instance. Sessions often persist for 24 hours or more. Total Free Free Mb Mb % IDXS_EXT10 2000 290 14.5 DATA_EXT100 10000 3200 32 SYSTEM 220 95.2 43.3 IDXS_EXT100 20000 9600 48 DATA_EXT10 6000 2990 49.8 UNDOB 4000 2561.1 64 TEMP 8000 5802.9 72.5 DATA_LOB_EXT20 2000 1560 78 IDXS_EXT1 500 401 80.2 DATA_EXT1 4000 3758 94 Total Instance 56720 30258.2 53.3 There are some immediate questions from our engineers about performance "- Oracle has one particular performance enhancement that Postgres is missing. If you do a select that returns 100,000 rows in a given order, and all you want are rows 99101 to 99200, then Oracle can do that very efficiently. With Postgres, it has to read the first 99200 rows and then discard the first 99100. But... If we really want to look at performance, then we ought to put together a set of benchmarks of some typical tasks." Is this accurate: accoring to http://www.postgresql.org/docs/8.0/interactive/queries-limit.html -- " The rows skipped by an OFFSET clause still have to be computed inside the server; therefore a large OFFSET can be inefficient." What are the key performance areas I should be looking at? Where is psql not appropriate to replace Oracle? Thanks in advance, apologies if this occurs as spam, please send Replies to me off-list.
> There are some immediate questions from our engineers about performance > > "- Oracle has one particular performance enhancement that Postgres is > missing. If you do a select that returns 100,000 rows in a given order, > and all you want are rows 99101 to 99200, then Oracle can do that very > efficiently. With Postgres, it has to read the first 99200 rows and > then discard the first 99100. But... If we really want to look at > performance, then we ought to put together a set of benchmarks of some > typical tasks." > > Is this accurate: > accoring to > http://www.postgresql.org/docs/8.0/interactive/queries-limit.html > -- " The rows skipped by an OFFSET clause still have to be computed > inside the server; therefore a large OFFSET can be inefficient." Yes. That's accurate. First you need to determine whether PostgreSQLs method is fast enough for that specific query, and if the performance gains for other queries (inserts, updates, delete) from reduced index management evens out your concern. All performance gains through design changes either increase complexity dramatically or have a performance trade-off elsewhere. I find it rather odd that anyone would issue a single one-off select for 0.1% of the data about 99.1% of the way through, without doing anything with the rest. Perhaps you want to take a look at using a CURSOR? > Where is psql not appropriate to replace Oracle? Anything involving reporting using complex aggregates or very long running selects which Oracle can divide amongst multiple CPUs. Well, PostgreSQL can do it if you give it enough time to run the query, but a CUBE in PostgreSQL on a TB sized table would likely take significantly longer to complete. It's mostly just that the Pg developers haven't implemented those features optimally, or at all, yet. --
> "- Oracle has one particular performance enhancement that Postgres is > missing. If you do a select that returns 100,000 rows in a given order, > and all you want are rows 99101 to 99200, then Oracle can do that very > efficiently. With Postgres, it has to read the first 99200 rows and > then discard the first 99100. When I was reading up on resultset pagination on AskTom I got a clear impression that the same happens in Oracle as well. Resultset is like: 0....START...STOP...END 0............STOP START...END You first select all the rows from 0 to STOP and then from that select the rows from START to end (which is now STOP). This is done using ROWNUM twice and subselects. It was discussed over there that this obviously produces higher response times as you move towards the end of a very large resultset. Tom even pointed out the same effect when using google search, as you move forward through a very large (millions) search result. Regards, -- Radu-Adrian Popescu CSA, DBA, Developer Aldrapay MD Aldratech Ltd. +40213212243
Hi all I'm wondering if and how the size of a table affects speed of inserts into it? What if the table has indexes, does that alter the answer? Thanks
Hi, At 13:24 27/06/2005, Praveen Raja wrote: >I'm wondering if and how the size of a table affects speed of inserts >into it? What if the table has indexes, does that alter the answer? Many parameters will affect the result: - whether there are any indexes (including the primary key, unique constraints...) to update or not - whether there are any foreign keys from or to that table - the size of the rows - whether the table (or at least the bits being updated) fit in RAM or not - whether the table has "holes" (due to former updates/deletes and vacuum) and how they are placed - and probably a bunch of other things... Obviously, if you have an append-only (no updates, no deletes) table with no indexes and no foreign keys, the size of the table should not matter much. As soon as one of those conditions is not met table size will have an impact, probably small as long as whatever is needed can be held in RAM, a lot bigger once it's not the case. Hope that helps, Jacques.
Just to clear things up a bit, the scenario that I'm interested in is a table with a large number of indexes on it (maybe 7-8). In this scenario other than the overhead of having to maintain the indexes (which I'm guessing is the same regardless of the size of the table), does the size of the table play a role in determining insert performance (and I mean only insert performance)? -----Original Message----- From: Jacques Caron [mailto:jc@directinfos.com] Sent: 27 June 2005 13:40 To: Praveen Raja Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Insert performance vs Table size Hi, At 13:24 27/06/2005, Praveen Raja wrote: >I'm wondering if and how the size of a table affects speed of inserts >into it? What if the table has indexes, does that alter the answer? Many parameters will affect the result: - whether there are any indexes (including the primary key, unique constraints...) to update or not - whether there are any foreign keys from or to that table - the size of the rows - whether the table (or at least the bits being updated) fit in RAM or not - whether the table has "holes" (due to former updates/deletes and vacuum) and how they are placed - and probably a bunch of other things... Obviously, if you have an append-only (no updates, no deletes) table with no indexes and no foreign keys, the size of the table should not matter much. As soon as one of those conditions is not met table size will have an impact, probably small as long as whatever is needed can be held in RAM, a lot bigger once it's not the case. Hope that helps, Jacques.
Hi, At 13:50 27/06/2005, Praveen Raja wrote: >Just to clear things up a bit, the scenario that I'm interested in is a >table with a large number of indexes on it (maybe 7-8). If you're after performance you'll want to carefully consider which indexes are really useful and/or redesign your schema so that you can have less indexes on that table. 7 or 8 indexes is quite a lot, and that really has a cost. > In this scenario >other than the overhead of having to maintain the indexes (which I'm >guessing is the same regardless of the size of the table) Definitely not: indexes grow with the size of the table. Depending on what columns you index (and their types), the indexes may be a fraction of the size of the table, or they may be very close in size (in extreme cases they may even be larger). With 7 or 8 indexes, that can be quite a large volume of data to manipulate, especially if the values of the columns inserted can span the whole range of the index (rather than being solely id- or time-based, for instance, in which case index updates are concentrated in a small area of each of the indexes), as this means you'll need to have a majority of the indexes in RAM if you want to maintain decent performance. >does the size of the table play a role in determining insert performance >(and I mean >only insert performance)? In this case, it's really the indexes that'll cause you trouble, though heavily fragmented tables (due to lots of deletes or updates) will also incur a penalty just for the data part of the inserts. Also, don't forget the usual hints if you are going to do lots of inserts: - batch them in large transactions, don't do them one at a time - better yet, use COPY rather than INSERT - in some situations, you might be better of dropping the indexes, doing large batch inserts, then re-creating the indexes. YMMV depending on the existing/new ratio, whether you need to maintain indexed access to the tables, etc. - pay attention to foreign keys Jacques.
> There are some immediate questions from our engineers about performance > > "- Oracle has one particular performance enhancement that Postgres is > missing. If you do a select that returns 100,000 rows in a given order, > and all you want are rows 99101 to 99200, then Oracle can do that very > efficiently. With Postgres, it has to read the first 99200 rows and > then discard the first 99100. But... If we really want to look at > performance, then we ought to put together a set of benchmarks of some > typical tasks." I agree with Rod: you are correct but this is a very odd objection. You are declaring a set but are only interested in a tiny subset of that based on arbitrary critera. You can do this with cursors or with clever querying (not without materializing the full set however), but why? Merlin
I assume you took size to mean the row size? What I really meant was does the number of rows a table has affect the performance of new inserts into the table (just INSERTs) all other things remaining constant. Sorry for the confusion. I know that having indexes on the table adds an overhead but again does this overhead increase (for an INSERT operation) with the number of rows the table contains? My instinct says no to both. If I'm wrong can someone explain why the number of rows in a table affects INSERT performance? Thanks again -----Original Message----- From: Jacques Caron [mailto:jc@directinfos.com] Sent: 27 June 2005 14:05 To: Praveen Raja Cc: pgsql-performance@postgresql.org Subject: RE: [PERFORM] Insert performance vs Table size Hi, At 13:50 27/06/2005, Praveen Raja wrote: >Just to clear things up a bit, the scenario that I'm interested in is a >table with a large number of indexes on it (maybe 7-8). If you're after performance you'll want to carefully consider which indexes are really useful and/or redesign your schema so that you can have less indexes on that table. 7 or 8 indexes is quite a lot, and that really has a cost. > In this scenario >other than the overhead of having to maintain the indexes (which I'm >guessing is the same regardless of the size of the table) Definitely not: indexes grow with the size of the table. Depending on what columns you index (and their types), the indexes may be a fraction of the size of the table, or they may be very close in size (in extreme cases they may even be larger). With 7 or 8 indexes, that can be quite a large volume of data to manipulate, especially if the values of the columns inserted can span the whole range of the index (rather than being solely id- or time-based, for instance, in which case index updates are concentrated in a small area of each of the indexes), as this means you'll need to have a majority of the indexes in RAM if you want to maintain decent performance. >does the size of the table play a role in determining insert performance >(and I mean >only insert performance)? In this case, it's really the indexes that'll cause you trouble, though heavily fragmented tables (due to lots of deletes or updates) will also incur a penalty just for the data part of the inserts. Also, don't forget the usual hints if you are going to do lots of inserts: - batch them in large transactions, don't do them one at a time - better yet, use COPY rather than INSERT - in some situations, you might be better of dropping the indexes, doing large batch inserts, then re-creating the indexes. YMMV depending on the existing/new ratio, whether you need to maintain indexed access to the tables, etc. - pay attention to foreign keys Jacques.
Hi, At 11:50 28/06/2005, Praveen Raja wrote: >I assume you took size to mean the row size? Nope, the size of the table. > What I really meant was >does the number of rows a table has affect the performance of new >inserts into the table (just INSERTs) all other things remaining >constant. Sorry for the confusion. As I said previously, in most cases it does. One of the few cases where it doesn't would be an append-only table, no holes, no indexes, no foreign keys... >I know that having indexes on the table adds an overhead but again does >this overhead increase (for an INSERT operation) with the number of rows >the table contains? It depends on what you are indexing. If the index key is something that grows monotonically (e.g. a unique ID or a timestamp), then the size of the table (and hence of the indexes) should have a very limited influence on the INSERTs. If the index key is anything else (and that must definitely be the case if you have 7 or 8 indexes!), then that means updates will happen all over the indexes, which means a lot of read and write activity, and once the total size of your indexes exceeds what can be cached in RAM, performance will decrease quite a bit. Of course if your keys are concentrated in a few limited areas of the key ranges it might help. >My instinct says no to both. If I'm wrong can someone explain why the >number of rows in a table affects INSERT performance? As described above, maintaining indexes when you "hit" anywhere in said indexes is very costly. The larger the table, the larger the indexes, the higher the number of levels in the trees, etc. As long as it fits in RAM, it shouldn't be a problem. Once you exceed that threshold, you start getting a lot of random I/O, and that's expensive. Again, it depends a lot on your exact schema, the nature of the data, the spread of the different values, etc, but I would believe it's more often the case than not. Jacques.
"Praveen Raja" <praveen.raja@netlight.se> writes: > I know that having indexes on the table adds an overhead but again does > this overhead increase (for an INSERT operation) with the number of rows > the table contains? Typical index implementations (such as b-tree) have roughly O(log N) cost to insert or lookup a key in an N-entry index. So yes, it grows, though slowly. regards, tom lane