Thread: Performance - moving from oracle to postgresql

Performance - moving from oracle to postgresql

From
"Greg Maples"
Date:
Hi:
    I'm beginning the push at our company to look at running
postgreSQL in production here.  We have a dual CPU 2.8 GHZ Xeon
Box running oracle.  Typical CPU load runs between 20% and 90%.
Raw DB size is about 200GB.  We hit the disk at roughly 15MB/s
read volume and 3MB/s write.
    At any given time we have from 2 to 70 sessions running
on the instance.  Sessions often persist for 24 hours or more.

                  Total      Free      Free
                  Mb           Mb          %

 IDXS_EXT10      2000        290      14.5
 DATA_EXT100      10000     3200      32
 SYSTEM            220           95.2      43.3
 IDXS_EXT100      20000     9600      48
 DATA_EXT10      6000           2990      49.8
 UNDOB            4000       2561.1 64
 TEMP           8000           5802.9 72.5
 DATA_LOB_EXT20      2000           1560      78
 IDXS_EXT1      500           401      80.2
 DATA_EXT1      4000           3758      94
Total Instance     56720     30258.2     53.3


There are some immediate questions from our engineers about performance

"- Oracle has one particular performance enhancement that Postgres is
missing.  If you do a select that returns 100,000 rows in a given order,
and all you want are rows 99101 to 99200, then Oracle can do that very
efficiently.  With Postgres, it has to read the first 99200 rows and
then discard the first 99100.  But...  If we really want to look at
performance, then we ought to put together a set of benchmarks of some
typical tasks."

Is this accurate:
accoring to
http://www.postgresql.org/docs/8.0/interactive/queries-limit.html
  -- " The rows skipped by an OFFSET clause still have to be computed
inside the server; therefore a large OFFSET can be inefficient."


What are the key performance areas I should be looking at?
Where is psql not appropriate to replace Oracle?

Thanks in advance, apologies if this occurs as spam, please send
Replies to me off-list.

Re: Performance - moving from oracle to postgresql

From
Rod Taylor
Date:
> There are some immediate questions from our engineers about performance
>
> "- Oracle has one particular performance enhancement that Postgres is
> missing.  If you do a select that returns 100,000 rows in a given order,
> and all you want are rows 99101 to 99200, then Oracle can do that very
> efficiently.  With Postgres, it has to read the first 99200 rows and
> then discard the first 99100.  But...  If we really want to look at
> performance, then we ought to put together a set of benchmarks of some
> typical tasks."
>
> Is this accurate:
> accoring to
> http://www.postgresql.org/docs/8.0/interactive/queries-limit.html
>   -- " The rows skipped by an OFFSET clause still have to be computed
> inside the server; therefore a large OFFSET can be inefficient."

Yes. That's accurate. First you need to determine whether PostgreSQLs
method is fast enough for that specific query, and if the performance
gains for other queries (inserts, updates, delete) from reduced index
management evens out your concern. All performance gains through design
changes either increase complexity dramatically or have a performance
trade-off elsewhere.


I find it rather odd that anyone would issue a single one-off select for
0.1% of the data about 99.1% of the way through, without doing anything
with the rest. Perhaps you want to take a look at using a CURSOR?

> Where is psql not appropriate to replace Oracle?

Anything involving reporting using complex aggregates or very long
running selects which Oracle can divide amongst multiple CPUs.

Well, PostgreSQL can do it if you give it enough time to run the query,
but a CUBE in PostgreSQL on a TB sized table would likely take
significantly longer to complete. It's mostly just that the Pg
developers haven't implemented those features optimally, or at all, yet.

--


Re: Performance - moving from oracle to postgresql

From
"Radu-Adrian Popescu"
Date:
> "- Oracle has one particular performance enhancement that Postgres is
> missing.  If you do a select that returns 100,000 rows in a given order,
> and all you want are rows 99101 to 99200, then Oracle can do that very
> efficiently.  With Postgres, it has to read the first 99200 rows and
> then discard the first 99100.

When I was reading up on resultset pagination on AskTom I got a clear
impression that the same happens in Oracle as well.
Resultset is like:
0....START...STOP...END
0............STOP
     START...END
You first select all the rows from 0 to STOP and then from that select the
rows from START to end (which is now STOP). This is done using ROWNUM
twice and subselects.
It was discussed over there that this obviously produces higher response
times as you move towards the end of a very large resultset. Tom even
pointed out the same effect when using google search, as you move forward
through a very large (millions) search result.

Regards,
--
Radu-Adrian Popescu
CSA, DBA, Developer
Aldrapay MD
Aldratech Ltd.
+40213212243

Insert performance vs Table size

From
"Praveen Raja"
Date:
Hi all

I'm wondering if and how the size of a table affects speed of inserts
into it? What if the table has indexes, does that alter the answer?

Thanks



Re: Insert performance vs Table size

From
Jacques Caron
Date:
Hi,

At 13:24 27/06/2005, Praveen Raja wrote:
>I'm wondering if and how the size of a table affects speed of inserts
>into it? What if the table has indexes, does that alter the answer?

Many parameters will affect the result:
- whether there are any indexes (including the primary key, unique
constraints...) to update or not
- whether there are any foreign keys from or to that table
- the size of the rows
- whether the table (or at least the bits being updated) fit in RAM or not
- whether the table has "holes" (due to former updates/deletes and vacuum)
and how they are placed
- and probably a bunch of other things...

Obviously, if you have an append-only (no updates, no deletes) table with
no indexes and no foreign keys, the size of the table should not matter
much. As soon as one of those conditions is not met table size will have an
impact, probably small as long as whatever is needed can be held in RAM, a
lot bigger once it's not the case.

Hope that helps,

Jacques.



Re: Insert performance vs Table size

From
"Praveen Raja"
Date:
Just to clear things up a bit, the scenario that I'm interested in is a
table with a large number of indexes on it (maybe 7-8). In this scenario
other than the overhead of having to maintain the indexes (which I'm
guessing is the same regardless of the size of the table), does the size
of the table play a role in determining insert performance (and I mean
only insert performance)?

-----Original Message-----
From: Jacques Caron [mailto:jc@directinfos.com]
Sent: 27 June 2005 13:40
To: Praveen Raja
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Insert performance vs Table size

Hi,

At 13:24 27/06/2005, Praveen Raja wrote:
>I'm wondering if and how the size of a table affects speed of inserts
>into it? What if the table has indexes, does that alter the answer?

Many parameters will affect the result:
- whether there are any indexes (including the primary key, unique
constraints...) to update or not
- whether there are any foreign keys from or to that table
- the size of the rows
- whether the table (or at least the bits being updated) fit in RAM or
not
- whether the table has "holes" (due to former updates/deletes and
vacuum)
and how they are placed
- and probably a bunch of other things...

Obviously, if you have an append-only (no updates, no deletes) table
with
no indexes and no foreign keys, the size of the table should not matter
much. As soon as one of those conditions is not met table size will have
an
impact, probably small as long as whatever is needed can be held in RAM,
a
lot bigger once it's not the case.

Hope that helps,

Jacques.



Re: Insert performance vs Table size

From
Jacques Caron
Date:
Hi,

At 13:50 27/06/2005, Praveen Raja wrote:
>Just to clear things up a bit, the scenario that I'm interested in is a
>table with a large number of indexes on it (maybe 7-8).

If you're after performance you'll want to carefully consider which indexes
are really useful and/or redesign your schema so that you can have less
indexes on that table. 7 or 8 indexes is quite a lot, and that really has a
cost.

>  In this scenario
>other than the overhead of having to maintain the indexes (which I'm
>guessing is the same regardless of the size of the table)

Definitely not: indexes grow with the size of the table. Depending on what
columns you index (and their types), the indexes may be a fraction of the
size of the table, or they may be very close in size (in extreme cases they
may even be larger). With 7 or 8 indexes, that can be quite a large volume
of data to manipulate, especially if the values of the columns inserted can
span the whole range of the index (rather than being solely id- or
time-based, for instance, in which case index updates are concentrated in a
small area of each of the indexes), as this means you'll need to have a
majority of the indexes in RAM if you want to maintain decent performance.

>does the size of the table play a role in determining insert performance
>(and I mean
>only insert performance)?

In this case, it's really the indexes that'll cause you trouble, though
heavily fragmented tables (due to lots of deletes or updates) will also
incur a penalty just for the data part of the inserts.

Also, don't forget the usual hints if you are going to do lots of inserts:
- batch them in large transactions, don't do them one at a time
- better yet, use COPY rather than INSERT
- in some situations, you might be better of dropping the indexes, doing
large batch inserts, then re-creating the indexes. YMMV depending on the
existing/new ratio, whether you need to maintain indexed access to the
tables, etc.
- pay attention to foreign keys

Jacques.



Re: Performance - moving from oracle to postgresql

From
"Merlin Moncure"
Date:
> There are some immediate questions from our engineers about
performance
>
> "- Oracle has one particular performance enhancement that Postgres is
> missing.  If you do a select that returns 100,000 rows in a given
order,
> and all you want are rows 99101 to 99200, then Oracle can do that very
> efficiently.  With Postgres, it has to read the first 99200 rows and
> then discard the first 99100.  But...  If we really want to look at
> performance, then we ought to put together a set of benchmarks of some
> typical tasks."

I agree with Rod: you are correct but this is a very odd objection.  You
are declaring a set but are only interested in a tiny subset of that
based on arbitrary critera.  You can do this with cursors or with clever
querying (not without materializing the full set however), but why?

Merlin


Re: Insert performance vs Table size

From
"Praveen Raja"
Date:
I assume you took size to mean the row size? What I really meant was
does the number of rows a table has affect the performance of new
inserts into the table (just INSERTs) all other things remaining
constant. Sorry for the confusion.

I know that having indexes on the table adds an overhead but again does
this overhead increase (for an INSERT operation) with the number of rows
the table contains?

My instinct says no to both. If I'm wrong can someone explain why the
number of rows in a table affects INSERT performance?

Thanks again

-----Original Message-----
From: Jacques Caron [mailto:jc@directinfos.com]
Sent: 27 June 2005 14:05
To: Praveen Raja
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Insert performance vs Table size

Hi,

At 13:50 27/06/2005, Praveen Raja wrote:
>Just to clear things up a bit, the scenario that I'm interested in is a
>table with a large number of indexes on it (maybe 7-8).

If you're after performance you'll want to carefully consider which
indexes
are really useful and/or redesign your schema so that you can have less
indexes on that table. 7 or 8 indexes is quite a lot, and that really
has a
cost.

>  In this scenario
>other than the overhead of having to maintain the indexes (which I'm
>guessing is the same regardless of the size of the table)

Definitely not: indexes grow with the size of the table. Depending on
what
columns you index (and their types), the indexes may be a fraction of
the
size of the table, or they may be very close in size (in extreme cases
they
may even be larger). With 7 or 8 indexes, that can be quite a large
volume
of data to manipulate, especially if the values of the columns inserted
can
span the whole range of the index (rather than being solely id- or
time-based, for instance, in which case index updates are concentrated
in a
small area of each of the indexes), as this means you'll need to have a
majority of the indexes in RAM if you want to maintain decent
performance.

>does the size of the table play a role in determining insert
performance
>(and I mean
>only insert performance)?

In this case, it's really the indexes that'll cause you trouble, though
heavily fragmented tables (due to lots of deletes or updates) will also
incur a penalty just for the data part of the inserts.

Also, don't forget the usual hints if you are going to do lots of
inserts:
- batch them in large transactions, don't do them one at a time
- better yet, use COPY rather than INSERT
- in some situations, you might be better of dropping the indexes, doing

large batch inserts, then re-creating the indexes. YMMV depending on the

existing/new ratio, whether you need to maintain indexed access to the
tables, etc.
- pay attention to foreign keys

Jacques.



Re: Insert performance vs Table size

From
Jacques Caron
Date:
Hi,

At 11:50 28/06/2005, Praveen Raja wrote:
>I assume you took size to mean the row size?

Nope, the size of the table.

>  What I really meant was
>does the number of rows a table has affect the performance of new
>inserts into the table (just INSERTs) all other things remaining
>constant. Sorry for the confusion.

As I said previously, in most cases it does. One of the few cases where it
doesn't would be an append-only table, no holes, no indexes, no foreign keys...

>I know that having indexes on the table adds an overhead but again does
>this overhead increase (for an INSERT operation) with the number of rows
>the table contains?

It depends on what you are indexing. If the index key is something that
grows monotonically (e.g. a unique ID or a timestamp), then the size of the
table (and hence of the indexes) should have a very limited influence on
the INSERTs. If the index key is anything else (and that must definitely be
the case if you have 7 or 8 indexes!), then that means updates will happen
all over the indexes, which means a lot of read and write activity, and
once the total size of your indexes exceeds what can be cached in RAM,
performance will decrease quite a bit. Of course if your keys are
concentrated in a few limited areas of the key ranges it might help.

>My instinct says no to both. If I'm wrong can someone explain why the
>number of rows in a table affects INSERT performance?

As described above, maintaining indexes when you "hit" anywhere in said
indexes is very costly. The larger the table, the larger the indexes, the
higher the number of levels in the trees, etc. As long as it fits in RAM,
it shouldn't be a problem. Once you exceed that threshold, you start
getting a lot of random I/O, and that's expensive.

Again, it depends a lot on your exact schema, the nature of the data, the
spread of the different values, etc, but I would believe it's more often
the case than not.

Jacques.



Re: Insert performance vs Table size

From
Tom Lane
Date:
"Praveen Raja" <praveen.raja@netlight.se> writes:
> I know that having indexes on the table adds an overhead but again does
> this overhead increase (for an INSERT operation) with the number of rows
> the table contains?

Typical index implementations (such as b-tree) have roughly O(log N)
cost to insert or lookup a key in an N-entry index.  So yes, it grows,
though slowly.

            regards, tom lane