Thread: questions about PG update performance
Dear,
I have a question about update performance of PG.
Because of the internal implementation of MVCC in PG
the update of a row is actually a insertion of a new version row.
So if the size of a row is huge, then it incurs some overhead compare to in-place update strategy.
Let's assume that a table has 200 columns,
and a user updates one of the columns of one row in the table.
Then PG will rewrite the whole contents of the updated row
including the updated columns and not-updated columns.
I'm not sure about the implementation of Oracle's update.
But if the Oracle can overwrite only the updated column,
the performance difference between Oracle and PG in that case may be significant.
I researched about this issues in mailing list and google.
But I've not found anything related to this issues.
Thank you.
Kisung Kim.
(C)Bitnine, Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : kskim@bitnine.co.kr
Office phone : 070-4800-3321
Mobile phone : 010-7136-0834
Fax : 02-568-1332
On Mon, Oct 26, 2015 at 10:52 AM, Kisung Kim wrote:
Because of the internal implementation of MVCC in PGthe update of a row is actually a insertion of a new version row.So if the size of a row is huge, then it incurs some overhead compare to in-place update strategy.
Yeah, that's how an UPDATE in Postgres for MVCC usage. The xmax of the old row is updated, and a new row is inserted with an xmin equal to the previous xmax. So if you update tuple fields one by one the cost is going to be high.
Let's assume that a table has 200 columns,and a user updates one of the columns of one row in the table.Then PG will rewrite the whole contents of the updated rowincluding the updated columns and not-updated columns.
When a table has a large number of columns, usually I would say that you have a normalization problem and such schemas could be split into a smaller set of tables, minimizing the UPDATE cost.
I'm not sure about the implementation of Oracle's update.But if the Oracle can overwrite only the updated column,the performance difference between Oracle and PG in that case may be significant.I researched about this issues in mailing list and google.But I've not found anything related to this issues.
What you are looking at here is columnar storage, Alvaro and 2nd Quadrant folks have been doing some work in this area recently:
http://www.postgresql.org/message-id/20150831225328.GM2912@alvherre.pgsql
http://www.postgresql.org/message-id/20150831225328.GM2912@alvherre.pgsql
Also, you may want to have a look at cstore_fdw: https://github.com/citusdata/cstore_fdw.
Regards,
--
Michael
2015-10-26 11:12 GMT+09:00 Michael Paquier <michael.paquier@gmail.com>:
On Mon, Oct 26, 2015 at 10:52 AM, Kisung Kim wrote:Because of the internal implementation of MVCC in PGthe update of a row is actually a insertion of a new version row.So if the size of a row is huge, then it incurs some overhead compare to in-place update strategy.Yeah, that's how an UPDATE in Postgres for MVCC usage. The xmax of the old row is updated, and a new row is inserted with an xmin equal to the previous xmax. So if you update tuple fields one by one the cost is going to be high.Let's assume that a table has 200 columns,and a user updates one of the columns of one row in the table.Then PG will rewrite the whole contents of the updated rowincluding the updated columns and not-updated columns.When a table has a large number of columns, usually I would say that you have a normalization problem and such schemas could be split into a smaller set of tables, minimizing the UPDATE cost.
I'm not sure about the implementation of Oracle's update.But if the Oracle can overwrite only the updated column,the performance difference between Oracle and PG in that case may be significant.I researched about this issues in mailing list and google.But I've not found anything related to this issues.What you are looking at here is columnar storage, Alvaro and 2nd Quadrant folks have been doing some work in this area recently:
http://www.postgresql.org/message-id/20150831225328.GM2912@alvherre.pgsqlAlso, you may want to have a look at cstore_fdw: https://github.com/citusdata/cstore_fdw.Regards,--Michael
Thank you for your reply.
I already know about the column store and that it is optimized for the case I described.
However, what I want to know is about the update performance difference between PG and Oracle if there any.
The case I described is for exaggerating the difference between PG and Oracle.
I want to explain for our clients that PG's update performance is comparable to Oracle's.
Regards,
----- Цитат от Kisung Kim (kskim@bitnine.co.kr), на 26.10.2015 в 04:36 ----- > However, what I want to know is about the update performance difference > between PG and Oracle if there any. > The case I described is for exaggerating the difference between PG and > Oracle. > > I want to explain for our clients that PG's update performance is > comparable to Oracle's. > Oracle is also using MVCC but copies the old row in the rollback segment and rewrites the values in-place. It is still 2 writes as in Postgres. The difference is on roll-back and cleaning the row when it is not needed anymore. Regards, -- Luben Karavelov
On Mon, Oct 26, 2015 at 9:03 AM, Любен Каравелов <karavelov@mail.bg> wrote:
>
>
> ----- Цитат от Kisung Kim (kskim@bitnine.co.kr), на 26.10.2015 в 04:36 -----
>
> > However, what I want to know is about the update performance difference
> > between PG and Oracle if there any.
> > The case I described is for exaggerating the difference between PG and
> > Oracle.
> >
> > I want to explain for our clients that PG's update performance is
> > comparable to Oracle's.
> >
>
> Oracle is also using MVCC but copies the old row in the rollback segment and
> rewrites the values in-place.
>
>
> ----- Цитат от Kisung Kim (kskim@bitnine.co.kr), на 26.10.2015 в 04:36 -----
>
> > However, what I want to know is about the update performance difference
> > between PG and Oracle if there any.
> > The case I described is for exaggerating the difference between PG and
> > Oracle.
> >
> > I want to explain for our clients that PG's update performance is
> > comparable to Oracle's.
> >
>
> Oracle is also using MVCC but copies the old row in the rollback segment and
> rewrites the values in-place.
I think Oracle just copies the changed part of old row to rollback segment.
Also in Redo logs, it just writes the changed column value (both old and
new). So for the case we are discussing in this thread (one changed
column out of 200 columns), Oracle will just write the old value of that
column in Redo and then in rollback segment, and write the new value
in Redo and then do the in-place update in heap row.
In PostgreSQL, the whole new row is written in heap and diff tuple (difference
of old and new tuple; this optimization is done in 9.4, commit a3115f0d)
in WAL. I think we can try to optimize and use the same technique for
heap as used for WAL to make PostgreSQL more efficient for such scenario's,
however as of today, my guess is that PostgreSQL's update would be lagging
in this area.
> It is still 2 writes as in Postgres.
The difference is in the amount of data written per write.
On Mon, Oct 26, 2015 at 10:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Oct 26, 2015 at 9:03 AM, Любен Каравелов <karavelov@mail.bg> wrote:
>
>
> ----- Цитат от Kisung Kim (kskim@bitnine.co.kr), на 26.10.2015 в 04:36 -----
>
> > However, what I want to know is about the update performance difference
> > between PG and Oracle if there any.
> > The case I described is for exaggerating the difference between PG and
> > Oracle.
> >
> > I want to explain for our clients that PG's update performance is
> > comparable to Oracle's.
> >
>
> Oracle is also using MVCC but copies the old row in the rollback segment and
> rewrites the values in-place.I think Oracle just copies the changed part of old row to rollback segment.Also in Redo logs, it just writes the changed column value (both old andnew). So for the case we are discussing in this thread (one changedcolumn out of 200 columns), Oracle will just write the old value of thatcolumn in Redo and then in rollback segment, and write the new valuein Redo and then do the in-place update in heap row.
In that case, readers would pay the penalty for constructing the row. PostgreSQL will not incur the cost of reconstruction. Either writer or reader is bound to pay penalty. If the user's load is reader heavy it makes sense to use something like PG, else something like what is described above.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Mon, Oct 26, 2015 at 12:07 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Mon, Oct 26, 2015 at 10:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:I think Oracle just copies the changed part of old row to rollback segment.Also in Redo logs, it just writes the changed column value (both old andnew). So for the case we are discussing in this thread (one changedcolumn out of 200 columns), Oracle will just write the old value of thatcolumn in Redo and then in rollback segment, and write the new valuein Redo and then do the in-place update in heap row.In that case, readers would pay the penalty for constructing the row.
Readers that have snapshot older than update-transaction needs to
pay such cost, otherwise all newer transactions can directly read from
page. Also not all old-transaction readers have to pay any such cost.
Not only that, such a design has an advantage that the bloat due to
older data won't be there.
On Mon, Oct 26, 2015 at 4:09 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Oct 26, 2015 at 12:07 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:On Mon, Oct 26, 2015 at 10:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:I think Oracle just copies the changed part of old row to rollback segment.Also in Redo logs, it just writes the changed column value (both old andnew). So for the case we are discussing in this thread (one changedcolumn out of 200 columns), Oracle will just write the old value of thatcolumn in Redo and then in rollback segment, and write the new valuein Redo and then do the in-place update in heap row.In that case, readers would pay the penalty for constructing the row.Readers that have snapshot older than update-transaction needs topay such cost, otherwise all newer transactions can directly read frompage. Also not all old-transaction readers have to pay any such cost.
Can you please explain your last sentence?
Not only that, such a design has an advantage that the bloat due toolder data won't be there.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On 10/26/2015 05:49 AM, Amit Kapila wrote:
On Mon, Oct 26, 2015 at 9:03 AM, Любен Каравелов <karavelov@mail.bg> wrote:
>
>
> ----- Цитат от Kisung Kim (kskim@bitnine.co.kr), на 26.10.2015 в 04:36 -----
>
> > However, what I want to know is about the update performance difference
> > between PG and Oracle if there any.
> > The case I described is for exaggerating the difference between PG and
> > Oracle.
> >
> > I want to explain for our clients that PG's update performance is
> > comparable to Oracle's.
> >
>
> Oracle is also using MVCC but copies the old row in the rollback segment and
> rewrites the values in-place.I think Oracle just copies the changed part of old row to rollback segment.Also in Redo logs, it just writes the changed column value (both old andnew). So for the case we are discussing in this thread (one changedcolumn out of 200 columns), Oracle will just write the old value of thatcolumn in Redo and then in rollback segment, and write the new valuein Redo and then do the in-place update in heap row.
IMV, where Oracle is heavily optimized for "most DML transactions will commit successfully" and "no long-running transactions shall ever exists" / "not many transactions will have to read previous snapshots"(based on PI), Postgres does not actually make any such assumptions.
Hence, for long running transactions / massive concurrency-many clients reading and writing older snapshots, Postgres will be faster (less work to do compared to re-constructing rows based on PIs)
Plus, for updates where the size of the NEW row is bigger than the previous one (think adding text) the overhead is actually greater for Oracle (plus, they don't compress variable length values by default / no TOAST )... so here Postgres would be faster.
For text-intensive workloads, Postgres is measurably faster than Oracle mostly due to this fact (plus much more efficient in it use of storage/RAM...)
In PostgreSQL, the whole new row is written in heap and diff tuple (differenceof old and new tuple; this optimization is done in 9.4, commit a3115f0d)in WAL. I think we can try to optimize and use the same technique forheap as used for WAL to make PostgreSQL more efficient for such scenario's,however as of today, my guess is that PostgreSQL's update would be laggingin this area.
Yup. But see above for a potential reason where it might not be that bad, especially after the optimization you mention.
> It is still 2 writes as in Postgres.The difference is in the amount of data written per write.
Yes, but compressed (for varlena-based datum/data), batched (group-commit) so mostly sequential, and non-duplicated (WAL vs REDO+UNDO).
So I guess the difference is quite small nowadays, and differences will be heavily influenced by actual workload.
Just my 2 (euro-) cents.
/ J.L.
On 10/25/15 9:36 PM, Kisung Kim wrote: > I want to explain for our clients that PG's update performance is > comparable to Oracle's. There's really only 2 ways you can answer that. You can either handwave the question away ("Yes, update performance is comparable."), or you have to do actual benchmarking. Trying to answer this from a theoretical standpoint is completely useless because there's an absurd number of things that will affect this: Number of columns Data types Size of overall transaction Percent of transactions that roll back Size of table What % of table is updated every day Underlying hardware What OS the database is running on What filesystem the database is running on ... and that's just off the top of my head. Or to look at it another way, I guarantee you can create a scenario where Postgres beats the pants off Oracle, *or vice versa*. So you have to either go with an answer along the lines of "For most workloads the performance of both databases is similar." or you have to benchmark the actual application in question. Most performance issues you find will probably be correctable with a moderate amount of work. To me, the real tradeoff between Postgres and Oracle (or any other commercial database) is whether you'd rather spend money on expert employees or software contracts. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
On 27/10/15 11:37, Jim Nasby wrote: > On 10/25/15 9:36 PM, Kisung Kim wrote: >> I want to explain for our clients that PG's update performance is >> comparable to Oracle's. > > There's really only 2 ways you can answer that. You can either > handwave the question away ("Yes, update performance is comparable."), > or you have to do actual benchmarking. Trying to answer this from a > theoretical standpoint is completely useless because there's an absurd > number of things that will affect this: > > Number of columns > Data types > Size of overall transaction > Percent of transactions that roll back > Size of table > What % of table is updated every day > Underlying hardware > What OS the database is running on > What filesystem the database is running on > > ... and that's just off the top of my head. > > Or to look at it another way, I guarantee you can create a scenario > where Postgres beats the pants off Oracle, *or vice versa*. So you > have to either go with an answer along the lines of "For most > workloads the performance of both databases is similar." or you have > to benchmark the actual application in question. Most performance > issues you find will probably be correctable with a moderate amount of > work. > > To me, the real tradeoff between Postgres and Oracle (or any other > commercial database) is whether you'd rather spend money on expert > employees or software contracts. And of course, on how you alter the tuning parameters in postgresql.conf, like temp_buffers and work_mem. The 'correct' values will depend on your workload and amount of RAM etc.
On Mon, Oct 26, 2015 at 4:31 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
At broad level, it works this way: when the transaction starts, it
makes a note of the current SCN and then while reading a table or
an index page, it uses the SCN number to determine if the page contains
the effects of transactions that should not be visible to the current
transaction. If the page is found to contain the effects of invisible
transactions, then it recreates an older version of the page by undoing
the effects of each such transaction. Now once this older version
On Mon, Oct 26, 2015 at 4:09 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:On Mon, Oct 26, 2015 at 12:07 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:On Mon, Oct 26, 2015 at 10:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:I think Oracle just copies the changed part of old row to rollback segment.Also in Redo logs, it just writes the changed column value (both old andnew). So for the case we are discussing in this thread (one changedcolumn out of 200 columns), Oracle will just write the old value of thatcolumn in Redo and then in rollback segment, and write the new valuein Redo and then do the in-place update in heap row.In that case, readers would pay the penalty for constructing the row.Readers that have snapshot older than update-transaction needs topay such cost, otherwise all newer transactions can directly read frompage. Also not all old-transaction readers have to pay any such cost.Can you please explain your last sentence?
makes a note of the current SCN and then while reading a table or
an index page, it uses the SCN number to determine if the page contains
the effects of transactions that should not be visible to the current
transaction. If the page is found to contain the effects of invisible
transactions, then it recreates an older version of the page by undoing
the effects of each such transaction. Now once this older version
of page is recreated, this can be used to fetch the rows for transactions
older than the current transaction which has updated the page and
newer than the transaction which has recreated the page. For details
you can read the blog [1] written by sometime back.
[1] - http://amitkapila16.blogspot.in/2015/03/different-approaches-for-mvcc-used-in.html
With Regards,
Amit Kapila.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com