Re: questions about PG update performance - Mailing list pgsql-hackers

From Kisung Kim
Subject Re: questions about PG update performance
Date
Msg-id CABF0Rr2H7+fVODRgO4+=tRJZjyPZLZQUGMOi5ao2TznCtsaGbg@mail.gmail.com
Whole thread Raw
In response to Re: questions about PG update performance  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: questions about PG update performance  (Любен Каравелов <karavelov@mail.bg>)
Re: questions about PG update performance  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers

2015-10-26 11:12 GMT+09:00 Michael Paquier <michael.paquier@gmail.com>:


On Mon, Oct 26, 2015 at 10:52 AM, Kisung Kim wrote:
Because of the internal implementation of MVCC in PG
the update of a row is actually a insertion of a new version row.
So if the size of a row is huge, then it incurs some overhead compare to in-place update strategy.

Yeah, that's how an UPDATE in Postgres for MVCC usage. The xmax of the old row is updated, and a new row is inserted with an xmin equal to the previous xmax. So if you update tuple fields one by one the cost is going to be high.
 
Let's assume that a table has 200 columns, 
and a user updates one of the columns of one row in the table.
Then PG will rewrite the whole contents of the updated row
including the updated columns and not-updated columns.

When a table has a large number of columns, usually I would say that you have a normalization problem and such schemas could be split into a smaller set of tables, minimizing the UPDATE cost.
 
I'm not sure about the implementation of Oracle's update.
But if the Oracle can overwrite only the updated column,
the performance difference between Oracle and PG in that case may be significant.

I researched about this issues in mailing list and google.
But I've not found anything related to this issues.

What you are looking at here is columnar storage, Alvaro and 2nd Quadrant folks have been doing some work in this area recently:
http://www.postgresql.org/message-id/20150831225328.GM2912@alvherre.pgsql
Also, you may want to have a look at cstore_fdw: https://github.com/citusdata/cstore_fdw.
Regards,
--
Michael

Thank you for your reply.
I already know about the column store and that it is optimized for the case I described.

However, what I want to know is about the update performance difference between PG and Oracle if there any.
The case I described is for exaggerating the difference between PG and Oracle.

I want to explain for our clients that PG's update performance is comparable to Oracle's.

Regards, 



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: pgbench gaussian/exponential docs improvements
Next
From: Любен Каравелов
Date:
Subject: Re: questions about PG update performance