Thread: Insert vs Update

Insert vs Update

From
Robert DiFalco
Date:
First off I apologize if this is question has been beaten to death. I've looked around for a simple answer and could not find one. 

Given a database that will not have it's PKEY or indices modified, is it generally faster to INSERT or UPDATE data. And if there is a performance difference is it substantial?

I have a situation where I can easily do one or the other to the same effect. For example, I have a journaling schema with a limited number of "states" for an "entry". Currently each state is it's own table so I just insert them as they occur. But I could easily have a single "entry" table where the row is updated with column information for states (after the entry's initial insertion). 

Not a big deal but since it's so easy for me to take either approach I was wondering if one was more efficient (for a large DB) than another. 

Thanks!

Re: Insert vs Update

From
"David G. Johnston"
Date:
On Wed, Jul 15, 2015 at 12:16 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:
First off I apologize if this is question has been beaten to death. I've looked around for a simple answer and could not find one. 

Given a database that will not have it's PKEY or indices modified, is it generally faster to INSERT or UPDATE data. And if there is a performance difference is it substantial?

I have a situation where I can easily do one or the other to the same effect. For example, I have a journaling schema with a limited number of "states" for an "entry". Currently each state is it's own table so I just insert them as they occur. But I could easily have a single "entry" table where the row is updated with column information for states (after the entry's initial insertion). 

Not a big deal but since it's so easy for me to take either approach I was wondering if one was more efficient (for a large DB) than another. 


​There is HOT (heap only tuple?) optimization that can occur if only non-indexed data is altered.  I do not recall the specifics.

Dave
 

Re: Insert vs Update

From
Michael Nolan
Date:
I

On Wed, Jul 15, 2015 at 12:16 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:
First off I apologize if this is question has been beaten to death. I've looked around for a simple answer and could not find one. 

Given a database that will not have it's PKEY or indices modified, is it generally faster to INSERT or UPDATE data. And if there is a performance difference is it substantial?

I have a situation where I can easily do one or the other to the same effect. For example, I have a journaling schema with a limited number of "states" for an "entry". Currently each state is it's own table so I just insert them as they occur. But I could easily have a single "entry" table where the row is updated with column information for states (after the entry's initial insertion). 

Not a big deal but since it's so easy for me to take either approach I was wondering if one was more efficient (for a large DB) than another. 

Thanks
 
If you think of an update as a delete-insert operation (glossing over the fine points of what has to be done for ACID), it seems pretty clear that an update involves more work than an insert.  Measuring that impact on performance is probably a bit more challenging, because it's going to be dependent on the specific table and the contents of the row, among other things. 
--
Mike Nolan


Re: Insert vs Update

From
"David G. Johnston"
Date:
On Wednesday, July 15, 2015, Robert DiFalco <robert.difalco@gmail.com> wrote:
First off I apologize if this is question has been beaten to death. I've looked around for a simple answer and could not find one. 

Given a database that will not have it's PKEY or indices modified, is it generally faster to INSERT or UPDATE data. And if there is a performance difference is it substantial?

This seems odd.  If you have an option to update but choose to insert what becomes of the other record? 

Re: Insert vs Update

From
Robert DiFalco
Date:


On Wed, Jul 15, 2015 at 10:33 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wednesday, July 15, 2015, Robert DiFalco <robert.difalco@gmail.com> wrote:
First off I apologize if this is question has been beaten to death. I've looked around for a simple answer and could not find one. 

Given a database that will not have it's PKEY or indices modified, is it generally faster to INSERT or UPDATE data. And if there is a performance difference is it substantial?

This seems odd.  If you have an option to update but choose to insert what becomes of the other record? 


Consider the two pseudo-schemas, I'm just making this up for example purposes:

SCHEMA A
=========
meal(id SEQUENCE,user_id, started DEFAULT NOW())
meal_prepared(ref_meal_id, prepared DEFAULT NOW())
meal_abandoned(ref_meal_id, abandoned ...)
meal_consumed(ref_meal_id, consumed ...)
etc.

Then in response to different meal events you always have an insert. 

aMealId = INSERT INTO meal(user_id) VALUES (aUserId);

When preparation starts:

INSERT INTO meal_prepared(ref_meal_id) VALUES (aMealId);

And so on for each event. 

Compare that to this:

SCHEMA B
=========
meal_event(id, started, prepared, abandoned, consumed, ...)

The start of the meal is an INSERT:

aMealId = INSERT INTO meal_event(user_id, started) VALUES (aUserId, NOW());

When preparation starts:

UPDATE meal_event SET prepared = NOW() WHERE id = aMealId;

And so on.

Basically the same data, in one case you always do inserts and add new tables for new events. In the other case you only insert once and then update for each state, then you add columns if you have new states. 

As I said this is just an example. But in SCHEMA A you have only inserts, lots of tables and in SCHEMA B you have a lot of updates and a lot of possibly NULL columns if certain events don't occur. 

Is that more clear?

R.


 

Re: Insert vs Update

From
"David G. Johnston"
Date:
On Wed, Jul 15, 2015 at 1:56 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:


On Wed, Jul 15, 2015 at 10:33 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wednesday, July 15, 2015, Robert DiFalco <robert.difalco@gmail.com> wrote:
First off I apologize if this is question has been beaten to death. I've looked around for a simple answer and could not find one. 

Given a database that will not have it's PKEY or indices modified, is it generally faster to INSERT or UPDATE data. And if there is a performance difference is it substantial?

This seems odd.  If you have an option to update but choose to insert what becomes of the other record? 


Consider the two pseudo-schemas, I'm just making this up for example purposes:

SCHEMA A
=========
meal(id SEQUENCE,user_id, started DEFAULT NOW())
meal_prepared(ref_meal_id, prepared DEFAULT NOW())
meal_abandoned(ref_meal_id, abandoned ...)
meal_consumed(ref_meal_id, consumed ...)
etc.

Then in response to different meal events you always have an insert. 

aMealId = INSERT INTO meal(user_id) VALUES (aUserId);

When preparation starts:

INSERT INTO meal_prepared(ref_meal_id) VALUES (aMealId);

And so on for each event. 

Compare that to this:

SCHEMA B
=========
meal_event(id, started, prepared, abandoned, consumed, ...)

The start of the meal is an INSERT:

aMealId = INSERT INTO meal_event(user_id, started) VALUES (aUserId, NOW());

When preparation starts:

UPDATE meal_event SET prepared = NOW() WHERE id = aMealId;

And so on.

Basically the same data, in one case you always do inserts and add new tables for new events. In the other case you only insert once and then update for each state, then you add columns if you have new states. 

As I said this is just an example. But in SCHEMA A you have only inserts, lots of tables and in SCHEMA B you have a lot of updates and a lot of possibly NULL columns if certain events don't occur. 

Is that more clear?


​Yes, you are trying to choose between a bunch of one-to-one (optional) relationships versus adding additional columns to a table all of which can be null.

​I'd argue that neither option is "normal" (in the DB normalization sense).

CREATE TABLE meal (meal_id bigserial)
CREATE TABLE meal_even​t_type (meal_event_id bigserial)
CREATE TABLE meal_event (meal_id bigint, meal_event_id bigint, occurred_at timestamptz)

​So now the decision is one of how to denormalize.  materialzed views and two ways to do so.  The specific solution would depend in part on the final application queries that you need to write.

If you do want to model the de-normalized form, which I would likely be tempted to do given a fixed set of "events" that do not require additional related attributes, would be to place the few event timestamps on the main table and UPDATE them to non-null.

In the normal form you will likely find partial indexes to be quite useful.

David J.

Re: Insert vs Update

From
Robert DiFalco
Date:

On Wed, Jul 15, 2015 at 11:15 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:

​Yes, you are trying to choose between a bunch of one-to-one (optional) relationships versus adding additional columns to a table all of which can be null.

​I'd argue that neither option is "normal" (in the DB normalization sense).

CREATE TABLE meal (meal_id bigserial)
CREATE TABLE meal_even​t_type (meal_event_id bigserial)
CREATE TABLE meal_event (meal_id bigint, meal_event_id bigint, occurred_at timestamptz)

​So now the decision is one of how to denormalize.  materialzed views and two ways to do so.  The specific solution would depend in part on the final application queries that you need to write.

If you do want to model the de-normalized form, which I would likely be tempted to do given a fixed set of "events" that do not require additional related attributes, would be to place the few event timestamps on the main table and UPDATE them to non-null.

In the normal form you will likely find partial indexes to be quite useful.

David J.


Thanks David, my example was a big simplification, but I appreciate your guidance. The different event types have differing amounts of related data. Query speed on this schema is not important, it's really the write speed that matters. So I was just wondering given the INSERT or UPDATE approach (with no indexed data being changed) if one is likely to be substantially faster than the other. 

Re: Insert vs Update

From
"David G. Johnston"
Date:
On Wed, Jul 15, 2015 at 3:16 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:
The different event types have differing amounts of related data. 

​On this basis alone I would select the multiple-table version as my baseline and only consider something different if the performance of this was insufficient and I could prove that an alternative arrangement was more performant.

A single optional date with meta-data embedded in the column name​
 
​is usually workable but if you then have a bunch of other columns with name like:

preparation_date, preparation_col1, preparation_col2, consumed_col1, consumed_col2, consumed_date

​I would find that to be undesirable.

You may be able to put Table Inheritance to good use here...

I do not know (but doubt) if HOT optimization works when going from NULL to non-NULL since the former is stored in a bitmap while the later occupies normal relation space and thus the update would likely end up writing an entirely new​ record upon each event category recording.

David J.


Re: Insert vs Update

From
Robert DiFalco
Date:


On Wed, Jul 15, 2015 at 12:32 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

You may be able to put Table Inheritance to good use here...

I do not know (but doubt) if HOT optimization works when going from NULL to non-NULL since the former is stored in a bitmap while the later occupies normal relation space and thus the update would likely end up writing an entirely new​ record upon each event category recording.

David J.



Thanks! 

Re: Insert vs Update

From
Michael Nolan
Date:


On Wed, Jul 15, 2015 at 3:16 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:

Thanks David, my example was a big simplification, but I appreciate your guidance. The different event types have differing amounts of related data. Query speed on this schema is not important, it's really the write speed that matters. So I was just wondering given the INSERT or UPDATE approach (with no indexed data being changed) if one is likely to be substantially faster than the other. 


As I understand how ACID compliance is done, updating a record will require updating any indexes for that record, even if the index keys are not changing.  That's because any pending transactions still need to be able to find the 'old' data, while new transactions need to be able to find the 'new' data.  And ACID also means an update is essentially a delete-and-insert. 
--
Mike Nolan

Re: Insert vs Update

From
"David G. Johnston"
Date:
On Wed, Jul 15, 2015 at 4:53 PM, Michael Nolan <htfoot@gmail.com> wrote:
On Wed, Jul 15, 2015 at 3:16 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:

Thanks David, my example was a big simplification, but I appreciate your guidance. The different event types have differing amounts of related data. Query speed on this schema is not important, it's really the write speed that matters. So I was just wondering given the INSERT or UPDATE approach (with no indexed data being changed) if one is likely to be substantially faster than the other. 


As I understand how ACID compliance is done, updating a record will require updating any indexes for that record, even if the index keys are not changing.  That's because any pending transactions still need to be able to find the 'old' data, while new transactions need to be able to find the 'new' data.  And ACID also means an update is essentially a delete-and-insert.  

​I might be a bit pedantic here but what you describe is a byproduct of the specific​ implementation that PostgreSQL uses to affect Consistency (the C in ACID) as opposed to a forgone outcome in being ACID compliant.


I'm out of my comfort zone here but the HOT optimization is designed to leverage the fact that an update to a row that does not affect indexed values is able to leave the index alone and instead during index lookup the index points to the old tuple, notices that there is a chain present, and walks that chain to find the currently active tuple.

In short, if the only index is a PK an update of the row can avoid touching that index.

I mentioned that going from NULL to Not NULL may disrupt this but I'm thinking I may have mis-spoken.

Also, with separate tables the amount of data to write is going to be less because you'd have fewer columns on the affected tables.

While an update is a delete+insert a delete is mostly just a bit-flip action - at least mid-transaction.  Depending on volume, though, the periodic impact of vaccuming may want to be taken into consideration.

David J.

Re: Insert vs Update

From
Guillaume Lelarge
Date:

Le 15 juil. 2015 11:16 PM, "David G. Johnston" <david.g.johnston@gmail.com> a écrit :
>
> On Wed, Jul 15, 2015 at 4:53 PM, Michael Nolan <htfoot@gmail.com> wrote:
>>
>> On Wed, Jul 15, 2015 at 3:16 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:
>>>
>>>
>>> Thanks David, my example was a big simplification, but I appreciate your guidance. The different event types have differing amounts of related data. Query speed on this schema is not important, it's really the write speed that matters. So I was just wondering given the INSERT or UPDATE approach (with no indexed data being changed) if one is likely to be substantially faster than the other. 
>>>
>>
>> As I understand how ACID compliance is done, updating a record will require updating any indexes for that record, even if the index keys are not changing.  That's because any pending transactions still need to be able to find the 'old' data, while new transactions need to be able to find the 'new' data.  And ACID also means an update is essentially a delete-and-insert.  
>
>
> ​I might be a bit pedantic here but what you describe is a byproduct of the specific​ implementation that PostgreSQL uses to affect Consistency (the C in ACID) as opposed to a forgone outcome in being ACID compliant.
>
> http://www.postgresql.org/docs/9.4/static/mvcc-intro.html
>
> I'm out of my comfort zone here but the HOT optimization is designed to leverage the fact that an update to a row that does not affect indexed values is able to leave the index alone and instead during index lookup the index points to the old tuple, notices that there is a chain present, and walks that chain to find the currently active tuple.
>

That's true as long as the old and new tuples are stored in the same block.

> In short, if the only index is a PK an update of the row can avoid touching that index.
>
> I mentioned that going from NULL to Not NULL may disrupt this but I'm thinking I may have mis-spoken.
>
> Also, with separate tables the amount of data to write is going to be less because you'd have fewer columns on the affected tables.
>
> While an update is a delete+insert a delete is mostly just a bit-flip action - at least mid-transaction.  Depending on volume, though, the periodic impact of vaccuming may want to be taken into consideration.

--
Guillaume