Re: Insert vs Update - Mailing list pgsql-performance

From David G. Johnston
Subject Re: Insert vs Update
Date
Msg-id CAKFQuwbo3mvJM5zwY8dHML=7dwpGM9zF4FiXT0yQfYL7-fpr2A@mail.gmail.com
Whole thread Raw
In response to Re: Insert vs Update  (Robert DiFalco <robert.difalco@gmail.com>)
Responses Re: Insert vs Update  (Robert DiFalco <robert.difalco@gmail.com>)
List pgsql-performance
On Wed, Jul 15, 2015 at 1:56 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:


On Wed, Jul 15, 2015 at 10:33 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wednesday, July 15, 2015, Robert DiFalco <robert.difalco@gmail.com> wrote:
First off I apologize if this is question has been beaten to death. I've looked around for a simple answer and could not find one. 

Given a database that will not have it's PKEY or indices modified, is it generally faster to INSERT or UPDATE data. And if there is a performance difference is it substantial?

This seems odd.  If you have an option to update but choose to insert what becomes of the other record? 


Consider the two pseudo-schemas, I'm just making this up for example purposes:

SCHEMA A
=========
meal(id SEQUENCE,user_id, started DEFAULT NOW())
meal_prepared(ref_meal_id, prepared DEFAULT NOW())
meal_abandoned(ref_meal_id, abandoned ...)
meal_consumed(ref_meal_id, consumed ...)
etc.

Then in response to different meal events you always have an insert. 

aMealId = INSERT INTO meal(user_id) VALUES (aUserId);

When preparation starts:

INSERT INTO meal_prepared(ref_meal_id) VALUES (aMealId);

And so on for each event. 

Compare that to this:

SCHEMA B
=========
meal_event(id, started, prepared, abandoned, consumed, ...)

The start of the meal is an INSERT:

aMealId = INSERT INTO meal_event(user_id, started) VALUES (aUserId, NOW());

When preparation starts:

UPDATE meal_event SET prepared = NOW() WHERE id = aMealId;

And so on.

Basically the same data, in one case you always do inserts and add new tables for new events. In the other case you only insert once and then update for each state, then you add columns if you have new states. 

As I said this is just an example. But in SCHEMA A you have only inserts, lots of tables and in SCHEMA B you have a lot of updates and a lot of possibly NULL columns if certain events don't occur. 

Is that more clear?


​Yes, you are trying to choose between a bunch of one-to-one (optional) relationships versus adding additional columns to a table all of which can be null.

​I'd argue that neither option is "normal" (in the DB normalization sense).

CREATE TABLE meal (meal_id bigserial)
CREATE TABLE meal_even​t_type (meal_event_id bigserial)
CREATE TABLE meal_event (meal_id bigint, meal_event_id bigint, occurred_at timestamptz)

​So now the decision is one of how to denormalize.  materialzed views and two ways to do so.  The specific solution would depend in part on the final application queries that you need to write.

If you do want to model the de-normalized form, which I would likely be tempted to do given a fixed set of "events" that do not require additional related attributes, would be to place the few event timestamps on the main table and UPDATE them to non-null.

In the normal form you will likely find partial indexes to be quite useful.

David J.

pgsql-performance by date:

Previous
From: Robert DiFalco
Date:
Subject: Re: Insert vs Update
Next
From: Robert DiFalco
Date:
Subject: Re: Insert vs Update