Re: Bulk Insert / Update / Delete - Mailing list pgsql-general

From Jason Godden
Subject Re: Bulk Insert / Update / Delete
Date
Msg-id 200308211853.50107.jasongodden@optushome.com.au
Whole thread Raw
In response to Re: Bulk Insert / Update / Delete  ("Philip Boonzaaier" <phil@cks.co.za>)
List pgsql-general
SQL (and PostgreSQL) certainly can do what you want to do.  You don't need an
external language to achieve this unless there are specific conditions that
require an external data source.

Essentially if you can bring the data into PostgreSQL (flat text) then using
the UPDATE ... FROM and INSERT ... SELECT FROM syntaxes you can determine:

a) What already exists and needs updating (you need a key for reference - well
you would with if statements anyway)

b) What doesn't exist and needs to be inserted.

If you can't do this then I don't know how my production system has been
running for the last 6 months, updating and bringing in new data every 24
hours  : )  You don't need to use pl/pgsql to get this basic functionality
out of your db.

I recommend reading:

http://www.postgresql.org/docs/7.3/static/sql-update.html
http://www.postgresql.org/docs/7.3/static/sql-insert.html
http://www.postgresql.org/docs/7.3/static/functions-subquery.html
http://www.postgresql.org/docs/7.3/static/queries-union.html

..and see below for an example of something I've been playing with for work.
It relies on two tables, a target (production table) and import table.  It
compares the two and deletes any non existant data from the production table
based on the import table, then updates any existing data and then inserts
new data - all in SQL and all working quite nicely.  And because it's all
wrapped in a transaction the whole lot has to work (or fail) and you achieve
the 'one sql action' effect you were trying to get.  Note that I use the
EXCEPT key word instead of EXISTS here.

begin;

--Group by to remove duplicates

delete from departments where not exists
(select departmenthostkey from ztimetable zt where departmenthostkey is not
null and departments.departmenthostkey = zt.departmenthostkey
group by zt.departmenthostkey);

update departments set department = zt.departmentname from
(select departmenthostkey,departmentname from ztimetable where
departmenthostkey is not null group by departmenthostkey,departmentname) zt
where (departments.departmenthostkey = zt.departmenthostkey and
departments.department <> zt.departmentname);

insert into departments (departmenthostkey,department) select
zt.departmenthostkey,zt.departmentname from
(select departmenthostkey,departmentname from ztimetable where
departmenthostkey is not null group by departmenthostkey,departmentname) zt
except (select departmenthostkey,department from departments);

commit;

Rgds,

Jason

On Fri, 22 Aug 2003 05:37 am, Philip Boonzaaier wrote:
> Hi Ron
>
> Yeah. I see what you are getting at. However, what about using a RULE ?
> This seems to fit what I am trying to do.
>
> Let me tell you what I am doing at the moment. I am migrating a COBOL based
> system to a RDBMS base, and eventually a Perl / Java / Whatever front end.
> As Phase 1, I am simple replicating the data in PostgreSQL. I have created
> tables identical to the 'records' in COBOL. When I INSERT in COBOL, I
> create an INSERT in SQL and action this. This is done externally from
> COBOL, and not using any embedded SQL features. Similarly with UPDATE.
> However, I now want to create a Table based on a sub - set of information,
> in the record in the first attempt, I am creating a table of Telephone
> numbers for an account, which is currently defined as an array of 4
> possibilities within the account record. ). Now, when UPDATING the main
> row, I have no idea if the sub - set of information is already in the
> database, or not. So I want to, simply by writing a SQL statement, INSERT
> or UPDATE the information in the database.
>
> Regards
>
> Phil
>
> ----- Original Message -----
> From: Ron Johnson <ron.l.johnson@cox.net>
> To: PgSQL General ML <pgsql-general@postgresql.org>
> Sent: Thursday, August 21, 2003 9:01 AM
> Subject: Re: [GENERAL] Bulk Insert / Update / Delete
>
> On Thu, 2003-08-21 at 13:33, Philip Boonzaaier wrote:
> > Hi Ron
> >
> > That is just the point. If Postgres cannot tell me which records exist
> > and need updating, and which do not and need inserting, then what can ?
> >
> > In the old world of indexed ISAM files it is very simple - try to get the
> > record ( row ) by primary key. If it is there, update it, if it is not,
> > insert it.
>
> SQL (and, by extension, the relational DBMS) isn't magic.  It just
> makes it easier to do what we did is the "old world of indexed ISAM"
> files.
>
> > Now, one can do this with a higher level language and SQL combined, but
> > is SQL that weak ?
>
> No, not weak.  See below.
>
> > What happens when you merge two tables ? Surely SQL must somehow
> > determine what needs INSERTING and what needs UPDATING.... Or does one
> > try to merge, get a failure, an resort to writing something in Perl or C
> > ?
>
> In this case, SQL will make it easier to tell you what's there,
> and, if the "comparison data" is loaded into a separate table,
> what's not there.
>
> So, yes, you will almost certainly need an "outer" language (C,
> Perl, Python, Tck/Tk, Java, etc).  However, you'll need less
> lines of the outer language if you use SQL.
>
> For example, if you use dumb old ISAM files, the most you can do
> is specify which index key you want the file sorted on before fetching
> *each* *row* *in* *the* *file*, and tough noogies if there are
> 100M rows in it.  And then you must code in IF statements to
> skip over any records that don't meet your criteria.  This is
> just adds more SLOC, thereby increasing the likelihood of bugs.
>
> With SQL, however, you embed the winnowing criteria as predicates
> in the WHERE clause, or maybe even the FROM clause, if you need
> certain kinds of sub-selects.
>
> If you think in terms of guns, SQL is a machine gun, thus giving
> great firepower/usefullness to the programmer.  However, it doesn't
> shoot silver bullets...
>
> Make any sense?
>
> > Please help to un - confuse me !
> >
> > Regards
> >
> > Phil
> > ----- Original Message -----
> > From: Ron Johnson <ron.l.johnson@cox.net>
> > To: PgSQL General ML <pgsql-general@postgresql.org>
> > Sent: Tuesday, August 19, 2003 6:45 PM
> > Subject: Re: [GENERAL] Bulk Insert / Update / Delete
> >
> > On Tue, 2003-08-19 at 22:03, Philip Boonzaaier wrote:
> > > Hi Jason
> > >
> > > Thanks for your prompt response.
> > >
> > > I'm pretty new to SQL, so please excuse the following rather stupid
> >
> > question
> >
> > > How do I use the (NOT) EXISTS or (NOT) IN clause ? Would it be
> > > feasible, using your suggestion, to simply put in two SQL statements,
> > > in the same query - first UPDATE when EXISTS, then INSERT when NOT
> > > EXISTS, to
> >
> > accomplist
> >
> > > this in one go ?
> > >
> > > Regards
> > >
> > > Phil
> >
> > How will you which records were updated, thus able to know which need
> > to be inserted?
> >
> > A temporary table and pl/pgsql should do the trick.
> >
> > > ----- Original Message -----
> > > From: Jason Godden <jasongodden@optushome.com.au>
> > > To: Philip Boonzaaier <phil@cks.co.za>; <pgsql-general@postgresql.org>
> > > Sent: Tuesday, August 19, 2003 4:42 PM
> > > Subject: Re: [GENERAL] Bulk Insert / Update / Delete
> > >
> > >
> > > Hi Philip,
> > >
> > > Pg is more ansi compliant than most (GoodThing (TM)).  You can use the
> > > 'when'
> > > conditional but not to do what you need.  If I understand you correclty
> >
> > you
> >
> > > should be able to acheive the same result using two seperate queries
> > > and
> >
> > the
> >
> > > (NOT) EXISTS or (NOT) IN clause.  Failing that have a look at the fine
> >
> > docs
> >
> > > on pl/pgsql and other postgresql procedural languages which allow you
> > > to
> >
> > use
> >
> > > loops and conditional statements like 'if'.
> > >
> > > Rgds,
> > >
> > > J
> > >
> > > On Wed, 20 Aug 2003 12:21 pm, Philip Boonzaaier wrote:
> > > > I want to be able to generate SQL statements that will go through a
>
> list
>
> > > of
> > >
> > > > data, effectively row by row, enquire on the database if this exists
>
> in
>
> > > the
> > >
> > > > selected table- If it exists, then the colums must be UPDATED, if
> > > > not,
> > >
> > > they
> > >
> > > > must be INSERTED.
> > > >
> > > > Logically then, I would like to SELECT * FROM <TABLE>
> > > > WHERE ....<Values entered here>, and then IF FOUND
> > > > UPDATE <TABLE> SET .... <Values entered here> ELSE
> > > > INSERT INTO <TABLE> VALUES <Values entered here>
> > > > END IF;
> > > >
> > > > The IF statement gets rejected by the parser. So it would appear that
> > > > PostgreSQL does not support an IF in this type of query, or maybe not
>
> at
>
> > > > all.
> > > >
> > > > Does anyone have any suggestions as to how I can achieve this ?
>
> --
> -----------------------------------------------------------------
> Ron Johnson, Jr. ron.l.johnson@cox.net
> Jefferson, LA USA
>
> After listening to many White House, Pentagon & CENTCOM
> briefings in both Gulf Wars, it is my firm belief that most
> "senior correspondents" either have serious agendas that don't
> get shaken by facts, or are dumb as dog feces.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>
> This message is privileged and confidential and intended for the addressee
> only. If you are not the intended recipient you may not disclose, copy or
> in any way use or publish the content hereof, which is subject to
> copyright.If you have received this in error, please destroy the original
> message and contact us at postmaster@cks.co.za. Any views expressed in this
> message are those of the individual sender, except where the sender
> specifically states them to be the view of Computerkit Retail Systems, its
> subsidiaries or associates. Please note that the recipient must scan this
> e-mail and attachments for  viruses. We accept no liability of whatever
> nature for any loss, liability,damage or expense resulting directly or
> indirectly from this transmission of this message and/or attachments.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match


pgsql-general by date:

Previous
From: Ron Johnson
Date:
Subject: Re: Bulk Insert / Update / Delete
Next
From: "Philip Boonzaaier"
Date:
Subject: Re: Bulk Insert / Update / Delete