Re: Truncate if exists - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Truncate if exists
Date
Msg-id CA+TgmoaM0Qb9pHY3p9YgpeTfrS2j_9keMZmP5bibGur5RbBz7w@mail.gmail.com
Whole thread Raw
In response to Re: Truncate if exists  ("Stafford, David x78061" <David.Stafford@broadridge.com>)
List pgsql-hackers
On Tue, Oct 16, 2012 at 2:12 PM, Stafford, David x78061
<David.Stafford@broadridge.com> wrote:
> On Mon, 15 Oct 2012 10:21:18 -0700, Robert Haas <robertmhaas@gmail.com> wrote:
>> Yeah, I think the functionality that we need is pretty much there
>> already today.  What we need to do is to get the syntax to a point
>> where people can write the code they want to write without getting
>> tangled up by it.
>>
>> I think the invention of DO was a big step in the right direction
>> ...
>> With DO, you can write the logic you want
>> as an SQL statement, it's just a clunky and awkward SQL statement.  In
>> my view the goal ought to be to refine that mechanism to remove the
>> clunkiness and awkwardness, rather than to invent something completely
>> new.
>
> As someone who has worked with a number of databases now, none of them really get this DDL integration completely
right. What I'd like to see is 1) a predicate to easily test things about the schema (does this table, column, index,
schema,etc. exist?  does it have the right type?) and 2) a way to conditionally execute DDL (and DML, which should fall
rightout, but it isn't really what this discussion is covering).  I would propose extending the current EXISTS / NOT
EXISTSpredicate as follows: 
>
> [NOT] EXISTS TABLE tab [ ( col [type] [ , col [type]]... ) ]
> [NOT] EXISTS COLUMN tab.col [type]
> [NOT] EXISTS INDEX tab.idx [ ( col [ , col]... ) ]
> [NOT] EXISTS CONSTRAINT tab.cstrt -- not sure what else might be useful here
> [NOT] EXISTS ( select ) -- this is the current EXISTS predicate, of course
> [NOT] EXISTS ANY ROWS FROM tab [join] [WHERE predicate] [GROUP BY col [ , col]...] -- exactly the same as
>                                                                                    -- (select 1 FROM etc.)
>                                                                                    -- only because I like
>                                                                                    -- it better
> (the latter [which by no means am I nuts over; it's just that when extending EXISTS I can't stop neatening it up to
mypersonal preferences] could be extended with [NOT] EXISTS MORE THAN n ROWS FROM and [NOT] EXISTS EXACTLY n ROWS
FROM.)
>
> There is a new SQL statement: IF predicate true-statement [ELSE false-statement].
>
> To actually execute this new IF statement, the executor would need an IF node that evaluates the predicate (with ANDs
andORs, just like all SQL predicates) and then executes the rest only if the predicate came out TRUE (or NOT FALSE; I
forgetwhich is usually used, and the difference with NULL could be useful, as long as it matches other predicates).
Thismoves one more bit of procedural logic into the executor. 
>
> Another wrinkle is that the dependent statement might not compile, due to missing tables or whatnot.  Actually
executingit while it doesn't compile is an error, but we want to defer that error until we actually decide we need to
executeit.  Also, it's probably good to try compiling it again at that point.  So my thought would be to try planning
thedependent statement(s); if they compile, hook them to the IF node; if not, hook a DEFERRED node to the IF node.  The
DEFERREDnode has the parse tree (or raw string, whatever makes sense) of the statement; on execution it tries again to
planthat statement; if it succeeds, run it; if not, error out. 
>
> I'd also add a SEQUENCE node to the executor.  It just runs its children in order (could be n-ary, or if fixed arity
nodesare what is in the planner/executor today, could be binary, first left, then right, and right could be another
SEQUENCE). The DEFERRED node means that a CREATE statement could precede use of what is created in the same sequence
andall could get planned (with some deferral to execution time) in advance and run in one lump.  This implements DO at
theexecutor level. 
>
> The biggest concepts left from plpgsql are looping and variables.  Most variables could be modeled as a single row
value;SQL already can update a row, so the planning of assignments and calculations of scalars (and arrays, I think)
alreadyfits into things the planner knows about.  Table variables (which I don't know that plpgsql supports, but
somedayit should) are less defined.  Adding plpgsql's loops to the executor would let whole functions run under one
tripthrough the executor.  This is beyond just improving the DDL support for scripts. 
>
> I have written a number of database upgrade scripts.  Over time we've made them less fragile, by checking for the
existenceof tables, indexes, and most recently, columns.  The usual sequence is: 
> 1) check the existence of an index; check that the first few columns are correct; if not, drop the index
> 2) repeat for other indexes that have changed definition over time
> 3) check the existence of the table; create with current layout if it is missing
> 4) check for the presence of a column; if missing, alter the table to add it (of course, we can only add new columns
atthe end, and occasionally delete a column) 
> 5) repeat for more columns
> 6) check the existence of an index; if missing, create it
> 7) repeat for all the indexes
> This is doable in most databases, but pretty messy.  You need to join with infoschema tables, or system tables, or
useclunky functions to check for existence; checking types is usually pretty horrid.  Consequently, we only check a few
thingsand trust that the schema is only in a few different states.  A true schema comparator and upgrade solver would
begreat, but I don't know anyone who has written such a thing.  The extended EXISTS predicate that could check tables
andindexes would declutter a lot of our upgrade scripts.  That's the use case for me. 

I agree.  I think something like this would be great.  But figuring
out how to make it happen is, of course, the trick.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Next
From: Christopher Browne
Date:
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility