RE: [HACKERS] CONSTRAINTS... - Mailing list pgsql-hackers

From Jackson, DeJuan
Subject RE: [HACKERS] CONSTRAINTS...
Date
Msg-id F10BB1FAF801D111829B0060971D839F5DAAA4@cpsmail
Whole thread Raw
Responses Re: [HACKERS] CONSTRAINTS...  (jwieck@debis.com (Jan Wieck))
List pgsql-hackers
> -----Original Message-----
>     Generic  triggers  in  C  that  are  argument driven would be
>     possible.  But the drawback is that those triggers have to be
>     very  smart  to  use saved SPI plans (one for every different
>     argument set). And it must be row level triggers, so  for  an
>     update  to  a  2  meg  row table they will be fired 2 million
>     times and run their queries inside - will take some time.
> 
>     More painful in the 2  meg  row  situation  is  that  trigger
>     invocation  has  to be delayed until COMMIT if the constraint
>     is deferred. I think we cannot remember 2 million OLD plus  2
>     million  NEW  tuples  if one tuple can have up to 8K (will be
>     32GB to remember plus overhead), so we need  to  remember  at
>     least  the  CTID's  of  OLD  and NEW and refetch them for the
>     trigger invocation. OUTCH - the OLD ones are at the head  and
>     all the NEW ones are at the end of the tables file!
Agreed scrap the triggers idea.

> > Maybe let's go with the rewrite system, because it works, and is
> > flexible and strangely designed for this type of problem.  
> Similar to
> > how we use the rewrite system for views.
> 
>     And  the  other  changes  I've planned for the rewrite system
>     will improve this much more.
> 
>     1.  Change pg_rewrite.ev_attr into an int28.  This  would  be
>         useful  for  ON  UPDATE  rules  so the rewrite system can
>         easily check if a rule has to be applied or not. If  none
>         of  named  attributes  gets  something different assigned
>         than it's own OLD value, they aren't updated so the  rule
>         could  never  result  in  an  action  and  can be omitted
>         completely.
Would we add a system table for constraints?  I believe that the
DEFERRABLE option means we need special information about all
constraints.  (ie. is_deferrable boolean, is_deferred boolean).

>     2.  Create cross reference catalog that lists  all  relations
>         used in a rule (rangetable).  If we have a DELETE CASCADE
>         constraint, the rule is triggered on the  key  table  and
>         the action is a DELETE from the referencing table. If now
>         the  referencing  table  is  dropped,  the   rule   get's
>         corrupted because the resulting querytree isn't plannable
>         any longer (the relations in  the  rules  rangetable  are
>         identified  by  the OID in pg_class, not by relname). You
>         can see the effect if you create a view and drop  one  of
>         the base tables.
*Bleck*  We can't place cascading or restriction information into the
table definition.  Don't know if you had thought about it, but you could
only create them in reverse order (child<-parent), very messy when you
have many 'child' tables to cascade.

>         Well,  we need to define what to do if a table is dropped
>         that occurs in the crossref. First of all, the rules have
>         to  be dropped too, but in the case of a view rule, maybe
>         the whole view too?
I agree that we should drop rules defined on the dropped table, but not
those that simply access the dropped table, including views.  If I need
to drop a table which has a view defined for it simply to rearrange
column order, I'd like not to have to recreate the view as well and/or
any RI-constraints that reference that table as well.

>         And in the case where a key table to  which  another  one
>         has  a  CHECK  reference is dropped? The rule action will
>         allways abort,  so  it  isn't  useful  any  more.  But  I
>         wouldn't  like to silently drop it, because someone might
>         want to drop and recreate the key table  and  this  would
>         silently  result  in  that  all the constraints have been
>         lost.
This is what I mean about views and RI-constraints.

>         Maybe we should change the rulesystem at all so that  the
>         rangetable  entries  in the rule actions etc. are updated
>         with a lookup from pg_class at rewrite time. Must be done
>         carefully because someone might drop a table and recreate
>         it with a different schema corrupting  the  parsetree  of
>         the rule actions though.
Could we allow a 'recompile' of the rules?  Which would update the rule
actions and report which rules have been corrupted.  And possibly a way
to simple drop all invalid rules.  We could simply mark the rules that
need to be recompiled on a DROP or ALTER TABLE (to allow for dropping a
column, or changing a column type) from the crossref.  

>     3.  Allow  an  unlimited  number  of  rules  on  a  relation.
>         Currently there is a hard coded limit on  the  number  of
>         rules the relation can hold in it's slots.
I agree with this one without reference, especially if we actually
implement RI/constraints using rewrite.

> > I am basically asking for a reason _not_ to use the rewrite 
> system for
> > this.  I can't think of one myself.
> 
>     It  might  interfere with the new MVCC code. The rule actions
>     must see exactly the  OLD  tuples  that  where  used  in  the
>     original  statements.  Not  only  those  in the updated table
>     itself, think of an INSERT...SELECT or an  UPDATE  where  the
>     TLE or qual expressions are values from other tables.
Shouldn't be too hard to accomplish.  We simply need to be sure that the
values are either cached or re-cacheable/re-producible (if there are
memory concerns) until the end of the transaction.  So, either:1. cache all the values2. store the part of the
parsetreeneeded to recreate the values and
 
have MVCC mark any needed rows off limits to vacuum
I know that there are many other subtleties that I'm not addressing such
as indexes, but the above will get you the answer.
FWIW, RI and CONSTRAINTS would be affected by this.

>     Not  a  real reason, just something to have in mind and maybe
>     switching  silently  to  another  MVCC  isolation  level   if
>     constraint  rules get applied, so all tables read from now on
>     will  get  a  read  lock  applied  and  cannot  get   updated
>     concurrently until COMMIT.
Not needed, see the above.  We just need to mark the rows with the tid,
or some-such, because dropped/updated rows will hang around until a
vacuum.

>     And  it's  a  problem I've came across just writing this note
>     where MVCC already could  have  broken  rewrite  rule  system
>     semantics.
Maybe, but I know you'll straighten it all out.  I have faith in you,
Jan.

> Jan-DEJ

P.S. I love the discussion that my simple little request has sparked.


pgsql-hackers by date:

Previous
From: Hal Snyder
Date:
Subject: Re: [HACKERS] references to packaged versions of PostgreSQL
Next
From: Mikhail Terekhov
Date:
Subject: pg_dump again