Thread: pg_dump and DEFAULT column values

pg_dump and DEFAULT column values

From

"Eric Ridge"

Date:

06 November 2001, 15:57:33

I have a function and a table:

  create table foo (
     f1 integer default foo_func();
  );

  create function foo_func returns integer
     as 'select max(f1)+1 from foo'
     language 'sql';


when I use pg_dump, and then psql (\i <dumpfile>) to dump/reload
everything, table foo can't be created because it uses foo_func which
can't be created because it uses a table (foo) which doesn't exist.

So I always have to hand-edit the dump file:

  create table foo (
     f1 integer;
  );

  create function foo_func returns integer
     as 'select max(f1) from foo'
     language 'sql';

  alter table foo alter column f1 set default foo_func();


Is there a way for pg_dump to do this automagically? or should I define
the table and function differently?  Should the default value be
replaced with an on insert trigger?

eric

Re: pg_dump and DEFAULT column values

From

Tom Lane

Date:

06 November 2001, 16:37:37

"Eric Ridge" <ebr@tcdi.com> writes:
> I have a function and a table:
>   create table foo (
>      f1 integer default foo_func();
>   );

>   create function foo_func returns integer
>      as 'select max(f1)+1 from foo'
>      language 'sql';

> when I use pg_dump, and then psql (\i <dumpfile>) to dump/reload
> everything, table foo can't be created because it uses foo_func which
> can't be created because it uses a table (foo) which doesn't exist.

pg_dump has a hard time with circular dependencies :-(

Have you considered using a sequence, rather than generating new values
as shown above?  The approach you are using is extremely fragile:
consider what happens if two backends try to insert at the same time.

            regards, tom lane

Re: pg_dump and DEFAULT column values

From

"Eric Ridge"

Date:

06 November 2001, 17:17:37

> pg_dump has a hard time with circular dependencies :-(

but on the plus side, pg_dump is really fast!  much faster than dump
utilities of other databases I've seen.

> Have you considered using a sequence, rather than generating
> new values
> as shown above?  The approach you are using is extremely fragile:
> consider what happens if two backends try to insert at the same time.

a sequence it is!  thanks.

eric

Re: pg_dump and DEFAULT column values

From

"Eric Ridge"

Date:

06 November 2001, 17:27:33

> > Have you considered using a sequence, rather than generating
> > new values
> > as shown above?  The approach you are using is extremely fragile:
> > consider what happens if two backends try to insert at the
> same time.
>
> a sequence it is!  thanks.

well, but then again, I want the default value of that field to be 1
greater than the max value, not the next value in a sequence.

The client application has the ability to change the value of that
field, but new records need to be max+1.

So I guess to make pg_dump happy, and to solve potential concurrency
issues, I need a trigger for that field?  Or will I have the same
circular reference problem with a trigger?

eric

Re: pg_dump and DEFAULT column values

From

Tom Lane

Date:

07 November 2001, 15:57:55

"Eric Ridge" <ebr@tcdi.com> writes:
>> a sequence it is!  thanks.

> well, but then again, I want the default value of that field to be 1
> greater than the max value, not the next value in a sequence.
> The client application has the ability to change the value of that
> field, but new records need to be max+1.

Hmm.  I think you should think hard about why you believe that the
default has to work that way and you can't just use a sequence.
You're paying a high price to conform to what seems a very questionable
set of assumptions.

> So I guess to make pg_dump happy, and to solve potential concurrency
> issues, I need a trigger for that field?  Or will I have the same
> circular reference problem with a trigger?

The circular reference problem arises because SQL-language functions
are parsed and checked at CREATE FUNCTION time.  A cheezy way to get
around it is to define the function before the table, but write it in
a PL language --- presently, at least, PL function definitions are not
checked until first use.  So you could do

    create function get_the_max() ... language 'plpgsql';

    create table ... default get_the_max()

and it should work and also dump correctly.

            regards, tom lane

Re: pg_dump and DEFAULT column values

From

"Eric Ridge"

Date:

07 November 2001, 17:57:49

> Hmm.  I think you should think hard about why you believe that the
> default has to work that way and you can't just use a sequence.
> You're paying a high price to conform to what seems a very
> questionable set of assumptions.

I use the field for sorting, and since the client application has the
ability to change the value of this field, it is possible they could set
it to a number larger than the next value of the sequence.  then an
insert of a new record would appear before the changed record, instead
of after it.

> The circular reference problem arises because SQL-language functions
> are parsed and checked at CREATE FUNCTION time.  A cheezy way to get
> around it is to define the function before the table, but write it in
> a PL language --- presently, at least, PL function definitions are not
> checked until first use.  So you could do

I ended up creating a "before on insert" trigger in plpgsql.  pg_dump
seems to like this, and it gives me the ability to easily change how I
find the default value for this field.


And thanks for your time.  It is very much appreciated.

eric

Re: pg_dump and DEFAULT column values

From

Tom Lane

Date:

07 November 2001, 18:57:46

"Eric Ridge" <ebr@tcdi.com> writes:
>> Hmm.  I think you should think hard about why you believe that the
>> default has to work that way and you can't just use a sequence.
>> You're paying a high price to conform to what seems a very
>> questionable set of assumptions.

> I use the field for sorting, and since the client application has the
> ability to change the value of this field, it is possible they could set
> it to a number larger than the next value of the sequence.  then an
> insert of a new record would appear before the changed record, instead
> of after it.

But if you allow applications to change the field, then all bets about
sort order are off anyway, no?  It's far from clear exactly what
semantics you think you are guaranteeing here.

In any case it seems like changing the initially-assigned field value
is an infrequent operation, and that you ought to make *that* be the
expensive and not-safe-for-parallelism case, not row insertion.
(Perhaps this is a wrong assumption on my part, in which case ignore
what follows.)  I'd still be inclined to use a sequence for insertion,
and to enforce consistency on update with an AFTER UPDATE trigger that
does something like
    if old.fld != new.fld then
        begin
        lock mytable;
        select setval('seq', max(fld)) from table;
        end;
(Untested, but I hope the idea is clear.)

            regards, tom lane

Re: pg_dump and DEFAULT column values

From

"Eric Ridge"

Date:

08 November 2001, 00:57:51

> But if you allow applications to change the field, then all bets about
> sort order are off anyway, no?  It's far from clear exactly what
> semantics you think you are guaranteeing here.

The only thing I'm trying to guarantee is that new records appear last
(or first, depending on sort direction).  Once the user "physically"
re-sorts the records (by changing the value of this field) then I don't
care what the old values are.  Only that new records are greater than
the last existing value.

> In any case it seems like changing the initially-assigned field value
> is an infrequent operation, and that you ought to make *that* be the
> expensive and not-safe-for-parallelism case, not row insertion.

About as frequent as insertions.  As records are added, they're manually
sorted.  So every INSERT (eventually) causes a manual UPDATE for all
records.

In most cases I know before I do the INSERT, how the record needs to be
sorted, so I can set the this value accordingly.  But other cases
involve a bulk insert process, and a manual resorting process.

> (Perhaps this is a wrong assumption on my part, in which case ignore
> what follows.)  I'd still be inclined to use a sequence for insertion,
> and to enforce consistency on update with an AFTER UPDATE trigger that
> does something like
>     if old.fld != new.fld then
>         begin
>         lock mytable;
>         select setval('seq', max(fld)) from table;
>         end;
> (Untested, but I hope the idea is clear.)

Very clear.  However, this table is going to receive many UPDATEs that
aren't related to this sorting field.  Would the overall overhead of
executing the trigger on every UPDATE outweigh the overhead of doing it
on BEFORE INSERT:

   begin
   if NEW.fld is null then -- user didn't specify a value
      lock mytable;
      NEW.fld := select max(fld)+1 from table;
   end;

I guess it depends on the frequency of updates v/s the frequency of
inserts.  I'm not sure, at this point, which will be greater.

eric

Re: pg_dump and DEFAULT column values

From

"Peter Darley"

Date:

08 November 2001, 16:58:08

You might want to move that logic to your client application or the insert
statement, rather than having it done in the database, if people aren't
going to be going into the database without going through your application.
You could have something like
INSERT INTO Table (Field, Field2, Field3) VALUES ((SELECT MAX(Field) FROM
Table)+1, 'Value2', 17);

Thanks,
Peter Darley

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Eric Ridge
Sent: Wednesday, November 07, 2001 9:46 PM
To: Tom Lane
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] pg_dump and DEFAULT column values


> But if you allow applications to change the field, then all bets about
> sort order are off anyway, no?  It's far from clear exactly what
> semantics you think you are guaranteeing here.

The only thing I'm trying to guarantee is that new records appear last
(or first, depending on sort direction).  Once the user "physically"
re-sorts the records (by changing the value of this field) then I don't
care what the old values are.  Only that new records are greater than
the last existing value.

> In any case it seems like changing the initially-assigned field value
> is an infrequent operation, and that you ought to make *that* be the
> expensive and not-safe-for-parallelism case, not row insertion.

About as frequent as insertions.  As records are added, they're manually
sorted.  So every INSERT (eventually) causes a manual UPDATE for all
records.

In most cases I know before I do the INSERT, how the record needs to be
sorted, so I can set the this value accordingly.  But other cases
involve a bulk insert process, and a manual resorting process.

> (Perhaps this is a wrong assumption on my part, in which case ignore
> what follows.)  I'd still be inclined to use a sequence for insertion,
> and to enforce consistency on update with an AFTER UPDATE trigger that
> does something like
>     if old.fld != new.fld then
>         begin
>         lock mytable;
>         select setval('seq', max(fld)) from table;
>         end;
> (Untested, but I hope the idea is clear.)

Very clear.  However, this table is going to receive many UPDATEs that
aren't related to this sorting field.  Would the overall overhead of
executing the trigger on every UPDATE outweigh the overhead of doing it
on BEFORE INSERT:

   begin
   if NEW.fld is null then -- user didn't specify a value
      lock mytable;
      NEW.fld := select max(fld)+1 from table;
   end;

I guess it depends on the frequency of updates v/s the frequency of
inserts.  I'm not sure, at this point, which will be greater.

eric

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html