Thread: pg_dump / Unique constraints

pg_dump / Unique constraints

From
"Christopher Kings-Lynne"
Date:
I've been examining the pg_dump source and output, and I've come to the
conclusion that I can modify it so that UNIQUE constraints appear as part of
the CREATE TABLE statement, rather than as a separate CREATE INDEX.  I know
it is possible because phpPgAdmin does it!

This change should also be in line with what we have been discussing
earlier, and could be a precursor to getting FOREIGN KEY constraints
appearing as part of CREATE TABLE as well...

Is there any problem with me working on this?

Chris

--
Christopher Kings-Lynne
Family Health Network (ACN 089 639 243)



Re: pg_dump / Unique constraints

From
Philip Warner
Date:
At 15:50 22/11/00 +0800, Christopher Kings-Lynne wrote:
>I've been examining the pg_dump source and output, and I've come to the
>conclusion that I can modify it so that UNIQUE constraints appear as part of
>the CREATE TABLE statement, rather than as a separate CREATE INDEX.
...
>Is there any problem with me working on this?

I actually don't think it's a good idea to force things to work that way. 

Perhaps as an *option*, but even then I'd be inclined to append them as a
series of 'ALTER TABLE ADD CONSTRAINT...' statements.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


RE: pg_dump / Unique constraints

From
"Christopher Kings-Lynne"
Date:
At 15:50 22/11/00 +0800, Christopher Kings-Lynne wrote:
> >I've been examining the pg_dump source and output, and I've come to the
> >conclusion that I can modify it so that UNIQUE constraints
> appear as part of
> >the CREATE TABLE statement, rather than as a separate CREATE INDEX.
> ...
> >Is there any problem with me working on this?
>
> I actually don't think it's a good idea to force things to work that way.

Why, exactly?

What's the difference between this:

--
create table test (a int4,constraint "name" unique (a)
)
--

and this:

--
create table test (a int4
)
create unique index "name" on "test" using btree ( "a" "int4_ops" );
--

I note that when a table is dropped, any unique constraints (in fact all
indices) associated with it are also dropped...

> Perhaps as an *option*, but even then I'd be inclined to append them as a
> series of 'ALTER TABLE ADD CONSTRAINT...' statements.

As far as I can tell, Postgres 7.0.3 only supports adding fk constraints.
The CVS version seems to support adding CHECK constraints, but other than
that, it has to be added as an index.  If you're a database user, it's
conceptually better to see right in your table that you've added a named (or
not) unique constraint, rather than noticing at the bottom of the file that
there's some unique index on one of your columns (IMHO).

Chris



RE: pg_dump / Unique constraints

From
Philip Warner
Date:
At 16:33 22/11/00 +0800, Christopher Kings-Lynne wrote:
>At 15:50 22/11/00 +0800, Christopher Kings-Lynne wrote:
>> >I've been examining the pg_dump source and output, and I've come to the
>> >conclusion that I can modify it so that UNIQUE constraints
>> appear as part of
>> >the CREATE TABLE statement, rather than as a separate CREATE INDEX.
>> ...
>> >Is there any problem with me working on this?
>>
>> I actually don't think it's a good idea to force things to work that way.
>
>Why, exactly?

Having now looked at the code and seen that PK constraints are already
dumped in the table definition, I guess doing unique constraints in the
same way is no worse.

My main concern is that I'd like pg_dump to be able to separate out the
various parts of the schema, and this includes constraints. The ability to
add/drop constraints at any part of the restoration process would be very
nice. The latest incarnations of pg_dump/pg_restore allow people (and
pg_dump/restore) to choose what to restore, and even to define an ordering
for them - and having the constraimts as separate items would be a great
benefit. One example of the problems that I'd like to avoid is in loading
data via INSERT statements - doing:
  Create Table...  Insert many rows...  Add Uniqueness Constraint

is *substantially* faster than INSERTs on a table with constraints already
defined.

At the current time we don't even have a working 'ALTER TABLE...' that
works with all constraint types, so my hopes are probably in vain. I don't
suppose you feel like working on 'ALTER TABLE...ADD/DROP CONSTRAINT...' do
you????


>What's the difference between this:
>
>--
>create table test (
> a int4,
> constraint "name" unique (a)
>)
>--
>
>and this:
>
>--
>create table test (
> a int4
>)
>create unique index "name" on "test" using btree ( "a" "int4_ops" );

The fact that pg_dump/restore will be able to create the index at the end
of the data load.


>
>As far as I can tell, Postgres 7.0.3 only supports adding fk constraints.
>The CVS version seems to support adding CHECK constraints, but other than
>that, it has to be added as an index.

Sounds like a good thing to work on ;-}


>If you're a database user, it's
>conceptually better to see right in your table that you've added a named (or
>not) unique constraint, rather than noticing at the bottom of the file that
>there's some unique index on one of your columns (IMHO).

This is a good argument for modifying the output of '\d' in psql. It is
also probably a valid argument for a new option on pg_dump to specify if
constraints should be kept separate from table definitions. Then we could
also move FK constraints to the end.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: pg_dump / Unique constraints

From
Bruce Momjian
Date:
> At 16:33 22/11/00 +0800, Christopher Kings-Lynne wrote:
> >At 15:50 22/11/00 +0800, Christopher Kings-Lynne wrote:
> >> >I've been examining the pg_dump source and output, and I've come to the
> >> >conclusion that I can modify it so that UNIQUE constraints
> >> appear as part of
> >> >the CREATE TABLE statement, rather than as a separate CREATE INDEX.
> >> ...
> >> >Is there any problem with me working on this?
> >>
> >> I actually don't think it's a good idea to force things to work that way.
> >
> >Why, exactly?
> 
> Having now looked at the code and seen that PK constraints are already
> dumped in the table definition, I guess doing unique constraints in the
> same way is no worse.

I have a good reason not to use UNIQUE.  As I remember, pg_dump creates
the tables, copies in the data, then creates the indexes.  This is much
faster than doing the copy with the indexes already created.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_dump / Unique constraints

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I have a good reason not to use UNIQUE.  As I remember, pg_dump creates
> the tables, copies in the data, then creates the indexes.  This is much
> faster than doing the copy with the indexes already created.

Right, that's the real implementation reason for doing it in two steps.

There's also a more abstract concern: ideally, pg_dump's schema output
should be the same as what the user originally entered.  Converting a
table and separate index declaration into one statement is not any more
correct than doing the reverse.  Thus the real problem here is to know
which way the index got created to begin with.  Currently we do not
know that, because (you guessed it) we have not got a declarative
representation for the UNIQUE constraint, only the execution-oriented
fact that the unique index exists.

My feeling is that there should be a stored indication someplace
allowing us to deduce exactly what caused the index to be created.
An ad-hoc way is to add another field to pg_index, but it might be
cleaner to create a new system catalog that covers all types of
constraint.

The next question is what pg_dump should emit, considering that it has
two conflicting goals: it wants to restore the original state of the
constraint catalog *but* also be efficient about loading data.  ALTER
TABLE ADD CONSTRAINT seems to be an essential requirement there.
But it seems to me that it'd be really whizzy if there were two
different styles of output, one for a full dump (CREATE, load data,
add constraints) and one for schema-only dumps that tries to reproduce
the original table declaration with embedded constraint specs.  That
would be nicer for documentation and editing purposes.
        regards, tom lane


Re: pg_dump / Unique constraints

From
Tom Lane
Date:
I said:
> But it seems to me that it'd be really whizzy if there were two
> different styles of output, one for a full dump (CREATE, load data,
> add constraints) and one for schema-only dumps that tries to reproduce
> the original table declaration with embedded constraint specs.  That
> would be nicer for documentation and editing purposes.

I just had an idea about this, based on the hackery that pg_dump
currently does with triggers: what if there were an ALTER command that
allows disabling and re-enabling constraint checking and index building?
Then the dump script could look like
full CREATE TABLE with all constraints shown
ALTER TABLE DISABLE CONSTRAINTS
COPY data in
ALTER TABLE ENABLE CONSTRAINTS

and there wouldn't have to be any difference between schema and full
dump output for CREATE TABLE.  If we were really brave (foolish?)
the last step could be something like
ALTER TABLE ENABLE CONSTRAINTS NOCHECK

which'd suppress the scan for constraint violations that a normal
ALTER ADD CONSTRAINT would want to do.

It also occurs to me that we should not consider pg_dump as the only
area that needs work to fix this.  Why shouldn't pg_dump simply do
full CREATE TABLE with all constraints shownCREATE all indexes too
-- if not schema dump then:COPY data in

The answer to that of course is that cross-table constraints (like
REFERENCES clauses) must be disabled while loading the data, or the
intermediate states where only some tables have been loaded are likely
to fail.  So we do need some kind of DISABLE CONSTRAINT mode to make
this work.  But it's silly that pg_dump has to go out of its way to
create the indexes last --- if COPY has a performance problem there,
we should be fixing COPY, not requiring pg_dump to contort itself.
Why can't COPY recognize for itself that rebuilding the indexes after
loading data is a better strategy than incremental index update?
(The simplest implementation would restrict this to happen only if the
table is empty when COPY starts, which'd be sufficient for pg_dump.)
        regards, tom lane


Re: pg_dump / Unique constraints

From
Bruce Momjian
Date:
> The answer to that of course is that cross-table constraints (like
> REFERENCES clauses) must be disabled while loading the data, or the
> intermediate states where only some tables have been loaded are likely
> to fail.  So we do need some kind of DISABLE CONSTRAINT mode to make
> this work.  But it's silly that pg_dump has to go out of its way to
> create the indexes last --- if COPY has a performance problem there,
> we should be fixing COPY, not requiring pg_dump to contort itself.
> Why can't COPY recognize for itself that rebuilding the indexes after
> loading data is a better strategy than incremental index update?
> (The simplest implementation would restrict this to happen only if the
> table is empty when COPY starts, which'd be sufficient for pg_dump.)

COPY would have to check to see if the table is already empty.  You can
COPY into a table that already has data.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_dump / Unique constraints

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> Why can't COPY recognize for itself that rebuilding the indexes after
>> loading data is a better strategy than incremental index update?
>> (The simplest implementation would restrict this to happen only if the
>> table is empty when COPY starts, which'd be sufficient for pg_dump.)

> COPY would have to check to see if the table is already empty.

That's what I said ... or intended to say, anyway.  If there's already
data then the tradeoff between incremental update and index rebuild is
not so obvious, and the easiest first implementation would just be to
always do incremental update in that case.  Or we could add an option
to the COPY command to tell it which to do, and let the user do the
guessing ;-)

There'd also be a locking issue, now that I think about it: to do an
index rebuild, we'd have to be sure that no other transaction is adding
data to the table at the same time.  So we'd need to get a stronger lock
than a plain write lock to do it that way.  A COPY option is sounding
better and better...
        regards, tom lane


Re: pg_dump / Unique constraints

From
Bruce Momjian
Date:
My feeling is "Let's walk before we run."  We need psql \dt to show
primary/foreign keys and SERIAL first.


> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> Why can't COPY recognize for itself that rebuilding the indexes after
> >> loading data is a better strategy than incremental index update?
> >> (The simplest implementation would restrict this to happen only if the
> >> table is empty when COPY starts, which'd be sufficient for pg_dump.)
> 
> > COPY would have to check to see if the table is already empty.
> 
> That's what I said ... or intended to say, anyway.  If there's already
> data then the tradeoff between incremental update and index rebuild is
> not so obvious, and the easiest first implementation would just be to
> always do incremental update in that case.  Or we could add an option
> to the COPY command to tell it which to do, and let the user do the
> guessing ;-)
> 
> There'd also be a locking issue, now that I think about it: to do an
> index rebuild, we'd have to be sure that no other transaction is adding
> data to the table at the same time.  So we'd need to get a stronger lock
> than a plain write lock to do it that way.  A COPY option is sounding
> better and better...
> 
>             regards, tom lane
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_dump / Unique constraints

From
Philip Warner
Date:
At 11:34 22/11/00 -0500, Tom Lane wrote:
>
>    full CREATE TABLE with all constraints shown
>
>    ALTER TABLE DISABLE CONSTRAINTS

I think you need something more like:
   SET ALL CONSTRAINTS DISABLED/OFF

since disabling one tables constraints won't work when we have
subselect-in-check (or if it does, then ALTER TABLE <table-name> DISABLE
CONSTRAINTS will be a misleading name). Also, I think FK constraints on
another table that is already loaded will fail until the primary table is
loaded.


>
>and there wouldn't have to be any difference between schema and full
>dump output for CREATE TABLE.

I still see a great deal of value in being able to get a list of 'ALTER
TABLE ADD CONSTRAINT...' statements from pg_dump/restore. 


>If we were really brave (foolish?)
>the last step could be something like
>
>    ALTER TABLE ENABLE CONSTRAINTS NOCHECK

Eek. Won't work for index-based constraints, since they are created anyway.
It *might* be a good idea for huge DBs.


>But it's silly that pg_dump has to go out of its way to
>create the indexes last --- if COPY has a performance problem there,
>we should be fixing COPY, not requiring pg_dump to contort itself.

This is fine for COPY, but doesn't work for data-as-INSERTS.


>Why can't COPY recognize for itself that rebuilding the indexes after
>loading data is a better strategy than incremental index update?

The other aspect of COPY that needs fixing is the ability to specify column
order (I think); from memory that's the reason the regression DB can't be
dumped & loaded. It's also be nice to be able to specify a subset of columns.


>(The simplest implementation would restrict this to happen only if the
>table is empty when COPY starts, which'd be sufficient for pg_dump.)

Does this approach have any implications for recovery/reliability; adding a
row but not updating indexes seems a little dangerous. Or is the plan to
drop the indexes, add the data, and create the indexes?


Stepping back from the discussion for a moment, I am beginning to have
doubts about the approach: having pg_dump put the indexes (and constraints)
at the end of the dump is simple and works in all cases. The only issue,
AFAICT, is generating a single complete table defn for easy-reading. The
suggested solution seems a little extreme (a pg_dump specific hack to COPY,
when there are other more general problems with COPY that more urgently
require attention).

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


RE: pg_dump / Unique constraints

From
"Christopher Kings-Lynne"
Date:
Just a quick question regarding the pg_dump program:

I notice that PRIMARY KEY constraints are currently dumped as:

PRIMARY KEY ("field")

Whereas (to be in line with all the other constraints), it should be dumped
as:

CONSTRAINT "name" PRIMARY KEY ("field")

Otherwise, some poor bugger who went to the trouble of giving his primary
keys custom names will lose them with a dump/restore???

Also, if they have defined a function or trigger that refers to that primary
key by name, won't it fail after a dump/restore? (If the name has changed?)

I'm just asking, because I'm still trying to find something small and
self-contained I can work on!

Chris



ALTER TABLE...ADD CONSTRAINT?

From
Philip Warner
Date:
At 10:21 23/11/00 +0800, Christopher Kings-Lynne wrote:
>
>I'm just asking, because I'm still trying to find something small and
>self-contained I can work on!
>

Is anybody working on:
   alter table <table> add constraint <name> primary key(column,...);

or
   alter table <table> add constraint <name> unique(column,...);

or
   alter table drop constraint

I guess this is not really a small task as it relates to unifying
constraint handling, but for the PK & unique constraints at least, we must
already have code that does the work - all(?) that has to happen is to make
sure the ALTER command calls it...is that right?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


RE: ALTER TABLE...ADD CONSTRAINT?

From
"Christopher Kings-Lynne"
Date:
> Is anybody working on:
>
>     alter table <table> add constraint <name> primary key(column,...);
>
> or
>
>     alter table <table> add constraint <name> unique(column,...);
>
> or
>
>     alter table drop constraint

I'd be more than happy to work on either of the above in the current
implementation, however - I'm not sure it'd be worth it, given that the
constraints system might be up for a reimplementation.

> I guess this is not really a small task as it relates to unifying
> constraint handling, but for the PK & unique constraints at least, we must
> already have code that does the work - all(?) that has to happen
> is to make
> sure the ALTER command calls it...is that right?

That is a thought - can someone point me to the C file that handles CREATE
TABLE so I can see how it's done?  I can't for the life of me find that bit
of code!

Chris



Re: RE: ALTER TABLE...ADD CONSTRAINT?

From
Tom Lane
Date:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> can someone point me to the C file that handles CREATE
> TABLE so I can see how it's done?

backend/parser/analyze.c has the preprocessing (see
transformCreateStmt).  Actual execution starts in
backend/commands/creatinh.c, and there's also important code in
backend/catalog/heap.c.

Plus subroutines scattered here, there, and everywhere :-(.

You really won't get far in reading the PG sources until you have
a tool that will quickly find the definition (and optionally all uses)
of any particular symbol you are interested in.  I'm partial to glimpse,
but you could also use ctags/etags or some other indexing program.
        regards, tom lane