Thread: Teaching pg_dump to use NOT VALID constraints

Teaching pg_dump to use NOT VALID constraints

From
Simon Riggs
Date:
Magnus and I discussed the need for pg_dump to offer the use of NOT
VALID constraints.
Here's the patch.


 pg_dump --no-revalidaton

will add "NOT VALID" onto the recreation SQL for any FKs, but only for
ones that were already known to be valid.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Teaching pg_dump to use NOT VALID constraints

From
Alvaro Herrera
Date:
Simon Riggs wrote:
> Magnus and I discussed the need for pg_dump to offer the use of NOT
> VALID constraints.
> Here's the patch.
> 
> 
>  pg_dump --no-revalidaton
> 
> will add "NOT VALID" onto the recreation SQL for any FKs, but only for
> ones that were already known to be valid.

Well.  Constraints that haven't been validated already have a NOT VALID
emitted by ruleutils.c, yes?  So what this patch does is add such a
clause for all *other* constraints.  Right?  In other words what it aims
to do is speed up loading of data by skipping the validation step on
restore.  Is that right?

ISTM we could have the default pg_dump behavior emit NOT VALID
constraints, and add VALIDATE CONSTRAINT commands at the end; that way
the database is usable sooner but the constraints end up marked as
validated by the time the dump is finished.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Teaching pg_dump to use NOT VALID constraints

From
Simon Riggs
Date:
On 10 November 2014 17:33, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

>>  pg_dump --no-revalidaton
>>
>> will add "NOT VALID" onto the recreation SQL for any FKs, but only for
>> ones that were already known to be valid.
>
> Well.  Constraints that haven't been validated already have a NOT VALID
> emitted by ruleutils.c, yes?  So what this patch does is add such a
> clause for all *other* constraints.  Right?  In other words what it aims
> to do is speed up loading of data by skipping the validation step on
> restore.  Is that right?

Correct. CHECK constraints are added onto main table so they validate at load.

> ISTM we could have the default pg_dump behavior emit NOT VALID
> constraints, and add VALIDATE CONSTRAINT commands at the end; that way
> the database is usable sooner but the constraints end up marked as
> validated by the time the dump is finished.

Yes, may be an even better idea. We'd still want the --no-revalidation
option, AFAICS.

FKs are already "at the end". Perhaps we should add another
"validation" section?

I like the idea, just not sure how long it would take.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Teaching pg_dump to use NOT VALID constraints

From
Jim Nasby
Date:
On 11/10/14, 12:00 PM, Simon Riggs wrote:
> On 10 November 2014 17:33, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
>>>   pg_dump --no-revalidaton
>>>
>>> will add "NOT VALID" onto the recreation SQL for any FKs, but only for
>>> ones that were already known to be valid.
>>
>> Well.  Constraints that haven't been validated already have a NOT VALID
>> emitted by ruleutils.c, yes?  So what this patch does is add such a
>> clause for all *other* constraints.  Right?  In other words what it aims
>> to do is speed up loading of data by skipping the validation step on
>> restore.  Is that right?
>
> Correct. CHECK constraints are added onto main table so they validate at load.
>
>> ISTM we could have the default pg_dump behavior emit NOT VALID
>> constraints, and add VALIDATE CONSTRAINT commands at the end; that way
>> the database is usable sooner but the constraints end up marked as
>> validated by the time the dump is finished.
>
> Yes, may be an even better idea. We'd still want the --no-revalidation
> option, AFAICS.
>
> FKs are already "at the end". Perhaps we should add another
> "validation" section?
>
> I like the idea, just not sure how long it would take.

Isn't the real use-case here that if constraints were valid when you dumped then we shouldn't have to *any* re-validate
whenwe load? (Though, we'd have to be careful of that with CHECK because that can call user code...)
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Teaching pg_dump to use NOT VALID constraints

From
Simon Riggs
Date:
On 13 November 2014 00:20, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

> Isn't the real use-case here that if constraints were valid when you dumped
> then we shouldn't have to *any* re-validate when we load? (Though, we'd have
> to be careful of that with CHECK because that can call user code...)

Yes, that is the use case this patch would improve considerably.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Teaching pg_dump to use NOT VALID constraints

From
Peter Eisentraut
Date:
On 11/13/14 5:07 AM, Simon Riggs wrote:
> On 13 November 2014 00:20, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> 
>> Isn't the real use-case here that if constraints were valid when you dumped
>> then we shouldn't have to *any* re-validate when we load? (Though, we'd have
>> to be careful of that with CHECK because that can call user code...)
> 
> Yes, that is the use case this patch would improve considerably.

But your patch doesn't really address that.  It just leaves the
constraints as invalid, and someone will have to revalidate them later
(how?).  What Jim was describing was a mode that creates the constraints
as valid but doesn't actually validate them.  I can see both sides of
that kind of feature.




Re: Teaching pg_dump to use NOT VALID constraints

From
Alvaro Herrera
Date:
Peter Eisentraut wrote:
> On 11/13/14 5:07 AM, Simon Riggs wrote:
> > On 13 November 2014 00:20, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> > 
> >> Isn't the real use-case here that if constraints were valid when you dumped
> >> then we shouldn't have to *any* re-validate when we load? (Though, we'd have
> >> to be careful of that with CHECK because that can call user code...)
> > 
> > Yes, that is the use case this patch would improve considerably.
> 
> But your patch doesn't really address that.  It just leaves the
> constraints as invalid, and someone will have to revalidate them later
> (how?).  What Jim was describing was a mode that creates the constraints
> as valid but doesn't actually validate them.  I can see both sides of
> that kind of feature.

This might lead to users introducing invalid data by way of declaring
constants as valid but not checked by the system; if they turn out to be
invalid, the final state is a mess.  I would only buy such a feature if
we had some way to pass down the knowledge of the constraint being valid
in the original system through some other means; say emit a CRC of the
copy data in the pg_dump output that can be verified while loading, and
only allow unvalidated constraints to be marked VALID if the sum
matches.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services