Thread: Teaching pg_dump to use NOT VALID constraints
Magnus and I discussed the need for pg_dump to offer the use of NOT VALID constraints. Here's the patch. pg_dump --no-revalidaton will add "NOT VALID" onto the recreation SQL for any FKs, but only for ones that were already known to be valid. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Simon Riggs wrote: > Magnus and I discussed the need for pg_dump to offer the use of NOT > VALID constraints. > Here's the patch. > > > pg_dump --no-revalidaton > > will add "NOT VALID" onto the recreation SQL for any FKs, but only for > ones that were already known to be valid. Well. Constraints that haven't been validated already have a NOT VALID emitted by ruleutils.c, yes? So what this patch does is add such a clause for all *other* constraints. Right? In other words what it aims to do is speed up loading of data by skipping the validation step on restore. Is that right? ISTM we could have the default pg_dump behavior emit NOT VALID constraints, and add VALIDATE CONSTRAINT commands at the end; that way the database is usable sooner but the constraints end up marked as validated by the time the dump is finished. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 10 November 2014 17:33, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> pg_dump --no-revalidaton >> >> will add "NOT VALID" onto the recreation SQL for any FKs, but only for >> ones that were already known to be valid. > > Well. Constraints that haven't been validated already have a NOT VALID > emitted by ruleutils.c, yes? So what this patch does is add such a > clause for all *other* constraints. Right? In other words what it aims > to do is speed up loading of data by skipping the validation step on > restore. Is that right? Correct. CHECK constraints are added onto main table so they validate at load. > ISTM we could have the default pg_dump behavior emit NOT VALID > constraints, and add VALIDATE CONSTRAINT commands at the end; that way > the database is usable sooner but the constraints end up marked as > validated by the time the dump is finished. Yes, may be an even better idea. We'd still want the --no-revalidation option, AFAICS. FKs are already "at the end". Perhaps we should add another "validation" section? I like the idea, just not sure how long it would take. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/10/14, 12:00 PM, Simon Riggs wrote: > On 10 November 2014 17:33, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >>> pg_dump --no-revalidaton >>> >>> will add "NOT VALID" onto the recreation SQL for any FKs, but only for >>> ones that were already known to be valid. >> >> Well. Constraints that haven't been validated already have a NOT VALID >> emitted by ruleutils.c, yes? So what this patch does is add such a >> clause for all *other* constraints. Right? In other words what it aims >> to do is speed up loading of data by skipping the validation step on >> restore. Is that right? > > Correct. CHECK constraints are added onto main table so they validate at load. > >> ISTM we could have the default pg_dump behavior emit NOT VALID >> constraints, and add VALIDATE CONSTRAINT commands at the end; that way >> the database is usable sooner but the constraints end up marked as >> validated by the time the dump is finished. > > Yes, may be an even better idea. We'd still want the --no-revalidation > option, AFAICS. > > FKs are already "at the end". Perhaps we should add another > "validation" section? > > I like the idea, just not sure how long it would take. Isn't the real use-case here that if constraints were valid when you dumped then we shouldn't have to *any* re-validate whenwe load? (Though, we'd have to be careful of that with CHECK because that can call user code...) -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 13 November 2014 00:20, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > Isn't the real use-case here that if constraints were valid when you dumped > then we shouldn't have to *any* re-validate when we load? (Though, we'd have > to be careful of that with CHECK because that can call user code...) Yes, that is the use case this patch would improve considerably. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/13/14 5:07 AM, Simon Riggs wrote: > On 13 November 2014 00:20, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > >> Isn't the real use-case here that if constraints were valid when you dumped >> then we shouldn't have to *any* re-validate when we load? (Though, we'd have >> to be careful of that with CHECK because that can call user code...) > > Yes, that is the use case this patch would improve considerably. But your patch doesn't really address that. It just leaves the constraints as invalid, and someone will have to revalidate them later (how?). What Jim was describing was a mode that creates the constraints as valid but doesn't actually validate them. I can see both sides of that kind of feature.
Peter Eisentraut wrote: > On 11/13/14 5:07 AM, Simon Riggs wrote: > > On 13 November 2014 00:20, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > > > >> Isn't the real use-case here that if constraints were valid when you dumped > >> then we shouldn't have to *any* re-validate when we load? (Though, we'd have > >> to be careful of that with CHECK because that can call user code...) > > > > Yes, that is the use case this patch would improve considerably. > > But your patch doesn't really address that. It just leaves the > constraints as invalid, and someone will have to revalidate them later > (how?). What Jim was describing was a mode that creates the constraints > as valid but doesn't actually validate them. I can see both sides of > that kind of feature. This might lead to users introducing invalid data by way of declaring constants as valid but not checked by the system; if they turn out to be invalid, the final state is a mess. I would only buy such a feature if we had some way to pass down the knowledge of the constraint being valid in the original system through some other means; say emit a CRC of the copy data in the pg_dump output that can be verified while loading, and only allow unvalidated constraints to be marked VALID if the sum matches. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services