Thread: Database design confusing pg_restore, and misc pg_restore issues

Database design confusing pg_restore, and misc pg_restore issues

From
Rick Yorgason
Date:
Hey everyone,

I run a website that sells videogames, and different games have
different registration systems, so I have a database design that goes
something like this:

> registration_type enum('none', 'regtype1', 'regtype2')
>
> products(product_id, registration_type)
>
> order_item(order_id, product_id, check(order_item_has_reginfo(order_id, product_id)))
>
> regtype1_reginfo(order_id, product_id, misc rows)
>
> regtype2_reginfo(order_id, product_id, orthogonally misc rows)
>
> function order_item_has_reginfo(text, text) returns boolean as $$
>     select exists(
>         select 1 from products where product_id = $2
>         and (
>             (reg_type = 'none')
>             or (reg_type = 'regtype1' and (select exists(select 1 from regtype1_reginfo where order_id = $1 and
product_id= $2))) 
>              or (reg_type = 'regtype2' and (select exists(select 1 from regtype2_reginfo where order_id = $1 and
product_id= $2))) 
>          )
>     )
>     $$ LANGUAGE 'SQL';

In other words, (order_id, product_id) of order_item is a foreign key to
either reginfo1, reginfo2, or nothing, depending on which product it is.

The works really well, until I try to use pg_dump/pg_restore, because it
attempts to restore order_items before the reginfo tables.  To get it to
work properly, I need to load the schema, disable the check, load the
data, then re-enable the check.

I'm interested in either a more painless way of importing backups, or a
better design.

Incidentally, using --disable-triggers didn't disable checks, and
--use-list didn't seem to actually work on my dev machine (Vista x64);
it just pretends like everything went fine, without inserting any data.
  Here's what PowerShell prints out:

> PS D:\projects\backup> & 'C:\Program Files (x86)\PostgreSQL\8.4\bin\pg_restore.exe' -v --use-list=backup.list -U
blahblah-p 9999 -C backup.db 
> --
> -- PostgreSQL database dump
> --
>
> -- Started on 2010-05-07 22:22:02
>
> SET statement_timeout = 0;
> SET client_encoding = 'UTF8';
> SET standard_conforming_strings = off;
> SET check_function_bodies = false;
> SET client_min_messages = warning;
> SET escape_string_warning = off;
>
> -- Completed on 2010-05-08 01:15:01
>
> --
> -- PostgreSQL database dump complete
> --
>
> pg_restore.exe : pg_restore: implied data-only restore
> At line:1 char:2
> + & <<<<  'C:\Program Files (x86)\PostgreSQL\8.4\bin\pg_restore.exe' -v --use-list=backup.list -U blahblah -p 9999 -C
backup.db
>     + CategoryInfo          : NotSpecified: (pg_restore: implied data-only restore:String) [], RemoteException
>     + FullyQualifiedErrorId : NativeCommandError

Thanks for your help,

-Rick-

Rick Yorgason <rick@longbowgames.com> writes:
> In other words, (order_id, product_id) of order_item is a foreign key to
> either reginfo1, reginfo2, or nothing, depending on which product it is.

I think you'll find that few people regard that as good database design.

> The works really well, until I try to use pg_dump/pg_restore, because it
> attempts to restore order_items before the reginfo tables.  To get it to
> work properly, I need to load the schema, disable the check, load the
> data, then re-enable the check.

Well, you can hardly expect pg_dump to intuit that there's a dependency
there; it understands nothing about the behavior of that SQL function.
Since you say that --disable-triggers doesn't help, I guess that you're
applying that function not in a trigger but in a CHECK constraint?
That's pretty horrid in itself: CHECK is *not* meant to enforce anything
except local properties of the newly inserted/updated row itself.
Aside from the ordering problems that you've already run into some of,
consider what happens when the referenced row gets deleted.  (Hint:
nothing.)

Sure you can't find a way to unify reginfo1/reginfo2 into one table?
If you have some side information that doesn't fit conveniently into
that table, maybe making an auxiliary table that's foreign-keyed to
the master reginfo table would help.  But you really need a structure
that allows you to declare the order_item table with a regular foreign
key for reginfo.  Foreign keys are not something you can cobble together
from spare parts --- a correct, robust implementation requires magic
that is just not available at the user level in SQL.

            regards, tom lane

Re: Database design confusing pg_restore, and misc pg_restore issues

From
Rick Yorgason
Date:
On 08/05/2010 10:33 PM, Tom Lane wrote:
> Since you say that --disable-triggers doesn't help, I guess that you're
> applying that function not in a trigger but in a CHECK constraint?
> That's pretty horrid in itself: CHECK is *not* meant to enforce anything
> except local properties of the newly inserted/updated row itself.
> Aside from the ordering problems that you've already run into some of,
> consider what happens when the referenced row gets deleted.  (Hint:
> nothing.)

Luckily, they never get deleted :)

Okay, well, I guess one solution is to replace the checks with triggers
on all tables involved.  That's not pretty, and really doesn't express
the concept of a constraint very clearly, but I guess it would work.

> Sure you can't find a way to unify reginfo1/reginfo2 into one table?
> If you have some side information that doesn't fit conveniently into
> that table, maybe making an auxiliary table that's foreign-keyed to
> the master reginfo table would help.  But you really need a structure
> that allows you to declare the order_item table with a regular foreign
> key for reginfo.

So, your first suggestion would look like this:

> reginfo(order_id, product_id, reginfo1_columns, reginfo2_columns, FOREIGN KEY(order_id, product_id) REFERENCES
order_items)

For the sake of illustration, let's say that order_item's foreign key to
this table is NOT NULL.

So, if the product in question uses regtype1, then the reginfo2 columns
are NULL, and vice versa.  If the product doesn't use any registration,
then both the reginfo1 and reginfo2 columns are NULL.

The problem is, how do I express that requirement in a constraint?  And
without updating the schema every time I add a new product?


Your second suggestion would look like this:

> reginfo(order_id, product_id, FOREIGN KEY(order_id, product_id) REFERENCES order_items)
>
> reginfo1(order_id, product_id, reginfo1_columns, FOREIGN KEY(order_id, product_id) REFERENCES reginfo)
>
> reginfo2(order_id, product_id, reginfo2_columns, FOREIGN KEY(order_id, product_id) REFERENCES reginfo)

Well, at that point, the reginfo table is redundant, and the reginfo1
and reginfo2 tables may as well reference order_items directly, which is
exactly what I have, minus my problematic constraint.

My assumption is that most people would simply give up and assume that
this constraint is too difficult to express in SQL, and just rely on the
business logic never being wrong.  I was hoping that wasn't the case :)

Thanks,

-Rick-

Re: Database design confusing pg_restore, and misc pg_restore issues

From
Alban Hertroys
Date:
On 9 May 2010, at 6:49, Rick Yorgason wrote:

> So, your first suggestion would look like this:
>
>> reginfo(order_id, product_id, reginfo1_columns, reginfo2_columns, FOREIGN KEY(order_id, product_id) REFERENCES
order_items)
>
> For the sake of illustration, let's say that order_item's foreign key to this table is NOT NULL.
>
> So, if the product in question uses regtype1, then the reginfo2 columns are NULL, and vice versa.  If the product
doesn'tuse any registration, then both the reginfo1 and reginfo2 columns are NULL. 
>
> The problem is, how do I express that requirement in a constraint?  And without updating the schema every time I add
anew product? 

You can do that with a CHECK constraint: CHECK (reginfo1 IS NULL OR reginfo2 IS NULL).

I see you're talking about multiple columns for both now though. Considering that you can add them as columns to the
reginfotable you could change that into a 1:1 relation with their own respective tables and change reginfo.reginfo1 and
reginfo.reginfo2into nullable foreign keys. 

> Your second suggestion would look like this:
>
>> reginfo(order_id, product_id, FOREIGN KEY(order_id, product_id) REFERENCES order_items)
>>
>> reginfo1(order_id, product_id, reginfo1_columns, FOREIGN KEY(order_id, product_id) REFERENCES reginfo)
>>
>> reginfo2(order_id, product_id, reginfo2_columns, FOREIGN KEY(order_id, product_id) REFERENCES reginfo)
>
> Well, at that point, the reginfo table is redundant, and the reginfo1 and reginfo2 tables may as well reference
order_itemsdirectly, which is exactly what I have, minus my problematic constraint. 

A constraint like that can't be expressed directly, as you mention, but writing BEFORE INSERT and UPDATE triggers that
performthe necessary checks and return NULL if they fail (and raise an error of course) isn't that difficult. 

> My assumption is that most people would simply give up and assume that this constraint is too difficult to express in
SQL,and just rely on the business logic never being wrong.  I was hoping that wasn't the case :) 


The business logic "always" gets it wrong at some point, if it weren't just because business logic tends to perform the
sameaction on data from multiple code paths. That's not (necessarily) bad design in the business logic, it's just that
thetranslation from user interface to data objects often isn't a straight one. I wouldn't be surprised if your
differenttypes of orders originate from different locations in the user interface, for example. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4be6782910411440911937!