Re: Problem with UPDATE and UNIQUE - Mailing list pgsql-general

From Michael Glaesemann
Subject Re: Problem with UPDATE and UNIQUE
Date
Msg-id F2AB0CCC-A90F-43A0-B136-F54E2C147BE9@seespotcode.net
Whole thread Raw
In response to Problem with UPDATE and UNIQUE  ("Frank Millman" <frank@chagford.com>)
Responses Re: Problem with UPDATE and UNIQUE  ("Frank Millman" <frank@chagford.com>)
List pgsql-general
On Aug 22, 2007, at 1:02 , Frank Millman wrote:

> I want to store data in a 'tree' form, with a fixed number of
> levels, so
> that each level has a defined role.

First thought: fixed, predetermined levels, separate tables for each
level. If a more general approach is desired, your options are
generally adjacency list, nested sets, or contrib/ltree. Each has
their own strengths and weaknesses.

> I have the following (simplified) table -
>
> CREATE TABLE treedata (
>   rowid serial primary key,
>   levelno int not null,
>   parentid int references treedata,
>   seq int not null,
>   code varchar not null,
>   description varchar not null
>   );

rowid + parentid looks like adjacency list to me. Note that you're
storing redundant data (the levelno, which can be derived from the
rowid/parentid relationships), which you may want to do for
performance reasons, but does make things more complicated: you're
essentially caching data which brings with it problems of cache
invalidation. In this case, you need to make sure you're updating
levelno whenever it needs to be updated. (Which I'm sure you've
already thought of.)

> To describe each of the levels in the tree, I have the following
> table -
>
> CREATE TABLE treelevels (
>   levelno int primary key,
>   code varchar unique not null,
>   description varchar not null
>   );

Having each level as its own table would make this redundant, but
again, that might not fit with what you're modeling.

> Typical values for this table could be -
>   (0,'Prod','Product code')
>   (1,'Cat','Product category')
>   (2,'*','All products')

This makes me think you'll want to rethink your schema a bit, as
you're mixing different types of data: categories and products. I'd
at least separate this out into a products table and a categories
table. The categories table may in fact still require some kind of
tree structure, but I don't think products belongs as part of it.

CREATE TABLE products
(
     product_code text PRIMARY KEY
     , product_name text NOT NULL UNIQUE
     , product_description TEXT NOT NULL
);

CREATE TABLE categories
(
     category_id INTEGER PRIMARY KEY
     , category_name TEXT NOT NULL UNIQUE
     , category_description TEXT NOT NULL
     , parent_category_id INTEGER NOT NULL
         REFERENCES categories
     , level INTEGER NOT NULL
     , seq INTEGER NOT NULL
);

One difference here is that I have a NOT NULL constraint on
parent_category_id, and I'd define the root as a category_id that has
a parent_category = category_id. You should be able to add a check
constraint that makes sure only level = 0 has parent_category_id =
category_id and a unique index so only one category can have level 0.
Something like:

-- check constraint
CHECK (
     CASE WHEN category_id = parent_category_id
         THEN level = 0
     ELSE category_id <> parent_category_id
     END)
-- unique index
CREATE UNIQUE INDEX categories_root_uniq_idx
ON categories (level)
WHERE level = 0;

These are untested, but I think they should work.

Given that you'd like to determine the order of the items, I think
you might be interested in using nested sets rather than adjacency
list, as this information is automatically encoded for you. This
would look something like:

CREATE FUNCTION valid_nested_interval_bounds(INTEGER, INTEGER)
RETURNS BOOLEAN
IMMUTABLE
LANGUAGE SQL AS $body$
SELECT $1 < $2 -- make sure they're properly ordered
     AND ($2 - $1 - 1) % 2 = 0 -- useful if using the more strict
versions of nested set (where there no gaps)
     -- and you can easily calculate the total number of descendants
     -- from just the upper and lower bounds of the parent node
$body$;

I generally separate the validity test into a separate function as I
often have a couple different nested set hierarchies in the same
database, and it makes the table definition clearer. I believe the
function is inlined in the check constraint,  but I'm not sure. You
can, of course, move the SQL in the function straight into table
definition.

CREATE TABLE categories
(
     , category_name TEXT PRIMARY KEY
     , category_description TEXT NOT NULL
     , category_lower INTEGER NOT NULL UNIQUE -- often called "left"
     , category_upper INTEGER NOT NULL UNIQUE -- often called "right"
     , CHECK (valid_nested_interval_bounds(category_lower,
category_upper))
);

Another advantage of nested sets is that selecting all of the
descendants of a particular node is very easy. The downside of the
nested set model is that inserts and updates are a bit more
complicated and db intensive. If your categories aren't going to
change much, this shouldn't be a problem.

Again, you could add level information into the categories table, but
you'd need to make sure to update it appropriately.

CREATE TABLE category_products
(
     category_name TEXT NOT NULL REFERENCES categories
     , product_code TEXT NOT NULL REFERENCES products
     , UNIQUE (category_name, product_code)
);

My comments below refer to your original design.

> Now for the problem. I want to insert or delete levels dynamically.
> I can
> insert or delete levels in 'treedata' without a problem. However, I
> also
> want to insert or delete a level in 'treelevels'.
>
> Say I want to insert a level between 'code' and 'category' called
> 'group' -
>
> INSERT INTO treelevels VALUES (1,'Group','Product group');

It's a good habit to *always* explicitly list your columns: it's self-
documenting and more robust in the face of schema changes.

> Obviously this will fail with a duplicate levelno. Therefore before
> the
> insert statement I want to do this -
>
> UPDATE treelevels SET levelno = (levelno+1) WHERE levelno >= 1;
>
> The problem is that if there are a number of levels, and they are in
> indeterminate order, I can get duplicate level numbers while the
> command is
> being executed.
>
> My workaround at present is the following -
>
> UPDATE treelevels SET levelno = (levelno+10001) WHERE levelno >= 1;
> UPDATE treelevels SET levelno = (levelno-10000) WHERE levelno >= 1;

This is a general problem with nested sets and your situation where
you're caching the levelno, and you're work around is similar to the
two generally recommended solutions. One is to make updates using an
offset such as what you're doing, and the other is to utilize
negative levels. I'm keen on the latter, as I feel it's a bit more
flexible: you don't need to make sure your offset is large enough.
I'd do something like this:

BEGIN;

UPDATE treelevels
SET levelno = -1 * (levelno + 1)
WHERE levelno >= 1;

INSERT INTO treelevels (levelno, code, description) VALUES (1,
'Group', 'Product group');

UPDATE treelevels
SET levelno = -1 * levelno
WHERE levelno < 0;

COMMIT;

And you can wrap the common operations (insert, move, delete) in
functions for convenience.

Anyway, hope this gives you something to think about.

Michael Glaesemann
grzm seespotcode net



pgsql-general by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: reporting tools
Next
From: "Scott Marlowe"
Date:
Subject: Re: Postgres, fsync and RAID controller with 100M of internal cache & dedicated battery