Thread: Sorting by related tables

Sorting by related tables

From
Bill Moseley
Date:
I have a few beginner questions about using related tables for
sorting.

create table region {
    id          SERIAL PRIMARY KEY,
    name        text,
    -- order this table should be sorted in
    -- a "1" is the top sort level
    sort_order  integer
);

create table city {
    id      SERIAL PRIMARY KEY,
    name    text,
    region  integer REFERENCES region
);

I want a way to adjust the sort order of the "region" table ("move
item up" or "move item down" in a web interface) without requiring
knowledge of the existing "sort_order" for the rows in the region
table.  (i.e.  requiring them to already be in an order).

Here's my "move up" (which is a lower sort_order value) statement.
(__TABLE__ is "region" and ? are $\d bind parameters)

    UPDATE __TABLE__
    SET sort_order =
        CASE
            -- subtract one from the item's sort, unless it's already "1"
            WHEN id = ? AND sort_order > 1 THEN sort_order-1

            -- for other items that are greater or equal to sort-1
            WHEN id != ? AND sort_order >= (select sort_order from __TABLE__ where id = ?)-1
                    THEN sort_order+1

            -- all others, leave alone
            ELSE sort_order
        END;

This works reasonably well for small tables, but doesn't scale and the
logic likely has holes.  And behavior when adding new rows to the
region table is not defined.

1) How do most people do this?  Use linked lists?

    create table region {
        id          SERIAL PRIMARY KEY
        name        text,
        list_head   boolean, -- flag if this is the head of the linked list
        next        integer REFERENCES region
    );

2) As a SQL beginner, I'm not seeing how to display rows from "city"
sorted in order based on the order in the "region" table.

3) Oh, and I have also this for checking IF there are items in
"region" that are "above" the item in question -- to see IF an item
can or cannot be moved up in the sort order relative to others.

    SELECT id FROM __TABLE__
        WHERE
            sort_order <= (SELECT sort_order FROM __TABLE__ WHERE id = ?)
             AND id != ?;

If that returns any rows then I know I can call the UPDATE to move the
item up.

Again, a very basic question: What method should be used to be sure
that nothing changes between the SELECT and the UPDATE?



--
Bill Moseley
moseley@hank.org


Re: Sorting by related tables

From
Andreas Seltenreich
Date:
Bill Moseley schrob:

> create table region {
>     id          SERIAL PRIMARY KEY,
>     name        text,
>     -- order this table should be sorted in
>     -- a "1" is the top sort level
>     sort_order  integer
> );
>
> create table city {
>     id      SERIAL PRIMARY KEY,
>     name    text,
>     region  integer REFERENCES region
> );
>
> I want a way to adjust the sort order of the "region" table ("move
> item up" or "move item down" in a web interface) without requiring
> knowledge of the existing "sort_order" for the rows in the region
> table.  (i.e.  requiring them to already be in an order).
>
> Here's my "move up" (which is a lower sort_order value) statement.
> (__TABLE__ is "region" and ? are $\d bind parameters)
>
>     UPDATE __TABLE__
>     SET sort_order =
>         CASE
>             -- subtract one from the item's sort, unless it's already "1"
>             WHEN id = ? AND sort_order > 1 THEN sort_order-1
>
>             -- for other items that are greater or equal to sort-1
>             WHEN id != ? AND sort_order >= (select sort_order from __TABLE__ where id = ?)-1
>                     THEN sort_order+1
>
>             -- all others, leave alone
>             ELSE sort_order
>         END;
>
> This works reasonably well for small tables, but doesn't scale and the
> logic likely has holes.  And behavior when adding new rows to the
> region table is not defined.

I guess your approach of maintaining a special attribute for the
custom sort order could be quite fast when using floating point
numbers instead of integers. You then could easily move a record
around without having to update _every_ other record by using proper
fractions. E.g. to move a record A between B and C, just update its
sort_order to (B.sort_order + C.sort_order) / 2.

However, with IEEE754-floats this is only guaranteed to work 1023
times in the worst case when using double precision and "seeding" the
sort_order with integers. So one would have to be careful and
"normalize" the column from time to time.

> 1) How do most people do this?  Use linked lists?

I haven't had this problem yet, so I don't know if there's a standard
answer.

>     create table region {
>         id          SERIAL PRIMARY KEY
>         name        text,
>         list_head   boolean, -- flag if this is the head of the linked list
>         next        integer REFERENCES region
>     );

The "problem" with linked lists is that they don't fit into the
relational model well, and since SQL isn't turing-complete by design,
you'd have to write a proper function with one of the procedural
languages PostgreSQL offers to iterate the list. Searching the list
archives should yield some examples.

> 2) As a SQL beginner, I'm not seeing how to display rows from "city"
> sorted in order based on the order in the "region" table.

IMHO a simple join should do here, e.g.

  select city.name from city, region
         where region.id = city.region
         order by region.sort_order

Joining is explained here:
<http://www.postgresql.org/docs/8.0/static/queries-table-expressions.html>

> 3) Oh, and I have also this for checking IF there are items in
> "region" that are "above" the item in question -- to see IF an item
> can or cannot be moved up in the sort order relative to others.
>
>     SELECT id FROM __TABLE__
>         WHERE
>             sort_order <= (SELECT sort_order FROM __TABLE__ WHERE id = ?)
>              AND id != ?;
>
> If that returns any rows then I know I can call the UPDATE to move the
> item up.

I guess you want a boolean value here? SELECT EXISTS around your above
query as a subselect should do the trick. You also want to use LIMIT 1
in the statement, to avoid fetching unnecessary records.

> Again, a very basic question: What method should be used to be sure
> that nothing changes between the SELECT and the UPDATE?

You can achieve that using transactions. Concurrency control is
explained here: <http://www.postgresql.org/docs/8.0/static/mvcc.html>.

regards
Andreas
--

Re: Sorting by related tables

From
Bill Moseley
Date:
On Sat, Aug 13, 2005 at 06:44:09PM +0200, Andreas Seltenreich wrote:
> > 3) Oh, and I have also this for checking IF there are items in
> > "region" that are "above" the item in question -- to see IF an item
> > can or cannot be moved up in the sort order relative to others.
> >
> >     SELECT id FROM __TABLE__
> >         WHERE
> >             sort_order <= (SELECT sort_order FROM __TABLE__ WHERE id = ?)
> >              AND id != ?;
> >
> > If that returns any rows then I know I can call the UPDATE to move the
> > item up.
>
> I guess you want a boolean value here? SELECT EXISTS around your above
> query as a subselect should do the trick. You also want to use LIMIT 1
> in the statement, to avoid fetching unnecessary records.

Is there much of a difference between using LIMIT 1 and using an
EXISTS subselect?  Frankly, I'm not clear what you are specifically
suggestion with EXISTS.  I'm using Perl's Class::DBI object mapping module so
returning a single row is an easy way to check this as a boolean
result in Perl.

> > Again, a very basic question: What method should be used to be sure
> > that nothing changes between the SELECT and the UPDATE?
>
> You can achieve that using transactions. Concurrency control is
> explained here: <http://www.postgresql.org/docs/8.0/static/mvcc.html>.

My comment was that I want to do the above SELECT and then *only* do
an UPDATE if the SELECT returns at least one row.

So, I should do:

  SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Before the SELECT.   And when I UPDATE I need to be prepared to do a
ROLLBACK if I get an error and repeat the process.  (And, I assume,
take some precaution to give up after some number of tries.)

Does that seem reasonable?

--
Bill Moseley
moseley@hank.org


Re: Sorting by related tables

From
Andreas Seltenreich
Date:
Bill Moseley schrob:

> On Sat, Aug 13, 2005 at 06:44:09PM +0200, Andreas Seltenreich wrote:
>> > 3) Oh, and I have also this for checking IF there are items in
>> > "region" that are "above" the item in question -- to see IF an item
>> > can or cannot be moved up in the sort order relative to others.
>> >
>> >     SELECT id FROM __TABLE__
>> >         WHERE
>> >             sort_order <= (SELECT sort_order FROM __TABLE__ WHERE id = ?)
>> >              AND id != ?;
>> >
>> > If that returns any rows then I know I can call the UPDATE to move the
>> > item up.
>>
>> I guess you want a boolean value here? SELECT EXISTS around your above
>> query as a subselect should do the trick. You also want to use LIMIT 1
>> in the statement, to avoid fetching unnecessary records.
>
> Is there much of a difference between using LIMIT 1 and using an
> EXISTS subselect?

LIMIT 1 does reduce the cost, EXISTS AFAIK only makes the result
boolean and doesn't stop the execution of the subselect by itself when
the first record is found.

> Frankly, I'm not clear what you are specifically
> suggestion with EXISTS.  I'm using Perl's Class::DBI object mapping module so
> returning a single row is an easy way to check this as a boolean
> result in Perl.

Uups, this wasn't question number three yet, and I wrongly inferred
from your uppercase-ifs that you wanted a boolean result here :-/

>> > Again, a very basic question: What method should be used to be sure
>> > that nothing changes between the SELECT and the UPDATE?
>>
>> You can achieve that using transactions. Concurrency control is
>> explained here: <http://www.postgresql.org/docs/8.0/static/mvcc.html>.
>
> My comment was that I want to do the above SELECT and then *only* do
> an UPDATE if the SELECT returns at least one row.
>
> So, I should do:
>
>   SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
>
> Before the SELECT.   And when I UPDATE I need to be prepared to do a
> ROLLBACK if I get an error and repeat the process.  (And, I assume,
> take some precaution to give up after some number of tries.)
>
> Does that seem reasonable?

This would be one possibility. If you don't want your application to
deal with transactions being aborted because of non-serializable
transactions, you could alternatively use explicit locking (SELECT ...
FOR UPDATE) combined with the Read Committed isolation level (the
default).

Explicit locking is documented here:
<http://www.postgresql.org/docs/8.0/static/explicit-locking.html#LOCKING-ROWS>

regards
Andreas
--

Re: Sorting by related tables

From
Bill Moseley
Date:
On Mon, Aug 15, 2005 at 11:30:32PM +0200, Andreas Seltenreich wrote:
>
> This would be one possibility. If you don't want your application to
> deal with transactions being aborted because of non-serializable
> transactions, you could alternatively use explicit locking (SELECT ...
> FOR UPDATE) combined with the Read Committed isolation level (the
> default).

SELECT FOR UPDATE locks just the rows that are selected, right?  If I
understand correctly, that would not work for my case because I'm
updating different rows than I'm selecting.

My tables are small, so I'm thinking of just manually updating all the
rows in sequence to adjust the order when needed -- to make things a
bit more simple.  But it is a problem that I am curious about how best
to solve in a scalable way.

Thanks very much for your feedback.

--
Bill Moseley
moseley@hank.org