Re: Efficient sorting the results of a join, without denormalization - Mailing list pgsql-general

From Francisco Olarte
Subject Re: Efficient sorting the results of a join, without denormalization
Date
Msg-id CA+bJJbzJA+3vnLt8w-kKcm0_gvHHjUs_DgkEqSBjzYC8v+ZcEg@mail.gmail.com
Whole thread Raw
In response to Re: Efficient sorting the results of a join, without denormalization  ("Glen M. Witherington" <glen@fea.st>)
Responses Re: Efficient sorting the results of a join, without denormalization  ("Glen M. Witherington" <glen@fea.st>)
List pgsql-general
Hi Glen:

On Sun, May 31, 2015 at 6:43 AM, Glen M. Witherington <glen@fea.st> wrote:
> On Sat, May 30, 2015, at 11:33 PM, David G. Johnston wrote:
>> This is one problem with using made up surrogate keys...
>> The PK of A is a component of both the PK of B and the PK of C but you throw that information away by using serial
fieldsfor PKs instead.  You should have unique indexes on B and C that incorporate the ID from A 
>
> That is quite a strange schema, though isn't it? If you imagine it as
> emails:
>
> C = Emails
> B = Folder
> A = User
>
> Now you're suggesting that even though an email belongs to to a folder,
> which belongs to a user ... each email should also contain contain a
> reference to a user? I guess that's fine, but seems unideal from a
> redundancy perspective

It may seem, and be,  unideal from a redundancy perspective, but keys
are more natural. It means you have user (Glen), folder (Glen, PGlist)
and message (Glen,PGlist,27), different from (Glen,Inbox,27) or (Glen,
PgList,28) or (Francisco,PgList,27) ( Where the 'tuples' I've printed
are the PK values ). This has a lot of advantages, which  you pay for
in other ways, like redundancies, but having composite primary keys
sometimes work in your favor as you can express restrictions with the
relationships and build composite indexes for add hoc queries. In this
case ( an email database ), a serial could be used ( instead of the
name ) for the user and folder PK, but still have very fast, simple
queries from a MUA for things like 'select * from messages where
user_id = <Prefetched_id> and not read order by timestamp desc limit
100'. Also it will help catch things like mismatching folder ids, or
using the user id as folder id, which are easily made when all the
keys are synthetic and meaningless numbers.


As an example, I have a currency table, with it's serial key
currency_id, and a seller table, which sells just a currency and whose
pk is (currency_id+seller_id), and a rate table with rates
(currency_id, rate_id), and an allowed rates table ( to see which
rates a seller can use ), with primay key (currency_id, seller_id,
rate_id) and foreign keys (currency_id, seller_id) and (currency_id,
rate_id) ( it is more or less a classical example. The composite keys
guarantee I can only allow a seller to sell rates on her currency.

I can also, if needed, build unique indexes on any single id ( they
are all serials, as I have no other candidate keys ), if I need them,
but given the access patterns I normally have all of them, and things
like populating a drop box to allow new rates for a seller are very
easy.

Francisco Olarte.


pgsql-general by date:

Previous
From: Francisco Olarte
Date:
Subject: Re: How to retrieve Comment text using SQL, not psql?
Next
From: Bill Moran
Date:
Subject: Re: Efficient sorting the results of a join, without denormalization