Thread: Primary Key

Primary Key

From
João Paulo Zavanela
Date:
Hello,

How many fields is recomended to create a primary key?
I'm thinking to create one with 6 fields, is much?

Thanks!



Re: Primary Key

From
Sam Mason
Date:
On Fri, Nov 16, 2007 at 03:50:20PM -0300, Jo??o Paulo Zavanela wrote:
> How many fields is recomended to create a primary key?
> I'm thinking to create one with 6 fields, is much?

Normally a primary key would just be a single column.  When you start
going to that many I'd probably have a serial column as the primary key,
and a UNIQUE index on those six fields.  Depends on what you're doing,
though unless you've got a few years experience I'd be tempted to stay
away from primary keys of more than a single column.


  Sam

Re: Primary Key

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 16 Nov 2007 18:54:22 +0000
Sam Mason <sam@samason.me.uk> wrote:

> On Fri, Nov 16, 2007 at 03:50:20PM -0300, Jo??o Paulo Zavanela wrote:
> > How many fields is recomended to create a primary key?
> > I'm thinking to create one with 6 fields, is much?
> 
> Normally a primary key would just be a single column.

Uhmm no. Normally a primary key is defined by the number of columns it
takes to determine naturally distinct values.

I would agree that if he is looking at 6 then he probably needs to
normalize further.

Sincerely,

Joshua D. Drake


- -- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHPej7ATb/zqfZUUQRAjnAAJ4p8mH685V/ysBDd4sPLoUkDRSzfACffFhs
AkUg+htYb3Nv2wpN/0ecod8=
=hnSU
-----END PGP SIGNATURE-----

Re: Primary Key

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/16/07 12:50, João Paulo Zavanela wrote:
> Hello,
>
> How many fields is recomended to create a primary key?
> I'm thinking to create one with 6 fields, is much?

The number of recommended fields is the *minimum* number required
for uniqueness.  1 or 6 or 24.  Doesn't matter.

- --
Ron Johnson, Jr.
Jefferson LA  USA

%SYSTEM-F-FISH, my hovercraft is full of eels
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHPej/S9HxQb37XmcRAo5mAJoDwp1E+aL2M/oTWhOsR5XYJi0AhgCgvUgU
/ZB7nP+K6j0WW8vNn5Q8tFI=
=s+zS
-----END PGP SIGNATURE-----

Re: Primary Key

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/16/07 12:54, Sam Mason wrote:
> On Fri, Nov 16, 2007 at 03:50:20PM -0300, Jo??o Paulo Zavanela wrote:
>> How many fields is recomended to create a primary key?
>> I'm thinking to create one with 6 fields, is much?
>
> Normally a primary key would just be a single column.  When you start
> going to that many I'd probably have a serial column as the primary key,
> and a UNIQUE index on those six fields.  Depends on what you're doing,
> though unless you've got a few years experience I'd be tempted to stay
> away from primary keys of more than a single column.

Fie on you evil synthetic key lovers.  Long live the Natural Key!

- --
Ron Johnson, Jr.
Jefferson LA  USA

%SYSTEM-F-FISH, my hovercraft is full of eels
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHPel7S9HxQb37XmcRAmPZAJ9mml45XtPRSCrAiQ7K+LCwWf5J5QCgvaSp
Zs5F1SU5CNQvnoofaxkAoIY=
=9ipR
-----END PGP SIGNATURE-----

Re: Primary Key

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 16 Nov 2007 13:03:23 -0600
Ron Johnson <ron.l.johnson@cox.net> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 11/16/07 12:54, Sam Mason wrote:
> > On Fri, Nov 16, 2007 at 03:50:20PM -0300, Jo??o Paulo Zavanela
> > wrote:
> >> How many fields is recomended to create a primary key?
> >> I'm thinking to create one with 6 fields, is much?
> > 
> > Normally a primary key would just be a single column.  When you
> > start going to that many I'd probably have a serial column as the
> > primary key, and a UNIQUE index on those six fields.  Depends on
> > what you're doing, though unless you've got a few years experience
> > I'd be tempted to stay away from primary keys of more than a single
> > column.
> 
> Fie on you evil synthetic key lovers.  Long live the Natural Key!

Right with you there buddy. Let's get the pitchforks!

Joshua D. Drake

> 
> - --
> Ron Johnson, Jr.
> Jefferson LA  USA
> 
> %SYSTEM-F-FISH, my hovercraft is full of eels
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> 
> iD8DBQFHPel7S9HxQb37XmcRAmPZAJ9mml45XtPRSCrAiQ7K+LCwWf5J5QCgvaSp
> Zs5F1SU5CNQvnoofaxkAoIY=
> =9ipR
> -----END PGP SIGNATURE-----
> 
> ---------------------------(end of
> broadcast)--------------------------- TIP 2: Don't 'kill -9' the
> postmaster
> 


- -- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHPetVATb/zqfZUUQRAp29AJ49UhU3LWyaIJq+F/vCkrZIMBIi8QCdEesA
EqTIjelyYrqR8+JN1Y3sPwM=
=AlLT
-----END PGP SIGNATURE-----

Re: Primary Key

From
Sam Mason
Date:
  ( Fi Fie Foe Fum, I smell the blood of a religious war )

On Fri, Nov 16, 2007 at 01:03:23PM -0600, Ron Johnson wrote:
> On 11/16/07 12:54, Sam Mason wrote:
> > On Fri, Nov 16, 2007 at 03:50:20PM -0300, Jo??o Paulo Zavanela wrote:
> >> How many fields is recomended to create a primary key?
> >> I'm thinking to create one with 6 fields, is much?
> >
> > Normally a primary key would just be a single column.  When you start
> > going to that many I'd probably have a serial column as the primary key,
> > and a UNIQUE index on those six fields.  Depends on what you're doing,
> > though unless you've got a few years experience I'd be tempted to stay
> > away from primary keys of more than a single column.
>
> Fie on you evil synthetic key lovers.  Long live the Natural Key!

Really?  I started off with everything using sequences and everything
was good.  Then I found I wanted to do more complicated things so I
started to transition to natural keys and things were better.  Then
I took things too far and wanted something artificial back in my
life.  I'm back to almost never using natural keys now, mainly because
interfacing with the outside world gets too complicated.  When I'm just
doing stuff inside the database then I can use natural keys and all is
good, otherwise things just get too complicated.

I'll probably look back in another few years and remember how young and
naive I was back now.


  Sam

Re: Primary Key

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 16 Nov 2007 20:00:29 +0000
Sam Mason <sam@samason.me.uk> wrote:


> > > Normally a primary key would just be a single column.  When you
> > > start going to that many I'd probably have a serial column as the
> > > primary key, and a UNIQUE index on those six fields.  Depends on
> > > what you're doing, though unless you've got a few years
> > > experience I'd be tempted to stay away from primary keys of more
> > > than a single column.
> > 
> > Fie on you evil synthetic key lovers.  Long live the Natural Key!
> 
> Really?  I started off with everything using sequences and everything
> was good.  Then I found I wanted to do more complicated things so I
> started to transition to natural keys and things were better.  Then
> I took things too far and wanted something artificial back in my

When that is needed I do this:

create table foo(id serial unique, a text, b text, primary (a,b));

Joshua D. Drake

> life.  I'm back to almost never using natural keys now, mainly because
> interfacing with the outside world gets too complicated.  When I'm
> just doing stuff inside the database then I can use natural keys and
> all is good, otherwise things just get too complicated.
> 
> I'll probably look back in another few years and remember how young
> and naive I was back now.
> 
> 
>   Sam
> 
> ---------------------------(end of
> broadcast)--------------------------- TIP 6: explain analyze is your
> friend
> 


- -- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHPfhWATb/zqfZUUQRArjCAJ90cXYwuw/A/yojQanj1gv+RJqJnQCdFvPv
6nGph8K57KcKtk1rTgfFSFg=
=fQ5l
-----END PGP SIGNATURE-----

Re: Primary Key

From
Sam Mason
Date:
On Fri, Nov 16, 2007 at 12:06:46PM -0800, Joshua D. Drake wrote:
> When that is needed I do this:
>
> create table foo(id serial unique, a text, b text, primary (a,b));

Humm, so the other way around from what I've ended up doing.  I'll need
to think about the implications of changing things around like this.
There are lots of things that seems as though they'll be pretty awkard
to do, I'm sure it's just because I haven't thought about it enough.


  Sam

Re: Primary Key

From
"Merlin Moncure"
Date:
On Nov 16, 2007 3:21 PM, Sam Mason <sam@samason.me.uk> wrote:
> On Fri, Nov 16, 2007 at 12:06:46PM -0800, Joshua D. Drake wrote:
> > When that is needed I do this:
> >
> > create table foo(id serial unique, a text, b text, primary (a,b));
>
> Humm, so the other way around from what I've ended up doing.  I'll need
> to think about the implications of changing things around like this.
> There are lots of things that seems as though they'll be pretty awkard
> to do, I'm sure it's just because I haven't thought about it enough.

there is a lot of nuance to this debate and tons of hyperbole on both
sides.  There are many side effects, pro and con, about choosing
'natural' keys vs. surrogates.  josh's suggestion is the most
reasonable compromise, because it allows you to get the performance
benefits (which are sometimes overrated) when you need it, but still
forces you to think about how your data is _truly_ organized and what
makes each record unique.  the sad fact is that sequences have made
developers lazy, not giving much thought to proper normalization
strategies which in turn often produces lousy databases.  if you know
how to do things properly, you will know what we mean.

merlin

Re: Primary Key

From
David Fetter
Date:
On Fri, Nov 16, 2007 at 12:06:46PM -0800, Joshua D. Drake wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Fri, 16 Nov 2007 20:00:29 +0000
> Sam Mason <sam@samason.me.uk> wrote:
>
>
> > > > Normally a primary key would just be a single column.  When
> > > > you start going to that many I'd probably have a serial column
> > > > as the primary key, and a UNIQUE index on those six fields.
> > > > Depends on what you're doing, though unless you've got a few
> > > > years experience I'd be tempted to stay away from primary keys
> > > > of more than a single column.
> > >
> > > Fie on you evil synthetic key lovers.  Long live the Natural
> > > Key!
> >
> > Really?  I started off with everything using sequences and
> > everything was good.  Then I found I wanted to do more complicated
> > things so I started to transition to natural keys and things were
> > better.  Then I took things too far and wanted something
> > artificial back in my
>
> When that is needed I do this:
>
> create table foo(id serial unique, a text, b text, primary (a,b));

By itself, this insufficiently restricts what's going on in the table.
I'd recommend a TRIGGER that disallows updating the synthetic key.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Primary Key

From
Sam Mason
Date:
On Fri, Nov 16, 2007 at 04:41:42PM -0500, Merlin Moncure wrote:
> On Nov 16, 2007 3:21 PM, Sam Mason <sam@samason.me.uk> wrote:
> > On Fri, Nov 16, 2007 at 12:06:46PM -0800, Joshua D. Drake wrote:
> > > When that is needed I do this:
> > >
> > > create table foo(id serial unique, a text, b text, primary (a,b));
> >
> > Humm, so the other way around from what I've ended up doing.  I'll need
> > to think about the implications of changing things around like this.
> > There are lots of things that seems as though they'll be pretty awkard
> > to do, I'm sure it's just because I haven't thought about it enough.
>
> there is a lot of nuance to this debate and tons of hyperbole on both
> sides.  There are many side effects, pro and con, about choosing
> 'natural' keys vs. surrogates.  josh's suggestion is the most
> reasonable compromise, because it allows you to get the performance
> benefits (which are sometimes overrated) when you need it,

I'm not sure if performance has ever really come into the decision about
whether to use natural/surrogate keys with me.  The main reason for
using a surrogate key is simplicity; I don't trust myself to maintain a
large database where every relationship is composed of multiple columns.

If I could say somewhere that I want a set of properties to hold (i.e.
there is a 1-to-1 relationship between these tables, there's at most one
row in this table for each of these, etc) and then these constraints
were checked when I actually wrote my queries I'd be much happier.

For example, given the tables:

  CREATE TABLE foo (
    id INTEGER NOT NULL PRIMARY KEY,
    description TEXT );
  CREATE TABLE bar (
    key INTEGER NOT NULL PRIMARY KEY,
    fooid INTEGER NOT NULL REFERENCES foo );

I'd like to be able to write the query:

  SELECT b.key, f.description
  FROM bar b, foo f
  WHERE b.fooid = f.id;

And be able to say that I expect exactly one row for each bar.key.
The database would be able to go back over the definitions and prove
that this constraint holds (because bar.key and foo.id are UNIQUE, the
FOREIGN KEY constraint has checked that bar.fooid always references a
valid foo.id, and that bar.fooid can never be NULL).

Or is this the sort of thing that materialised views are good for, and
I've always just been thinking about them as a performance hack.

> the sad fact is that sequences have made
> developers lazy, not giving much thought to proper normalization
> strategies which in turn often produces lousy databases.  if you know
> how to do things properly, you will know what we mean.

"Properly" is very open ended.  Most people will try to do their best
job (given various external constraints) and we've all experienced bad
design, if only from stuff that we did while learning.  I think I've
experienced this, but you've probably got a very different idea about
what "properly" means than I do.


  Sam

Re: Primary Key

From
Greg Smith
Date:
On Fri, 16 Nov 2007, Merlin Moncure wrote:

> the sad fact is that sequences have made developers lazy

Nah, developers were lazy long before that.  If you ask Larry Wall it's a
virtue.

I gave up on this argument ten years ago after a long battle with
well-known natural key zealot Joe Celko wore me out.  He published one of
his many articles making a case for them using an example from the
automotive industry.  Only problem was, the unique identifier he suggested
wasn't.  At the auto parts company I worked for, I had just spent many
monotonous days contorting keys to work around a problem caused by the
original designer there, who misunderstood some nuances of how the "Big
Three" auto manufacturers assigned part numbers the same way Celko did.

He doesn't use that example anymore but still misses the point I tried to
make.  The ability of the world to invalidate the assumptions that go into
natural key assignment are really impressive.  I particularly enjoy that
so many systems are built presuming that the Social Security number for a
person is involatile that this topic comes up in their FAQ about identify
theft:  http://www.socialsecurity.gov/pubs/10064.html

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Need help with complicated SQL statement

From
Ted Byers
Date:
Please consider the following statement (it becomes
obvious if you remember the important thing about the
table is that it has columns for each of stock_id,
price_date, and price).

(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS T2
ORDER BY T2.price_date ASC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS T3
ORDER BY T3.price_date ASC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 132) AS T4
ORDER BY T4.price_date ASC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 264) AS T5
ORDER BY T5.price_date ASC LIMIT 1);

This statement works flawlessly, and is blindingly
fast relative to everything else I have tried.  But I
am stuck.

First, while this statement gets me the correct data,
I need to obtain a single record with stock_id,
current price (that obtained from the first select
statement in the union, and each of the prices
returned by the subsequent select statements as a the
current price minus the price at the previous date,
and the result divided by the price at the previous
date, expressed as a percentage.  I do not yet know
how to do this using SQL (it would be trivial if I
exported the data to Java or C++ - but it isn't clear
how to do it within SQL).

To make things more difficult, suppose I have another
select statement that returns a set of stock_ids.  How
do I apply the SQL logic I require to only those
stocks in the set returned by a statement like SELECT
stock_id FROM someTable WHERE ...  The result of this
extension would be that I have one record for each
stock in the selected set of stocks.

I do NOT want to have to recompute the set of stocks
for each of the select statements in the above union
(since that would be a waste because the resulting set
of stocks would always be the same for the given
criteria).  Nor do I want to apply the SQL logic I
need for the prices to all the stocks in the database.
 There could be thousands, or even tens of thousands,
of stocks represented in the database and I'd need the
gain/loss logic only for a few dozen at any given
time!

How do I make the two extensions I require?

I expect the SQL I get to be eventually placed in a
stored procedure, which may then be used to construct
a view, but that is the easy part.

Maybe I have been staring at this for too long to see
the obvious solution, but I am exhausted and am not
seeing the next step.  If there IS an obvious next
step, please at least give me a hint.

Thanks

Ted


Re: Primary Key

From
"Merlin Moncure"
Date:
On Nov 16, 2007 9:50 PM, Greg Smith <gsmith@gregsmith.com> wrote:
> On Fri, 16 Nov 2007, Merlin Moncure wrote:
> > the sad fact is that sequences have made developers lazy
>
> Nah, developers were lazy long before that.  If you ask Larry Wall it's a
> virtue.

well, 'lazy' in the sense that it encourages easy to solutions to
difficult problems is arguably virtuous.  intellectual laziness (which
i most certainly am not accusing you [or the OP] of) is another
matter. long years of wrestling with you and many other less talented
individuals on this particular topic has imparted to me a little bit
of weariness as well.  furthermore, i have myself surrogated a
database to victory on various occasions, although usually for
performance reasons...so i'm hardly a zealot.  i do however think that
being able to separate data into tables using unambiguous keys lifted
directly from the data is a critical skill.

> I gave up on this argument ten years ago after a long battle with
> well-known natural key zealot Joe Celko wore me out.  He published one of
> his many articles making a case for them using an example from the
> automotive industry.  Only problem was, the unique identifier he suggested
> wasn't.  At the auto parts company I worked for, I had just spent many
> monotonous days contorting keys to work around a problem caused by the
> original designer there, who misunderstood some nuances of how the "Big
> Three" auto manufacturers assigned part numbers the same way Celko did.

well, nobody's perfect...

> He doesn't use that example anymore but still misses the point I tried to
> make.  The ability of the world to invalidate the assumptions that go into
> natural key assignment are really impressive.  I particularly enjoy that
> so many systems are built presuming that the Social Security number for a
> person is involatile that this topic comes up in their FAQ about identify
> theft:  http://www.socialsecurity.gov/pubs/10064.html

that just means that the SSN is only part of the key that
unambiguously defines a person, should that be a requirement :)

database design, like many engineering disciplines, is a series of
trade-offs mixed in with a couple of helpings of artistry and the few
bits of theory that the sql standards committee was was not able to
snuff out.  like i said in my opening remarks, the issues at play are
nuanced without clear cut answers.

merlin

p.s. no compilation of 80's albums is complete without 'full moon fever'...

Re: Primary Key

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/16/07 20:50, Greg Smith wrote:
[snip]
>
> He doesn't use that example anymore but still misses the point I tried
> to make.  The ability of the world to invalidate the assumptions that go
> into natural key assignment are really impressive.  I particularly enjoy
> that so many systems are built presuming that the Social Security number
> for a person is involatile that this topic comes up in their FAQ about
> identify theft:  http://www.socialsecurity.gov/pubs/10064.html

Natural PKs are *not* set in stone, and only stubborn fools won't
admit that they can't divine all situations.  So, you add a new
column to the PK and keep on going.

But still, there *are* some circumstances where natural PKs just
don't work.  After all, SSNs and credit card numbers are synthetic
(just not generated sequential by the RDBMS).

- --
Ron Johnson, Jr.
Jefferson LA  USA

%SYSTEM-F-FISH, my hovercraft is full of eels
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHPnLPS9HxQb37XmcRAggNAKCL8UDTQ3238mbYIiV32AUAnOs+aQCgkhTP
yr+t6pT5loh7PBUc3QPljD4=
=O/Eb
-----END PGP SIGNATURE-----

Re: Need help with complicated SQL statement

From
Shane Ambler
Date:
Ted Byers wrote:
> Please consider the following statement (it becomes
> obvious if you remember the important thing about the
> table is that it has columns for each of stock_id,
> price_date, and price).
>
> (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
> price_date DESC LIMIT 1)
> UNION
> (SELECT * FROM (SELECT * FROM stockprices WHERE
> stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS T2
> ORDER BY T2.price_date ASC LIMIT 1)
> UNION
> (SELECT * FROM (SELECT * FROM stockprices WHERE
> stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS T3
> ORDER BY T3.price_date ASC LIMIT 1)
> UNION
> (SELECT * FROM (SELECT * FROM stockprices WHERE
> stock_id = 1 ORDER BY price_date DESC LIMIT 132) AS T4
> ORDER BY T4.price_date ASC LIMIT 1)
> UNION
> (SELECT * FROM (SELECT * FROM stockprices WHERE
> stock_id = 1 ORDER BY price_date DESC LIMIT 264) AS T5
> ORDER BY T5.price_date ASC LIMIT 1);
>
> This statement works flawlessly, and is blindingly
> fast relative to everything else I have tried.  But I
> am stuck.

I would have these subselects as -

UNION
(SELECT * FROM stockprices WHERE
  stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 264)

I would expect that to give the same result but make the query plan a
bit simpler and quicker using less memory.

> First, while this statement gets me the correct data,
> I need to obtain a single record with stock_id,
> current price (that obtained from the first select
> statement in the union, and each of the prices
> returned by the subsequent select statements as a the
> current price minus the price at the previous date,
> and the result divided by the price at the previous
> date, expressed as a percentage.  I do not yet know
> how to do this using SQL (it would be trivial if I
> exported the data to Java or C++ - but it isn't clear
> how to do it within SQL).

I haven't tested this but I would start with -

CREATE VIEW stock_price_combined AS
SELECT
stock_id

, (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id
ORDER BY price_date DESC LIMIT 1) as orig_price

, (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id
ORDER BY price_date DESC LIMIT 1 OFFSET 22) as price_two

, (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id
ORDER BY price_date DESC LIMIT 1 OFFSET 66) as price_three

, (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id
ORDER BY price_date DESC LIMIT 1 OFFSET 132) as price_four

, (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id
ORDER BY price_date DESC LIMIT 1 OFFSET 264) as price_five

FROM stock_prices OT;


Then you can -

SELECT
orig_price
, (orig_price - price_two) as price_increase
, ((orig_price - price_two)/price_two) as percentile
...
...

FROM stock_price_combined

WHERE stock_id in (SELECT stock_id FROM someTable WHERE ...)


> To make things more difficult, suppose I have another
> select statement that returns a set of stock_ids.  How
> do I apply the SQL logic I require to only those
> stocks in the set returned by a statement like SELECT
> stock_id FROM someTable WHERE ...  The result of this
> extension would be that I have one record for each
> stock in the selected set of stocks.

SELECT * from stockprices WHERE stock_id in (SELECT stock_id FROM
someTable WHERE ...)



If that isn't the answer you want I hope it points you in the right
direction...


--

Shane Ambler
pgSQL@Sheeky.Biz

Get Sheeky @ http://Sheeky.Biz

Re: Need help with complicated SQL statement

From
Ted Byers
Date:
--- Shane Ambler <pgsql@Sheeky.Biz> wrote:

> Ted Byers wrote:
> > Please consider the following statement (it
> becomes
> > obvious if you remember the important thing about
> the
> > table is that it has columns for each of stock_id,
> > price_date, and price).
> >
> > (SELECT * FROM stockprices WHERE stock_id = 1
> ORDER BY
> > price_date DESC LIMIT 1)
> > UNION
> > (SELECT * FROM (SELECT * FROM stockprices WHERE
> > stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS
> T2
> > ORDER BY T2.price_date ASC LIMIT 1)
> > UNION
> > (SELECT * FROM (SELECT * FROM stockprices WHERE
> > stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS
> T3
> > ORDER BY T3.price_date ASC LIMIT 1)
> > UNION
> > (SELECT * FROM (SELECT * FROM stockprices WHERE
> > stock_id = 1 ORDER BY price_date DESC LIMIT 132)
> AS T4
> > ORDER BY T4.price_date ASC LIMIT 1)
> > UNION
> > (SELECT * FROM (SELECT * FROM stockprices WHERE
> > stock_id = 1 ORDER BY price_date DESC LIMIT 264)
> AS T5
> > ORDER BY T5.price_date ASC LIMIT 1);
> >
> > This statement works flawlessly, and is blindingly
> > fast relative to everything else I have tried.
> But I
> > am stuck.
>
> I would have these subselects as -
>
> UNION
> (SELECT * FROM stockprices WHERE
>   stock_id = 1 ORDER BY price_date DESC LIMIT 1
> OFFSET 264)
>
> I would expect that to give the same result but make
> the query plan a
> bit simpler and quicker using less memory.
>
It gave apparently correct values, but for some
reason, it insisted on returning thousands upon
thousands of identical record.  There is something
awry there, but I can't place what.  Yes, I know I
could use SELECT DISTINCT, but I worry that it may be
doing a full table scan, as opposed to the relatively
direct lookup I came up with after looking at your
statement.  I don't yet know how long it would take
because it is the slowest option I tied, and I gave up
after it had returned over 10,000 rows and still
showed no signs of finishing.  I don't understand this
as explain returned apparently much better results for
yours than it did for mine.

My latest is as follows:

SELECT A1.stock_id,
       A1.price_date,
       A1.adjusted,
       A2.price_date AS pd22,
       100.0 * (A1.adjusted - A2.adjusted)/A2.adjusted
AS gl22pc,
       A3.price_date AS pd66,
       100.0 * (A1.adjusted - A3.adjusted)/A3.adjusted
AS gl66pc,
       A4.price_date AS pd132,
       100.0 * (A1.adjusted - A4.adjusted)/A4.adjusted
AS gl132pc,
       A5.price_date AS pd264,
       100.0 * (A1.adjusted - A5.adjusted)/A5.adjusted
AS gl264pc
    FROM
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1) AS A1
LEFT JOIN
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1 OFFSET 22) AS A2
    ON A1.stock_id = A2.stock_id
LEFT JOIN
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1 OFFSET 66) AS A3
    ON A1.stock_id = A3.stock_id
LEFT JOIN
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1 OFFSET 132) AS A4
    ON A1.stock_id = A4.stock_id
LEFT JOIN
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1 OFFSET 264) AS A5
    ON A1.stock_id = A5.stock_id;

This still gives me the correct answer, but is faster
still than anything I came up with before.

Now that I have the correct result for one stock, I
need to adapt it to apply to each stock individually,
in some small selection from a large number of stocks.

Thanks again

Ted

Re: Primary Key

From
Gregory Stark
Date:
"Ron Johnson" <ron.l.johnson@cox.net> writes:

> On 11/16/07 12:50, João Paulo Zavanela wrote:
>> Hello,
>>
>> How many fields is recomended to create a primary key?
>> I'm thinking to create one with 6 fields, is much?
>
> The number of recommended fields is the *minimum* number required
> for uniqueness.  1 or 6 or 24.  Doesn't matter.

Unless of course you care about every other table being 24x larger and slower
due to having all these copies of the 24 fields. And of course unless you care
about being able to handle the inevitable day when it turns out the 24 fields
aren't unique and you need to consider adding a 25th column to the table *and
every table referencing it* as well as changing every line of application code
to use the new column.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

Re: Primary Key

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/17/07 01:21, Gregory Stark wrote:
> "Ron Johnson" <ron.l.johnson@cox.net> writes:
>
>> On 11/16/07 12:50, João Paulo Zavanela wrote:
>>> Hello,
>>>
>>> How many fields is recomended to create a primary key?
>>> I'm thinking to create one with 6 fields, is much?
>> The number of recommended fields is the *minimum* number required
>> for uniqueness.  1 or 6 or 24.  Doesn't matter.
>
> Unless of course you care about every other table being 24x larger and slower
> due to having all these copies of the 24 fields. And of course unless you care
> about being able to handle the inevitable day when it turns out the 24 fields
> aren't unique and you need to consider adding a 25th column to the table *and
> every table referencing it* as well as changing every line of application code
> to use the new column.

What's got to be done has got to be done.

On one of our systems, the natural PK of an electronic road toll is:
ETC_ACCOUNT_ID    INTEGER
FISCAL_PERIOD    INTEGER
LANE_TX_ID    BIGINT
TX_TYPE_ID    CHAR(1)
TX_SUBTYPE_IND    CHAR(1)

On another, it's:
ETC_ACCOUNT_ID    INTEGER
FISCAL_PERIOD    INTEGER
LANE_TX_ID    BIGINT
DEVICE_NO    CHAR(12) <<<< added column
TX_TYPE_ID    CHAR(1)
TX_SUBTYPE_IND    CHAR(1)


If the PK was synthetic and generated by the engine, then a (buggy)
app could insert duplicate tolls and the system wouldn't utter a
peep.  But the customer sure would when he saw the duplicate entries.

Note the seemingly *synthetic* field LANE_TX_ID.

Records coming in from the lane are inserted into the T_LANE_TX
table which has the PK of LANE_TX_ID.  However, that table also has
a "natural" unique index of LANE_ID, TX_DATE, TX_TIME, TX_SEQ_NUMBER

Likewise, T_LANE has the synthetic PK of LANE_ID, but it
back-stopped by a natural unique index PLAZA_ID, EXTERN_LANE_ID.

And... T_PLAZA has the PK of PLAZA_ID and is back-stopped by the
natural unique index AGENCY_ID, EXTERN_PLAZA_ID.

Breaking the chain, T_AGENCY only has the synthetic key AGENCY_ID.

But it only has 27 rows.

- --
Ron Johnson, Jr.
Jefferson LA  USA

%SYSTEM-F-FISH, my hovercraft is full of eels
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHP701S9HxQb37XmcRAk1cAKCFnNraM2Z1s8M8j8sl8Gotxz1r0QCglEfJ
sUCabkDaZTQVc/kCyHGewhQ=
=b9ii
-----END PGP SIGNATURE-----

Re: Need help with complicated SQL statement

From
Shane Ambler
Date:
Ted Byers wrote:

> It gave apparently correct values, but for some
> reason, it insisted on returning thousands upon
> thousands of identical record.  There is something
> awry there, but I can't place what.  Yes, I know I
> could use SELECT DISTINCT, but I worry that it may be
> doing a full table scan, as opposed to the relatively
> direct lookup I came up with after looking at your
> statement.  I don't yet know how long it would take
> because it is the slowest option I tied, and I gave up
> after it had returned over 10,000 rows and still
> showed no signs of finishing.  I don't understand this
> as explain returned apparently much better results for
> yours than it did for mine.

Now that I look at it again today I see that - you would either need to
use SELECT DISTINCT(stock_id) in the VIEW definition or select the
stock_id from the stock table instead of the stockprices table.

I set up a little test this time - this is the example I came up with -

CREATE DATABASE stocktest;

\c stocktest

CREATE TABLE stocks
(
   id serial PRIMARY KEY,
   description text
);


CREATE TABLE stockprices
(
    id serial PRIMARY KEY,
    stock_id integer REFERENCES stocks (id),
    stock_price numeric,
    price_date date
);

CREATE INDEX idx_stockprices_date ON stockprices (price_date);
CREATE INDEX idx_stockprices_stock_id ON stockprices (stock_id);


CREATE VIEW stock_prices_combined AS
SELECT
id AS stock_id

, (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY
price_date DESC LIMIT 1) as one_adjusted

, (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY
price_date DESC LIMIT 1) as one_date

, (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY
price_date DESC LIMIT 1 OFFSET 22) as two_adjusted

, (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY
price_date DESC LIMIT 1 OFFSET 22) as two_date

, (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY
price_date DESC LIMIT 1 OFFSET 66) as three_adjusted

, (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY
price_date DESC LIMIT 1 OFFSET 66) as three_date

, (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY
price_date DESC LIMIT 1 OFFSET 132) as four_adjusted

, (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY
price_date DESC LIMIT 1 OFFSET 132) as four_date

, (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY
price_date DESC LIMIT 1 OFFSET 264) as five_adjusted

, (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY
price_date DESC LIMIT 1 OFFSET 264) as five_date

FROM stocks ST;


CREATE VIEW stock_price_history AS
SELECT
stock_id,

one_date AS pd1,
one_adjusted AS current_price,

two_date AS pd22,
100.0 * (one_adjusted - two_adjusted)/two_adjusted AS gl22pc,

three_date AS pd66,
100.0 * (one_adjusted - three_adjusted)/three_adjusted AS gl66pc,

four_date AS pd132,
100.0 * (one_adjusted - four_adjusted)/four_adjusted AS gl132pc,

five_date AS pd264,
100.0 * (one_adjusted - five_adjusted)/five_adjusted AS gl264pc

FROM stock_prices_combined;


I INSERTed 500 stocks entries and 10,000 stockprices entries for each
stock (that's 5,000,000 price rows), then from

EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id = 20
I got - Total runtime: 981.618 ms

EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id IN
(10,25,36,45,86,154,368,481)
I got - Total runtime: 8084.217 ms

So that's about 1 second per stock_id returned (on my old machine).


You can change that last query to be -
SELECT * FROM stock_price_history WHERE stock_id IN (SELECT stock_id
FROM sometable WHERE ....)

Which gives you the range of stock_id's from a table that you asked about.



--

Shane Ambler
pgSQL@Sheeky.Biz

Get Sheeky @ http://Sheeky.Biz

Re: Primary Key

From
Sam Mason
Date:
On Sun, Nov 18, 2007 at 04:19:01AM +0000, Ron Johnson wrote:
> If the PK was synthetic and generated by the engine, then a (buggy)
> app could insert duplicate tolls and the system wouldn't utter a
> peep.  But the customer sure would when he saw the duplicate entries.

You'd just need to put a UNIQUE constraint on an appropriate set of
columns and the database would complain in all the same places.


One big benefit that I've been able to discern with having natural,
over synthetic, keys is that you need to pull in every table up the
hierarchy when you want to find out something lower down.  An example
would probably help; say we were recording samples from locations, with
these samples being collected in groups on specific visits.  I've been
using synthetic keys more recently, so I'll start with them:

  CREATE TABLE locations (
    id SERIAL NOT NULL PRIMARY KEY,
    locntype TEXT
  );
  CREATE TABLE visits (
    id SERIAL PRIMARY KEY,
    locnid   INTEGER NOT NULL REFERENCES locations,
    visitnum INTEGER NOT NULL,
      UNIQUE (locnid,visitnum),
    visitdate DATE
  );
  CREATE TABLE samples (
    id SERIAL PRIMARY KEY,
    visitid   INTEGER NOT NULL REFERENCES visits,
    samplenum INTEGER NOT NULL,
      UNIQUE (visitid,samplenum),
    barcode TEXT UNIQUE
  );

Say I wanted to summarise the number of samples per location type, I'd
be forced to join onto the visits table; like this:

  SELECT l.locntype, COUNT(*)
  FROM locations l, visits v, samples s
  WHERE l.id = v.locnid
    AND v.id = s.visitid
  GROUP BY l.locntype;

If we now organise the tables using natural keys (note that the location
id is still pretty artificial as it has to map back to something in the
real world):

  CREATE TABLE locations (
    id TEXT NOT NULL PRIMARY KEY,
    locntype TEXT
  );
  CREATE TABLE visits (
    locnid   TEXT    NOT NULL REFERENCES locations,
    visitnum INTEGER NOT NULL,
      PRIMARY KEY (locnid,visitnum),
    visitdate DATE
  );
  CREATE TABLE samples (
    locnid    TEXT    NOT NULL,
    visitnum  INTEGER NOT NULL,
    samplenum INTEGER NOT NULL,
      PRIMARY KEY (locnid,visitnum,samplenum),
      FOREIGN KEY (locnid,visitnum) REFERENCES visits,
    barcode TEXT UNIQUE
  );

This has a nice simplifying effect on the above query; I can directly
refer to the location for each sample:

  SELECT l.locntype, COUNT(*) AS samples
  FROM locations l, samples s
  WHERE l.id = s.locnid
  GROUP BY l.locntype;

Is that a reasonable example of the different ways of laying things out
in the two styles?


The main advantage of synthetic keys is that it introduces another
layer of abstraction into the database, potentially allowing greater
flexibility (of course this can also be a burden).

Natural keys allow the above simplification of queries and ease the
manual fix-up of the data should something go horribly wrong.  It's
reasonably easy to fix things if you have a sample referencing a
non-existent visit with natural keys, but if you've got synthetic keys
you're probably going to have to dump the sample as well.  Of course,
if you've got yourself into this state you've already failed, but some
recourse is nice.


In summary; I see various advantages to either approach, but I don't
see either as being fundamentally "better".  You can express the same
constraints in either style (I'm sure someone will think of a good
counter example though), it's the context in which it's used that will
determine which is more suitable to the task at hand.  Using one style
exclusively is almost certainly bad, but having a preference for one or
the other is probably good as it'll make the database as a whole more
cohesive and subsequently ease maintenance.

Comments?


  Sam

Re: Need help with complicated SQL statement

From
Shane Ambler
Date:
Shane Ambler wrote:
  > I INSERTed 500 stocks entries and 10,000 stockprices entries for each
> stock (that's 5,000,000 price rows), then from
>
> EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id = 20
> I got - Total runtime: 981.618 ms
>
> EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id IN
> (10,25,36,45,86,154,368,481)
> I got - Total runtime: 8084.217 ms
>

Actually I found a better way - after you run the example I gave you
before -

DROP INDEX idx_stockprices_date
DROP INDEX idx_stockprices_stock_id

CREATE INDEX idx_stockprices_id_date ON stockprices (stock_id,price_date);


with the same data (5,000,000 price rows) I then get -

EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id = 20
I got - Total runtime: 6.397 ms

EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id IN
(10,25,36,45,86,154,368,481)
I got - Total runtime: 36.265 ms


Which is probably the speed you want ;-)



--

Shane Ambler
pgSQL@Sheeky.Biz

Get Sheeky @ http://Sheeky.Biz

Re: Need help with complicated SQL statement

From
Ted Byers
Date:
Thanks Shane,

It works reasonably well.  It gets the right answer,
but I guess my data set is much larger than your test.

Please consider the appended data.

The first two SQL statements are directly comparable.
My Left join is marginally simpler, as shown by
EXPLAIN, and runs to completion in about 3 seconds
(elapsed real time), as opposed to about 30 seconds
for the two views.  It makes a little sense, though,
in that according to EXPLAIN, the LEFT JOIN needs to
consider a dramatically smaller number of rows.  What
I find puzzling, though, is that it takes less time to
get the report for 28 stocks at a given time than it
does to get the report for 1. (Both take about 30
seconds, but for 28 stocks, it takes about 0.005
seconds less time ;-)

This is a case where LEFT JOINS appear to be much
faster than subqueries.

I appreciate all your help, but I am struggling to
figure out how best to adapt my LEFT JOINs in your
VIEWs, so that the latter benefit from the speed of
the JOINs.  The heart of my problem is to figure out
how to  use a stock_id in the WHERE clause.

One thing I am not certain of is, "Is there a way to
preserve the logic of the WHERE clauses by replacing
the WHERE clause, which I use to sample the time
series at 22 days ago, 66 days ago, 132 days ago &c.,
by a "GROUP BY" clause, grouping by stock_id?  If so,
might that, along with an additional LEFT JOIN, get me
the result I am after?

I created a stored procedure that takes an id argument
(and can usefully invoke it on any stock_id in the
database), but the problem remains as to how to
construct a record set by applying the procedure to
each id in a set of ids returned, e.g., by SELECT
stock_id FROM stocks;

Ted

=========== test data =============
EXPLAIN SELECT A1.stock_id,
        A1.price_date,
        A1.adjusted,
        A2.price_date AS pd22,
        100.0 * (A1.adjusted -
A2.adjusted)/A2.adjusted AS gl22pc,
        A3.price_date AS pd66,
        100.0 * (A1.adjusted -
A3.adjusted)/A3.adjusted AS gl66pc,
        A4.price_date AS pd132,
        100.0 * (A1.adjusted -
A4.adjusted)/A4.adjusted AS gl132pc,
        A5.price_date AS pd264,
        100.0 * (A1.adjusted -
A5.adjusted)/A5.adjusted AS gl264pc
     FROM
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1) AS A1
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 22) AS A2
     ON A1.stock_id = A2.stock_id
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 66) AS A3
     ON A1.stock_id = A3.stock_id
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 132) AS A4
     ON A1.stock_id = A4.stock_id
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 264) AS A5
     ON A1.stock_id = A5.stock_id;
+----+-------------+-------------+--------+---------------+---------+---------+------+-------+-------------+
| id | select_type | table       | type   |
possible_keys | key     | key_len | ref  | rows  |
Extra       |
+----+-------------+-------------+--------+---------------+---------+---------+------+-------+-------------+
|  1 | PRIMARY     | <derived2>  | system | NULL
   | NULL    | NULL    | NULL |     1 |             |
|  1 | PRIMARY     | <derived3>  | system | NULL
   | NULL    | NULL    | NULL |     1 |             |
|  1 | PRIMARY     | <derived4>  | system | NULL
   | NULL    | NULL    | NULL |     1 |             |
|  1 | PRIMARY     | <derived5>  | system | NULL
   | NULL    | NULL    | NULL |     1 |             |
|  1 | PRIMARY     | <derived6>  | system | NULL
   | NULL    | NULL    | NULL |     1 |             |
|  6 | DERIVED     | stockprices | ref    | PRIMARY
   | PRIMARY | 4       |      | 17442 | Using where |
|  5 | DERIVED     | stockprices | ref    | PRIMARY
   | PRIMARY | 4       |      | 17442 | Using where |
|  4 | DERIVED     | stockprices | ref    | PRIMARY
   | PRIMARY | 4       |      | 17442 | Using where |
|  3 | DERIVED     | stockprices | ref    | PRIMARY
   | PRIMARY | 4       |      | 17442 | Using where |
|  2 | DERIVED     | stockprices | ref    | PRIMARY
   | PRIMARY | 4       |      | 17442 | Using where |
+----+-------------+-------------+--------+---------------+---------+---------+------+-------+-------------+
10 rows in set (0.08 sec)

EXPLAIN SELECT * FROM stock_price_history WHERE
stock_id = 1;

+----+--------------------+-------------+-------+---------------+---------+---------+-------------------+--------+----------------+
| id | select_type        | table       | type  |
possible_keys | key     | key_len | ref
| rows   | Extra          |

+----+--------------------+-------------+-------+---------------+---------+---------+-------------------+--------+----------------+
|  1 | PRIMARY            | <derived3>  | ALL   | NULL
         | NULL    | NULL    | NULL              |
494 | Using where    |
|  3 | DERIVED            | ST          | index | NULL
         | PRIMARY | 4       | NULL              |
496 | Using index    |
| 13 | DEPENDENT SUBQUERY | stockprices | ALL   |
PRIMARY       | PRIMARY | 4       | yohan.ST.stock_id
|      1 | Using filesort |
| 12 | DEPENDENT SUBQUERY | stockprices | ALL   |
PRIMARY       | PRIMARY | 4       | yohan.ST.stock_id
| 137560 | Using filesort |
| 11 | DEPENDENT SUBQUERY | stockprices | ALL   |
PRIMARY       | PRIMARY | 4       | yohan.ST.stock_id
| 137560 | Using filesort |
| 10 | DEPENDENT SUBQUERY | stockprices | ALL   |
PRIMARY       | PRIMARY | 4       | yohan.ST.stock_id
| 137560 | Using filesort |
|  9 | DEPENDENT SUBQUERY | stockprices | ALL   |
PRIMARY       | PRIMARY | 4       | yohan.ST.stock_id
| 137560 | Using filesort |
|  8 | DEPENDENT SUBQUERY | stockprices | ALL   |
PRIMARY       | PRIMARY | 4       | yohan.ST.stock_id
| 137560 | Using filesort |
|  7 | DEPENDENT SUBQUERY | stockprices | ALL   |
PRIMARY       | PRIMARY | 4       | yohan.ST.stock_id
| 137560 | Using filesort |
|  6 | DEPENDENT SUBQUERY | stockprices | ALL   |
PRIMARY       | PRIMARY | 4       | yohan.ST.stock_id
| 137560 | Using filesort |
|  5 | DEPENDENT SUBQUERY | stockprices | ALL   |
PRIMARY       | PRIMARY | 4       | yohan.ST.stock_id
| 137560 | Using filesort |
|  4 | DEPENDENT SUBQUERY | stockprices | ALL   |
PRIMARY       | PRIMARY | 4       | yohan.ST.stock_id
| 137560 | Using filesort |

+----+--------------------+-------------+-------+---------------+---------+---------+-------------------+--------+----------------+
12 rows in set (30.52 sec)

EXPLAIN SELECT * FROM stock_price_history WHERE
stock_id IN (SELECT stock_id FROM etf_stocks WHERE
etf_id = 397);

+----+--------------------+-------------+--------+------------------+---------+---------+-------------------+--------+--------------------------+
| id | select_type        | table       | type   |
possible_keys    | key     | key_len | ref
  | rows   | Extra                    |

+----+--------------------+-------------+--------+------------------+---------+---------+-------------------+--------+--------------------------+
|  1 | PRIMARY            | <derived4>  | ALL    |
NULL             | NULL    | NULL    | NULL
  |    494 | Using where              |
|  4 | DERIVED            | ST          | index  |
NULL             | PRIMARY | 4       | NULL
  |    496 | Using index              |
| 14 | DEPENDENT SUBQUERY | stockprices | ALL    |
PRIMARY          | PRIMARY | 4       |
yohan.ST.stock_id |      1 | Using filesort
|
| 13 | DEPENDENT SUBQUERY | stockprices | ALL    |
PRIMARY          | PRIMARY | 4       |
yohan.ST.stock_id | 137560 | Using filesort
|
| 12 | DEPENDENT SUBQUERY | stockprices | ALL    |
PRIMARY          | PRIMARY | 4       |
yohan.ST.stock_id | 137560 | Using filesort
|
| 11 | DEPENDENT SUBQUERY | stockprices | ALL    |
PRIMARY          | PRIMARY | 4       |
yohan.ST.stock_id | 137560 | Using filesort
|
| 10 | DEPENDENT SUBQUERY | stockprices | ALL    |
PRIMARY          | PRIMARY | 4       |
yohan.ST.stock_id | 137560 | Using filesort
|
|  9 | DEPENDENT SUBQUERY | stockprices | ALL    |
PRIMARY          | PRIMARY | 4       |
yohan.ST.stock_id | 137560 | Using filesort
|
|  8 | DEPENDENT SUBQUERY | stockprices | ALL    |
PRIMARY          | PRIMARY | 4       |
yohan.ST.stock_id | 137560 | Using filesort
|
|  7 | DEPENDENT SUBQUERY | stockprices | ALL    |
PRIMARY          | PRIMARY | 4       |
yohan.ST.stock_id | 137560 | Using filesort
|
|  6 | DEPENDENT SUBQUERY | stockprices | ALL    |
PRIMARY          | PRIMARY | 4       |
yohan.ST.stock_id | 137560 | Using filesort
|
|  5 | DEPENDENT SUBQUERY | stockprices | ALL    |
PRIMARY          | PRIMARY | 4       |
yohan.ST.stock_id | 137560 | Using filesort
|
|  2 | DEPENDENT SUBQUERY | etf_stocks  | eq_ref |
PRIMARY,stock_id | PRIMARY | 8       | const,func
  |      1 | Using where; Using index |

+----+--------------------+-------------+--------+------------------+---------+---------+-------------------+--------+--------------------------+
13 rows in set (30.34 sec)

SELECT A1.stock_id,
        A1.price_date,
        A1.adjusted,
        A2.price_date AS pd22,
        100.0 * (A1.adjusted -
A2.adjusted)/A2.adjusted AS gl22pc,
        A3.price_date AS pd66,
        100.0 * (A1.adjusted -
A3.adjusted)/A3.adjusted AS gl66pc,
        A4.price_date AS pd132,
        100.0 * (A1.adjusted -
A4.adjusted)/A4.adjusted AS gl132pc,
        A5.price_date AS pd264,
        100.0 * (A1.adjusted -
A5.adjusted)/A5.adjusted AS gl264pc
     FROM
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1) AS A1
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 22) AS A2
     ON A1.stock_id = A2.stock_id
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 66) AS A3
     ON A1.stock_id = A3.stock_id
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 132) AS A4
     ON A1.stock_id = A4.stock_id
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 264) AS A5
     ON A1.stock_id = A5.stock_id;

+----------+------------+----------+------------+------------+------------+------------+------------+------------+------------+-----------+
| stock_id | price_date | adjusted | pd22       |
gl22pc     | pd66       | gl66pc     | pd132      |
gl132pc    | pd264      | gl264pc   |

+----------+------------+----------+------------+------------+------------+------------+------------+------------+------------+-----------+
|        1 | 2007-11-13 |    46.10 | 2007-10-12 |
-5.2025499 | 2007-08-10 | -3.5564854 | 2007-05-08 |
-8.4409136 | 2006-10-25 | 5.0353156 |

+----------+------------+----------+------------+------------+------------+------------+------------+------------+------------+-----------+
1 row in set (0.03 sec)

SELECT * FROM stock_price_history WHERE stock_id = 1;

+----------+------------+---------------+------------+------------+------------+------------+------------+------------+------------+-----------+
| stock_id | pd1        | current_price | pd22       |
gl22pc     | pd66       | gl66pc     | pd132      |
gl132pc    | pd264      | gl264pc   |

+----------+------------+---------------+------------+------------+------------+------------+------------+------------+------------+-----------+
|        1 | 2007-11-13 |         46.10 | 2007-10-12 |
-5.2025499 | 2007-08-10 | -3.5564854 | 2007-05-08 |
-8.4409136 | 2006-10-25 | 5.0353156 |

+----------+------------+---------------+------------+------------+------------+------------+------------+------------+------------+-----------+
1 row in set (30.44 sec)

SELECT * FROM stock_price_history WHERE stock_id IN
(SELECT stock_id FROM etf_stocks WHERE etf_id = 397);

+----------+------------+---------------+------------+-------------+------------+-------------+------------+-------------+------------+-------------+
| stock_id | pd1        | current_price | pd22       |
gl22pc      | pd66       | gl66pc      | pd132      |
gl132pc     | pd264      | gl264pc     |

+----------+------------+---------------+------------+-------------+------------+-------------+------------+-------------+------------+-------------+
|        1 | 2007-11-13 |         46.10 | 2007-10-12 |
 -5.2025499 | 2007-08-10 |  -3.5564854 | 2007-05-08 |
-8.4409136 | 2006-10-25 |   5.0353156 |
|        2 | 2007-11-13 |         41.26 | 2007-10-12 |
-11.2688172 | 2007-08-10 |  -0.6740491 | 2007-05-08 |
-7.4680422 | 2006-10-25 |   6.5599174 |
|        3 | 2007-11-13 |         93.70 | 2007-10-12 |
  0.1710498 | 2007-08-10 |  37.5513799 | 2007-05-08 |
59.7340607 | 2006-10-25 | 117.8563125 |
|        4 | 2007-11-13 |        101.38 | 2007-10-12 |
-12.9187425 | 2007-08-10 |  17.5692914 | 2007-05-08 |
42.6079617 | 2006-10-25 |  72.4442932 |
|        5 | 2007-11-13 |         36.63 | 2007-10-12 |
 -3.5291019 | 2007-08-10 |   6.0509554 | 2007-05-08 |
-6.4367816 | 2006-10-25 |  35.3658537 |
|        6 | 2007-11-13 |         82.08 | 2007-10-12 |
 -1.7829365 | 2007-08-10 |   2.6641651 | 2007-05-08 |
25.0266565 | 2006-10-25 |  40.3316806 |
|        7 | 2007-11-13 |         97.13 | 2007-10-12 |
 -0.1233933 | 2007-08-10 |  10.1122322 | 2007-05-08 |
25.6532988 | 2006-10-25 |  44.1525675 |
|        8 | 2007-11-13 |         50.92 | 2007-10-12 |
  7.0422535 | 2007-08-10 |  21.7890457 | 2007-05-08 |
22.4627225 | 2006-10-25 |  15.1515152 |
|        9 | 2007-11-13 |         53.30 | 2007-10-12 |
 -5.1770148 | 2007-08-10 |   2.8957529 | 2007-05-08 |
-17.4539260 | 2006-10-25 |  -6.1289186 |
|       10 | 2007-11-13 |         34.53 | 2007-10-12 |
 -3.2230942 | 2007-08-10 |   3.1362007 | 2007-05-08 |
-8.3841868 | 2006-10-25 |   6.4426634 |
|       11 | 2007-11-13 |         73.99 | 2007-10-12 |
  1.6346154 | 2007-08-10 |  12.3102611 | 2007-05-08 |
-8.0983729 | 2006-10-25 |  21.1163857 |
|       12 | 2007-11-13 |         68.60 | 2007-10-12 |
-12.6114650 | 2007-08-10 |  -7.1969697 | 2007-05-08 |
-7.0712544 | 2006-10-25 |   3.4378770 |
|       13 | 2007-11-13 |         90.85 | 2007-10-12 |
-14.1709967 | 2007-08-10 |   5.2967084 | 2007-05-08 |
-17.3715325 | 2006-10-25 |  32.9382499 |
|       14 | 2007-11-13 |         90.91 | 2007-10-12 |
 -9.1263495 | 2007-08-10 |  -8.6698815 | 2007-05-08 |
-19.6695237 | 2006-10-25 |  11.7242227 |
|       15 | 2007-11-13 |         83.82 | 2007-10-12 |
 -9.2071057 | 2007-08-10 | -12.3588457 | 2007-05-08 |
-28.4812287 | 2006-10-25 |   6.4110702 |
|       16 | 2007-11-13 |         48.82 | 2007-10-12 |
-10.5697014 | 2007-08-10 | -12.7279228 | 2007-05-08 |
-5.8074474 | 2006-10-25 |  -2.2231124 |
|       17 | 2007-11-13 |         46.68 | 2007-10-12 |
 -0.8917197 | 2007-08-10 |  11.7013640 | 2007-05-08 |
11.5145724 | 2006-10-25 |   5.0168729 |
|       18 | 2007-11-13 |         52.00 | 2007-10-12 |
  2.4832479 | 2007-08-10 |   9.9598224 | 2007-05-08 |
22.0943884 | 2006-10-25 |  36.6982124 |
|       19 | 2007-11-13 |         48.87 | 2007-10-12 |
-12.9032258 | 2007-08-10 | -12.9497684 | 2007-05-08 |
-19.8195242 | 2006-10-25 |  28.5038128 |
|       20 | 2007-11-13 |         31.72 | 2007-10-12 |
  0.2211690 | 2007-08-10 |   1.9607843 | 2007-05-08 |
-0.1887980 | 2006-10-25 |  21.6724204 |
|       21 | 2007-11-13 |         64.25 | 2007-10-12 |
 -5.5424875 | 2007-08-10 |  -4.4467579 | 2007-05-08 |
-1.7584098 | 2006-10-25 |  14.9579531 |
|       22 | 2007-11-13 |         46.99 | 2007-10-12 |
 -9.9635946 | 2007-08-10 |  -9.3382211 | 2007-05-08 |
-9.0926678 | 2006-10-25 |  13.8599467 |
|       23 | 2007-11-13 |         24.56 | 2007-10-12 |
 -7.4604371 | 2007-08-10 |  -6.7223699 | 2007-05-08 |
-24.9618087 | 2006-10-25 | -15.1933702 |
|       24 | 2007-11-13 |         49.80 | 2007-10-12 |
 -6.0554612 | 2007-08-10 |   1.6949153 | 2007-05-08 |
-2.9807130 | 2006-10-25 |  20.3189176 |
|       25 | 2007-11-13 |         26.65 | 2007-10-12 |
-14.3362263 | 2007-08-10 |  -9.9966228 | 2007-05-08 |
-22.2578763 | 2006-10-25 | -12.1332015 |
|       26 | 2007-11-13 |         53.25 | 2007-10-12 |
-19.9248120 | 2007-08-10 |  -3.3399891 | 2007-05-08 |
-9.5157179 | 2006-10-25 | -20.8649131 |
|       27 | 2007-11-13 |         29.19 | 2007-10-12 |
  2.9266573 | 2007-08-10 |  -2.9587766 | 2007-05-08 |
-11.1685940 | 2006-10-25 | -10.1293103 |
|       28 | 2007-11-13 |         19.22 | 2007-10-12 |
-10.2707750 | 2007-08-10 |  -2.4365482 | 2007-05-08 |
-0.6204757 | 2006-10-25 |  10.0801833 |

+----------+------------+---------------+------------+-------------+------------+-------------+------------+-------------+------------+-------------+
28 rows in set (30.39 sec)


Re: Need help with complicated SQL statement

From
Shane Ambler
Date:
Ted Byers wrote:
> Thanks Shane,
>
> It works reasonably well.  It gets the right answer,
> but I guess my data set is much larger than your test.

What indexes have you got?

Using this index on the sample I sent gets the response time to about
5ms (per stock_id) (as opposed to 900ms with these columns indexed
separately)

CREATE INDEX idx_stockprices_id_date ON stockprices (stock_id,price_date);


> Please consider the appended data.
>
> The first two SQL statements are directly comparable.
> My Left join is marginally simpler, as shown by
> EXPLAIN, and runs to completion in about 3 seconds
> (elapsed real time), as opposed to about 30 seconds
> for the two views.  It makes a little sense, though,
> in that according to EXPLAIN, the LEFT JOIN needs to
> consider a dramatically smaller number of rows.  What
> I find puzzling, though, is that it takes less time to
> get the report for 28 stocks at a given time than it
> does to get the report for 1. (Both take about 30
> seconds, but for 28 stocks, it takes about 0.005
> seconds less time ;-)
>
> This is a case where LEFT JOINS appear to be much
> faster than subqueries.
>
> I appreciate all your help, but I am struggling to
> figure out how best to adapt my LEFT JOINs in your
> VIEWs, so that the latter benefit from the speed of
> the JOINs.  The heart of my problem is to figure out
> how to  use a stock_id in the WHERE clause.

That is where I have moved away from your select - the way you are
joining makes it hard to adapt to where you want it to end up (more than
one stock_id per query)

By using the view that generates the rows you want you make the query
sent from the client so much simpler and make it easy to get any single
or list of stock_id you want.

The second view will add little overhead and can be part of the select
sent from the client if you wish. I separated them in to two views to
prevent duplicating the same selects for the calculations. You can merge
them into one view if you wish - the first view would become a subselect
for the second view.


> One thing I am not certain of is, "Is there a way to
> preserve the logic of the WHERE clauses by replacing
> the WHERE clause, which I use to sample the time
> series at 22 days ago, 66 days ago, 132 days ago &c.,
> by a "GROUP BY" clause, grouping by stock_id?  If so,
> might that, along with an additional LEFT JOIN, get me
> the result I am after?
>
> I created a stored procedure that takes an id argument
> (and can usefully invoke it on any stock_id in the
> database), but the problem remains as to how to
> construct a record set by applying the procedure to
> each id in a set of ids returned, e.g., by SELECT
> stock_id FROM stocks;
>
> Ted
>

--

Shane Ambler
pgSQL@Sheeky.Biz

Get Sheeky @ http://Sheeky.Biz

Re: Primary Key

From
Sascha Bohnenkamp
Date:
> Fie on you evil synthetic key lovers.  Long live the Natural Key!

think of an foreign key referencing a primary key with 6 columns *urgs*
never!

Re: Primary Key

From
"Peter Childs"
Date:


On 21/11/2007, Sascha Bohnenkamp <asbohnenkamp@gmx.de> wrote:
> Fie on you evil synthetic key lovers.  Long live the Natural Key!

think of an foreign key referencing a primary key with 6 columns *urgs*
never!


The worse thing I meet is people who think primary keys need to be integer single field unique serial fields

I tend to agree that primary keys should be single fields if they need to be referenced but should also be natural if at all possible. ie use car number plates rather than some serial int.

Peter.

Re: Primary Key

From
Martijn van Oosterhout
Date:
On Fri, Nov 23, 2007 at 09:33:13AM +0000, Peter Childs wrote:
> I tend to agree that primary keys should be single fields if they need to be
> referenced but should also be natural if at all possible. ie use car number
> plates rather than some serial int.

Car number plates are unique over time? I didn't think so...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Attachment

Re: Primary Key

From
Csaba Nagy
Date:
On Fri, 2007-11-23 at 11:37 +0100, Martijn van Oosterhout wrote:
> On Fri, Nov 23, 2007 at 09:33:13AM +0000, Peter Childs wrote:
> > I tend to agree that primary keys should be single fields if they need to be
> > referenced but should also be natural if at all possible. ie use car number
> > plates rather than some serial int.
>
> Car number plates are unique over time? I didn't think so...

I'm not sure how it is in the US, but here in Germany I just reused a
car plate from the owner it had before me... so now the plate is
uniquely associated at most with the car, not the owner... and I'm
pretty sure that's not unique either.

And what do you do when the things shift meaning in your natural key ?
Cause that's a very common thing to happen to natural keys. And suddenly
what was unique becomes not unique anymore... and the headaches begin...

You're better off using synthetic keys for references between tables,
and you can still keep your natural keys for lookup, just don't use them
as unique join criteria, only search/filter criteria.

Cheers,
Csaba.



Re: Primary Key

From
Sam Mason
Date:
On Fri, Nov 23, 2007 at 12:00:18PM +0100, Csaba Nagy wrote:
> I'm not sure how it is in the US, but here in Germany I just reused a
> car plate from the owner it had before me... so now the plate is
> uniquely associated at most with the car, not the owner... and I'm
> pretty sure that's not unique either.
>
> And what do you do when the things shift meaning in your natural key ?
> Cause that's a very common thing to happen to natural keys. And suddenly
> what was unique becomes not unique anymore... and the headaches begin...
>
> You're better off using synthetic keys for references between tables,
> and you can still keep your natural keys for lookup, just don't use them
> as unique join criteria, only search/filter criteria.

To me, that just confirms that using natural keys for tracking data
outside the database is wrong.  For the abstractions inside the database
natural keys make a lot of sense.


  Sam

Re: Primary Key

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/23/07 03:33, Peter Childs wrote:
>
>
> On 21/11/2007, *Sascha Bohnenkamp* <asbohnenkamp@gmx.de
> <mailto:asbohnenkamp@gmx.de>> wrote:
>
>     > Fie on you evil synthetic key lovers.  Long live the Natural Key!
>
>     think of an foreign key referencing a primary key with 6 columns *urgs*
>     never!
>
>
>
> The worse thing I meet is people who think primary keys need to be
> integer single field unique serial fields
>
> I tend to agree that primary keys should be single fields if they need
> to be referenced but should also be natural if at all possible. ie use
> car number plates rather than some serial int.

I wouldn't trust plate number to be unique over time, since the
format "ABC 123" only has a capacity of 17,576,000 vehicles.

- --
Ron Johnson, Jr.
Jefferson LA  USA

%SYSTEM-F-FISH, my hovercraft is full of eels
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHRvTAS9HxQb37XmcRAiNRAJ40NhYEE7tVlx1jaBLPbjsf2RSW4ACcCLfc
QGvy8Jz9Nja0kfEJYEx9VV4=
=VnCw
-----END PGP SIGNATURE-----

Re: Primary Key

From
Tom Lane
Date:
Ron Johnson <ron.l.johnson@cox.net> writes:
> On 11/23/07 03:33, Peter Childs wrote:
>> The worse thing I meet is people who think primary keys need to be
>> integer single field unique serial fields
>>
>> I tend to agree that primary keys should be single fields if they need
>> to be referenced but should also be natural if at all possible. ie use
>> car number plates rather than some serial int.

> I wouldn't trust plate number to be unique over time, since the
> format "ABC 123" only has a capacity of 17,576,000 vehicles.

There's a worse problem than that: what if you find out that you
mis-entered the value?  Now you have to change the record's primary key
in order to stay consistent with the real world.  Which is exactly what
a primary key should never have to do, at least in the opinion of those
who recommend synthetic primary keys.

            regards, tom lane

Re: Primary Key

From
"James B. Byrne"
Date:
On: Fri, 23 Nov 2007 09:33:13 +0000, "Peter Childs" <peterachilds@gmail.com>
wrote:

> The worse thing I meet is people who think primary keys need to be
> integer single field unique serial fields
>
> I tend to agree that primary keys should be single fields if they
> need to be referenced but should also be natural if at all possible.
> ie use car number plates rather than some serial int.

Why is this desire not better satisfied by an index rather than a key?
Any key into a relation is, in the final analysis, an arbitrary value.
Why is a data value considered intrinsically superior to a sequence?

I am converting a system from HP TruboImage to PostgreSQL and the
framework selected, Ruby on Rails, essentially depends upon the existence
of an integer sequenced primary key for each row.  Originally I had a deep
distaste for the artificiality of integer keys but now I really do not
consider them any more, or less, coercive than many other programming
conventions.

Regards,

--
***          E-Mail is NOT a SECURE channel          ***
James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3


Re: Primary Key

From
Michael Glaesemann
Date:
On Nov 23, 2007, at 17:35 , James B. Byrne wrote:

> Why is this desire not better satisfied by an index rather than a key?

What's your distinction between an index and a key? For what it's
worth, both UNIQUE and PRIMARY KEY constrain a column (or set of
columns) to be unique, or a key in the logical sense. Uniqueness is
implemented in terms of btree indexes in PostgreSQL. It seems to me
you're conflating logical (key) and physical (index) concepts here.

Michael Glaesemann
grzm seespotcode net



Re: Primary Key

From
Steve Crawford
Date:
Martijn van Oosterhout wrote:
> On Fri, Nov 23, 2007 at 09:33:13AM +0000, Peter Childs wrote:
>
>> I tend to agree that primary keys should be single fields if they need to be
>> referenced but should also be natural if at all possible. ie use car number
>> plates rather than some serial int.
>>
>
> Car number plates are unique over time? I didn't think so...
>

It's worse than that.

If we presume that the plate is a key to a vehicle, then we immediately
run into problems as a vehicle can, over time, have several plates
(lost, stolen, changed to vanity...) and a plate can belong,
sequentially, to several vehicles (especially when vanity plates are
transferred to new cars).

And when you have your char(6) plate-number column, they run out of
numbers and switch to 7-characters requiring changes to all tables that
used the plate as a key. Or you realize that ABC123 could be
ABC123-California, ABC123-Nevada or ABC123-New York (I'm assuming that
AAA999 is a valid format in those states).

Although I haven't seen it much, recently, semi-trucks used to regularly
have with numerous plates - one for each state in which they operated.
And some states such as Texas allow you to have the same amateur-radio
plate number on multiple vehicles.

I won't argue that there are no reasonable natural keys. But I have sure
seen plenty of cases where what appeared to be a natural key was
discovered, generally at a very inopportune time in the development
process, to be not-so-natural after all.

Cheers,
Steve

Re: Primary Key

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 26 Nov 2007 10:11:37 -0800
Steve Crawford <scrawford@pinpointresearch.com> wrote:


> Although I haven't seen it much, recently, semi-trucks used to
> regularly have with numerous plates - one for each state in which
> they operated. And some states such as Texas allow you to have the
> same amateur-radio plate number on multiple vehicles.
> 
> I won't argue that there are no reasonable natural keys. But I have
> sure seen plenty of cases where what appeared to be a natural key was 
> discovered, generally at a very inopportune time in the development 
> process, to be not-so-natural after all.

In "theory" the item that would be a natural key in this instance is
the VIN. You would of course have to make some kind of allowance for
cars that don't have a VIN (nothing in the last what... 50 years?).

Joshua D. Drake


> 
> Cheers,
> Steve
> 
> ---------------------------(end of
> broadcast)--------------------------- TIP 5: don't forget to increase
> your free space map settings
> 


- -- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHSw4xATb/zqfZUUQRAgNgAJ9y3duugBxPeXdnyyIGycZwRGLrowCePzqo
v54FYcWDsMTnDcBzix+MvOM=
=1leN
-----END PGP SIGNATURE-----

Re: Primary Key

From
Richard Broersma Jr
Date:
--- On Mon, 11/26/07, Joshua D. Drake <jd@commandprompt.com> wrote:

> In "theory" the item that would be a natural key
> in this instance is the VIN. You would of course have
> to make some kind of allowance for cars that don't
> have a VIN (nothing in the last what...
> 50 years?).

So this is why the service stations always record my cars VIN number when I show up for oil changes. ;)  Ofcourse,
thereis a whole industry built around auto theft where they restamp the stolen car with a differnt vin number.  I
wonderif these stolen cars end up with duplicated VIN numbers or if the VIN's they are given do not pass the the VIN
check-sum(if such a think exists). 

Regards,
Richard Broersma Jr.

Re: Primary Key

From
"Garber, Mikhail"
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of
> Richard Broersma Jr
> Sent: Monday, November 26, 2007 10:28 AM
> To: Joshua D. Drake
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Primary Key
>
> --- On Mon, 11/26/07, Joshua D. Drake <jd@commandprompt.com> wrote:
>
> > In "theory" the item that would be a natural key in this
> instance is
> > the VIN. You would of course have to make some kind of
> allowance for
> > cars that don't have a VIN (nothing in the last what...
> > 50 years?).
>
> So this is why the service stations always record my cars VIN
> number when I show up for oil changes. ;)  Ofcourse, there is
> a whole industry built around auto theft where they restamp
> the stolen car with a differnt vin number.  I wonder if these
> stolen cars end up with duplicated VIN numbers or if the
> VIN's they are given do not pass the the VIN check-sum (if
> such a think exists).
>
> Regards,
> Richard Broersma Jr.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org
> so that your
>        message can get through to the mailing list cleanly
>

VIN encoding is covered here

http://en.wikipedia.org/wiki/Vehicle_Identification_Number

Looks like a poor choice for a primary key: too many confliciting, "meaningful", evolving-over-time digits that can be
mis-interepretedby your customers. 

Re: Primary Key

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 26 Nov 2007 10:28:03 -0800 (PST)
Richard Broersma Jr <rabroersma@yahoo.com> wrote:

> --- On Mon, 11/26/07, Joshua D. Drake <jd@commandprompt.com> wrote:
> 
> > In "theory" the item that would be a natural key
> > in this instance is the VIN. You would of course have
> > to make some kind of allowance for cars that don't
> > have a VIN (nothing in the last what...
> > 50 years?).
> 
> So this is why the service stations always record my cars VIN number
> when I show up for oil changes. ;)  Ofcourse, there is a whole
> industry built around auto theft where they restamp the stolen car
> with a differnt vin number.  I wonder if these stolen cars end up
> with duplicated VIN numbers or if the VIN's they are given do not
> pass the the VIN check-sum (if such a think exists).

They may end up with duplicate VINs but that isn't really relevant as
at that point we know that one of the two (or six or whatever) cars are
invalid, thus the "natural" key is still valid and designed to alert us
if there is any potential problem :)

Sincerely,

Joshua D. Drake


> 
> Regards,
> Richard Broersma Jr.
> 
> ---------------------------(end of
> broadcast)--------------------------- TIP 1: if posting/reading
> through Usenet, please send an appropriate subscribe-nomail command
> to majordomo@postgresql.org so that your message can get through to
> the mailing list cleanly
> 


- -- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHSxTtATb/zqfZUUQRAoDbAJ9h+NcHeyj7b1dmFAXl5uSWUbHzJwCfVOSW
vBFs5lpCXrgeCwOu0wud9S8=
=9AfY
-----END PGP SIGNATURE-----

Re: Primary Key

From
Scott Ribe
Date:
> It's worse than that.

It's even worse than that. Decades ago, Florida used to issue multiple
plates with the same number, differentiated by color.

There are other cases of states having multiple types of license plates,
with overlapping numbers.


--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: Primary Key

From
Steve Crawford
Date:
Joshua D. Drake wrote:
> In "theory" the item that would be a natural key in this instance is
> the VIN. You would of course have to make some kind of allowance for
> cars that don't have a VIN (nothing in the last what... 50 years?).
And some kind of allowance for Title 49, Sec. 565.4, subsection (d):

"The VINs of any two vehicles manufactured within a 30-year period shall
not be identical."

After 30 years, all bets are off. And the manufacturers have been
tinkering with the VIN due to the inadequacies of the VIN number. Since
VINs were first standardized in 1980, does this mean we are approaching
a Y2010 problem?

I'm sure someone has defined a "vehicle", but I don't know what number
applies when you've pieced together a rebuilt engine, salvaged
transmission, junkyard hood and so-on to get a working car. I think
custom builders end up applying for a new VIN but I don't think that
applies to repairs. Of course defining when a vehicle becomes a
different vehicle is a problem that remains regardless of choice of key.

VINs are apparently not required on directly imported vehicles (ie.
those imported by other than the manufacturer).

The above applies to the US. I'm sure the laws of other countries vary.

Cheers,
Steve


Re: Primary Key

From
"Scott Marlowe"
Date:
On Nov 26, 2007 1:30 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:

> I'm sure someone has defined a "vehicle", but I don't know what number
> applies when you've pieced together a rebuilt engine, salvaged
> transmission, junkyard hood and so-on to get a working car. I think
> custom builders end up applying for a new VIN but I don't think that
> applies to repairs. Of course defining when a vehicle becomes a
> different vehicle is a problem that remains regardless of choice of key.

There's a small car building company in the west that buys the old
VINs / chassis to Shelby 350 / 500GTs and builds whole new cars with
those VINs.  Course, the older ones are in the junkyard by then.

Re: Primary Key

From
Ron Mayer
Date:
Joshua D. Drake wrote:
> On Mon, 26 Nov 2007 10:28:03 -0800 (PST)
> Richard Broersma Jr <rabroersma@yahoo.com> wrote:
>> --- On Mon, 11/26/07, Joshua D. Drake <jd@commandprompt.com> wrote:
>>> In "theory" the item that would be a natural key
>>> in this instance is the VIN.

And you then need to deal with cars that have mismatched
Body VIN numbers and Engine VIN numbers?  It's not uncommon
on older cars or crash damaged cars.


Re: Primary Key

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/26/07 12:11, Steve Crawford wrote:
[snip]
>
> If we presume that the plate is a key to a vehicle, then we immediately
> run into problems as a vehicle can, over time, have several plates
> (lost, stolen, changed to vanity...) and a plate can belong,
> sequentially, to several vehicles (especially when vanity plates are
> transferred to new cars).
>
> And when you have your char(6) plate-number column, they run out of
> numbers and switch to 7-characters requiring changes to all tables that
> used the plate as a key. Or you realize that ABC123 could be
> ABC123-California, ABC123-Nevada or ABC123-New York (I'm assuming that
> AAA999 is a valid format in those states).

We use this as a *non*-unique index:
PLATE_NUMBER    CHAR(10)
PLATE_STATE    CHAR(2)
PLATE_COUNTRY    CHAR(4)

The country field could be dropped off and Canada/USA differentiated
by the state/province code, but with NAFTA it's possible that
Mexican plates will turn up "soon", and there's always the off
chance that a European car will show up.

(We used to have PLATE_STATE first, but then discovered how many
northeasterners don't know what state their vehicle is registered in.)

- --
Ron Johnson, Jr.
Jefferson LA  USA

%SYSTEM-F-FISH, my hovercraft is full of eels
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHSzxlS9HxQb37XmcRAh0QAKCLp5aNkoPPs8P5oXQCJ0HI28MNuACeKtFH
eECn8XRwrjOqonUuDr8DDH8=
=cYiG
-----END PGP SIGNATURE-----

Re: Primary Key

From
"Merlin Moncure"
Date:
On Nov 26, 2007 1:11 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
> It's worse than that.
>
> If we presume that the plate is a key to a vehicle, then we immediately
> run into problems as a vehicle can, over time, have several plates
> (lost, stolen, changed to vanity...) and a plate can belong,
> sequentially, to several vehicles (especially when vanity plates are
> transferred to new cars).
>
> And when you have your char(6) plate-number column, they run out of
> numbers and switch to 7-characters requiring changes to all tables that
> used the plate as a key. Or you realize that ABC123 could be
> ABC123-California, ABC123-Nevada or ABC123-New York (I'm assuming that
> AAA999 is a valid format in those states).
>
> Although I haven't seen it much, recently, semi-trucks used to regularly
> have with numerous plates - one for each state in which they operated.
> And some states such as Texas allow you to have the same amateur-radio
> plate number on multiple vehicles.
>
> I won't argue that there are no reasonable natural keys. But I have sure
> seen plenty of cases where what appeared to be a natural key was
> discovered, generally at a very inopportune time in the development
> process, to be not-so-natural after all.

if you miss the key and blow it, you fix it.  yes, there are tons of
examples of this particular number not exactly lining up with
something, like a person, vehicle, etc. of course this all means that
the number in question is simply not enough information by itself, and
so is either a partial definition or defines something else.

i will concede that changing a key across 10 tables is easier than
redefining a constraint on one table.  this is why the compromise
mentioned way upthread by josh drake (namely, to define the natural
but use surrogate for joining) is good in certain cases like this,
especially when you have a complex key that is used in many tables.

the problem is that, because surrogates allow skipping the problem
without defining a proper key at all, the vague data relationships you
mention never get properly defined in the database and end up being
caught in code or by the user because the id is trusted to express the
relationship when in fact it doesn't.  this causes much worse problems
than redefining keys by the way, and helps create the messy databases
that those of us who know how to do things both ways complain about.

in other words, if you create tables by defining the id p-key,
throwing a bunch of fields on it that approximately describe the item,
plus maybe some indexes for performance, you have already loaded the
gun to shoot yourself in the foot.  many of the safeguards the
database can provide in keeping your data organized have been
removed...

merlin