Thread: Primary Key
Hello, How many fields is recomended to create a primary key? I'm thinking to create one with 6 fields, is much? Thanks!
On Fri, Nov 16, 2007 at 03:50:20PM -0300, Jo??o Paulo Zavanela wrote: > How many fields is recomended to create a primary key? > I'm thinking to create one with 6 fields, is much? Normally a primary key would just be a single column. When you start going to that many I'd probably have a serial column as the primary key, and a UNIQUE index on those six fields. Depends on what you're doing, though unless you've got a few years experience I'd be tempted to stay away from primary keys of more than a single column. Sam
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 16 Nov 2007 18:54:22 +0000 Sam Mason <sam@samason.me.uk> wrote: > On Fri, Nov 16, 2007 at 03:50:20PM -0300, Jo??o Paulo Zavanela wrote: > > How many fields is recomended to create a primary key? > > I'm thinking to create one with 6 fields, is much? > > Normally a primary key would just be a single column. Uhmm no. Normally a primary key is defined by the number of columns it takes to determine naturally distinct values. I would agree that if he is looking at 6 then he probably needs to normalize further. Sincerely, Joshua D. Drake - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHPej7ATb/zqfZUUQRAjnAAJ4p8mH685V/ysBDd4sPLoUkDRSzfACffFhs AkUg+htYb3Nv2wpN/0ecod8= =hnSU -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/16/07 12:50, João Paulo Zavanela wrote: > Hello, > > How many fields is recomended to create a primary key? > I'm thinking to create one with 6 fields, is much? The number of recommended fields is the *minimum* number required for uniqueness. 1 or 6 or 24. Doesn't matter. - -- Ron Johnson, Jr. Jefferson LA USA %SYSTEM-F-FISH, my hovercraft is full of eels -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHPej/S9HxQb37XmcRAo5mAJoDwp1E+aL2M/oTWhOsR5XYJi0AhgCgvUgU /ZB7nP+K6j0WW8vNn5Q8tFI= =s+zS -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/16/07 12:54, Sam Mason wrote: > On Fri, Nov 16, 2007 at 03:50:20PM -0300, Jo??o Paulo Zavanela wrote: >> How many fields is recomended to create a primary key? >> I'm thinking to create one with 6 fields, is much? > > Normally a primary key would just be a single column. When you start > going to that many I'd probably have a serial column as the primary key, > and a UNIQUE index on those six fields. Depends on what you're doing, > though unless you've got a few years experience I'd be tempted to stay > away from primary keys of more than a single column. Fie on you evil synthetic key lovers. Long live the Natural Key! - -- Ron Johnson, Jr. Jefferson LA USA %SYSTEM-F-FISH, my hovercraft is full of eels -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHPel7S9HxQb37XmcRAmPZAJ9mml45XtPRSCrAiQ7K+LCwWf5J5QCgvaSp Zs5F1SU5CNQvnoofaxkAoIY= =9ipR -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 16 Nov 2007 13:03:23 -0600 Ron Johnson <ron.l.johnson@cox.net> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 11/16/07 12:54, Sam Mason wrote: > > On Fri, Nov 16, 2007 at 03:50:20PM -0300, Jo??o Paulo Zavanela > > wrote: > >> How many fields is recomended to create a primary key? > >> I'm thinking to create one with 6 fields, is much? > > > > Normally a primary key would just be a single column. When you > > start going to that many I'd probably have a serial column as the > > primary key, and a UNIQUE index on those six fields. Depends on > > what you're doing, though unless you've got a few years experience > > I'd be tempted to stay away from primary keys of more than a single > > column. > > Fie on you evil synthetic key lovers. Long live the Natural Key! Right with you there buddy. Let's get the pitchforks! Joshua D. Drake > > - -- > Ron Johnson, Jr. > Jefferson LA USA > > %SYSTEM-F-FISH, my hovercraft is full of eels > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.6 (GNU/Linux) > > iD8DBQFHPel7S9HxQb37XmcRAmPZAJ9mml45XtPRSCrAiQ7K+LCwWf5J5QCgvaSp > Zs5F1SU5CNQvnoofaxkAoIY= > =9ipR > -----END PGP SIGNATURE----- > > ---------------------------(end of > broadcast)--------------------------- TIP 2: Don't 'kill -9' the > postmaster > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHPetVATb/zqfZUUQRAp29AJ49UhU3LWyaIJq+F/vCkrZIMBIi8QCdEesA EqTIjelyYrqR8+JN1Y3sPwM= =AlLT -----END PGP SIGNATURE-----
( Fi Fie Foe Fum, I smell the blood of a religious war ) On Fri, Nov 16, 2007 at 01:03:23PM -0600, Ron Johnson wrote: > On 11/16/07 12:54, Sam Mason wrote: > > On Fri, Nov 16, 2007 at 03:50:20PM -0300, Jo??o Paulo Zavanela wrote: > >> How many fields is recomended to create a primary key? > >> I'm thinking to create one with 6 fields, is much? > > > > Normally a primary key would just be a single column. When you start > > going to that many I'd probably have a serial column as the primary key, > > and a UNIQUE index on those six fields. Depends on what you're doing, > > though unless you've got a few years experience I'd be tempted to stay > > away from primary keys of more than a single column. > > Fie on you evil synthetic key lovers. Long live the Natural Key! Really? I started off with everything using sequences and everything was good. Then I found I wanted to do more complicated things so I started to transition to natural keys and things were better. Then I took things too far and wanted something artificial back in my life. I'm back to almost never using natural keys now, mainly because interfacing with the outside world gets too complicated. When I'm just doing stuff inside the database then I can use natural keys and all is good, otherwise things just get too complicated. I'll probably look back in another few years and remember how young and naive I was back now. Sam
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 16 Nov 2007 20:00:29 +0000 Sam Mason <sam@samason.me.uk> wrote: > > > Normally a primary key would just be a single column. When you > > > start going to that many I'd probably have a serial column as the > > > primary key, and a UNIQUE index on those six fields. Depends on > > > what you're doing, though unless you've got a few years > > > experience I'd be tempted to stay away from primary keys of more > > > than a single column. > > > > Fie on you evil synthetic key lovers. Long live the Natural Key! > > Really? I started off with everything using sequences and everything > was good. Then I found I wanted to do more complicated things so I > started to transition to natural keys and things were better. Then > I took things too far and wanted something artificial back in my When that is needed I do this: create table foo(id serial unique, a text, b text, primary (a,b)); Joshua D. Drake > life. I'm back to almost never using natural keys now, mainly because > interfacing with the outside world gets too complicated. When I'm > just doing stuff inside the database then I can use natural keys and > all is good, otherwise things just get too complicated. > > I'll probably look back in another few years and remember how young > and naive I was back now. > > > Sam > > ---------------------------(end of > broadcast)--------------------------- TIP 6: explain analyze is your > friend > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHPfhWATb/zqfZUUQRArjCAJ90cXYwuw/A/yojQanj1gv+RJqJnQCdFvPv 6nGph8K57KcKtk1rTgfFSFg= =fQ5l -----END PGP SIGNATURE-----
On Fri, Nov 16, 2007 at 12:06:46PM -0800, Joshua D. Drake wrote: > When that is needed I do this: > > create table foo(id serial unique, a text, b text, primary (a,b)); Humm, so the other way around from what I've ended up doing. I'll need to think about the implications of changing things around like this. There are lots of things that seems as though they'll be pretty awkard to do, I'm sure it's just because I haven't thought about it enough. Sam
On Nov 16, 2007 3:21 PM, Sam Mason <sam@samason.me.uk> wrote: > On Fri, Nov 16, 2007 at 12:06:46PM -0800, Joshua D. Drake wrote: > > When that is needed I do this: > > > > create table foo(id serial unique, a text, b text, primary (a,b)); > > Humm, so the other way around from what I've ended up doing. I'll need > to think about the implications of changing things around like this. > There are lots of things that seems as though they'll be pretty awkard > to do, I'm sure it's just because I haven't thought about it enough. there is a lot of nuance to this debate and tons of hyperbole on both sides. There are many side effects, pro and con, about choosing 'natural' keys vs. surrogates. josh's suggestion is the most reasonable compromise, because it allows you to get the performance benefits (which are sometimes overrated) when you need it, but still forces you to think about how your data is _truly_ organized and what makes each record unique. the sad fact is that sequences have made developers lazy, not giving much thought to proper normalization strategies which in turn often produces lousy databases. if you know how to do things properly, you will know what we mean. merlin
On Fri, Nov 16, 2007 at 12:06:46PM -0800, Joshua D. Drake wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Fri, 16 Nov 2007 20:00:29 +0000 > Sam Mason <sam@samason.me.uk> wrote: > > > > > > Normally a primary key would just be a single column. When > > > > you start going to that many I'd probably have a serial column > > > > as the primary key, and a UNIQUE index on those six fields. > > > > Depends on what you're doing, though unless you've got a few > > > > years experience I'd be tempted to stay away from primary keys > > > > of more than a single column. > > > > > > Fie on you evil synthetic key lovers. Long live the Natural > > > Key! > > > > Really? I started off with everything using sequences and > > everything was good. Then I found I wanted to do more complicated > > things so I started to transition to natural keys and things were > > better. Then I took things too far and wanted something > > artificial back in my > > When that is needed I do this: > > create table foo(id serial unique, a text, b text, primary (a,b)); By itself, this insufficiently restricts what's going on in the table. I'd recommend a TRIGGER that disallows updating the synthetic key. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Fri, Nov 16, 2007 at 04:41:42PM -0500, Merlin Moncure wrote: > On Nov 16, 2007 3:21 PM, Sam Mason <sam@samason.me.uk> wrote: > > On Fri, Nov 16, 2007 at 12:06:46PM -0800, Joshua D. Drake wrote: > > > When that is needed I do this: > > > > > > create table foo(id serial unique, a text, b text, primary (a,b)); > > > > Humm, so the other way around from what I've ended up doing. I'll need > > to think about the implications of changing things around like this. > > There are lots of things that seems as though they'll be pretty awkard > > to do, I'm sure it's just because I haven't thought about it enough. > > there is a lot of nuance to this debate and tons of hyperbole on both > sides. There are many side effects, pro and con, about choosing > 'natural' keys vs. surrogates. josh's suggestion is the most > reasonable compromise, because it allows you to get the performance > benefits (which are sometimes overrated) when you need it, I'm not sure if performance has ever really come into the decision about whether to use natural/surrogate keys with me. The main reason for using a surrogate key is simplicity; I don't trust myself to maintain a large database where every relationship is composed of multiple columns. If I could say somewhere that I want a set of properties to hold (i.e. there is a 1-to-1 relationship between these tables, there's at most one row in this table for each of these, etc) and then these constraints were checked when I actually wrote my queries I'd be much happier. For example, given the tables: CREATE TABLE foo ( id INTEGER NOT NULL PRIMARY KEY, description TEXT ); CREATE TABLE bar ( key INTEGER NOT NULL PRIMARY KEY, fooid INTEGER NOT NULL REFERENCES foo ); I'd like to be able to write the query: SELECT b.key, f.description FROM bar b, foo f WHERE b.fooid = f.id; And be able to say that I expect exactly one row for each bar.key. The database would be able to go back over the definitions and prove that this constraint holds (because bar.key and foo.id are UNIQUE, the FOREIGN KEY constraint has checked that bar.fooid always references a valid foo.id, and that bar.fooid can never be NULL). Or is this the sort of thing that materialised views are good for, and I've always just been thinking about them as a performance hack. > the sad fact is that sequences have made > developers lazy, not giving much thought to proper normalization > strategies which in turn often produces lousy databases. if you know > how to do things properly, you will know what we mean. "Properly" is very open ended. Most people will try to do their best job (given various external constraints) and we've all experienced bad design, if only from stuff that we did while learning. I think I've experienced this, but you've probably got a very different idea about what "properly" means than I do. Sam
On Fri, 16 Nov 2007, Merlin Moncure wrote: > the sad fact is that sequences have made developers lazy Nah, developers were lazy long before that. If you ask Larry Wall it's a virtue. I gave up on this argument ten years ago after a long battle with well-known natural key zealot Joe Celko wore me out. He published one of his many articles making a case for them using an example from the automotive industry. Only problem was, the unique identifier he suggested wasn't. At the auto parts company I worked for, I had just spent many monotonous days contorting keys to work around a problem caused by the original designer there, who misunderstood some nuances of how the "Big Three" auto manufacturers assigned part numbers the same way Celko did. He doesn't use that example anymore but still misses the point I tried to make. The ability of the world to invalidate the assumptions that go into natural key assignment are really impressive. I particularly enjoy that so many systems are built presuming that the Social Security number for a person is involatile that this topic comes up in their FAQ about identify theft: http://www.socialsecurity.gov/pubs/10064.html -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Please consider the following statement (it becomes obvious if you remember the important thing about the table is that it has columns for each of stock_id, price_date, and price). (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS T2 ORDER BY T2.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS T3 ORDER BY T3.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 132) AS T4 ORDER BY T4.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 264) AS T5 ORDER BY T5.price_date ASC LIMIT 1); This statement works flawlessly, and is blindingly fast relative to everything else I have tried. But I am stuck. First, while this statement gets me the correct data, I need to obtain a single record with stock_id, current price (that obtained from the first select statement in the union, and each of the prices returned by the subsequent select statements as a the current price minus the price at the previous date, and the result divided by the price at the previous date, expressed as a percentage. I do not yet know how to do this using SQL (it would be trivial if I exported the data to Java or C++ - but it isn't clear how to do it within SQL). To make things more difficult, suppose I have another select statement that returns a set of stock_ids. How do I apply the SQL logic I require to only those stocks in the set returned by a statement like SELECT stock_id FROM someTable WHERE ... The result of this extension would be that I have one record for each stock in the selected set of stocks. I do NOT want to have to recompute the set of stocks for each of the select statements in the above union (since that would be a waste because the resulting set of stocks would always be the same for the given criteria). Nor do I want to apply the SQL logic I need for the prices to all the stocks in the database. There could be thousands, or even tens of thousands, of stocks represented in the database and I'd need the gain/loss logic only for a few dozen at any given time! How do I make the two extensions I require? I expect the SQL I get to be eventually placed in a stored procedure, which may then be used to construct a view, but that is the easy part. Maybe I have been staring at this for too long to see the obvious solution, but I am exhausted and am not seeing the next step. If there IS an obvious next step, please at least give me a hint. Thanks Ted
On Nov 16, 2007 9:50 PM, Greg Smith <gsmith@gregsmith.com> wrote: > On Fri, 16 Nov 2007, Merlin Moncure wrote: > > the sad fact is that sequences have made developers lazy > > Nah, developers were lazy long before that. If you ask Larry Wall it's a > virtue. well, 'lazy' in the sense that it encourages easy to solutions to difficult problems is arguably virtuous. intellectual laziness (which i most certainly am not accusing you [or the OP] of) is another matter. long years of wrestling with you and many other less talented individuals on this particular topic has imparted to me a little bit of weariness as well. furthermore, i have myself surrogated a database to victory on various occasions, although usually for performance reasons...so i'm hardly a zealot. i do however think that being able to separate data into tables using unambiguous keys lifted directly from the data is a critical skill. > I gave up on this argument ten years ago after a long battle with > well-known natural key zealot Joe Celko wore me out. He published one of > his many articles making a case for them using an example from the > automotive industry. Only problem was, the unique identifier he suggested > wasn't. At the auto parts company I worked for, I had just spent many > monotonous days contorting keys to work around a problem caused by the > original designer there, who misunderstood some nuances of how the "Big > Three" auto manufacturers assigned part numbers the same way Celko did. well, nobody's perfect... > He doesn't use that example anymore but still misses the point I tried to > make. The ability of the world to invalidate the assumptions that go into > natural key assignment are really impressive. I particularly enjoy that > so many systems are built presuming that the Social Security number for a > person is involatile that this topic comes up in their FAQ about identify > theft: http://www.socialsecurity.gov/pubs/10064.html that just means that the SSN is only part of the key that unambiguously defines a person, should that be a requirement :) database design, like many engineering disciplines, is a series of trade-offs mixed in with a couple of helpings of artistry and the few bits of theory that the sql standards committee was was not able to snuff out. like i said in my opening remarks, the issues at play are nuanced without clear cut answers. merlin p.s. no compilation of 80's albums is complete without 'full moon fever'...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/16/07 20:50, Greg Smith wrote: [snip] > > He doesn't use that example anymore but still misses the point I tried > to make. The ability of the world to invalidate the assumptions that go > into natural key assignment are really impressive. I particularly enjoy > that so many systems are built presuming that the Social Security number > for a person is involatile that this topic comes up in their FAQ about > identify theft: http://www.socialsecurity.gov/pubs/10064.html Natural PKs are *not* set in stone, and only stubborn fools won't admit that they can't divine all situations. So, you add a new column to the PK and keep on going. But still, there *are* some circumstances where natural PKs just don't work. After all, SSNs and credit card numbers are synthetic (just not generated sequential by the RDBMS). - -- Ron Johnson, Jr. Jefferson LA USA %SYSTEM-F-FISH, my hovercraft is full of eels -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHPnLPS9HxQb37XmcRAggNAKCL8UDTQ3238mbYIiV32AUAnOs+aQCgkhTP yr+t6pT5loh7PBUc3QPljD4= =O/Eb -----END PGP SIGNATURE-----
Ted Byers wrote: > Please consider the following statement (it becomes > obvious if you remember the important thing about the > table is that it has columns for each of stock_id, > price_date, and price). > > (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY > price_date DESC LIMIT 1) > UNION > (SELECT * FROM (SELECT * FROM stockprices WHERE > stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS T2 > ORDER BY T2.price_date ASC LIMIT 1) > UNION > (SELECT * FROM (SELECT * FROM stockprices WHERE > stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS T3 > ORDER BY T3.price_date ASC LIMIT 1) > UNION > (SELECT * FROM (SELECT * FROM stockprices WHERE > stock_id = 1 ORDER BY price_date DESC LIMIT 132) AS T4 > ORDER BY T4.price_date ASC LIMIT 1) > UNION > (SELECT * FROM (SELECT * FROM stockprices WHERE > stock_id = 1 ORDER BY price_date DESC LIMIT 264) AS T5 > ORDER BY T5.price_date ASC LIMIT 1); > > This statement works flawlessly, and is blindingly > fast relative to everything else I have tried. But I > am stuck. I would have these subselects as - UNION (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 264) I would expect that to give the same result but make the query plan a bit simpler and quicker using less memory. > First, while this statement gets me the correct data, > I need to obtain a single record with stock_id, > current price (that obtained from the first select > statement in the union, and each of the prices > returned by the subsequent select statements as a the > current price minus the price at the previous date, > and the result divided by the price at the previous > date, expressed as a percentage. I do not yet know > how to do this using SQL (it would be trivial if I > exported the data to Java or C++ - but it isn't clear > how to do it within SQL). I haven't tested this but I would start with - CREATE VIEW stock_price_combined AS SELECT stock_id , (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id ORDER BY price_date DESC LIMIT 1) as orig_price , (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id ORDER BY price_date DESC LIMIT 1 OFFSET 22) as price_two , (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id ORDER BY price_date DESC LIMIT 1 OFFSET 66) as price_three , (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id ORDER BY price_date DESC LIMIT 1 OFFSET 132) as price_four , (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id ORDER BY price_date DESC LIMIT 1 OFFSET 264) as price_five FROM stock_prices OT; Then you can - SELECT orig_price , (orig_price - price_two) as price_increase , ((orig_price - price_two)/price_two) as percentile ... ... FROM stock_price_combined WHERE stock_id in (SELECT stock_id FROM someTable WHERE ...) > To make things more difficult, suppose I have another > select statement that returns a set of stock_ids. How > do I apply the SQL logic I require to only those > stocks in the set returned by a statement like SELECT > stock_id FROM someTable WHERE ... The result of this > extension would be that I have one record for each > stock in the selected set of stocks. SELECT * from stockprices WHERE stock_id in (SELECT stock_id FROM someTable WHERE ...) If that isn't the answer you want I hope it points you in the right direction... -- Shane Ambler pgSQL@Sheeky.Biz Get Sheeky @ http://Sheeky.Biz
--- Shane Ambler <pgsql@Sheeky.Biz> wrote: > Ted Byers wrote: > > Please consider the following statement (it > becomes > > obvious if you remember the important thing about > the > > table is that it has columns for each of stock_id, > > price_date, and price). > > > > (SELECT * FROM stockprices WHERE stock_id = 1 > ORDER BY > > price_date DESC LIMIT 1) > > UNION > > (SELECT * FROM (SELECT * FROM stockprices WHERE > > stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS > T2 > > ORDER BY T2.price_date ASC LIMIT 1) > > UNION > > (SELECT * FROM (SELECT * FROM stockprices WHERE > > stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS > T3 > > ORDER BY T3.price_date ASC LIMIT 1) > > UNION > > (SELECT * FROM (SELECT * FROM stockprices WHERE > > stock_id = 1 ORDER BY price_date DESC LIMIT 132) > AS T4 > > ORDER BY T4.price_date ASC LIMIT 1) > > UNION > > (SELECT * FROM (SELECT * FROM stockprices WHERE > > stock_id = 1 ORDER BY price_date DESC LIMIT 264) > AS T5 > > ORDER BY T5.price_date ASC LIMIT 1); > > > > This statement works flawlessly, and is blindingly > > fast relative to everything else I have tried. > But I > > am stuck. > > I would have these subselects as - > > UNION > (SELECT * FROM stockprices WHERE > stock_id = 1 ORDER BY price_date DESC LIMIT 1 > OFFSET 264) > > I would expect that to give the same result but make > the query plan a > bit simpler and quicker using less memory. > It gave apparently correct values, but for some reason, it insisted on returning thousands upon thousands of identical record. There is something awry there, but I can't place what. Yes, I know I could use SELECT DISTINCT, but I worry that it may be doing a full table scan, as opposed to the relatively direct lookup I came up with after looking at your statement. I don't yet know how long it would take because it is the slowest option I tied, and I gave up after it had returned over 10,000 rows and still showed no signs of finishing. I don't understand this as explain returned apparently much better results for yours than it did for mine. My latest is as follows: SELECT A1.stock_id, A1.price_date, A1.adjusted, A2.price_date AS pd22, 100.0 * (A1.adjusted - A2.adjusted)/A2.adjusted AS gl22pc, A3.price_date AS pd66, 100.0 * (A1.adjusted - A3.adjusted)/A3.adjusted AS gl66pc, A4.price_date AS pd132, 100.0 * (A1.adjusted - A4.adjusted)/A4.adjusted AS gl132pc, A5.price_date AS pd264, 100.0 * (A1.adjusted - A5.adjusted)/A5.adjusted AS gl264pc FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1) AS A1 LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 22) AS A2 ON A1.stock_id = A2.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 66) AS A3 ON A1.stock_id = A3.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 132) AS A4 ON A1.stock_id = A4.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 264) AS A5 ON A1.stock_id = A5.stock_id; This still gives me the correct answer, but is faster still than anything I came up with before. Now that I have the correct result for one stock, I need to adapt it to apply to each stock individually, in some small selection from a large number of stocks. Thanks again Ted
"Ron Johnson" <ron.l.johnson@cox.net> writes: > On 11/16/07 12:50, João Paulo Zavanela wrote: >> Hello, >> >> How many fields is recomended to create a primary key? >> I'm thinking to create one with 6 fields, is much? > > The number of recommended fields is the *minimum* number required > for uniqueness. 1 or 6 or 24. Doesn't matter. Unless of course you care about every other table being 24x larger and slower due to having all these copies of the 24 fields. And of course unless you care about being able to handle the inevitable day when it turns out the 24 fields aren't unique and you need to consider adding a 25th column to the table *and every table referencing it* as well as changing every line of application code to use the new column. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/17/07 01:21, Gregory Stark wrote: > "Ron Johnson" <ron.l.johnson@cox.net> writes: > >> On 11/16/07 12:50, João Paulo Zavanela wrote: >>> Hello, >>> >>> How many fields is recomended to create a primary key? >>> I'm thinking to create one with 6 fields, is much? >> The number of recommended fields is the *minimum* number required >> for uniqueness. 1 or 6 or 24. Doesn't matter. > > Unless of course you care about every other table being 24x larger and slower > due to having all these copies of the 24 fields. And of course unless you care > about being able to handle the inevitable day when it turns out the 24 fields > aren't unique and you need to consider adding a 25th column to the table *and > every table referencing it* as well as changing every line of application code > to use the new column. What's got to be done has got to be done. On one of our systems, the natural PK of an electronic road toll is: ETC_ACCOUNT_ID INTEGER FISCAL_PERIOD INTEGER LANE_TX_ID BIGINT TX_TYPE_ID CHAR(1) TX_SUBTYPE_IND CHAR(1) On another, it's: ETC_ACCOUNT_ID INTEGER FISCAL_PERIOD INTEGER LANE_TX_ID BIGINT DEVICE_NO CHAR(12) <<<< added column TX_TYPE_ID CHAR(1) TX_SUBTYPE_IND CHAR(1) If the PK was synthetic and generated by the engine, then a (buggy) app could insert duplicate tolls and the system wouldn't utter a peep. But the customer sure would when he saw the duplicate entries. Note the seemingly *synthetic* field LANE_TX_ID. Records coming in from the lane are inserted into the T_LANE_TX table which has the PK of LANE_TX_ID. However, that table also has a "natural" unique index of LANE_ID, TX_DATE, TX_TIME, TX_SEQ_NUMBER Likewise, T_LANE has the synthetic PK of LANE_ID, but it back-stopped by a natural unique index PLAZA_ID, EXTERN_LANE_ID. And... T_PLAZA has the PK of PLAZA_ID and is back-stopped by the natural unique index AGENCY_ID, EXTERN_PLAZA_ID. Breaking the chain, T_AGENCY only has the synthetic key AGENCY_ID. But it only has 27 rows. - -- Ron Johnson, Jr. Jefferson LA USA %SYSTEM-F-FISH, my hovercraft is full of eels -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHP701S9HxQb37XmcRAk1cAKCFnNraM2Z1s8M8j8sl8Gotxz1r0QCglEfJ sUCabkDaZTQVc/kCyHGewhQ= =b9ii -----END PGP SIGNATURE-----
Ted Byers wrote: > It gave apparently correct values, but for some > reason, it insisted on returning thousands upon > thousands of identical record. There is something > awry there, but I can't place what. Yes, I know I > could use SELECT DISTINCT, but I worry that it may be > doing a full table scan, as opposed to the relatively > direct lookup I came up with after looking at your > statement. I don't yet know how long it would take > because it is the slowest option I tied, and I gave up > after it had returned over 10,000 rows and still > showed no signs of finishing. I don't understand this > as explain returned apparently much better results for > yours than it did for mine. Now that I look at it again today I see that - you would either need to use SELECT DISTINCT(stock_id) in the VIEW definition or select the stock_id from the stock table instead of the stockprices table. I set up a little test this time - this is the example I came up with - CREATE DATABASE stocktest; \c stocktest CREATE TABLE stocks ( id serial PRIMARY KEY, description text ); CREATE TABLE stockprices ( id serial PRIMARY KEY, stock_id integer REFERENCES stocks (id), stock_price numeric, price_date date ); CREATE INDEX idx_stockprices_date ON stockprices (price_date); CREATE INDEX idx_stockprices_stock_id ON stockprices (stock_id); CREATE VIEW stock_prices_combined AS SELECT id AS stock_id , (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1) as one_adjusted , (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1) as one_date , (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 22) as two_adjusted , (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 22) as two_date , (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 66) as three_adjusted , (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 66) as three_date , (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 132) as four_adjusted , (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 132) as four_date , (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 264) as five_adjusted , (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 264) as five_date FROM stocks ST; CREATE VIEW stock_price_history AS SELECT stock_id, one_date AS pd1, one_adjusted AS current_price, two_date AS pd22, 100.0 * (one_adjusted - two_adjusted)/two_adjusted AS gl22pc, three_date AS pd66, 100.0 * (one_adjusted - three_adjusted)/three_adjusted AS gl66pc, four_date AS pd132, 100.0 * (one_adjusted - four_adjusted)/four_adjusted AS gl132pc, five_date AS pd264, 100.0 * (one_adjusted - five_adjusted)/five_adjusted AS gl264pc FROM stock_prices_combined; I INSERTed 500 stocks entries and 10,000 stockprices entries for each stock (that's 5,000,000 price rows), then from EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id = 20 I got - Total runtime: 981.618 ms EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id IN (10,25,36,45,86,154,368,481) I got - Total runtime: 8084.217 ms So that's about 1 second per stock_id returned (on my old machine). You can change that last query to be - SELECT * FROM stock_price_history WHERE stock_id IN (SELECT stock_id FROM sometable WHERE ....) Which gives you the range of stock_id's from a table that you asked about. -- Shane Ambler pgSQL@Sheeky.Biz Get Sheeky @ http://Sheeky.Biz
On Sun, Nov 18, 2007 at 04:19:01AM +0000, Ron Johnson wrote: > If the PK was synthetic and generated by the engine, then a (buggy) > app could insert duplicate tolls and the system wouldn't utter a > peep. But the customer sure would when he saw the duplicate entries. You'd just need to put a UNIQUE constraint on an appropriate set of columns and the database would complain in all the same places. One big benefit that I've been able to discern with having natural, over synthetic, keys is that you need to pull in every table up the hierarchy when you want to find out something lower down. An example would probably help; say we were recording samples from locations, with these samples being collected in groups on specific visits. I've been using synthetic keys more recently, so I'll start with them: CREATE TABLE locations ( id SERIAL NOT NULL PRIMARY KEY, locntype TEXT ); CREATE TABLE visits ( id SERIAL PRIMARY KEY, locnid INTEGER NOT NULL REFERENCES locations, visitnum INTEGER NOT NULL, UNIQUE (locnid,visitnum), visitdate DATE ); CREATE TABLE samples ( id SERIAL PRIMARY KEY, visitid INTEGER NOT NULL REFERENCES visits, samplenum INTEGER NOT NULL, UNIQUE (visitid,samplenum), barcode TEXT UNIQUE ); Say I wanted to summarise the number of samples per location type, I'd be forced to join onto the visits table; like this: SELECT l.locntype, COUNT(*) FROM locations l, visits v, samples s WHERE l.id = v.locnid AND v.id = s.visitid GROUP BY l.locntype; If we now organise the tables using natural keys (note that the location id is still pretty artificial as it has to map back to something in the real world): CREATE TABLE locations ( id TEXT NOT NULL PRIMARY KEY, locntype TEXT ); CREATE TABLE visits ( locnid TEXT NOT NULL REFERENCES locations, visitnum INTEGER NOT NULL, PRIMARY KEY (locnid,visitnum), visitdate DATE ); CREATE TABLE samples ( locnid TEXT NOT NULL, visitnum INTEGER NOT NULL, samplenum INTEGER NOT NULL, PRIMARY KEY (locnid,visitnum,samplenum), FOREIGN KEY (locnid,visitnum) REFERENCES visits, barcode TEXT UNIQUE ); This has a nice simplifying effect on the above query; I can directly refer to the location for each sample: SELECT l.locntype, COUNT(*) AS samples FROM locations l, samples s WHERE l.id = s.locnid GROUP BY l.locntype; Is that a reasonable example of the different ways of laying things out in the two styles? The main advantage of synthetic keys is that it introduces another layer of abstraction into the database, potentially allowing greater flexibility (of course this can also be a burden). Natural keys allow the above simplification of queries and ease the manual fix-up of the data should something go horribly wrong. It's reasonably easy to fix things if you have a sample referencing a non-existent visit with natural keys, but if you've got synthetic keys you're probably going to have to dump the sample as well. Of course, if you've got yourself into this state you've already failed, but some recourse is nice. In summary; I see various advantages to either approach, but I don't see either as being fundamentally "better". You can express the same constraints in either style (I'm sure someone will think of a good counter example though), it's the context in which it's used that will determine which is more suitable to the task at hand. Using one style exclusively is almost certainly bad, but having a preference for one or the other is probably good as it'll make the database as a whole more cohesive and subsequently ease maintenance. Comments? Sam
Shane Ambler wrote: > I INSERTed 500 stocks entries and 10,000 stockprices entries for each > stock (that's 5,000,000 price rows), then from > > EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id = 20 > I got - Total runtime: 981.618 ms > > EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id IN > (10,25,36,45,86,154,368,481) > I got - Total runtime: 8084.217 ms > Actually I found a better way - after you run the example I gave you before - DROP INDEX idx_stockprices_date DROP INDEX idx_stockprices_stock_id CREATE INDEX idx_stockprices_id_date ON stockprices (stock_id,price_date); with the same data (5,000,000 price rows) I then get - EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id = 20 I got - Total runtime: 6.397 ms EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id IN (10,25,36,45,86,154,368,481) I got - Total runtime: 36.265 ms Which is probably the speed you want ;-) -- Shane Ambler pgSQL@Sheeky.Biz Get Sheeky @ http://Sheeky.Biz
Thanks Shane, It works reasonably well. It gets the right answer, but I guess my data set is much larger than your test. Please consider the appended data. The first two SQL statements are directly comparable. My Left join is marginally simpler, as shown by EXPLAIN, and runs to completion in about 3 seconds (elapsed real time), as opposed to about 30 seconds for the two views. It makes a little sense, though, in that according to EXPLAIN, the LEFT JOIN needs to consider a dramatically smaller number of rows. What I find puzzling, though, is that it takes less time to get the report for 28 stocks at a given time than it does to get the report for 1. (Both take about 30 seconds, but for 28 stocks, it takes about 0.005 seconds less time ;-) This is a case where LEFT JOINS appear to be much faster than subqueries. I appreciate all your help, but I am struggling to figure out how best to adapt my LEFT JOINs in your VIEWs, so that the latter benefit from the speed of the JOINs. The heart of my problem is to figure out how to use a stock_id in the WHERE clause. One thing I am not certain of is, "Is there a way to preserve the logic of the WHERE clauses by replacing the WHERE clause, which I use to sample the time series at 22 days ago, 66 days ago, 132 days ago &c., by a "GROUP BY" clause, grouping by stock_id? If so, might that, along with an additional LEFT JOIN, get me the result I am after? I created a stored procedure that takes an id argument (and can usefully invoke it on any stock_id in the database), but the problem remains as to how to construct a record set by applying the procedure to each id in a set of ids returned, e.g., by SELECT stock_id FROM stocks; Ted =========== test data ============= EXPLAIN SELECT A1.stock_id, A1.price_date, A1.adjusted, A2.price_date AS pd22, 100.0 * (A1.adjusted - A2.adjusted)/A2.adjusted AS gl22pc, A3.price_date AS pd66, 100.0 * (A1.adjusted - A3.adjusted)/A3.adjusted AS gl66pc, A4.price_date AS pd132, 100.0 * (A1.adjusted - A4.adjusted)/A4.adjusted AS gl132pc, A5.price_date AS pd264, 100.0 * (A1.adjusted - A5.adjusted)/A5.adjusted AS gl264pc FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1) AS A1 LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 22) AS A2 ON A1.stock_id = A2.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 66) AS A3 ON A1.stock_id = A3.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 132) AS A4 ON A1.stock_id = A4.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 264) AS A5 ON A1.stock_id = A5.stock_id; +----+-------------+-------------+--------+---------------+---------+---------+------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+--------+---------------+---------+---------+------+-------+-------------+ | 1 | PRIMARY | <derived2> | system | NULL | NULL | NULL | NULL | 1 | | | 1 | PRIMARY | <derived3> | system | NULL | NULL | NULL | NULL | 1 | | | 1 | PRIMARY | <derived4> | system | NULL | NULL | NULL | NULL | 1 | | | 1 | PRIMARY | <derived5> | system | NULL | NULL | NULL | NULL | 1 | | | 1 | PRIMARY | <derived6> | system | NULL | NULL | NULL | NULL | 1 | | | 6 | DERIVED | stockprices | ref | PRIMARY | PRIMARY | 4 | | 17442 | Using where | | 5 | DERIVED | stockprices | ref | PRIMARY | PRIMARY | 4 | | 17442 | Using where | | 4 | DERIVED | stockprices | ref | PRIMARY | PRIMARY | 4 | | 17442 | Using where | | 3 | DERIVED | stockprices | ref | PRIMARY | PRIMARY | 4 | | 17442 | Using where | | 2 | DERIVED | stockprices | ref | PRIMARY | PRIMARY | 4 | | 17442 | Using where | +----+-------------+-------------+--------+---------------+---------+---------+------+-------+-------------+ 10 rows in set (0.08 sec) EXPLAIN SELECT * FROM stock_price_history WHERE stock_id = 1; +----+--------------------+-------------+-------+---------------+---------+---------+-------------------+--------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------------+-------+---------------+---------+---------+-------------------+--------+----------------+ | 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 494 | Using where | | 3 | DERIVED | ST | index | NULL | PRIMARY | 4 | NULL | 496 | Using index | | 13 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 1 | Using filesort | | 12 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 11 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 10 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 9 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 8 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 7 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 6 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 5 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 4 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | +----+--------------------+-------------+-------+---------------+---------+---------+-------------------+--------+----------------+ 12 rows in set (30.52 sec) EXPLAIN SELECT * FROM stock_price_history WHERE stock_id IN (SELECT stock_id FROM etf_stocks WHERE etf_id = 397); +----+--------------------+-------------+--------+------------------+---------+---------+-------------------+--------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------------+--------+------------------+---------+---------+-------------------+--------+--------------------------+ | 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 494 | Using where | | 4 | DERIVED | ST | index | NULL | PRIMARY | 4 | NULL | 496 | Using index | | 14 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 1 | Using filesort | | 13 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 12 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 11 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 10 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 9 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 8 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 7 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 6 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 5 | DEPENDENT SUBQUERY | stockprices | ALL | PRIMARY | PRIMARY | 4 | yohan.ST.stock_id | 137560 | Using filesort | | 2 | DEPENDENT SUBQUERY | etf_stocks | eq_ref | PRIMARY,stock_id | PRIMARY | 8 | const,func | 1 | Using where; Using index | +----+--------------------+-------------+--------+------------------+---------+---------+-------------------+--------+--------------------------+ 13 rows in set (30.34 sec) SELECT A1.stock_id, A1.price_date, A1.adjusted, A2.price_date AS pd22, 100.0 * (A1.adjusted - A2.adjusted)/A2.adjusted AS gl22pc, A3.price_date AS pd66, 100.0 * (A1.adjusted - A3.adjusted)/A3.adjusted AS gl66pc, A4.price_date AS pd132, 100.0 * (A1.adjusted - A4.adjusted)/A4.adjusted AS gl132pc, A5.price_date AS pd264, 100.0 * (A1.adjusted - A5.adjusted)/A5.adjusted AS gl264pc FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1) AS A1 LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 22) AS A2 ON A1.stock_id = A2.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 66) AS A3 ON A1.stock_id = A3.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 132) AS A4 ON A1.stock_id = A4.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 264) AS A5 ON A1.stock_id = A5.stock_id; +----------+------------+----------+------------+------------+------------+------------+------------+------------+------------+-----------+ | stock_id | price_date | adjusted | pd22 | gl22pc | pd66 | gl66pc | pd132 | gl132pc | pd264 | gl264pc | +----------+------------+----------+------------+------------+------------+------------+------------+------------+------------+-----------+ | 1 | 2007-11-13 | 46.10 | 2007-10-12 | -5.2025499 | 2007-08-10 | -3.5564854 | 2007-05-08 | -8.4409136 | 2006-10-25 | 5.0353156 | +----------+------------+----------+------------+------------+------------+------------+------------+------------+------------+-----------+ 1 row in set (0.03 sec) SELECT * FROM stock_price_history WHERE stock_id = 1; +----------+------------+---------------+------------+------------+------------+------------+------------+------------+------------+-----------+ | stock_id | pd1 | current_price | pd22 | gl22pc | pd66 | gl66pc | pd132 | gl132pc | pd264 | gl264pc | +----------+------------+---------------+------------+------------+------------+------------+------------+------------+------------+-----------+ | 1 | 2007-11-13 | 46.10 | 2007-10-12 | -5.2025499 | 2007-08-10 | -3.5564854 | 2007-05-08 | -8.4409136 | 2006-10-25 | 5.0353156 | +----------+------------+---------------+------------+------------+------------+------------+------------+------------+------------+-----------+ 1 row in set (30.44 sec) SELECT * FROM stock_price_history WHERE stock_id IN (SELECT stock_id FROM etf_stocks WHERE etf_id = 397); +----------+------------+---------------+------------+-------------+------------+-------------+------------+-------------+------------+-------------+ | stock_id | pd1 | current_price | pd22 | gl22pc | pd66 | gl66pc | pd132 | gl132pc | pd264 | gl264pc | +----------+------------+---------------+------------+-------------+------------+-------------+------------+-------------+------------+-------------+ | 1 | 2007-11-13 | 46.10 | 2007-10-12 | -5.2025499 | 2007-08-10 | -3.5564854 | 2007-05-08 | -8.4409136 | 2006-10-25 | 5.0353156 | | 2 | 2007-11-13 | 41.26 | 2007-10-12 | -11.2688172 | 2007-08-10 | -0.6740491 | 2007-05-08 | -7.4680422 | 2006-10-25 | 6.5599174 | | 3 | 2007-11-13 | 93.70 | 2007-10-12 | 0.1710498 | 2007-08-10 | 37.5513799 | 2007-05-08 | 59.7340607 | 2006-10-25 | 117.8563125 | | 4 | 2007-11-13 | 101.38 | 2007-10-12 | -12.9187425 | 2007-08-10 | 17.5692914 | 2007-05-08 | 42.6079617 | 2006-10-25 | 72.4442932 | | 5 | 2007-11-13 | 36.63 | 2007-10-12 | -3.5291019 | 2007-08-10 | 6.0509554 | 2007-05-08 | -6.4367816 | 2006-10-25 | 35.3658537 | | 6 | 2007-11-13 | 82.08 | 2007-10-12 | -1.7829365 | 2007-08-10 | 2.6641651 | 2007-05-08 | 25.0266565 | 2006-10-25 | 40.3316806 | | 7 | 2007-11-13 | 97.13 | 2007-10-12 | -0.1233933 | 2007-08-10 | 10.1122322 | 2007-05-08 | 25.6532988 | 2006-10-25 | 44.1525675 | | 8 | 2007-11-13 | 50.92 | 2007-10-12 | 7.0422535 | 2007-08-10 | 21.7890457 | 2007-05-08 | 22.4627225 | 2006-10-25 | 15.1515152 | | 9 | 2007-11-13 | 53.30 | 2007-10-12 | -5.1770148 | 2007-08-10 | 2.8957529 | 2007-05-08 | -17.4539260 | 2006-10-25 | -6.1289186 | | 10 | 2007-11-13 | 34.53 | 2007-10-12 | -3.2230942 | 2007-08-10 | 3.1362007 | 2007-05-08 | -8.3841868 | 2006-10-25 | 6.4426634 | | 11 | 2007-11-13 | 73.99 | 2007-10-12 | 1.6346154 | 2007-08-10 | 12.3102611 | 2007-05-08 | -8.0983729 | 2006-10-25 | 21.1163857 | | 12 | 2007-11-13 | 68.60 | 2007-10-12 | -12.6114650 | 2007-08-10 | -7.1969697 | 2007-05-08 | -7.0712544 | 2006-10-25 | 3.4378770 | | 13 | 2007-11-13 | 90.85 | 2007-10-12 | -14.1709967 | 2007-08-10 | 5.2967084 | 2007-05-08 | -17.3715325 | 2006-10-25 | 32.9382499 | | 14 | 2007-11-13 | 90.91 | 2007-10-12 | -9.1263495 | 2007-08-10 | -8.6698815 | 2007-05-08 | -19.6695237 | 2006-10-25 | 11.7242227 | | 15 | 2007-11-13 | 83.82 | 2007-10-12 | -9.2071057 | 2007-08-10 | -12.3588457 | 2007-05-08 | -28.4812287 | 2006-10-25 | 6.4110702 | | 16 | 2007-11-13 | 48.82 | 2007-10-12 | -10.5697014 | 2007-08-10 | -12.7279228 | 2007-05-08 | -5.8074474 | 2006-10-25 | -2.2231124 | | 17 | 2007-11-13 | 46.68 | 2007-10-12 | -0.8917197 | 2007-08-10 | 11.7013640 | 2007-05-08 | 11.5145724 | 2006-10-25 | 5.0168729 | | 18 | 2007-11-13 | 52.00 | 2007-10-12 | 2.4832479 | 2007-08-10 | 9.9598224 | 2007-05-08 | 22.0943884 | 2006-10-25 | 36.6982124 | | 19 | 2007-11-13 | 48.87 | 2007-10-12 | -12.9032258 | 2007-08-10 | -12.9497684 | 2007-05-08 | -19.8195242 | 2006-10-25 | 28.5038128 | | 20 | 2007-11-13 | 31.72 | 2007-10-12 | 0.2211690 | 2007-08-10 | 1.9607843 | 2007-05-08 | -0.1887980 | 2006-10-25 | 21.6724204 | | 21 | 2007-11-13 | 64.25 | 2007-10-12 | -5.5424875 | 2007-08-10 | -4.4467579 | 2007-05-08 | -1.7584098 | 2006-10-25 | 14.9579531 | | 22 | 2007-11-13 | 46.99 | 2007-10-12 | -9.9635946 | 2007-08-10 | -9.3382211 | 2007-05-08 | -9.0926678 | 2006-10-25 | 13.8599467 | | 23 | 2007-11-13 | 24.56 | 2007-10-12 | -7.4604371 | 2007-08-10 | -6.7223699 | 2007-05-08 | -24.9618087 | 2006-10-25 | -15.1933702 | | 24 | 2007-11-13 | 49.80 | 2007-10-12 | -6.0554612 | 2007-08-10 | 1.6949153 | 2007-05-08 | -2.9807130 | 2006-10-25 | 20.3189176 | | 25 | 2007-11-13 | 26.65 | 2007-10-12 | -14.3362263 | 2007-08-10 | -9.9966228 | 2007-05-08 | -22.2578763 | 2006-10-25 | -12.1332015 | | 26 | 2007-11-13 | 53.25 | 2007-10-12 | -19.9248120 | 2007-08-10 | -3.3399891 | 2007-05-08 | -9.5157179 | 2006-10-25 | -20.8649131 | | 27 | 2007-11-13 | 29.19 | 2007-10-12 | 2.9266573 | 2007-08-10 | -2.9587766 | 2007-05-08 | -11.1685940 | 2006-10-25 | -10.1293103 | | 28 | 2007-11-13 | 19.22 | 2007-10-12 | -10.2707750 | 2007-08-10 | -2.4365482 | 2007-05-08 | -0.6204757 | 2006-10-25 | 10.0801833 | +----------+------------+---------------+------------+-------------+------------+-------------+------------+-------------+------------+-------------+ 28 rows in set (30.39 sec)
Ted Byers wrote: > Thanks Shane, > > It works reasonably well. It gets the right answer, > but I guess my data set is much larger than your test. What indexes have you got? Using this index on the sample I sent gets the response time to about 5ms (per stock_id) (as opposed to 900ms with these columns indexed separately) CREATE INDEX idx_stockprices_id_date ON stockprices (stock_id,price_date); > Please consider the appended data. > > The first two SQL statements are directly comparable. > My Left join is marginally simpler, as shown by > EXPLAIN, and runs to completion in about 3 seconds > (elapsed real time), as opposed to about 30 seconds > for the two views. It makes a little sense, though, > in that according to EXPLAIN, the LEFT JOIN needs to > consider a dramatically smaller number of rows. What > I find puzzling, though, is that it takes less time to > get the report for 28 stocks at a given time than it > does to get the report for 1. (Both take about 30 > seconds, but for 28 stocks, it takes about 0.005 > seconds less time ;-) > > This is a case where LEFT JOINS appear to be much > faster than subqueries. > > I appreciate all your help, but I am struggling to > figure out how best to adapt my LEFT JOINs in your > VIEWs, so that the latter benefit from the speed of > the JOINs. The heart of my problem is to figure out > how to use a stock_id in the WHERE clause. That is where I have moved away from your select - the way you are joining makes it hard to adapt to where you want it to end up (more than one stock_id per query) By using the view that generates the rows you want you make the query sent from the client so much simpler and make it easy to get any single or list of stock_id you want. The second view will add little overhead and can be part of the select sent from the client if you wish. I separated them in to two views to prevent duplicating the same selects for the calculations. You can merge them into one view if you wish - the first view would become a subselect for the second view. > One thing I am not certain of is, "Is there a way to > preserve the logic of the WHERE clauses by replacing > the WHERE clause, which I use to sample the time > series at 22 days ago, 66 days ago, 132 days ago &c., > by a "GROUP BY" clause, grouping by stock_id? If so, > might that, along with an additional LEFT JOIN, get me > the result I am after? > > I created a stored procedure that takes an id argument > (and can usefully invoke it on any stock_id in the > database), but the problem remains as to how to > construct a record set by applying the procedure to > each id in a set of ids returned, e.g., by SELECT > stock_id FROM stocks; > > Ted > -- Shane Ambler pgSQL@Sheeky.Biz Get Sheeky @ http://Sheeky.Biz
> Fie on you evil synthetic key lovers. Long live the Natural Key! think of an foreign key referencing a primary key with 6 columns *urgs* never!
On 21/11/2007, Sascha Bohnenkamp <asbohnenkamp@gmx.de> wrote:
The worse thing I meet is people who think primary keys need to be integer single field unique serial fields
I tend to agree that primary keys should be single fields if they need to be referenced but should also be natural if at all possible. ie use car number plates rather than some serial int.
Peter.
> Fie on you evil synthetic key lovers. Long live the Natural Key!
think of an foreign key referencing a primary key with 6 columns *urgs*
never!
The worse thing I meet is people who think primary keys need to be integer single field unique serial fields
I tend to agree that primary keys should be single fields if they need to be referenced but should also be natural if at all possible. ie use car number plates rather than some serial int.
Peter.
On Fri, Nov 23, 2007 at 09:33:13AM +0000, Peter Childs wrote: > I tend to agree that primary keys should be single fields if they need to be > referenced but should also be natural if at all possible. ie use car number > plates rather than some serial int. Car number plates are unique over time? I didn't think so... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
Attachment
On Fri, 2007-11-23 at 11:37 +0100, Martijn van Oosterhout wrote: > On Fri, Nov 23, 2007 at 09:33:13AM +0000, Peter Childs wrote: > > I tend to agree that primary keys should be single fields if they need to be > > referenced but should also be natural if at all possible. ie use car number > > plates rather than some serial int. > > Car number plates are unique over time? I didn't think so... I'm not sure how it is in the US, but here in Germany I just reused a car plate from the owner it had before me... so now the plate is uniquely associated at most with the car, not the owner... and I'm pretty sure that's not unique either. And what do you do when the things shift meaning in your natural key ? Cause that's a very common thing to happen to natural keys. And suddenly what was unique becomes not unique anymore... and the headaches begin... You're better off using synthetic keys for references between tables, and you can still keep your natural keys for lookup, just don't use them as unique join criteria, only search/filter criteria. Cheers, Csaba.
On Fri, Nov 23, 2007 at 12:00:18PM +0100, Csaba Nagy wrote: > I'm not sure how it is in the US, but here in Germany I just reused a > car plate from the owner it had before me... so now the plate is > uniquely associated at most with the car, not the owner... and I'm > pretty sure that's not unique either. > > And what do you do when the things shift meaning in your natural key ? > Cause that's a very common thing to happen to natural keys. And suddenly > what was unique becomes not unique anymore... and the headaches begin... > > You're better off using synthetic keys for references between tables, > and you can still keep your natural keys for lookup, just don't use them > as unique join criteria, only search/filter criteria. To me, that just confirms that using natural keys for tracking data outside the database is wrong. For the abstractions inside the database natural keys make a lot of sense. Sam
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/23/07 03:33, Peter Childs wrote: > > > On 21/11/2007, *Sascha Bohnenkamp* <asbohnenkamp@gmx.de > <mailto:asbohnenkamp@gmx.de>> wrote: > > > Fie on you evil synthetic key lovers. Long live the Natural Key! > > think of an foreign key referencing a primary key with 6 columns *urgs* > never! > > > > The worse thing I meet is people who think primary keys need to be > integer single field unique serial fields > > I tend to agree that primary keys should be single fields if they need > to be referenced but should also be natural if at all possible. ie use > car number plates rather than some serial int. I wouldn't trust plate number to be unique over time, since the format "ABC 123" only has a capacity of 17,576,000 vehicles. - -- Ron Johnson, Jr. Jefferson LA USA %SYSTEM-F-FISH, my hovercraft is full of eels -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHRvTAS9HxQb37XmcRAiNRAJ40NhYEE7tVlx1jaBLPbjsf2RSW4ACcCLfc QGvy8Jz9Nja0kfEJYEx9VV4= =VnCw -----END PGP SIGNATURE-----
Ron Johnson <ron.l.johnson@cox.net> writes: > On 11/23/07 03:33, Peter Childs wrote: >> The worse thing I meet is people who think primary keys need to be >> integer single field unique serial fields >> >> I tend to agree that primary keys should be single fields if they need >> to be referenced but should also be natural if at all possible. ie use >> car number plates rather than some serial int. > I wouldn't trust plate number to be unique over time, since the > format "ABC 123" only has a capacity of 17,576,000 vehicles. There's a worse problem than that: what if you find out that you mis-entered the value? Now you have to change the record's primary key in order to stay consistent with the real world. Which is exactly what a primary key should never have to do, at least in the opinion of those who recommend synthetic primary keys. regards, tom lane
On: Fri, 23 Nov 2007 09:33:13 +0000, "Peter Childs" <peterachilds@gmail.com> wrote: > The worse thing I meet is people who think primary keys need to be > integer single field unique serial fields > > I tend to agree that primary keys should be single fields if they > need to be referenced but should also be natural if at all possible. > ie use car number plates rather than some serial int. Why is this desire not better satisfied by an index rather than a key? Any key into a relation is, in the final analysis, an arbitrary value. Why is a data value considered intrinsically superior to a sequence? I am converting a system from HP TruboImage to PostgreSQL and the framework selected, Ruby on Rails, essentially depends upon the existence of an integer sequenced primary key for each row. Originally I had a deep distaste for the artificiality of integer keys but now I really do not consider them any more, or less, coercive than many other programming conventions. Regards, -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
On Nov 23, 2007, at 17:35 , James B. Byrne wrote: > Why is this desire not better satisfied by an index rather than a key? What's your distinction between an index and a key? For what it's worth, both UNIQUE and PRIMARY KEY constrain a column (or set of columns) to be unique, or a key in the logical sense. Uniqueness is implemented in terms of btree indexes in PostgreSQL. It seems to me you're conflating logical (key) and physical (index) concepts here. Michael Glaesemann grzm seespotcode net
Martijn van Oosterhout wrote: > On Fri, Nov 23, 2007 at 09:33:13AM +0000, Peter Childs wrote: > >> I tend to agree that primary keys should be single fields if they need to be >> referenced but should also be natural if at all possible. ie use car number >> plates rather than some serial int. >> > > Car number plates are unique over time? I didn't think so... > It's worse than that. If we presume that the plate is a key to a vehicle, then we immediately run into problems as a vehicle can, over time, have several plates (lost, stolen, changed to vanity...) and a plate can belong, sequentially, to several vehicles (especially when vanity plates are transferred to new cars). And when you have your char(6) plate-number column, they run out of numbers and switch to 7-characters requiring changes to all tables that used the plate as a key. Or you realize that ABC123 could be ABC123-California, ABC123-Nevada or ABC123-New York (I'm assuming that AAA999 is a valid format in those states). Although I haven't seen it much, recently, semi-trucks used to regularly have with numerous plates - one for each state in which they operated. And some states such as Texas allow you to have the same amateur-radio plate number on multiple vehicles. I won't argue that there are no reasonable natural keys. But I have sure seen plenty of cases where what appeared to be a natural key was discovered, generally at a very inopportune time in the development process, to be not-so-natural after all. Cheers, Steve
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, 26 Nov 2007 10:11:37 -0800 Steve Crawford <scrawford@pinpointresearch.com> wrote: > Although I haven't seen it much, recently, semi-trucks used to > regularly have with numerous plates - one for each state in which > they operated. And some states such as Texas allow you to have the > same amateur-radio plate number on multiple vehicles. > > I won't argue that there are no reasonable natural keys. But I have > sure seen plenty of cases where what appeared to be a natural key was > discovered, generally at a very inopportune time in the development > process, to be not-so-natural after all. In "theory" the item that would be a natural key in this instance is the VIN. You would of course have to make some kind of allowance for cars that don't have a VIN (nothing in the last what... 50 years?). Joshua D. Drake > > Cheers, > Steve > > ---------------------------(end of > broadcast)--------------------------- TIP 5: don't forget to increase > your free space map settings > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHSw4xATb/zqfZUUQRAgNgAJ9y3duugBxPeXdnyyIGycZwRGLrowCePzqo v54FYcWDsMTnDcBzix+MvOM= =1leN -----END PGP SIGNATURE-----
--- On Mon, 11/26/07, Joshua D. Drake <jd@commandprompt.com> wrote: > In "theory" the item that would be a natural key > in this instance is the VIN. You would of course have > to make some kind of allowance for cars that don't > have a VIN (nothing in the last what... > 50 years?). So this is why the service stations always record my cars VIN number when I show up for oil changes. ;) Ofcourse, thereis a whole industry built around auto theft where they restamp the stolen car with a differnt vin number. I wonderif these stolen cars end up with duplicated VIN numbers or if the VIN's they are given do not pass the the VIN check-sum(if such a think exists). Regards, Richard Broersma Jr.
> -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of > Richard Broersma Jr > Sent: Monday, November 26, 2007 10:28 AM > To: Joshua D. Drake > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Primary Key > > --- On Mon, 11/26/07, Joshua D. Drake <jd@commandprompt.com> wrote: > > > In "theory" the item that would be a natural key in this > instance is > > the VIN. You would of course have to make some kind of > allowance for > > cars that don't have a VIN (nothing in the last what... > > 50 years?). > > So this is why the service stations always record my cars VIN > number when I show up for oil changes. ;) Ofcourse, there is > a whole industry built around auto theft where they restamp > the stolen car with a differnt vin number. I wonder if these > stolen cars end up with duplicated VIN numbers or if the > VIN's they are given do not pass the the VIN check-sum (if > such a think exists). > > Regards, > Richard Broersma Jr. > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org > so that your > message can get through to the mailing list cleanly > VIN encoding is covered here http://en.wikipedia.org/wiki/Vehicle_Identification_Number Looks like a poor choice for a primary key: too many confliciting, "meaningful", evolving-over-time digits that can be mis-interepretedby your customers.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, 26 Nov 2007 10:28:03 -0800 (PST) Richard Broersma Jr <rabroersma@yahoo.com> wrote: > --- On Mon, 11/26/07, Joshua D. Drake <jd@commandprompt.com> wrote: > > > In "theory" the item that would be a natural key > > in this instance is the VIN. You would of course have > > to make some kind of allowance for cars that don't > > have a VIN (nothing in the last what... > > 50 years?). > > So this is why the service stations always record my cars VIN number > when I show up for oil changes. ;) Ofcourse, there is a whole > industry built around auto theft where they restamp the stolen car > with a differnt vin number. I wonder if these stolen cars end up > with duplicated VIN numbers or if the VIN's they are given do not > pass the the VIN check-sum (if such a think exists). They may end up with duplicate VINs but that isn't really relevant as at that point we know that one of the two (or six or whatever) cars are invalid, thus the "natural" key is still valid and designed to alert us if there is any potential problem :) Sincerely, Joshua D. Drake > > Regards, > Richard Broersma Jr. > > ---------------------------(end of > broadcast)--------------------------- TIP 1: if posting/reading > through Usenet, please send an appropriate subscribe-nomail command > to majordomo@postgresql.org so that your message can get through to > the mailing list cleanly > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHSxTtATb/zqfZUUQRAoDbAJ9h+NcHeyj7b1dmFAXl5uSWUbHzJwCfVOSW vBFs5lpCXrgeCwOu0wud9S8= =9AfY -----END PGP SIGNATURE-----
> It's worse than that. It's even worse than that. Decades ago, Florida used to issue multiple plates with the same number, differentiated by color. There are other cases of states having multiple types of license plates, with overlapping numbers. -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
Joshua D. Drake wrote: > In "theory" the item that would be a natural key in this instance is > the VIN. You would of course have to make some kind of allowance for > cars that don't have a VIN (nothing in the last what... 50 years?). And some kind of allowance for Title 49, Sec. 565.4, subsection (d): "The VINs of any two vehicles manufactured within a 30-year period shall not be identical." After 30 years, all bets are off. And the manufacturers have been tinkering with the VIN due to the inadequacies of the VIN number. Since VINs were first standardized in 1980, does this mean we are approaching a Y2010 problem? I'm sure someone has defined a "vehicle", but I don't know what number applies when you've pieced together a rebuilt engine, salvaged transmission, junkyard hood and so-on to get a working car. I think custom builders end up applying for a new VIN but I don't think that applies to repairs. Of course defining when a vehicle becomes a different vehicle is a problem that remains regardless of choice of key. VINs are apparently not required on directly imported vehicles (ie. those imported by other than the manufacturer). The above applies to the US. I'm sure the laws of other countries vary. Cheers, Steve
On Nov 26, 2007 1:30 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > I'm sure someone has defined a "vehicle", but I don't know what number > applies when you've pieced together a rebuilt engine, salvaged > transmission, junkyard hood and so-on to get a working car. I think > custom builders end up applying for a new VIN but I don't think that > applies to repairs. Of course defining when a vehicle becomes a > different vehicle is a problem that remains regardless of choice of key. There's a small car building company in the west that buys the old VINs / chassis to Shelby 350 / 500GTs and builds whole new cars with those VINs. Course, the older ones are in the junkyard by then.
Joshua D. Drake wrote: > On Mon, 26 Nov 2007 10:28:03 -0800 (PST) > Richard Broersma Jr <rabroersma@yahoo.com> wrote: >> --- On Mon, 11/26/07, Joshua D. Drake <jd@commandprompt.com> wrote: >>> In "theory" the item that would be a natural key >>> in this instance is the VIN. And you then need to deal with cars that have mismatched Body VIN numbers and Engine VIN numbers? It's not uncommon on older cars or crash damaged cars.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/26/07 12:11, Steve Crawford wrote: [snip] > > If we presume that the plate is a key to a vehicle, then we immediately > run into problems as a vehicle can, over time, have several plates > (lost, stolen, changed to vanity...) and a plate can belong, > sequentially, to several vehicles (especially when vanity plates are > transferred to new cars). > > And when you have your char(6) plate-number column, they run out of > numbers and switch to 7-characters requiring changes to all tables that > used the plate as a key. Or you realize that ABC123 could be > ABC123-California, ABC123-Nevada or ABC123-New York (I'm assuming that > AAA999 is a valid format in those states). We use this as a *non*-unique index: PLATE_NUMBER CHAR(10) PLATE_STATE CHAR(2) PLATE_COUNTRY CHAR(4) The country field could be dropped off and Canada/USA differentiated by the state/province code, but with NAFTA it's possible that Mexican plates will turn up "soon", and there's always the off chance that a European car will show up. (We used to have PLATE_STATE first, but then discovered how many northeasterners don't know what state their vehicle is registered in.) - -- Ron Johnson, Jr. Jefferson LA USA %SYSTEM-F-FISH, my hovercraft is full of eels -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHSzxlS9HxQb37XmcRAh0QAKCLp5aNkoPPs8P5oXQCJ0HI28MNuACeKtFH eECn8XRwrjOqonUuDr8DDH8= =cYiG -----END PGP SIGNATURE-----
On Nov 26, 2007 1:11 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > It's worse than that. > > If we presume that the plate is a key to a vehicle, then we immediately > run into problems as a vehicle can, over time, have several plates > (lost, stolen, changed to vanity...) and a plate can belong, > sequentially, to several vehicles (especially when vanity plates are > transferred to new cars). > > And when you have your char(6) plate-number column, they run out of > numbers and switch to 7-characters requiring changes to all tables that > used the plate as a key. Or you realize that ABC123 could be > ABC123-California, ABC123-Nevada or ABC123-New York (I'm assuming that > AAA999 is a valid format in those states). > > Although I haven't seen it much, recently, semi-trucks used to regularly > have with numerous plates - one for each state in which they operated. > And some states such as Texas allow you to have the same amateur-radio > plate number on multiple vehicles. > > I won't argue that there are no reasonable natural keys. But I have sure > seen plenty of cases where what appeared to be a natural key was > discovered, generally at a very inopportune time in the development > process, to be not-so-natural after all. if you miss the key and blow it, you fix it. yes, there are tons of examples of this particular number not exactly lining up with something, like a person, vehicle, etc. of course this all means that the number in question is simply not enough information by itself, and so is either a partial definition or defines something else. i will concede that changing a key across 10 tables is easier than redefining a constraint on one table. this is why the compromise mentioned way upthread by josh drake (namely, to define the natural but use surrogate for joining) is good in certain cases like this, especially when you have a complex key that is used in many tables. the problem is that, because surrogates allow skipping the problem without defining a proper key at all, the vague data relationships you mention never get properly defined in the database and end up being caught in code or by the user because the id is trusted to express the relationship when in fact it doesn't. this causes much worse problems than redefining keys by the way, and helps create the messy databases that those of us who know how to do things both ways complain about. in other words, if you create tables by defining the id p-key, throwing a bunch of fields on it that approximately describe the item, plus maybe some indexes for performance, you have already loaded the gun to shoot yourself in the foot. many of the safeguards the database can provide in keeping your data organized have been removed... merlin