Thread: Questions about my strategy
I am writing a stock system, and am looking for either a confirmation of my plan or a "No, don't do that!" before I get in too deep.. I have a table, called stock_transactions. It has primary key and timestamp set on insert. It has these columns: Count, product_key, branch_key, size_key, colour_key and type_key. The keys obviously are used to link to other tables, for example the product table has a description, the type key is a transaction type etc. The idea is that when I want toknow how many items at a branch I select all rows matching product_key and branch_key then SUM(count) the count column. The problem as I see it is after a length of time the table will reach a large size and then getting counts of stock on hand will become quite slow, so I plan to have a stock take date in the system, and then limit it to all rows after the last stocktake. This is guess will require an index on teh timestamp column. Does any one see a problem here or does it all sound ok? Thanks -- * * Rob Brown-Bayliss *
On Mon, 2002-07-29 at 22:49, Rob Brown-Bayliss wrote: > The problem as I see it is after a length of time the table will reach a > large size and then getting counts of stock on hand will become quite > slow, so I plan to have a stock take date in the system, and then limit > it to all rows after the last stocktake. > > This is guess will require an index on teh timestamp column. > Bear in mind that the PostgreSQL query optimiser is unlikely to use an index if more than a few percent of the rows will be returned. If the tuning parameters are set correctly, the index lookup should kick in only when it would be faster[*]. If your stocktakes are equally distributed amongst your transactions, then I suspect there would have to be about 30 stocktakes (i.e. transactions partioned into about 30 sets) before the index would be valuable. How frequent are stocktake entries going to be compared to transactions? Of course, the only harm in creating an index is that it will slow inserts down slightly. There have also been some suggestions that the default btree index implementation in PG is not so efficient with continuously increasing keys e.g. timestamps. Also, I'm sure there are more knowledgable folks round here who may have deeper insights that they can offer. Regards John [*] Sequential scan is faster than index scan for large setsb being returned because a) it exploits the readahead behaviour of your OS and b) the tuple visibility information is stored in the heap (the main table) and has to be looked up anyway (though this won't really cause a problem if you rarely/never UPDATE the table) and c) when seen via the index, the heap will also be in random order. Apologies if you knew this all already. -- John Gray Azuli IT www.azuli.co.uk
On Tue, 2002-07-30 at 10:19, John Gray wrote: > Bear in mind that the PostgreSQL query optimiser is unlikely to use an > index if more than a few percent of the rows will be returned. If the > tuning parameters are set correctly, the index lookup should kick in > only when it would be faster[*]. If your stocktakes are equally > distributed amongst your transactions, then I suspect there would have > to be about 30 stocktakes (i.e. transactions partioned into about 30 > sets) before the index would be valuable. How frequent are stocktake > entries going to be compared to transactions? Basically they wont, after the stock has been counted a single transaction will be entered for each product, the count column being a correction, ie count = -4 if there are 4 items less tha nthere should be, 3 if more or 0 if the same. then the stocktake timestamp will be created and from then when looking up how many items in stock I will ignore all transactions older than the stocktake timestamp. > Of course, the only harm in creating an index is that it will slow > inserts down slightly. There have also been some suggestions that the > default btree index implementation in PG is not so efficient with > continuously increasing keys e.g. timestamps. I would like to hear more about index types, also what about clustering, I havn't readinto it yet, but understand it orders a table by the index, but then items inserted with a time stamp of the inestion will be automaitacally cluseterd by their time stamp wont they? (assuming I neve delete or update any rows) -- * * Rob Brown-Bayliss *
On Tue, 2002-07-30 at 13:42, Chris Albertson wrote: > 2) The clasic design for an inventory system keeps a count > of the widgets of each type on hand so you don't actually need > to compute (count(widgets_bought)-count(widgets_sold)) to > know how many are left. I think it is OK to keep some infomation > pre-computed if it is needed frequently. Recovering the count > from a datestamped transaction log seems like a lot of work. The reason I thought this might be the way top go is the complaint from the customer about every other system he has looked at. The business is shoe retail, so he buys a shoe, say it's called "Trek" made by "Joes Shoe makers". When He buys 400 of these, they might be 200 white, 100 black, 100 red, but also spread accross 10 different sizes. On other systems he has had to have aproduct code for each possible combo, one for wihte size 35, one for white size 35.5 etc. Then he cant get a count of how many "treks" he has in stock. I fuggured this way I can give a simple "389 total treks in stock" answer, or a "137 white treks" or a "24 white size 35 treks". As you say it's quite fast, I have entered 25,000 random transactions and getting a query run in 0.21 seconds (from a python interface). It seems like a good plan, but I am not experienced in these things... -- * * Rob Brown-Bayliss *
OK, you've re-discovered another rule of thumb: If the queries are unpredictable you can't keep them precomputed. So if you want to answer ad hoc queries like "How many Brown, men's Trecks sold in the last two weeks" and then ask "how many whites in 10 1/2" Your way might work. The more traditional method would be to model the physical objects. and track how many white size 8s you have then use something like sum(QuantityOnHand) ... where size in (10, 10.5, 11)...; Or sum(QuantityOnHand) ... where maker like 'Shoeworld' ... ; --- Rob Brown-Bayliss <rob@zoism.org> wrote: > On Tue, 2002-07-30 at 13:42, Chris Albertson wrote: > > > 2) The clasic design for an inventory system keeps a count > > of the widgets of each type on hand so you don't actually need > > to compute (count(widgets_bought)-count(widgets_sold)) to > > know how many are left. I think it is OK to keep some infomation > > pre-computed if it is needed frequently. Recovering the count > > from a datestamped transaction log seems like a lot of work. > > The reason I thought this might be the way top go is the complaint > from > the customer about every other system he has looked at. > > The business is shoe retail, so he buys a shoe, say it's called > "Trek" > made by "Joes Shoe makers". When He buys 400 of these, they might be > 200 white, 100 black, 100 red, but also spread accross 10 different > sizes. > > On other systems he has had to have aproduct code for each possible > combo, one for wihte size 35, one for white size 35.5 etc. > > Then he cant get a count of how many "treks" he has in stock. I > fuggured this way I can give a simple "389 total treks in stock" > answer, > or a "137 white treks" or a "24 white size 35 treks". > > As you say it's quite fast, I have entered 25,000 random transactions > and getting a query run in 0.21 seconds (from a python interface). > > It seems like a good plan, but I am not experienced in these > things... > > > -- > > * > * Rob Brown-Bayliss > * > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org ===== Chris Albertson Home: 310-376-1029 chrisalbertson90278@yahoo.com Cell: 310-990-7550 Office: 310-336-5189 Christopher.J.Albertson@aero.org __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
Just a few comments 1) For your purposes a million or so rows is not "a lot" and queries will still be fast. 2) The clasic design for an inventory system keeps a count of the widgets of each type on hand so you don't actually need to compute (count(widgets_bought)-count(widgets_sold)) to know how many are left. I think it is OK to keep some infomation pre-computed if it is needed frequently. Recovering the count from a datestamped transaction log seems like a lot of work. --- Rob Brown-Bayliss <rob@zoism.org> wrote: > > I am writing a stock system, and am looking for either a confirmation > of > my plan or a "No, don't do that!" before I get in too deep.. > > I have a table, called stock_transactions. > > It has primary key and timestamp set on insert. It has these columns: > > Count, product_key, branch_key, size_key, colour_key and type_key. > > The keys obviously are used to link to other tables, for example the > product table has a description, the type key is a transaction type > etc. > > The idea is that when I want toknow how many items at a branch I > select > all rows matching product_key and branch_key then SUM(count) the > count > column. > > The problem as I see it is after a length of time the table will > reach a > large size and then getting counts of stock on hand will become quite > slow, so I plan to have a stock take date in the system, and then > limit > it to all rows after the last stocktake. > > This is guess will require an index on teh timestamp column. > > Does any one see a problem here or does it all sound ok? > > Thanks > ===== Chris Albertson Home: 310-376-1029 chrisalbertson90278@yahoo.com Cell: 310-990-7550 Office: 310-336-5189 Christopher.J.Albertson@aero.org __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
On Tue, Jul 30, 2002 at 01:58:33PM +1200, Rob Brown-Bayliss wrote: > The business is shoe retail, so he buys a shoe, say it's called "Trek" > made by "Joes Shoe makers". When He buys 400 of these, they might be > 200 white, 100 black, 100 red, but also spread accross 10 different > sizes. > > On other systems he has had to have aproduct code for each possible > combo, one for wihte size 35, one for white size 35.5 etc. > > Then he cant get a count of how many "treks" he has in stock. I > fuggured this way I can give a simple "389 total treks in stock" answer, > or a "137 white treks" or a "24 white size 35 treks". Why not three codes: model, colour, and size? Then you can query all by model, and limit by size, colour, or both. I think your strategy will work, but it has the potential to degrade seriously over time, or else to require some sort of regular maintenance. Nothing wrong with that, of course, just a potential pitfall. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
On Wed, 2002-07-31 at 02:34, Andrew Sullivan wrote: > Why not three codes: model, colour, and size? Then you can query all > by model, and limit by size, colour, or both. How do you mean three codes? If you mean having more than one product code for a type of shoe thats what I am trying to avoid. I worked for acompany that did that a long time ago, the product was steel, and having 15 or more different lengths of a steel chanl meant having 15 product codes. It was often easier to walk out to the store and look for the product than to query each product code at the counter... > I think your strategy will work, but it has the potential to degrade > seriously over time, or else to require some sort of regular > maintenance. Nothing wrong with that, of course, just a potential > pitfall. One thing I am thinking is having a duplicate table called history_transactions, and at a stock take move all the transaction rows into the history table. That way we have the total history, but ding a search for products in stock is useing a smaler table. Taht said, maybe do the history thing at the end of each financial year rather then each month or quater depending on how many transactions they are getting (at the moment it's less than 50 per store per day, so less 400 a day on a good day, but other potential customes could well be busier) -- * * Rob Brown-Bayliss *
You are confusing database design with user interfece design. Making a user enter any kind of code is stupid user interface period. Ifthe user is looking for size 11 1/2 shoues he should nothave to remember to enter SizeCode='23W-Z47' He shoul be able to either select from a drop down menue or enter any of "10 1/2", "10.5", or even "10.3 .. 10 3/4" Here is an example of an inventory search page that is even a bit more complex than the shoe shor example. They have tens of thousands of one of a kind items but still searching by ad-hoc catagory is easy. Try it http://www.yachtworld.com/listing/yw_advanced_search_form.jsp See, you can find boats built by "catalina" that are between 30 and 40 feet long that are currently in the UK. Now if you want to storecodes in your DBMS fine. but just don't show them to the user Use a translation table called maybe code2human if you ned to save space. --- Rob Brown-Bayliss <rob@zoism.org> wrote: > On Wed, 2002-07-31 at 02:34, Andrew Sullivan wrote: > > Why not three codes: model, colour, and size? Then you can query > all > > by model, and limit by size, colour, or both. > > How do you mean three codes? > > If you mean having more than one product code for a type of shoe > thats > what I am trying to avoid. I worked for acompany that did that a > long > time ago, the product was steel, and having 15 or more different > lengths > of a steel chanl meant having 15 product codes. It was often easier > to > walk out to the store and look for the product than to query each > product code at the counter... > > > > I think your strategy will work, but it has the potential to > degrade > > seriously over time, or else to require some sort of regular > > maintenance. Nothing wrong with that, of course, just a potential > > pitfall. > > > One thing I am thinking is having a duplicate table called > history_transactions, and at a stock take move all the transaction > rows > into the history table. That way we have the total history, but ding > a > search for products in stock is useing a smaler table. > > Taht said, maybe do the history thing at the end of each financial > year > rather then each month or quater depending on how many transactions > they > are getting (at the moment it's less than 50 per store per day, so > less > 400 a day on a good day, but other potential customes could well be > busier) > -- > > * > * Rob Brown-Bayliss > * > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html ===== Chris Albertson Home: 310-376-1029 chrisalbertson90278@yahoo.com Cell: 310-990-7550 Office: 310-336-5189 Christopher.J.Albertson@aero.org __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
On Wed, 2002-07-31 at 11:10, Chris Albertson wrote: > You are confusing database design with user interfece design. > > Making a user enter any kind of code is stupid user interface > period. Ifthe user is looking for size 11 1/2 shoues he should > nothave to remember to enter SizeCode='23W-Z47' He shoul be > able to either select from a drop down menue or enter any of > "10 1/2", "10.5", or even "10.3 .. 10 3/4" The query interface will use dropdowns and some autocompletion (like tabs to complete part names etc) Sales of course by scanning a bar code... I am only interested in t\ke storage of the data and would like to know what Andrew meant by 3 codes. -- * * Rob Brown-Bayliss *
On 31 Jul 2002, Rob Brown-Bayliss wrote: > On Wed, 2002-07-31 at 02:34, Andrew Sullivan wrote: > > Why not three codes: model, colour, and size? Then you can query all > > by model, and limit by size, colour, or both. > > How do you mean three codes? > > If you mean having more than one product code for a type of shoe thats > what I am trying to avoid. I worked for acompany that did that a long > time ago, the product was steel, and having 15 or more different lengths > of a steel chanl meant having 15 product codes. It was often easier to > walk out to the store and look for the product than to query each > product code at the counter... What Andrew is saying, I believe, is that you need to determine what defines a shoe. What defines a shoe is it's model, it's size and it's colour. Therefore you absolutely need a unique code to describe each variant. Either that or you say the combination of (model, size, colour) is the unique code. Personally one of the two designs I immediately thought of (the second one actually but the first has more relations) that I would therefore consider strongly goes something like: shoe_model: mid primary key, name, description, manufacturer, ... shoe_stock: mid references shoe_model(mid), colour, size, count That structure gives you everything you want whether count is a difference or an absolute, at least I think it does. I believe it's also more of a correct solution than the convoluted way you seem to be thinking. Some database expert will now correct me on those points. :) -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
On Wed, Jul 31, 2002 at 10:46:24AM +1200, Rob Brown-Bayliss wrote: > On Wed, 2002-07-31 at 02:34, Andrew Sullivan wrote: > > Why not three codes: model, colour, and size? Then you can query all > > by model, and limit by size, colour, or both. > > How do you mean three codes? > > If you mean having more than one product code for a type of shoe thats > what I am trying to avoid. Nope. A code for the product, and then other (optional) codes for size and colour. So, you can query by maker, and by shoe type, and by size, and by colour: Show me all the things by bigcorp in style hotnow that are red and size ten is something like SELECT DISTINCT (shoename) FROM shoecodes WHERE maker = 'bigcorp' AND shoename = 'hotnow' AND size = 10 AND colour = 'red'; That's a crappy example, but you get the picture? This way you can also get all the shoes in size ten, all the red shoes, or all the shoes by so-and-so, or whatever. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
On Wed, 2002-07-31 at 22:47, Nigel J. Andrews wrote: Now you have me confused and paranoid, but I learnt something new so thats good (the constraint when creating a table). > That structure gives you everything you want whether count is a difference or > an absolute, at least I think it does. I believe it's also more of a > correct solution than the convoluted way you seem to be thinking. Some database > expert will now correct me on those points. :) What I am doing is I thought what you just presented, from pg_dump I have: CREATE TABLE "stock_products" ( "loc_seq_pkey" text DEFAULT set_unique_key() NOT NULL, "timestamp" timestamp with time zone DEFAULT 'now()', "version" integer DEFAULT 0, "f_new" character varying, "f_update" character varying, "product" character varying, "category_key" text, "brand_key" text, "sizegrp_key" text, "code" character varying, "biz_key" text, "value" numeric(12,2), Constraint "stock_products_pkey" Primary Key ("loc_seq_pkey") ); CREATE TABLE "stock_transactions" ( "loc_seq_pkey" text DEFAULT set_unique_key() NOT NULL, "timestamp" timestamp with time zone DEFAULT 'now()', "version" integer DEFAULT 0, "f_new" character varying, "f_update" character varying, "product_key" text, "branch_key" text, "size_key" text, "colour_key" text, "transaction_key" text, "type_key" text, "count" integer, "value" numeric(12,2), Constraint "stock_transactions_pkey" Primary Key ("loc_seq_pkey") ); all the *_key fields link to other tables, I do this so that I can easily add colours etc... product_key likes to the stock_products table. count is the number of titems in the transaction, negative for stock leaving, pos for incomming, type_key is type of movement, eg: sale or transfer to another branch etc. Value is the value of the transaction at that point in time. I keep this as next month the shos might have cost the store more than this month and is taken from teh stock_products table. Transaction_key is so that I can group them togeather, ie: a link to a stock_order table (ties in with type,ie if type is ionwards then links to orders table). -- * * Rob Brown-Bayliss *