Thread: Questions about my strategy

Questions about my strategy

From
Rob Brown-Bayliss
Date:
I am writing a stock system, and am looking for either a confirmation of
my plan or a "No, don't do that!" before I get in too deep..

I have a table, called stock_transactions.

It has primary key and timestamp set on insert. It has these columns:
Count, product_key, branch_key, size_key, colour_key and type_key.

The keys obviously are used to link to other tables, for example the
product table has a description, the type key is a transaction type etc.

The idea is that when I want toknow how many items at a branch I select
all rows matching product_key and branch_key then SUM(count) the count
column.

The problem as I see it is after a length of time the table will reach a
large size and then getting counts of stock on hand will become quite
slow, so I plan to have a stock take date in the system, and then limit
it to all rows after the last stocktake.

This is guess will require an index on teh timestamp column.

Does any one see a problem here or does it all sound ok?

Thanks



--

*
*  Rob Brown-Bayliss
*

Re: Questions about my strategy

From
John Gray
Date:
On Mon, 2002-07-29 at 22:49, Rob Brown-Bayliss wrote:

> The problem as I see it is after a length of time the table will reach a
> large size and then getting counts of stock on hand will become quite
> slow, so I plan to have a stock take date in the system, and then limit
> it to all rows after the last stocktake.
>
> This is guess will require an index on teh timestamp column.
>

Bear in mind that the PostgreSQL query optimiser is unlikely to use an
index if more than a few percent of the rows will be returned. If the
tuning parameters are set correctly, the index lookup should kick in
only when it would be faster[*]. If your stocktakes are equally
distributed amongst your transactions, then I suspect there would have
to be about 30 stocktakes (i.e. transactions partioned into about 30
sets) before the index would be valuable. How frequent are stocktake
entries going to be compared to transactions?

Of course, the only harm in creating an index is that it will slow
inserts down slightly. There have also been some suggestions that the
default btree index implementation in PG is not so efficient with
continuously increasing keys e.g. timestamps.

Also, I'm sure there are more knowledgable folks round here who may have
deeper insights that they can offer.

Regards

John

[*] Sequential scan is faster than index scan for large setsb being
returned because a) it exploits the readahead behaviour of your OS and
b) the tuple visibility information is stored in the heap (the main
table) and has to be looked up anyway (though this won't really cause a
problem if you rarely/never UPDATE the table) and c) when seen via the
index, the heap will also be in random order. Apologies if you knew this
all already.

--
John Gray
Azuli IT
www.azuli.co.uk



Re: Questions about my strategy

From
Rob Brown-Bayliss
Date:
On Tue, 2002-07-30 at 10:19, John Gray wrote:

> Bear in mind that the PostgreSQL query optimiser is unlikely to use an
> index if more than a few percent of the rows will be returned. If the
> tuning parameters are set correctly, the index lookup should kick in
> only when it would be faster[*]. If your stocktakes are equally
> distributed amongst your transactions, then I suspect there would have
> to be about 30 stocktakes (i.e. transactions partioned into about 30
> sets) before the index would be valuable. How frequent are stocktake
> entries going to be compared to transactions?

Basically they wont, after the stock has been counted a single
transaction will be entered for each product, the count column being a
correction, ie count = -4 if there are 4 items less tha nthere should
be, 3 if more or 0 if the same.  then the stocktake timestamp will be
created and from then when looking up how many items in stock I will
ignore all transactions older than the stocktake timestamp.

> Of course, the only harm in creating an index is that it will slow
> inserts down slightly. There have also been some suggestions that the
> default btree index implementation in PG is not so efficient with
> continuously increasing keys e.g. timestamps.

I would like to hear more about index types, also what about clustering,
I havn't readinto it yet, but understand it orders a table by the index,
but then items inserted with a time stamp of the inestion will be
automaitacally cluseterd by their time stamp wont they? (assuming I neve
delete or update any rows)

--

*
*  Rob Brown-Bayliss
*

Re: Questions about my strategy

From
Rob Brown-Bayliss
Date:
On Tue, 2002-07-30 at 13:42, Chris Albertson wrote:

> 2) The clasic design for an inventory system keeps a count
> of the widgets of each type on hand so you don't actually need
> to compute (count(widgets_bought)-count(widgets_sold)) to
> know how many are left.  I think it is OK to keep some infomation
> pre-computed if it is needed frequently.  Recovering the count
> from a datestamped transaction log seems like a lot of work.

The reason I thought this might be the way top go is the complaint from
the customer about every other system he has looked at.

The business is shoe retail, so he buys a shoe, say it's called "Trek"
made by "Joes Shoe makers".  When He buys 400 of these, they might be
200 white, 100 black, 100 red, but also spread accross 10 different
sizes.

On other systems he has had to have aproduct code for each possible
combo, one for wihte  size 35, one for white size 35.5 etc.

Then he cant get a count of how many "treks" he has in stock.  I
fuggured this way I can give a simple "389 total treks in stock" answer,
or a "137 white treks" or a "24 white size 35 treks".

As you say it's quite fast, I have entered 25,000 random transactions
and getting a query run in 0.21 seconds (from a python interface).

It seems like a good plan, but I am not experienced in these things...


--

*
*  Rob Brown-Bayliss
*

Re: Questions about my strategy

From
Chris Albertson
Date:
OK, you've re-discovered another rule of thumb:
If the queries are unpredictable you can't keep them
precomputed.  So if you want to answer ad hoc queries
like "How many Brown, men's Trecks sold in the last two
weeks" and then ask "how many whites in 10 1/2"
Your way might work.

The more traditional method would be to model the physical
objects.  and track how many white size 8s you have then
use something like sum(QuantityOnHand) ... where
size in (10, 10.5, 11)...;

Or sum(QuantityOnHand) ... where maker like 'Shoeworld' ... ;

--- Rob Brown-Bayliss <rob@zoism.org> wrote:
> On Tue, 2002-07-30 at 13:42, Chris Albertson wrote:
>
> > 2) The clasic design for an inventory system keeps a count
> > of the widgets of each type on hand so you don't actually need
> > to compute (count(widgets_bought)-count(widgets_sold)) to
> > know how many are left.  I think it is OK to keep some infomation
> > pre-computed if it is needed frequently.  Recovering the count
> > from a datestamped transaction log seems like a lot of work.
>
> The reason I thought this might be the way top go is the complaint
> from
> the customer about every other system he has looked at.
>
> The business is shoe retail, so he buys a shoe, say it's called
> "Trek"
> made by "Joes Shoe makers".  When He buys 400 of these, they might be
> 200 white, 100 black, 100 red, but also spread accross 10 different
> sizes.
>
> On other systems he has had to have aproduct code for each possible
> combo, one for wihte  size 35, one for white size 35.5 etc.
>
> Then he cant get a count of how many "treks" he has in stock.  I
> fuggured this way I can give a simple "389 total treks in stock"
> answer,
> or a "137 white treks" or a "24 white size 35 treks".
>
> As you say it's quite fast, I have entered 25,000 random transactions
> and getting a query run in 0.21 seconds (from a python interface).
>
> It seems like a good plan, but I am not experienced in these
> things...
>
>
> --
>
> *
> *  Rob Brown-Bayliss
> *
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org


=====
Chris Albertson
  Home:   310-376-1029  chrisalbertson90278@yahoo.com
  Cell:   310-990-7550
  Office: 310-336-5189  Christopher.J.Albertson@aero.org

__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

Re: Questions about my strategy

From
Chris Albertson
Date:
Just a few comments

1) For your purposes a million or so rows is not "a lot" and
queries will still be fast.

2) The clasic design for an inventory system keeps a count
of the widgets of each type on hand so you don't actually need
to compute (count(widgets_bought)-count(widgets_sold)) to
know how many are left.  I think it is OK to keep some infomation
pre-computed if it is needed frequently.  Recovering the count
from a datestamped transaction log seems like a lot of work.


--- Rob Brown-Bayliss <rob@zoism.org> wrote:
>
> I am writing a stock system, and am looking for either a confirmation
> of
> my plan or a "No, don't do that!" before I get in too deep..
>
> I have a table, called stock_transactions.
>
> It has primary key and timestamp set on insert. It has these columns:
>
> Count, product_key, branch_key, size_key, colour_key and type_key.
>
> The keys obviously are used to link to other tables, for example the
> product table has a description, the type key is a transaction type
> etc.
>
> The idea is that when I want toknow how many items at a branch I
> select
> all rows matching product_key and branch_key then SUM(count) the
> count
> column.
>
> The problem as I see it is after a length of time the table will
> reach a
> large size and then getting counts of stock on hand will become quite
> slow, so I plan to have a stock take date in the system, and then
> limit
> it to all rows after the last stocktake.
>
> This is guess will require an index on teh timestamp column.
>
> Does any one see a problem here or does it all sound ok?
>
> Thanks
>

=====
Chris Albertson
  Home:   310-376-1029  chrisalbertson90278@yahoo.com
  Cell:   310-990-7550
  Office: 310-336-5189  Christopher.J.Albertson@aero.org

__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

Re: Questions about my strategy

From
Andrew Sullivan
Date:
On Tue, Jul 30, 2002 at 01:58:33PM +1200, Rob Brown-Bayliss wrote:
> The business is shoe retail, so he buys a shoe, say it's called "Trek"
> made by "Joes Shoe makers".  When He buys 400 of these, they might be
> 200 white, 100 black, 100 red, but also spread accross 10 different
> sizes.
>
> On other systems he has had to have aproduct code for each possible
> combo, one for wihte  size 35, one for white size 35.5 etc.
>
> Then he cant get a count of how many "treks" he has in stock.  I
> fuggured this way I can give a simple "389 total treks in stock" answer,
> or a "137 white treks" or a "24 white size 35 treks".

Why not three codes: model, colour, and size?  Then you can query all
by model, and limit by size, colour, or both.

I think your strategy will work, but it has the potential to degrade
seriously over time, or else to require some sort of regular
maintenance.  Nothing wrong with that, of course, just a potential
pitfall.

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110


Re: Questions about my strategy

From
Rob Brown-Bayliss
Date:
On Wed, 2002-07-31 at 02:34, Andrew Sullivan wrote:
> Why not three codes: model, colour, and size?  Then you can query all
> by model, and limit by size, colour, or both.

How do you mean three codes?

If you mean having more than one product code for a type of shoe thats
what I am trying to avoid.  I worked for acompany that did that a long
time ago, the product was steel, and having 15 or more different lengths
of a steel chanl meant having 15 product codes.  It was often easier to
walk out to the store and look for the product than to query each
product code at the counter...


> I think your strategy will work, but it has the potential to degrade
> seriously over time, or else to require some sort of regular
> maintenance.  Nothing wrong with that, of course, just a potential
> pitfall.


One thing I am thinking is having a duplicate table called
history_transactions, and at a stock take move all the transaction rows
into the history table.  That way we have the total history, but ding a
search for products in stock is useing a smaler table.

Taht said, maybe do the history thing at the end of each financial year
rather then each month or quater depending on how many transactions they
are getting (at the moment it's less than 50 per store per day, so less
400 a day on a good day, but other potential customes could well be
busier)
--

*
*  Rob Brown-Bayliss
*

Re: Questions about my strategy

From
Chris Albertson
Date:
You are confusing database design with user interfece design.

Making a user enter any kind of code is stupid user interface
period.  Ifthe user is looking for size 11 1/2 shoues he should
nothave to remember to enter SizeCode='23W-Z47'  He shoul be
able to either select from a drop down menue or enter any of
"10 1/2", "10.5", or even "10.3 .. 10 3/4"

Here is an example of an inventory search page that is
even a bit more complex than the shoe shor example.  They
have tens of thousands of one of a kind items but still
searching by ad-hoc catagory is easy.  Try it

http://www.yachtworld.com/listing/yw_advanced_search_form.jsp

See, you can find boats built by "catalina" that are between
30 and 40 feet long that are currently in the UK.

Now if you want to storecodes in your DBMS fine. but just
don't show them to the user Use a translation table called
maybe code2human if you ned to save space.


--- Rob Brown-Bayliss <rob@zoism.org> wrote:
> On Wed, 2002-07-31 at 02:34, Andrew Sullivan wrote:
> > Why not three codes: model, colour, and size?  Then you can query
> all
> > by model, and limit by size, colour, or both.
>
> How do you mean three codes?
>
> If you mean having more than one product code for a type of shoe
> thats
> what I am trying to avoid.  I worked for acompany that did that a
> long
> time ago, the product was steel, and having 15 or more different
> lengths
> of a steel chanl meant having 15 product codes.  It was often easier
> to
> walk out to the store and look for the product than to query each
> product code at the counter...
>
>
> > I think your strategy will work, but it has the potential to
> degrade
> > seriously over time, or else to require some sort of regular
> > maintenance.  Nothing wrong with that, of course, just a potential
> > pitfall.
>
>
> One thing I am thinking is having a duplicate table called
> history_transactions, and at a stock take move all the transaction
> rows
> into the history table.  That way we have the total history, but ding
> a
> search for products in stock is useing a smaler table.
>
> Taht said, maybe do the history thing at the end of each financial
> year
> rather then each month or quater depending on how many transactions
> they
> are getting (at the moment it's less than 50 per store per day, so
> less
> 400 a day on a good day, but other potential customes could well be
> busier)
> --
>
> *
> *  Rob Brown-Bayliss
> *
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html


=====
Chris Albertson
  Home:   310-376-1029  chrisalbertson90278@yahoo.com
  Cell:   310-990-7550
  Office: 310-336-5189  Christopher.J.Albertson@aero.org

__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

Re: Questions about my strategy

From
Rob Brown-Bayliss
Date:
On Wed, 2002-07-31 at 11:10, Chris Albertson wrote:
> You are confusing database design with user interfece design.
>
> Making a user enter any kind of code is stupid user interface
> period.  Ifthe user is looking for size 11 1/2 shoues he should
> nothave to remember to enter SizeCode='23W-Z47'  He shoul be
> able to either select from a drop down menue or enter any of
> "10 1/2", "10.5", or even "10.3 .. 10 3/4"

The query interface will use dropdowns and some autocompletion (like
tabs to complete part names etc)

Sales of course by scanning a bar code...  I am only interested in t\ke
storage of the data and would like to know what Andrew meant by 3 codes.

--

*
*  Rob Brown-Bayliss
*

Re: Questions about my strategy

From
"Nigel J. Andrews"
Date:
On 31 Jul 2002, Rob Brown-Bayliss wrote:

> On Wed, 2002-07-31 at 02:34, Andrew Sullivan wrote:
> > Why not three codes: model, colour, and size?  Then you can query all
> > by model, and limit by size, colour, or both.
>
> How do you mean three codes?
>
> If you mean having more than one product code for a type of shoe thats
> what I am trying to avoid.  I worked for acompany that did that a long
> time ago, the product was steel, and having 15 or more different lengths
> of a steel chanl meant having 15 product codes.  It was often easier to
> walk out to the store and look for the product than to query each
> product code at the counter...

What Andrew is saying, I believe, is that you need to determine what defines a
shoe. What defines a shoe is it's model, it's size and it's colour. Therefore
you absolutely need a unique code to describe each variant. Either that or you
say the combination of (model, size, colour) is the unique code. Personally one
of the two designs I immediately thought of (the second one actually but the
first has more relations) that I would therefore consider strongly goes
something like:

shoe_model:
    mid primary key,
    name,
    description,
    manufacturer,
    ...

shoe_stock:
    mid references shoe_model(mid),
    colour,
    size,
    count


That structure gives you everything you want whether count is a difference or
an absolute, at least I think it does. I believe it's also more of a
correct solution than the convoluted way you seem to be thinking. Some database
expert will now correct me on those points. :)


--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants


Re: Questions about my strategy

From
Andrew Sullivan
Date:
On Wed, Jul 31, 2002 at 10:46:24AM +1200, Rob Brown-Bayliss wrote:
> On Wed, 2002-07-31 at 02:34, Andrew Sullivan wrote:
> > Why not three codes: model, colour, and size?  Then you can query all
> > by model, and limit by size, colour, or both.
>
> How do you mean three codes?
>
> If you mean having more than one product code for a type of shoe thats
> what I am trying to avoid.

Nope.  A code for the product, and then other (optional) codes for
size and colour.  So, you can query by maker, and by shoe type, and
by size, and by colour:

Show me all the things by bigcorp in style hotnow that are red and
size ten is something like

    SELECT DISTINCT (shoename)
    FROM shoecodes
    WHERE maker = 'bigcorp'
    AND shoename = 'hotnow'
    AND size = 10
    AND colour = 'red';

That's a crappy example, but you get the picture?  This way you can
also get all the shoes in size ten, all the red shoes, or all the
shoes by so-and-so, or whatever.

A
--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110


Re: Questions about my strategy

From
Rob Brown-Bayliss
Date:
On Wed, 2002-07-31 at 22:47, Nigel J. Andrews wrote:

Now you have me confused and paranoid, but I learnt something new so
thats good (the constraint when creating a table).

> That structure gives you everything you want whether count is a difference or
> an absolute, at least I think it does. I believe it's also more of a
> correct solution than the convoluted way you seem to be thinking. Some database
> expert will now correct me on those points. :)

What I am doing is I thought what you just presented, from pg_dump I
have:

CREATE TABLE "stock_products" (
    "loc_seq_pkey" text DEFAULT set_unique_key() NOT NULL,
    "timestamp" timestamp with time zone DEFAULT 'now()',
    "version" integer DEFAULT 0,
    "f_new" character varying,
    "f_update" character varying,
    "product" character varying,
    "category_key" text,
    "brand_key" text,
    "sizegrp_key" text,
    "code" character varying,
    "biz_key" text,
    "value" numeric(12,2),
    Constraint "stock_products_pkey" Primary Key ("loc_seq_pkey")
);

CREATE TABLE "stock_transactions" (
    "loc_seq_pkey" text DEFAULT set_unique_key() NOT NULL,
    "timestamp" timestamp with time zone DEFAULT 'now()',
    "version" integer DEFAULT 0,
    "f_new" character varying,
    "f_update" character varying,
    "product_key" text,
    "branch_key" text,
    "size_key" text,
    "colour_key" text,
    "transaction_key" text,
    "type_key" text,
    "count" integer,
    "value" numeric(12,2),
    Constraint "stock_transactions_pkey" Primary Key ("loc_seq_pkey")
);

all the *_key fields link to other tables, I do this so that I can
easily add colours etc...

product_key likes to the stock_products table.  count is the number of
titems in the transaction, negative  for stock leaving, pos for
incomming, type_key is type of movement, eg: sale or transfer to another
branch etc.  Value is the value of the transaction at that point in
time.  I keep this as next month the shos might have cost the store more
than this month and is taken from teh stock_products table.
Transaction_key is so that I can group them togeather, ie: a link to a
stock_order table (ties in with type,ie if type is ionwards then links
to orders table).



--

*
*  Rob Brown-Bayliss
*