Many joins: monthly summaries S-L--O--W - Mailing list pgsql-novice

From Michael Glaesmann
Subject Many joins: monthly summaries S-L--O--W
Date
Msg-id 7C513675-03BF-11D8-B460-0005029FC1A7@myrealbox.com
Whole thread Raw
Responses Re: Many joins: monthly summaries S-L--O--W  (Josh Berkus <josh@agliodbs.com>)
Re: Many joins: monthly summaries S-L--O--W  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-novice
Hello all!

I've been working with PostgreSQL now for about a year and have really
enjoyed it. And I've been able to impress the boss with very simple
PHP/PostgreSQL web applications, so that's been great too!

Nearly all I've done has been very straightforward and basic: simple
selects from a couple of different tables joined by where clauses. But
now I'd like to do something a bit more complex: show sales per month,
current inventory, and expected duration of inventory at present sales
rates for each item we have. All in one relation :)

Here are the relevant relations

products (code TEXT PK, name TEXT)

orders (product_code TEXT FK products(id), date DATE, branch_id INTEGER
FK branches(id), qty INTEGER)

inventory (product_code INTEGER FK products(id), date DATE, qty INTEGER)

What I'm trying to make is a table with columns
product_code,
product_name,
jan03_qty,...,
half1_qty,
jul03_qty, ...
sep03_qty,
half2_qty,
total_qty,
inv_qty,
inv_date,
est_inv_qty,
months_remaining

where product_name is products(name), jan03_qty is the quantity of
sales in January, 2003, half1_qty is quantity of sales from January
through June, 2003, inv_qty is the latest inventory data we have,
inv_date is the date of that inventory data, est_inv_qty is the
estimated current inventory based on inv_qty and sales since inv_date,
and months_remaining is an estimate of how many months the estimated
inventory will last at average sales rates (calculated using the
average monthly sales for the previous months).

I've got something that works, but it's *extremely* slow. It takes
about 10 minutes running on a 500MHz G4 Mac with 1GB RAM, running OS
10.2.8 and PostgreSQL 7.3.3. Here's what I've done:

select
    products.code as product_code,
    products.name as product_name,
    jan03.qty as jan03_qty,
    feb03.qty as feb03_qty ...
    inv.qty as inv_qty,
    est_inv.qty as est_inv_qty,
    months_remaining::numeric(8,1)
from products
    left join (
        select products.code as product_code, sum(qty) as qty
        from orders, products
        where products.code = orders.id and
        date between '2003-01-01' and '2003-01-31'
        group by product_code) as
    jan03 on (jan03.product_code = products.code)
    left join (
        select products.code as product_code, sum(qty) as qty
        from orders, products
        where products.code = orders.id and
        date between '2003-02-01' and '2003-02-28'
        group by product_code) as
    feb03 on (feb03.product_code = products.code)
    left join
-- repeat through total_qty
    total on (total.product_code = products.code)
    left join (

-- this is where it get's hairy

        select
                     est_inventory.product_code as product_code,
            est_inventory.qty,
                     monthly_averages.monthly_average,
                     (est_inventory.qty/monthly_average)::numeric(10,1)
                as months_remaining
                 from (
                     select
                             inventory.product_code as product_code,
                inventory.qty - coalesce(orders.qty,0) as qty
                         from (
                                 select product_code, date, qty
                                 from current_inventory_view
                                 ) as
                inventory
                             left outer join (
                                 select orders.product_code as product_code,
                    sum(orders.qty) as qty
                                 from (
                                         select product_code, date, qty
                                         from current_inventory_view
                                         ) as
                    inventory,
                    orders
                                 where
                                     orders.date > inventory.date and
                                     orders.product_code = inventory.product_code
                                 group by orders.product_code
                    ) as
                orders on (inventory.product_code = orders.product_code)
                         ) as
            est_inventory
                     left outer join (
                         select
                             product_code as product_code,
                             sum(qty)/ageinmonths(timestamp '9/30/2003',
timestamp '1/1/2003')                 as monthly_average
                         from orders
                         where date between '1/1/2003' and '9/30/2003'
                         group by product_code
                         ) as
            monthly_averages on
                (est_inventory.product_code = monthly_averages.product_code)
                     where monthly_average > 0
                     ) as
    remaining on (remaining.product_code = products.code)
             left join (
               select distinct product_code, date, qty
                 from current_inventory_view order by date desc
                 ) as
    inventory on (inventory.product_code = products.code)
    ; -- finally

ageinmonths is an sql function that returns a double precision float,
the number of months between $1 and $2, defined as follows
    select
    12 * date_part ('year',age($1, $2)) +
    date_part('month',age($1,$2)) +
    date_part('day'), age($1, $2))/30;

current_inventory_view, showing the most recent inventory qty and date,
is defined as
    SELECT inv.date, inv.product_code, inv.qty FROM inventory inv, (SELECT
max(inventory.date) AS date, inventory.product_code FROM inventory
GROUP BY inventory.product_code) curr_inv WHERE ((inv.date =
curr_inv.date) AND (inv.product_code = curr_inv.product_code));

orders is indexed on date and product_code (and branch_id, but that
shouldn't matter)
inventory is indexed on date
products is unindexed besides the order_pkey index that's
auto-generated.

I'm using left joins because not all items sell each month. Items that
have zero sales would fall out if I used WHERE clauses.

Now, I've done an EXPLAIN on the monstrosity and it comes to just over
3 pages printed at 8 pt on A3 landscape. It's much more than I can take
in. I've looked through the EXPLAIN documentation and am willing to
learn (which is good, because I definitely need to).

I tried a version of this, truncating the 'hairy' part. It definitely
ran faster, taking less than 5 minutes.

Also, this query requires more upkeep than I'd like. I modify it every
time I want to add a new month. It seems like I should be able to use a
custom function so I can just put in the date span I'd like to cover.

I've started to work on a function and have run into some problems (see
separate post: Custom function problems if you're interested )

Any suggestion would be appreciated. It seems like this should be
something that's easy, and I'm approaching it wrong. If I'm completely
on the wrong track, I'd love to know! Also, places I should/could look
for ways to accomplish this, that'd be great.

If you've made it to here, thanks for your perseverance!  : )

Regards,
Michael Glaesemann
grzm myrealbox com



pgsql-novice by date:

Previous
From: Michael Glaesmann
Date:
Subject: Custom function problems
Next
From: Stephan Szabo
Date:
Subject: Re: Custom function problems