Thread: problem with subqueries
Hi Any help appreciated - I have spent 2 days trying to get this query to work! I have an orders database and a customer database. The orders database has a date field for each order. Because I want to obtain a monthly breakdown, I created a view called monthcustomer as this select: select orders.ord_date, customer.cname,date_part('month',orders.ord_date) AS "month", date_part('year',orders.ord_date) AS"year", orders.number_of_items; Each month will have multiple numbers of items, so to get a monthly breakdown I tried this: select distinct year,month, (select sum(monthcustomer.number_of_items) from monthcustomer where monthcustomer.year=m.yearand monthcustomer.month=m.month) as NumPotsfrom monthcustomer m; This goes off and never comes back - CPU is hitting the top of the chart! I have to ^C to interrupt it, as it runs for ages (I've left this run for 10-20 minutes and it is still running). I have indexes on the columns involved from the original tables. Any help appreciated. Pete PS: Using pgsql 7.2-70 from Suse distribution.
pete@phillipsfamily.freeserve.co.uk writes: > select distinct year,month, > (select sum(monthcustomer.number_of_items) from monthcustomer where > monthcustomer.year=m.year and monthcustomer.month=m.month) as NumPots > from monthcustomer m; > This goes off and never comes back - No surprise, considering the sub-select is going to be evaluated separately for every row of monthcustomer --- and then most of those evaluations will be thrown away by the DISTINCT :-( A straightforward way of reducing the redundant computations would be to do the DISTINCT first: select year,month, (select sum(monthcustomer.number_of_items) from monthcustomer where monthcustomer.year=m.year and monthcustomer.month=m.month)as NumPotsfrom (select distinct year, month from monthcustomer) as m; But it appears to me that you are reinventing the wheel. Isn't this query the equivalent of a grouped aggregation --- viz, select year, month, sum(number_of_items) as NumPotsfrom monthcustomergroup by year, month regards, tom lane
On Sat, 5 Oct 2002 pete@phillipsfamily.freeserve.co.uk wrote: > Because I want to obtain a monthly breakdown, I created a view called > monthcustomer as this select: > > select orders.ord_date, customer.cname, > date_part('month',orders.ord_date) AS "month", > date_part('year',orders.ord_date) AS "year", > orders.number_of_items; > > Each month will have multiple numbers of items, so to get a monthly > breakdown I tried this: > > select distinct year,month, > (select sum(monthcustomer.number_of_items) from monthcustomer where > monthcustomer.year=m.year and monthcustomer.month=m.month) as NumPots > from monthcustomer m; > > This goes off and never comes back - CPU is hitting the top of the > chart! I have to ^C to interrupt it, as it runs for ages (I've left That's going to run that inner select once for every row in monthcustomer probably. Would select year, month, sum(monthcustomer.number_of_items) as NumPots frmo monthcustomer m group by year, month; have the same effect, get the sum of the items for each year/month combination along with which year and month?
Hi guys. Thanks for the rapid replies so far. To answer some of the questions: >you did not indicate an explicit join - or even a "from" clause for that >matter- in the example of your create view statement. My original post was a simplified version. Here is the actual view creating statement: create view monthord as select ord_date, extract (month from ord_date)as month, extract (year from ord_date) as year,r_region,number_of_itemsfrom orders,customer where ccode = codenum; >But it appears to me that you are reinventing the wheel. Isn't this >query the equivalent of a grouped aggregation Yes - but again I was simplifying - I want to run a sub query for each region, so I get output like this: year month Reg1 Reg2 Reg3 Reg4 ----- ----- ---- ---- ----- ---- 1999 Jan 20 45 10 27 1999 Feb 30 43 18 37 ... 2002 Oct 7 89 60 17 The subquery I have tried to run is actually this (there is probably a way to do this all in SQL, but at present I would like to just understand why my subqueries take so long). -----start of script --------------- #!/bin/sh # prepare a query as shell variables # set up timeframe SDATE=`date --date '2 years ago' +'%Y-%m-01'` EDATE=`date --date 'next month' +'%Y-%m-01'` # use the QUERY env variable to build the full query QUERY="select distinct year,month, " for reg in `psql -U postgres -d product_db -c "select distinct r_code from regionlist order by r_code;"` do QUERY="$QUERY (select sum(monthcustomer.number_of_items) from monthcustomer where monthcustomer.year=m.year and monthcustomer.month=m.monthand r_region like '$reg') as $reg," done # now add a total column QUERY="$QUERY (select sum(monthcustomer.number_of_items) from monthcustomer where ord_date >= '$SDATE' and ord_date < '$EDATE'and monthcustomer.year=m.year and monthcustomer.month=m.month) as TOTAL from monthcustomer m;" # execute the query psql -U postgres -d newmaggot -c "$QUERY" -----end of script --------------- As you can see, I get all the regions, and loop through them building up a bunch of subqueries as I go along. Hopefully this will answer some of your questions. I will try some of the suggestions later on today (such as placing the distinct clause at the end) but first I have to go and get some decorating done :-( (yes, that sound you can hear is the crack of my wife's whip!). Regards, Pete -- I do whatever the Voices tell me to do
On Sun, 6 Oct 2002 pete@phillipsfamily.freeserve.co.uk wrote: > Hi guys. Thanks for the rapid replies so far. > > To answer some of the questions: > > >you did not indicate an explicit join - or even a "from" clause for that > >matter- in the example of your create view statement. > > My original post was a simplified version. Here is the actual view > creating statement: > > create view monthord as select ord_date, extract (month from ord_date) > as month, extract (year from ord_date) as year,r_region, > number_of_items from orders,customer where ccode = codenum; > > > >But it appears to me that you are reinventing the wheel. Isn't this > >query the equivalent of a grouped aggregation > > Yes - but again I was simplifying - I want to run a sub query for each > region, so I get output like this: > > year month Reg1 Reg2 Reg3 Reg4 > ----- ----- ---- ---- ----- ---- > 1999 Jan 20 45 10 27 > 1999 Feb 30 43 18 37 > ... > 2002 Oct 7 89 60 17 > > The subquery I have tried to run is actually this (there is probably a > way to do this all in SQL, but at present I would like to just > understand why my subqueries take so long). Well, you're running <n> subqueries for each row in monthcustomer because the distinct happens afterwards in your query. So if you've got 4 regions and 1 total and 100,000 rows in monthcustomer, you're looking at something on the order of 500,000 subqueries. Doing the distinct before that step should lower the number to ((#year/month combinations) * (#regions+1)). In any case, you may be better off with one of: a) Doing something programatic to turn a result set like:year|month|region|value1999|Jan |1 |201999|Jan |2 |45... into the form you want. The above can be gotten by group by probably and would require no subqueries. b) Keeping a summary table that you update via triggers. Thisrequires a bit of work to get the triggers, but it probablymakesthe query faster.