Thread: SQL Count Magic Required....
i have the following query that yields a series of true or false results: -- distinct on is pgsql extension -- SELECT DISTINCT ON (t_inspect_result.inspect_id) t_inspect_result.inspect_result_id, t_inspect_result.inspect_result_pass, t_inspect_area.inspect_area, t_inspect_result.inspect_result_timestamp --,t_inspect.serial_number_id, t_inspect.inspect_id FROM t_inspect_result, t_inspect, t_inspect_area, t_serial_number, t_link_contract_number_job_number, t_job_number, t_product WHERE t_inspect.inspect_area_id = t_inspect_area.inspect_area_id AND t_inspect.serial_number_id = t_serial_number.serial_number_id AND t_serial_number.link_contract_number_job_number_id = t_link_contract_number_job_number.link_contract_number_job_number_id AND t_link_contract_number_job_number.job_number_id = t_job_number.job_number_id AND t_product.product_id = t_job_number.product_id AND t_inspect.inspect_id = t_inspect_result.inspect_id AND t_inspect.serial_number_id = '200' ORDER BY t_inspect_result.inspect_id DESC, t_inspect_result.inspect_result_timestamp ASC -- used to get first pass yield pass / fail (true / false) data. -- inspect_id desc impacts end result. time desc impacts the groups prior to being distinctly listed the simplified output may look like f,t,t,f,f,f,t,f,t,f the COUNT magic comes into play b/c i want to count the result set's "t"s and total, but i have no clue how to get this done. #ts: 4 #total: 10 when i have this data, i can apply some math and come up with a 40% yield. any help is, as always, much appreciated. tia... ps - this query looks to be brutal when there is a lot of data. is it? if so, how can i optimize it? __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On 5/31/06 7:32 PM, "operationsengineer1@yahoo.com" <operationsengineer1@yahoo.com> wrote: > i have the following query that yields a series of > true or false results: > > -- distinct on is pgsql extension -- > SELECT DISTINCT ON (t_inspect_result.inspect_id) > t_inspect_result.inspect_result_id, > > t_inspect_result.inspect_result_pass, > t_inspect_area.inspect_area, > > t_inspect_result.inspect_result_timestamp > --,t_inspect.serial_number_id, > t_inspect.inspect_id > FROM t_inspect_result, t_inspect, t_inspect_area, > t_serial_number, > t_link_contract_number_job_number, t_job_number, > t_product > WHERE t_inspect.inspect_area_id = > t_inspect_area.inspect_area_id > AND t_inspect.serial_number_id = > t_serial_number.serial_number_id > AND t_serial_number.link_contract_number_job_number_id > = > > t_link_contract_number_job_number.link_contract_number_job_number_id > AND t_link_contract_number_job_number.job_number_id = > t_job_number.job_number_id > AND t_product.product_id = > t_job_number.product_id > AND t_inspect.inspect_id = > t_inspect_result.inspect_id > AND t_inspect.serial_number_id = '200' > ORDER BY t_inspect_result.inspect_id DESC, > t_inspect_result.inspect_result_timestamp ASC > -- used to get first pass yield pass / fail (true / > false) data. > -- inspect_id desc impacts end result. time desc > impacts the groups prior to being distinctly listed > > the simplified output may look like > > f,t,t,f,f,f,t,f,t,f > > the COUNT magic comes into play b/c i want to count > the result set's "t"s and total, but i have no clue > how to get this done. > > #ts: 4 > #total: 10 > > when i have this data, i can apply some math and come > up with a 40% yield. You can do a query like (untested, and needs to be translated into your monster query): select a.id,a.total,b.failed,(a.total::numeric)/b.total as yield from (select count(test_result) as total from table) as a, (select count(test_result) as failed from table where test_result='f') as b where a.id = b.id; The point is to do the queries separately as subqueries and join them on some primary key so that you get the count "total" and the count "failed". Then you can do the math as above. Note that you have to cast at least one of the integers to numeric if you want a numeric result. Sean
> > i have the following query that yields a series of > > true or false results: > > > > -- distinct on is pgsql extension -- > > SELECT DISTINCT ON (t_inspect_result.inspect_id) > > t_inspect_result.inspect_result_id, > > > > t_inspect_result.inspect_result_pass, > > t_inspect_area.inspect_area, > > > > t_inspect_result.inspect_result_timestamp > > --,t_inspect.serial_number_id, > > t_inspect.inspect_id > > FROM t_inspect_result, t_inspect, t_inspect_area, > > t_serial_number, > > t_link_contract_number_job_number, t_job_number, > > t_product > > WHERE t_inspect.inspect_area_id = > > t_inspect_area.inspect_area_id > > AND t_inspect.serial_number_id = > > t_serial_number.serial_number_id > > AND > t_serial_number.link_contract_number_job_number_id > > = > > > > > t_link_contract_number_job_number.link_contract_number_job_number_id > > AND > t_link_contract_number_job_number.job_number_id = > > t_job_number.job_number_id > > AND t_product.product_id = > > t_job_number.product_id > > AND t_inspect.inspect_id = > > t_inspect_result.inspect_id > > AND t_inspect.serial_number_id = '200' > > ORDER BY t_inspect_result.inspect_id DESC, > > t_inspect_result.inspect_result_timestamp ASC > > -- used to get first pass yield pass / fail (true > / > > false) data. > > -- inspect_id desc impacts end result. time desc > > impacts the groups prior to being distinctly > listed > > > > the simplified output may look like > > > > f,t,t,f,f,f,t,f,t,f > > > > the COUNT magic comes into play b/c i want to > count > > the result set's "t"s and total, but i have no > clue > > how to get this done. > > > > #ts: 4 > > #total: 10 > > > > when i have this data, i can apply some math and > come > > up with a 40% yield. > > You can do a query like (untested, and needs to be > translated into your > monster query): > > select > a.id,a.total,b.failed,(a.total::numeric)/b.total as > yield > from (select count(test_result) as total from > table) as a, > (select count(test_result) as failed from > table where > test_result='f') as b where a.id = b.id; > > The point is to do the queries separately as > subqueries and join them on > some primary key so that you get the count "total" > and the count "failed". > Then you can do the math as above. Note that you > have to cast at least one > of the integers to numeric if you want a numeric > result. > > Sean Sean, t_test_result can get quite large quite quickly. iow, this query could be running on a million or more rows in a year or two. is this a performance killer query? the reason i ask is i could add a column to t_inspect called first_pass and store the pass/fail data there and then just complete a much simpler query counting the t_inspect.first_pass true values and then the getting the total. i'd imagine that would be much faster and much simpler to conceptualize. the downside is that i would have repeating information. tia... __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On 6/1/06 11:47 AM, "operationsengineer1@yahoo.com" <operationsengineer1@yahoo.com> wrote: >>> i have the following query that yields a series of >>> true or false results: >>> >>> -- distinct on is pgsql extension -- >>> SELECT DISTINCT ON (t_inspect_result.inspect_id) >>> t_inspect_result.inspect_result_id, >>> >>> t_inspect_result.inspect_result_pass, >>> t_inspect_area.inspect_area, >>> >>> t_inspect_result.inspect_result_timestamp >>> --,t_inspect.serial_number_id, >>> t_inspect.inspect_id >>> FROM t_inspect_result, t_inspect, t_inspect_area, >>> t_serial_number, >>> t_link_contract_number_job_number, t_job_number, >>> t_product >>> WHERE t_inspect.inspect_area_id = >>> t_inspect_area.inspect_area_id >>> AND t_inspect.serial_number_id = >>> t_serial_number.serial_number_id >>> AND >> t_serial_number.link_contract_number_job_number_id >>> = >>> >>> >> > t_link_contract_number_job_number.link_contract_number_job_number_id >>> AND >> t_link_contract_number_job_number.job_number_id = >>> t_job_number.job_number_id >>> AND t_product.product_id = >>> t_job_number.product_id >>> AND t_inspect.inspect_id = >>> t_inspect_result.inspect_id >>> AND t_inspect.serial_number_id = '200' >>> ORDER BY t_inspect_result.inspect_id DESC, >>> t_inspect_result.inspect_result_timestamp ASC >>> -- used to get first pass yield pass / fail (true >> / >>> false) data. >>> -- inspect_id desc impacts end result. time desc >>> impacts the groups prior to being distinctly >> listed >>> >>> the simplified output may look like >>> >>> f,t,t,f,f,f,t,f,t,f >>> >>> the COUNT magic comes into play b/c i want to >> count >>> the result set's "t"s and total, but i have no >> clue >>> how to get this done. >>> >>> #ts: 4 >>> #total: 10 >>> >>> when i have this data, i can apply some math and >> come >>> up with a 40% yield. >> >> You can do a query like (untested, and needs to be >> translated into your >> monster query): >> >> select >> a.id,a.total,b.failed,(a.total::numeric)/b.total as >> yield >> from (select count(test_result) as total from >> table) as a, >> (select count(test_result) as failed from >> table where >> test_result='f') as b where a.id = b.id; >> >> The point is to do the queries separately as >> subqueries and join them on >> some primary key so that you get the count "total" >> and the count "failed". >> Then you can do the math as above. Note that you >> have to cast at least one >> of the integers to numeric if you want a numeric >> result. >> >> Sean > > Sean, t_test_result can get quite large quite quickly. > iow, this query could be running on a million or more > rows in a year or two. > > is this a performance killer query? > > the reason i ask is i could add a column to t_inspect > called first_pass and store the pass/fail data there > and then just complete a much simpler query counting > the t_inspect.first_pass true values and then the > getting the total. > > i'd imagine that would be much faster and much simpler > to conceptualize. the downside is that i would have > repeating information. I don't know how fast this will be. You will have to benchmark some. An alternative is to use a "historic" table that includes infrequently accessed historical data (for serial numbers of products no longer produced, for example). Or you could create a materialized view of the summarized data. Or put in a trigger that updates counts to another table when you insert into the main table. But, I would probably prove to yourself that your query performance is unacceptable, first. If this is going to be used to generate a "report" once a month, then if it takes 15 minutes to run, no big deal. Also, if you are looking up only one item at a time (by serial number, for example, for a web interface) so that indexing can be used, things will be very fast, I would imagine. You will have to benchmark. Sean
> On 5/31/06 7:32 PM, "operationsengineer1@yahoo.com" > <operationsengineer1@yahoo.com> wrote: > > > i have the following query that yields a series of > > true or false results: > > > > -- distinct on is pgsql extension -- > > SELECT DISTINCT ON (t_inspect_result.inspect_id) > > t_inspect_result.inspect_result_id, > > > > t_inspect_result.inspect_result_pass, > > t_inspect_area.inspect_area, > > > > t_inspect_result.inspect_result_timestamp > > --,t_inspect.serial_number_id, > > t_inspect.inspect_id > > FROM t_inspect_result, t_inspect, t_inspect_area, > > t_serial_number, > > t_link_contract_number_job_number, t_job_number, > > t_product > > WHERE t_inspect.inspect_area_id = > > t_inspect_area.inspect_area_id > > AND t_inspect.serial_number_id = > > t_serial_number.serial_number_id > > AND > t_serial_number.link_contract_number_job_number_id > > = > > > > > t_link_contract_number_job_number.link_contract_number_job_number_id > > AND > t_link_contract_number_job_number.job_number_id = > > t_job_number.job_number_id > > AND t_product.product_id = > > t_job_number.product_id > > AND t_inspect.inspect_id = > > t_inspect_result.inspect_id > > AND t_inspect.serial_number_id = '200' > > ORDER BY t_inspect_result.inspect_id DESC, > > t_inspect_result.inspect_result_timestamp ASC > > -- used to get first pass yield pass / fail (true > / > > false) data. > > -- inspect_id desc impacts end result. time desc > > impacts the groups prior to being distinctly > listed > > > > the simplified output may look like > > > > f,t,t,f,f,f,t,f,t,f > > > > the COUNT magic comes into play b/c i want to > count > > the result set's "t"s and total, but i have no > clue > > how to get this done. > > > > #ts: 4 > > #total: 10 > > > > when i have this data, i can apply some math and > come > > up with a 40% yield. > > You can do a query like (untested, and needs to be > translated into your > monster query): > > select > a.id,a.total,b.failed,(a.total::numeric)/b.total as > yield > from (select count(test_result) as total from > table) as a, > (select count(test_result) as failed from > table where > test_result='f') as b where a.id = b.id; > > The point is to do the queries separately as > subqueries and join them on > some primary key so that you get the count "total" > and the count "failed". > Then you can do the math as above. Note that you > have to cast at least one > of the integers to numeric if you want a numeric > result. for those following on (probably just yours truly ;-), the first iteration (tested and works): SELECT a.total, b.passed, b.passed/(a.total::numeric) as yield FROM (SELECT count(inspect_result_pass) as total FROM t_inspect_result) AS a, (SELECT count(inspect_result_pass) as passed FROM t_inspect_result WHERE inspect_result_pass = 't') AS b this generates results based on the whole table (not limited to first entry and not limited by unique product/serial combo. in my test case, i have 5 passes and 9 total and 5/9 is displayed as 0.55555555555555... should i format this in my application code, or does pgsql allow for formatting? i want to display 55.5%. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
> in my test case, i have 5 passes and 9 total and 5/9 > is displayed as > > 0.55555555555555... > > should i format this in my application code, or does > pgsql allow for formatting? i want to display 55.5%. I would do this on the client side, but it doesn't make any particular difference. Allows you to maintain a separation of your model from your view. http://www.postgresql.org/docs/8.1/static/functions-math.html See the trunc function.
On Thu, Jun 01, 2006 at 02:28:29PM -0400, Sean Davis wrote: > > > in my test case, i have 5 passes and 9 total and 5/9 > > is displayed as > > > > 0.55555555555555... > > > > should i format this in my application code, or does > > pgsql allow for formatting? i want to display 55.5%. > > I would do this on the client side, but it doesn't make any particular > difference. Allows you to maintain a separation of your model from your > view. > > http://www.postgresql.org/docs/8.1/static/functions-math.html > > See the trunc function. Or how about casting the answer to numeric(3,1) ( or whatever precision is desired)?
thanks to help from Sean and Michael, this is the working query... SELECT 1.0 * sum(CASE WHEN inspect_result_pass THEN 1 ELSE 0 END) / count(total), sum(CASE WHEN inspect_result_pass THEN 1 ELSE 0 END), count(total) FROM (SELECT DISTINCT ON (t_inspect.inspect_id) t_inspect_result.inspect_result_pass FROM t_inspect_result, t_inspect, t_inspect_area, t_serial_number, t_link_contract_number_job_number, t_job_number, t_product WHERE t_inspect_result.inspect_id = t_inspect.inspect_id AND t_inspect.serial_number_id = t_serial_number.serial_number_id AND t_inspect_result.inspect_result_pass = 't' AND t_serial_number.link_contract_number_job_number_id = t_link_contract_number_job_number.link_contract_number_job_number_id AND t_link_contract_number_job_number.job_number_id = t_job_number.job_number_id AND t_product.product_id = t_job_number.product_id AND t_inspect.serial_number_id = 200 AND t_product.product_number = 7214118000 AND t_inspect_result.inspect_result_timestamp > '2005-06-01' AND t_inspect_result.inspect_result_timestamp < '2006-06-01' ORDER BY t_inspect.inspect_id, inspect_result_timestamp ASC) AS total of course, i use bind variable notation (?) for serial_number_id and product_number. thanks again. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com