Thread: SQL Count Magic Required....

SQL Count Magic Required....

From

Date:

31 May 2006, 20:32:51

i have the following query that yields a series of
true or false results:

-- distinct on is pgsql extension --
SELECT DISTINCT ON (t_inspect_result.inspect_id)
t_inspect_result.inspect_result_id,

t_inspect_result.inspect_result_pass,
                   t_inspect_area.inspect_area,

t_inspect_result.inspect_result_timestamp
                   --,t_inspect.serial_number_id,
t_inspect.inspect_id
FROM t_inspect_result, t_inspect, t_inspect_area,
     t_serial_number,
t_link_contract_number_job_number, t_job_number,
t_product
WHERE  t_inspect.inspect_area_id =
       t_inspect_area.inspect_area_id
AND t_inspect.serial_number_id =
    t_serial_number.serial_number_id
AND t_serial_number.link_contract_number_job_number_id
=

t_link_contract_number_job_number.link_contract_number_job_number_id
AND t_link_contract_number_job_number.job_number_id =
    t_job_number.job_number_id
AND t_product.product_id =
    t_job_number.product_id
AND t_inspect.inspect_id =
    t_inspect_result.inspect_id
AND t_inspect.serial_number_id = '200'
ORDER BY t_inspect_result.inspect_id DESC,
t_inspect_result.inspect_result_timestamp ASC
-- used to get first pass yield pass / fail (true /
false) data.
-- inspect_id desc impacts end result.  time desc
impacts the groups prior to being distinctly listed

the simplified output may look like

f,t,t,f,f,f,t,f,t,f

the COUNT magic comes into play b/c i want to count
the result set's "t"s and total, but i have no clue
how to get this done.

#ts: 4
#total: 10

when i have this data, i can apply some math and come
up with a 40% yield.

any help is, as always, much appreciated.

tia...

ps - this query looks to be brutal when there is a lot
of data.  is it?  if so, how can i optimize it?

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: SQL Count Magic Required....

From

Sean Davis

Date:

01 June 2006, 07:56:46



On 5/31/06 7:32 PM, "operationsengineer1@yahoo.com"
<operationsengineer1@yahoo.com> wrote:

> i have the following query that yields a series of
> true or false results:
>
> -- distinct on is pgsql extension --
> SELECT DISTINCT ON (t_inspect_result.inspect_id)
> t_inspect_result.inspect_result_id,
>
> t_inspect_result.inspect_result_pass,
>                    t_inspect_area.inspect_area,
>
> t_inspect_result.inspect_result_timestamp
>                    --,t_inspect.serial_number_id,
> t_inspect.inspect_id
> FROM t_inspect_result, t_inspect, t_inspect_area,
>      t_serial_number,
> t_link_contract_number_job_number, t_job_number,
> t_product
> WHERE  t_inspect.inspect_area_id =
>        t_inspect_area.inspect_area_id
> AND t_inspect.serial_number_id =
>     t_serial_number.serial_number_id
> AND t_serial_number.link_contract_number_job_number_id
> =
>
> t_link_contract_number_job_number.link_contract_number_job_number_id
> AND t_link_contract_number_job_number.job_number_id =
>     t_job_number.job_number_id
> AND t_product.product_id =
>     t_job_number.product_id
> AND t_inspect.inspect_id =
>     t_inspect_result.inspect_id
> AND t_inspect.serial_number_id = '200'
> ORDER BY t_inspect_result.inspect_id DESC,
> t_inspect_result.inspect_result_timestamp ASC
> -- used to get first pass yield pass / fail (true /
> false) data.
> -- inspect_id desc impacts end result.  time desc
> impacts the groups prior to being distinctly listed
>
> the simplified output may look like
>
> f,t,t,f,f,f,t,f,t,f
>
> the COUNT magic comes into play b/c i want to count
> the result set's "t"s and total, but i have no clue
> how to get this done.
>
> #ts: 4
> #total: 10
>
> when i have this data, i can apply some math and come
> up with a 40% yield.

You can do a query like (untested, and needs to be translated into your
monster query):

 select a.id,a.total,b.failed,(a.total::numeric)/b.total as yield
   from (select count(test_result) as total from table) as a,
        (select count(test_result) as failed from table where
test_result='f') as b where a.id = b.id;

The point is to do the queries separately as subqueries and join them on
some primary key so that you get the count "total" and the count "failed".
Then you can do the math as above.  Note that you have to cast at least one
of the integers to numeric if you want a numeric result.

Sean

Re: SQL Count Magic Required....

From

Date:

01 June 2006, 12:48:00

> > i have the following query that yields a series of
> > true or false results:
> >
> > -- distinct on is pgsql extension --
> > SELECT DISTINCT ON (t_inspect_result.inspect_id)
> > t_inspect_result.inspect_result_id,
> >
> > t_inspect_result.inspect_result_pass,
> >                    t_inspect_area.inspect_area,
> >
> > t_inspect_result.inspect_result_timestamp
> >                    --,t_inspect.serial_number_id,
> > t_inspect.inspect_id
> > FROM t_inspect_result, t_inspect, t_inspect_area,
> >      t_serial_number,
> > t_link_contract_number_job_number, t_job_number,
> > t_product
> > WHERE  t_inspect.inspect_area_id =
> >        t_inspect_area.inspect_area_id
> > AND t_inspect.serial_number_id =
> >     t_serial_number.serial_number_id
> > AND
> t_serial_number.link_contract_number_job_number_id
> > =
> >
> >
>
t_link_contract_number_job_number.link_contract_number_job_number_id
> > AND
> t_link_contract_number_job_number.job_number_id =
> >     t_job_number.job_number_id
> > AND t_product.product_id =
> >     t_job_number.product_id
> > AND t_inspect.inspect_id =
> >     t_inspect_result.inspect_id
> > AND t_inspect.serial_number_id = '200'
> > ORDER BY t_inspect_result.inspect_id DESC,
> > t_inspect_result.inspect_result_timestamp ASC
> > -- used to get first pass yield pass / fail (true
> /
> > false) data.
> > -- inspect_id desc impacts end result.  time desc
> > impacts the groups prior to being distinctly
> listed
> >
> > the simplified output may look like
> >
> > f,t,t,f,f,f,t,f,t,f
> >
> > the COUNT magic comes into play b/c i want to
> count
> > the result set's "t"s and total, but i have no
> clue
> > how to get this done.
> >
> > #ts: 4
> > #total: 10
> >
> > when i have this data, i can apply some math and
> come
> > up with a 40% yield.
>
> You can do a query like (untested, and needs to be
> translated into your
> monster query):
>
>  select
> a.id,a.total,b.failed,(a.total::numeric)/b.total as
> yield
>    from (select count(test_result) as total from
> table) as a,
>         (select count(test_result) as failed from
> table where
> test_result='f') as b where a.id = b.id;
>
> The point is to do the queries separately as
> subqueries and join them on
> some primary key so that you get the count "total"
> and the count "failed".
> Then you can do the math as above.  Note that you
> have to cast at least one
> of the integers to numeric if you want a numeric
> result.
>
> Sean

Sean, t_test_result can get quite large quite quickly.
 iow, this query could be running on a million or more
rows in a year or two.

is this a performance killer query?

the reason i ask is i could add a column to t_inspect
called first_pass and store the pass/fail data there
and then just complete a much simpler query counting
the t_inspect.first_pass true values and then the
getting the total.

i'd imagine that would be much faster and much simpler
to conceptualize.  the downside is that i would have
repeating information.

tia...

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: SQL Count Magic Required....

From

Sean Davis

Date:

01 June 2006, 13:03:11



On 6/1/06 11:47 AM, "operationsengineer1@yahoo.com"
<operationsengineer1@yahoo.com> wrote:

>>> i have the following query that yields a series of
>>> true or false results:
>>>
>>> -- distinct on is pgsql extension --
>>> SELECT DISTINCT ON (t_inspect_result.inspect_id)
>>> t_inspect_result.inspect_result_id,
>>>
>>> t_inspect_result.inspect_result_pass,
>>>                    t_inspect_area.inspect_area,
>>>
>>> t_inspect_result.inspect_result_timestamp
>>>                    --,t_inspect.serial_number_id,
>>> t_inspect.inspect_id
>>> FROM t_inspect_result, t_inspect, t_inspect_area,
>>>      t_serial_number,
>>> t_link_contract_number_job_number, t_job_number,
>>> t_product
>>> WHERE  t_inspect.inspect_area_id =
>>>        t_inspect_area.inspect_area_id
>>> AND t_inspect.serial_number_id =
>>>     t_serial_number.serial_number_id
>>> AND
>> t_serial_number.link_contract_number_job_number_id
>>> =
>>>
>>>
>>
> t_link_contract_number_job_number.link_contract_number_job_number_id
>>> AND
>> t_link_contract_number_job_number.job_number_id =
>>>     t_job_number.job_number_id
>>> AND t_product.product_id =
>>>     t_job_number.product_id
>>> AND t_inspect.inspect_id =
>>>     t_inspect_result.inspect_id
>>> AND t_inspect.serial_number_id = '200'
>>> ORDER BY t_inspect_result.inspect_id DESC,
>>> t_inspect_result.inspect_result_timestamp ASC
>>> -- used to get first pass yield pass / fail (true
>> /
>>> false) data.
>>> -- inspect_id desc impacts end result.  time desc
>>> impacts the groups prior to being distinctly
>> listed
>>>
>>> the simplified output may look like
>>>
>>> f,t,t,f,f,f,t,f,t,f
>>>
>>> the COUNT magic comes into play b/c i want to
>> count
>>> the result set's "t"s and total, but i have no
>> clue
>>> how to get this done.
>>>
>>> #ts: 4
>>> #total: 10
>>>
>>> when i have this data, i can apply some math and
>> come
>>> up with a 40% yield.
>>
>> You can do a query like (untested, and needs to be
>> translated into your
>> monster query):
>>
>>  select
>> a.id,a.total,b.failed,(a.total::numeric)/b.total as
>> yield
>>    from (select count(test_result) as total from
>> table) as a,
>>         (select count(test_result) as failed from
>> table where
>> test_result='f') as b where a.id = b.id;
>>
>> The point is to do the queries separately as
>> subqueries and join them on
>> some primary key so that you get the count "total"
>> and the count "failed".
>> Then you can do the math as above.  Note that you
>> have to cast at least one
>> of the integers to numeric if you want a numeric
>> result.
>>
>> Sean
>
> Sean, t_test_result can get quite large quite quickly.
>  iow, this query could be running on a million or more
> rows in a year or two.
>
> is this a performance killer query?
>
> the reason i ask is i could add a column to t_inspect
> called first_pass and store the pass/fail data there
> and then just complete a much simpler query counting
> the t_inspect.first_pass true values and then the
> getting the total.
>
> i'd imagine that would be much faster and much simpler
> to conceptualize.  the downside is that i would have
> repeating information.

I don't know how fast this will be.  You will have to benchmark some.  An
alternative is to use a "historic" table that includes infrequently accessed
historical data (for serial numbers of products no longer produced, for
example).  Or you could create a materialized view of the summarized data.
Or put in a trigger that updates counts to another table when you insert
into the main table.  But, I would probably prove to yourself that your
query performance is unacceptable, first.  If this is going to be used to
generate a "report" once a month, then if it takes 15 minutes to run, no big
deal.  Also, if you are looking up only one item at a time (by serial
number, for example, for a web interface) so that indexing can be used,
things will be very fast, I would imagine.  You will have to benchmark.

Sean

Re: SQL Count Magic Required.... First Iteration...

From

Date:

01 June 2006, 15:04:22

> On 5/31/06 7:32 PM, "operationsengineer1@yahoo.com"
> <operationsengineer1@yahoo.com> wrote:
>
> > i have the following query that yields a series of
> > true or false results:
> >
> > -- distinct on is pgsql extension --
> > SELECT DISTINCT ON (t_inspect_result.inspect_id)
> > t_inspect_result.inspect_result_id,
> >
> > t_inspect_result.inspect_result_pass,
> >                    t_inspect_area.inspect_area,
> >
> > t_inspect_result.inspect_result_timestamp
> >                    --,t_inspect.serial_number_id,
> > t_inspect.inspect_id
> > FROM t_inspect_result, t_inspect, t_inspect_area,
> >      t_serial_number,
> > t_link_contract_number_job_number, t_job_number,
> > t_product
> > WHERE  t_inspect.inspect_area_id =
> >        t_inspect_area.inspect_area_id
> > AND t_inspect.serial_number_id =
> >     t_serial_number.serial_number_id
> > AND
> t_serial_number.link_contract_number_job_number_id
> > =
> >
> >
>
t_link_contract_number_job_number.link_contract_number_job_number_id
> > AND
> t_link_contract_number_job_number.job_number_id =
> >     t_job_number.job_number_id
> > AND t_product.product_id =
> >     t_job_number.product_id
> > AND t_inspect.inspect_id =
> >     t_inspect_result.inspect_id
> > AND t_inspect.serial_number_id = '200'
> > ORDER BY t_inspect_result.inspect_id DESC,
> > t_inspect_result.inspect_result_timestamp ASC
> > -- used to get first pass yield pass / fail (true
> /
> > false) data.
> > -- inspect_id desc impacts end result.  time desc
> > impacts the groups prior to being distinctly
> listed
> >
> > the simplified output may look like
> >
> > f,t,t,f,f,f,t,f,t,f
> >
> > the COUNT magic comes into play b/c i want to
> count
> > the result set's "t"s and total, but i have no
> clue
> > how to get this done.
> >
> > #ts: 4
> > #total: 10
> >
> > when i have this data, i can apply some math and
> come
> > up with a 40% yield.
>
> You can do a query like (untested, and needs to be
> translated into your
> monster query):
>
>  select
> a.id,a.total,b.failed,(a.total::numeric)/b.total as
> yield
>    from (select count(test_result) as total from
> table) as a,
>         (select count(test_result) as failed from
> table where
> test_result='f') as b where a.id = b.id;
>
> The point is to do the queries separately as
> subqueries and join them on
> some primary key so that you get the count "total"
> and the count "failed".
> Then you can do the math as above.  Note that you
> have to cast at least one
> of the integers to numeric if you want a numeric
> result.

for those following on (probably just yours truly ;-),
the first iteration (tested and works):

SELECT a.total, b.passed, b.passed/(a.total::numeric)
as yield
FROM (SELECT count(inspect_result_pass) as total
      FROM t_inspect_result)
      AS a,
     (SELECT count(inspect_result_pass) as passed
      FROM t_inspect_result
      WHERE inspect_result_pass = 't')
      AS b

this generates results based on the whole table (not
limited to first entry and not limited by unique
product/serial combo.

in my test case, i have 5 passes and 9 total and 5/9
is displayed as

0.55555555555555...

should i format this in my application code, or does
pgsql allow for formatting?  i want to display 55.5%.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: SQL Count Magic Required.... First Iteration...

From

Sean Davis

Date:

01 June 2006, 15:28:36

> in my test case, i have 5 passes and 9 total and 5/9
> is displayed as
>
> 0.55555555555555...
>
> should i format this in my application code, or does
> pgsql allow for formatting?  i want to display 55.5%.

I would do this on the client side, but it doesn't make any particular
difference. Allows you to maintain a separation of your model from your
view.

http://www.postgresql.org/docs/8.1/static/functions-math.html

See the trunc function.

Re: SQL Count Magic Required.... First Iteration...

From

David

Date:

01 June 2006, 23:11:38

On Thu, Jun 01, 2006 at 02:28:29PM -0400, Sean Davis wrote:
>
> > in my test case, i have 5 passes and 9 total and 5/9
> > is displayed as
> >
> > 0.55555555555555...
> >
> > should i format this in my application code, or does
> > pgsql allow for formatting?  i want to display 55.5%.
>
> I would do this on the client side, but it doesn't make any particular
> difference. Allows you to maintain a separation of your model from your
> view.
>
> http://www.postgresql.org/docs/8.1/static/functions-math.html
>
> See the trunc function.

Or how about casting the answer to numeric(3,1) ( or whatever precision
is desired)?

Re: SQL Count Magic Required.... First Iteration...

From

Date:

02 June 2006, 16:38:34

thanks to help from Sean and Michael, this is the
working query...

SELECT 1.0 * sum(CASE WHEN inspect_result_pass THEN 1
ELSE 0 END) / count(total),
       sum(CASE WHEN inspect_result_pass THEN 1 ELSE 0
END),
       count(total)
FROM (SELECT DISTINCT ON (t_inspect.inspect_id)
t_inspect_result.inspect_result_pass
      FROM t_inspect_result, t_inspect,
t_inspect_area, t_serial_number,
           t_link_contract_number_job_number,
t_job_number, t_product
      WHERE t_inspect_result.inspect_id =
t_inspect.inspect_id
        AND t_inspect.serial_number_id =
t_serial_number.serial_number_id
        AND t_inspect_result.inspect_result_pass = 't'
        AND
t_serial_number.link_contract_number_job_number_id =

t_link_contract_number_job_number.link_contract_number_job_number_id
        AND
t_link_contract_number_job_number.job_number_id =
            t_job_number.job_number_id
        AND t_product.product_id =
t_job_number.product_id
        AND t_inspect.serial_number_id = 200
        AND t_product.product_number = 7214118000
        AND t_inspect_result.inspect_result_timestamp
> '2005-06-01'
        AND t_inspect_result.inspect_result_timestamp
< '2006-06-01'
      ORDER BY t_inspect.inspect_id,
inspect_result_timestamp ASC) AS total

of course, i use bind variable notation (?) for
serial_number_id and product_number.

thanks again.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com