rewrite count distinct query - Mailing list pgsql-general

From Chris Smith
Subject rewrite count distinct query
Date
Msg-id 41B403A2.20000@interspire.com
Whole thread Raw
List pgsql-general
Hi all,

'Scuse the long post :) I'm trying to include all relevant info..

I'm trying to work out a better way to approach a query, any tips are
greatly appreciated.

The relevant tables:

db=# \d tp_conversions
                 Table "public.tp_conversions"
     Column     |          Type          |     Modifiers
---------------+------------------------+--------------------
  conversionid  | integer                | not null default 0
  type          | character varying(10)  |
  name          | character varying(255) |
  amount        | double precision       |
  cookieid      | character varying(32)  |
  currtime      | integer                |
  ip            | character varying(20)  |
  origintype    | character varying(20)  |
  originfrom    | character varying(255) |
  origindetails | character varying(255) |
  userid        | integer                |
Indexes:
     "tp_conversions_pkey" primary key, btree (conversionid)
     "conv_origindetails" btree (origindetails)
     "conv_originfrom" btree (originfrom)
     "conv_origintype" btree (origintype)
     "conv_time" btree (currtime)
     "conv_userid" btree (userid)


trackpoint=# SELECT count(*) from tp_conversions;
  count
-------
    261
(1 row)


db=# \d tp_search
                     Table "public.tp_search"
       Column      |          Type          |     Modifiers
------------------+------------------------+--------------------
  searchid         | integer                | not null default 0
  searchenginename | character varying(255) |
  keywords         | character varying(255) |
  currtime         | integer                |
  ip               | character varying(20)  |
  landingpage      | character varying(255) |
  cookieid         | character varying(32)  |
  userid           | integer                |
Indexes:
     "tp_search_pkey" primary key, btree (searchid)
     "search_cookieid" btree (cookieid)
     "search_keywords" btree (keywords)
     "search_searchenginename" btree (searchenginename)
     "search_userid" btree (userid)


trackpoint=# SELECT count(*) from tp_search;
  count
-------
   5086
(1 row)



What I'm trying to do...

Work out the number of conversions for each search origin.


This query works:

select
count(distinct conversionid) as convcount,
count(distinct searchid) as searchcount,
(count(distinct conversionid) / count(distinct searchid)) as perc,
s.searchenginename
from tp_conversions c, tp_search s
where
c.origintype='search' and s.searchenginename=c.originfrom and
s.userid=c.userid and c.userid=1
group by searchenginename
order by convcount desc;

  convcount | searchcount | perc | searchenginename
-----------+-------------+------+------------------
         15 |        2884 |    0 | Google
          1 |         110 |    0 | Google AU
          2 |         308 |    0 | Google CA
          1 |          25 |    0 | Google CL
          1 |         143 |    0 | Google DE
          1 |         117 |    0 | Google IN
          1 |          26 |    0 | Google NZ
          3 |          49 |    0 | Google RO
          1 |          60 |    0 | Google TH
          2 |         174 |    0 | Yahoo
(10 rows)


However the percentage is wrong.

I can cast one to a float:

(count(distinct conversionid) / count(distinct searchid)::float)

and it'll give me a better percentage:

  convcount | searchcount |        perc         | searchenginename
-----------+-------------+---------------------+------------------
         15 |        2884 | 0.00520110957004161 | Google
          3 |          49 |  0.0612244897959184 | Google RO
          2 |         308 | 0.00649350649350649 | Google CA
          2 |         174 |  0.0114942528735632 | Yahoo
          1 |         110 | 0.00909090909090909 | Google AU
          1 |          25 |                0.04 | Google CL
          1 |         143 | 0.00699300699300699 | Google DE
          1 |         117 | 0.00854700854700855 | Google IN
          1 |          26 |  0.0384615384615385 | Google NZ
          1 |          60 |  0.0166666666666667 | Google TH
(10 rows)

(I think the answer to this is 'no' but I'm going to ask anyway :P)
Is there an easier way to get the more-detailed percentage (it's meant
to work in multiple databases - so casting to a float won't work for
other db's) ?


More importantly... Is there a better way to write the query (I don't
like the count(distinct...) but it works and gives the right info) ?

I tried to do it with a union:

SELECT
count(searchid),
searchenginename
from tp_search s
where userid=1
group by searchenginename
union
select
count(conversionid),
originfrom
from tp_conversions c
where c.userid=1
group by originfrom;

but then realised that getting the data out with php would be a
nightmare (plus I can't get the percentages).

Lastly:

db=# SELECT version();
                             version
---------------------------------------------------------------
  PostgreSQL 7.4.3 on i386-pc-linux-gnu, compiled by GCC 2.95.4
(1 row)

(I know it's a little out of date, upgrading later this week).

Any suggestions/hints/tips welcome :)

Thanks,
Chris.

pgsql-general by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: [HACKERS] DBD::PgSPI 0.02
Next
From: "Rolf Østvik"
Date:
Subject: Re: 3rd RFD: comp.databases.postgresql (was: