Re: count of occurences PLUS optimisation - Mailing list pgsql-hackers
From | Haller Christoph |
---|---|
Subject | Re: count of occurences PLUS optimisation |
Date | |
Msg-id | 200109131244.OAA19247@rodos Whole thread Raw |
In response to | Re: count of occurences PLUS optimisation ("Thurstan R. McDougle" <trmcdougle@my-deja.com>) |
List | pgsql-hackers |
Adam, try this select distinct job_num, count (job_num) from search_record group by job_num ; don't worry about an ordered list - group by does it for you. Regards, Christoph > > HACKERS: see the end of this message about a possible optimisation for > ORDER BY+LIMIT cases (the normal use of LIMIT?) > > Adam wrote: > > > > I help run a job database and have a table of search records. I want > > a query that will return the top 10 jobs by search frequency. I'm > > familiar with ORDER BY and LIMIT, so I basically need this: > > > > Given a table search_records: > > job_num > > ------- > > 1 > > 2 > > 2 > > 3 > > 4 > > 4 > > 4 > > > > I want a query that will return: > > job_num | count > > --------+------ > > 1 |1 > > 2 |2 > > 3 |1 > > 4 |3 > > > > I tried > > > > select distinct job_num, (select count(*) from search_records j where > > j.job_num=k.job_num) from search_records k > > > > but it is horribly slow (it takes several minutes on a table of about > > 25k rows!). I assume it scans the entire table for every job_num in > > order to count the number of occurences of that job_num, taking order > > n^2 time. Since I can easily use job_num as an index (being integers > > from 0 to roughly 400 so far) I could just do a "select * from > > search_records" and do the counting in PHP (our HTML pre-processor) in > > order n time. However, I don't know how to do an order n*log(n) sort > > in PHP, just n^2, so there would still be an efficiency problem. > > I have Postgresql 7.0.3. > > Help is of course greatly appreciated. > > I have not tried it but how about:- > > select job_num from > (select job_num, count(*) as c from search_records group by job_num) > order by c limit 10; > > I am not sure if count(*) would work in this context, if not try count() > on some field that is in every record. > > > If you can be sure that the top 10 will have at least a certain > threshold of searches (perhaps >1!) then it MIGHT be faster, due to less > data being sorted for the outer selects order by, (experiment) to do:- > > select job_num from > (select job_num, count(*) as c from search_records group by job_num > HAVING c>1) > order by c limit 10; > > It would depend on how efficient the ORDER BY and LIMIT work together. > (The ORDER BY could build a list of LIMIT n items and just replace items > in that list...a lot more efficient both of memory and comparisons than > building the full list and then keeping the top n) > > HACKERS: If it does not do this it might be a usefull optimisation. > There would probably need to be a cutoff limit on whether to apply this > method or sort and keep n. Also for LIMIT plus OFFSET it would need to > build a list of the the total of the LIMIT and OFFSET figures. > > -- > This is the identity that I use for NewsGroups. Email to > this will just sit there. If you wish to email me replace > the domain with knightpiesold . co . uk (no spaces). > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
pgsql-hackers by date: