Thread: Keeping top N records of a group

Keeping top N records of a group

From
Alex Magnum
Date:
Hi,
i want to archive data and am looking for a query to replace a rather slow function i am currently using.

The idea is that for every sym and doc_key I want to keep the records of the top 2 sources. Eg. in below table I want to archive 
sym 1022 of source 3000 but keep the 4 records from sources 4045 and 4081.

Any help on this would be appreciated.


 source    | sym          | doc_id  | doc_key
-----------+--------------+---------+--------------
      3000 | 1022         |  551008 | 23834363  <-- Archive
      3000 | 1022         |  551008 | 23834363  <-- Archive
      4045 | 1022         | 3699290 | 23834363
      4045 | 1022         | 3699290 | 23834363
      4081 | 1022         | 3811196 | 23834363
      4081 | 1022         | 3811196 | 23834363
      3000 | 1028         |  550997 | 23834363
      3468 | 1085         | 1740526 | 23834363
      3000 | 1149         |  551017 | 23834363
      4045 | 1149         | 3699291 | 23834363
      3000 | 12           |  551015 | 23834363
      3000 | 12           |  551015 | 23834363
      3951 | 12           | 3147700 | 23834363
      3951 | 12           | 3147700 | 23834363
      3000 | 13           |  551000 | 23834363
      4045 | 13           | 3699283 | 23834363
      3000 | 1327         |  551010 | 23834363
      3971 | 1327         | 3394469 | 23834363
      3000 | 15           |  551001 | 23834363
      4045 | 15           | 3699284 | 23834363
      3000 | 16           |  551002 | 23834363
      4045 | 16           | 3699285 | 23834363
      3000 | 18           |  551013 | 23834363
      4045 | 18           | 3699286 | 23834363
      3000 | 257          |  551005 | 23834363
      3951 | 257          | 3147701 | 23834363
      3000 | 2795         |  551011 | 23834363 <-- Archive
      3459 | 2795         | 1710571 | 23834363 <-- Archive
      3905 | 2795         | 2994791 | 23834363
      4045 | 2795         | 3699292 | 23834363
      3060 | 2913         |  856199 | 23834363 
      3000 | 2954         |  551012 | 23834363 <-- Archive
      3971 | 2954         | 3394470 | 23834363
      4212 | 2954         | 4650870 | 23834363
      3183 | 3427         | 1055492 | 23834363 <-- Archive
      3971 | 3427         | 3394471 | 23834363
      4248 | 3427         | 4763105 | 23834363
      3188 | 594          | 1062642 | 23834363 <-- Archive
      3188 | 594          | 1062642 | 23834363 <-- Archive
      3191 | 594          | 1067501 | 23834363 <-- Archive
      3191 | 594          | 1067501 | 23834363 <-- Archive
      3192 | 594          | 1068391 | 23834363 <-- Archive
      3192 | 594          | 1068391 | 23834363 <-- Archive
      3199 | 594          | 1096070 | 23834363 <-- Archive
      3199 | 594          | 1096070 | 23834363 <-- Archive
      3303 | 594          | 1305467 | 23834363
      3303 | 594          | 1305467 | 23834363
      4117 | 594          | 4000987 | 23834363

Thanks a lot for any ideas.
Alex

Re: Keeping top N records of a group

From
Andreas Kretschmer
Date:
Alex Magnum <magnum11200@gmail.com> wrote:

> Hi,
> i want to archive data and am looking for a query to replace a rather slow
> function i am currently using.
>
> The idea is that for every sym and doc_key I want to keep the records of the
> top 2 sources. Eg. in below table I want to archive 

sounds like a case for window-functions here (maybe row_number() or
rank()), please read that for the start:

http://stackoverflow.com/questions/7613785/postgresql-top-n-entries-per-item-in-same-table



Regards, Andreas Kretschmer
--
Andreas Kretschmer
http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services