Thread: Keeping top N records of a group
Hi,
i want to archive data and am looking for a query to replace a rather slow function i am currently using.
The idea is that for every sym and doc_key I want to keep the records of the top 2 sources. Eg. in below table I want to archive
sym 1022 of source 3000 but keep the 4 records from sources 4045 and 4081.
Any help on this would be appreciated.
source | sym | doc_id | doc_key
-----------+--------------+---------+--------------
3000 | 1022 | 551008 | 23834363 <-- Archive
3000 | 1022 | 551008 | 23834363 <-- Archive
4045 | 1022 | 3699290 | 23834363
4045 | 1022 | 3699290 | 23834363
4081 | 1022 | 3811196 | 23834363
4081 | 1022 | 3811196 | 23834363
3000 | 1028 | 550997 | 23834363
3468 | 1085 | 1740526 | 23834363
3000 | 1149 | 551017 | 23834363
4045 | 1149 | 3699291 | 23834363
3000 | 12 | 551015 | 23834363
3000 | 12 | 551015 | 23834363
3951 | 12 | 3147700 | 23834363
3951 | 12 | 3147700 | 23834363
3000 | 13 | 551000 | 23834363
4045 | 13 | 3699283 | 23834363
3000 | 1327 | 551010 | 23834363
3971 | 1327 | 3394469 | 23834363
3000 | 15 | 551001 | 23834363
4045 | 15 | 3699284 | 23834363
3000 | 16 | 551002 | 23834363
4045 | 16 | 3699285 | 23834363
3000 | 18 | 551013 | 23834363
4045 | 18 | 3699286 | 23834363
3000 | 257 | 551005 | 23834363
3951 | 257 | 3147701 | 23834363
3000 | 2795 | 551011 | 23834363 <-- Archive
3459 | 2795 | 1710571 | 23834363 <-- Archive
3905 | 2795 | 2994791 | 23834363
4045 | 2795 | 3699292 | 23834363
3060 | 2913 | 856199 | 23834363
3000 | 2954 | 551012 | 23834363 <-- Archive
3971 | 2954 | 3394470 | 23834363
4212 | 2954 | 4650870 | 23834363
3183 | 3427 | 1055492 | 23834363 <-- Archive
3971 | 3427 | 3394471 | 23834363
4248 | 3427 | 4763105 | 23834363
3188 | 594 | 1062642 | 23834363 <-- Archive
3188 | 594 | 1062642 | 23834363 <-- Archive
3191 | 594 | 1067501 | 23834363 <-- Archive
3191 | 594 | 1067501 | 23834363 <-- Archive
3192 | 594 | 1068391 | 23834363 <-- Archive
3192 | 594 | 1068391 | 23834363 <-- Archive
3199 | 594 | 1096070 | 23834363 <-- Archive
3199 | 594 | 1096070 | 23834363 <-- Archive
3303 | 594 | 1305467 | 23834363
3303 | 594 | 1305467 | 23834363
4117 | 594 | 4000987 | 23834363
Thanks a lot for any ideas.
Alex
Alex Magnum <magnum11200@gmail.com> wrote: > Hi, > i want to archive data and am looking for a query to replace a rather slow > function i am currently using. > > The idea is that for every sym and doc_key I want to keep the records of the > top 2 sources. Eg. in below table I want to archive sounds like a case for window-functions here (maybe row_number() or rank()), please read that for the start: http://stackoverflow.com/questions/7613785/postgresql-top-n-entries-per-item-in-same-table Regards, Andreas Kretschmer -- Andreas Kretschmer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services