Thread: Batching up data into groups of n rows
Hi, I have a need to write a query that batches up rows of a table into groups of n records. I feel like this should be possible using the existing window functions but I can't figure out how to do it from the examples. So I have some table, let's say create table addresses ( line_1 text, line_2 text, city text, state text, zip text); ...and I want to select the data from that table, adding a "group_id" column, and a "record_id" column. The "group_id" would start at 1, and increment by 1 every 100 rows, and the "record_id" would increment by 1 every row, but restart at 1 for each new group_id. Thanks, Andy
On Mon, Sep 26, 2011 at 18:59, Andy Chambers <achambers@mcna.net> wrote: > ...and I want to select the data from that table, adding a "group_id" > column, and a "record_id" column. The "group_id" would start at 1, > and increment by 1 every 100 rows, and the "record_id" would increment > by 1 every row, but restart at 1 for each new group_id. I can't see why you would want this, but regardless... Since you didn't list a primary key for the table, I'm using "ctid" as a substitute. But that's a PostgreSQL-specific hack, normally you would use the real primary key instead. update addresses set group_id=(nr/100)+1, record_id=(nr%100)+1 from (select ctid, row_number() over () -1 as nr from addresses) as subq where subq.ctid=addresses.ctid; This isn't going to be fast for a large table as it will effectively make 3 passes over the table, but it will get the job done. If you want the numbers to be ordered by certain columns, you'd use row_number() OVER (ORDER BY col1, col2) Regards, Marti
On Mon, Sep 26, 2011 at 4:22 PM, Marti Raudsepp <marti@juffo.org> wrote: > On Mon, Sep 26, 2011 at 18:59, Andy Chambers <achambers@mcna.net> wrote: >> ...and I want to select the data from that table, adding a "group_id" >> column, and a "record_id" column. The "group_id" would start at 1, >> and increment by 1 every 100 rows, and the "record_id" would increment >> by 1 every row, but restart at 1 for each new group_id. > > I can't see why you would want this, but regardless... The addresses need to be sent to a 3rd party web-service for canonicalization. The web service accepts batches of <100 addresses. I was wondering how I'd get Postgres to generate the XML for sending 100 addresses at a time to this web service. > Since you didn't list a primary key for the table, I'm using "ctid" as > a substitute. But that's a PostgreSQL-specific hack, normally you > would use the real primary key instead. > > update addresses set group_id=(nr/100)+1, record_id=(nr%100)+1 from > (select ctid, row_number() over () -1 as nr from addresses) as subq > where subq.ctid=addresses.ctid; Cool! I don't need to actually store these ids in the database, they just need to be generated on the fly and forgotten but I think I can adapt the example to do what I need. Thanks, Andy