Thread: PostgreSQL add id column that increments based on data
I am relatively new to using Postgres, coming from a MySQL background. I am using Postgres 9.3.4 on Windows x64. We are being supplied data in multiple fixed length text files. The first digit on each line is a number between 1 and 4 that indicates the record type of the data in that row. The rows are grouped sequentially such that there will always first be a row of type 1 followed by zero or more rows of the other types. data_x.txt --------------------- 1data01 2data02 4data03 4data04 1data05 1data06 3data07 To import this into Postgres I have used the following SQL commands: CREATE TABLE data_raw ( raw_data TEXT ); COPY data_raw FROM 'C:\path\data_x.txt' ...; -- Repeated for eachfile ALTER TABLE data_raw ADD COLUMN indicator integer; UPDATE data_raw SET indicator = CAST(substr(raw_data,1, 1) AS integer), raw_data = substr(raw_data, 2); I then create tables for each of the 4 record types: CREATE TABLE table_1 SELECT raw_data FROM data_raw WHERE indicator = 1; CREATE TABLE table_2 SELECT raw_data FROM data_rawWHERE indicator = 2; CREATE TABLE table_3 SELECT raw_data FROM data_raw WHERE indicator = 3; CREATE TABLE table_4SELECT raw_data FROM data_raw WHERE indicator = 4; What I need to do, but am unsure how, is to also add an "id" column for each group where the indicator starts with 1. We will be getting weekly updates so I need to specify the initial id for each batch. So if this batch starts at id = 225, then I want to get the following tables from the sample data: table_1 id | raw_data -------------------- 225 | data01 226 | data05 227 | data06 table_2 id | raw_data -------------------- 225 | data02 table_3 id | raw_data -------------------- 227 | data07 table_4 id | raw_data -------------------- 225 | data03 225 | data04 -- View this message in context: http://postgresql.1045698.n5.nabble.com/PostgreSQL-add-id-column-that-increments-based-on-data-tp5813514.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
You can try the *sequence* : http://www.postgresql.org/docs/9.3/static/functions-sequence.html Am 01.08.2014 11:15, schrieb DerekW: > I am relatively new to using Postgres, coming from a MySQL background. I am > using Postgres 9.3.4 on Windows x64. > > We are being supplied data in multiple fixed length text files. The first > digit on each line is a number between 1 and 4 that indicates the record > type of the data in that row. The rows are grouped sequentially such that > there will always first be a row of type 1 followed by zero or more rows of > the other types. > > data_x.txt > --------------------- > 1data01 > 2data02 > 4data03 > 4data04 > 1data05 > 1data06 > 3data07 > > To import this into Postgres I have used the following SQL commands: > > CREATE TABLE data_raw ( > raw_data TEXT > ); > > COPY data_raw FROM 'C:\path\data_x.txt' ...; -- Repeated for each file > > ALTER TABLE data_raw > ADD COLUMN indicator integer; > > UPDATE data_raw SET > indicator = CAST(substr(raw_data, 1, 1) AS integer), > raw_data = substr(raw_data, 2); > > I then create tables for each of the 4 record types: > > CREATE TABLE table_1 SELECT raw_data FROM data_raw WHERE indicator = 1; > CREATE TABLE table_2 SELECT raw_data FROM data_raw WHERE indicator = 2; > CREATE TABLE table_3 SELECT raw_data FROM data_raw WHERE indicator = 3; > CREATE TABLE table_4 SELECT raw_data FROM data_raw WHERE indicator = 4; > > What I need to do, but am unsure how, is to also add an "id" column for each > group where the indicator starts with 1. We will be getting weekly updates > so I need to specify the initial id for each batch. So if this batch starts > at id = 225, then I want to get the following tables from the sample data: > > table_1 > id | raw_data > -------------------- > 225 | data01 > 226 | data05 > 227 | data06 > > table_2 > id | raw_data > -------------------- > 225 | data02 > > table_3 > id | raw_data > -------------------- > 227 | data07 > > table_4 > id | raw_data > -------------------- > 225 | data03 > 225 | data04 > > > > > > > -- > View this message in context: http://postgresql.1045698.n5.nabble.com/PostgreSQL-add-id-column-that-increments-based-on-data-tp5813514.html > Sent from the PostgreSQL - sql mailing list archive at Nabble.com. > >
On 08/01/2014 02:15 AM, DerekW wrote: > I am relatively new to using Postgres, coming from a MySQL background. I am > using Postgres 9.3.4 on Windows x64. > > We are being supplied data in multiple fixed length text files. The first > digit on each line is a number between 1 and 4 that indicates the record > type of the data in that row. The rows are grouped sequentially such that > there will always first be a row of type 1 followed by zero or more rows of > the other types. > > data_x.txt > --------------------- > 1data01 > 2data02 > 4data03 > 4data04 > 1data05 > 1data06 > 3data07 > > To import this into Postgres I have used the following SQL commands: > > CREATE TABLE data_raw ( > raw_data TEXT > ); > > COPY data_raw FROM 'C:\path\data_x.txt' ...; -- Repeated for each file > > ALTER TABLE data_raw > ADD COLUMN indicator integer; > > UPDATE data_raw SET > indicator = CAST(substr(raw_data, 1, 1) AS integer), > raw_data = substr(raw_data, 2); > > I then create tables for each of the 4 record types: > > CREATE TABLE table_1 SELECT raw_data FROM data_raw WHERE indicator = 1; > CREATE TABLE table_2 SELECT raw_data FROM data_raw WHERE indicator = 2; > CREATE TABLE table_3 SELECT raw_data FROM data_raw WHERE indicator = 3; > CREATE TABLE table_4 SELECT raw_data FROM data_raw WHERE indicator = 4; > > What I need to do, but am unsure how, is to also add an "id" column for each > group where the indicator starts with 1. We will be getting weekly updates > so I need to specify the initial id for each batch. So if this batch starts > at id = 225, then I want to get the following tables from the sample data: > > table_1 > id | raw_data > -------------------- > 225 | data01 > 226 | data05 > 227 | data06 > > table_2 > id | raw_data > -------------------- > 225 | data02 > > table_3 > id | raw_data > -------------------- > 227 | data07 > > table_4 > id | raw_data > -------------------- > 225 | data03 > 225 | data04 I am not following the logic of the id numbering scheme. I see the sequential numbering for indicator 1. Not why indicator 2 uses 225 again when indicator 3 starts with 227 and indicator 4 has the same id for both rows. My guess, whatever the logic is, it is going to involve triggers and possibly another table to keep track of batch numbers > > > > > > > -- > View this message in context: http://postgresql.1045698.n5.nabble.com/PostgreSQL-add-id-column-that-increments-based-on-data-tp5813514.html > Sent from the PostgreSQL - sql mailing list archive at Nabble.com. > > -- Adrian Klaver adrian.klaver@aklaver.com
DerekW wrote > I am relatively new to using Postgres, coming from a MySQL background. I > am using Postgres 9.3.4 on Windows x64. > > We are being supplied data in multiple fixed length text files. The first > digit on each line is a number between 1 and 4 that indicates the record > type of the data in that row. The rows are grouped sequentially such that > there will always first be a row of type 1 followed by zero or more rows > of the other types. > > data_x.txt > --------------------- > 1data01 > 2data02 > 4data03 > 4data04 > 1data05 > 1data06 > 3data07 > > To import this into Postgres I have used the following SQL commands: > > CREATE TABLE data_raw ( > raw_data TEXT > ); > > COPY data_raw FROM 'C:\path\data_x.txt' ...; -- Repeated for each file > 1. Add a serial column to data_raw so that relative order can be recorded during import. 2. Alter the COPY command to explicitly list only the raw_data column - so the serial column uses its default. At this point if you need to re-assign the sequence numbers on the production tables based upon some business rules you can simply apply your logic and feed in the raw data in the previously established order of import. The general logic would be to query all of the 1 rows and get their import index. Using lead/lag over this set you can, for each 1 record, get the valid range of child indexes. You can then join the non-1 data by use of a between predicate. In the same 1 query as the lead/lag you can assign base group numbers using the row_number function and simply add some base offset. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/PostgreSQL-add-id-column-that-increments-based-on-data-tp5813514p5813543.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Thank you for the input. I have come up with the following solution: CREATE TABLE data_raw ( raw_data TEXT ); COPY data_raw FROM 'C:\path\data_x.txt' ...; -- Repeated for each file ALTER TABLE data_raw ADD COLUMN pk_id serial, ADD COLUMN id integer, ADD COLUMN indicator integer; UPDATE data_raw SET indicator = CAST(substr(raw_data, 1, 1) AS integer), raw_data = substr(raw_data, 2); CREATE TABLE id_base AS SELECT pk_id, sum(CASE WHEN indicator = 1 THEN 1 ELSE 0 END) OVER (ORDER BY pk_id) AS rec_id FROM data_raw; CREATE INDEX id_base_pk ON id_base USING btree(pk_id); UPDATE data_raw r SET id = (SELECT rec_id FROM id_base b WHERE b.pk_id = r.pk_id); DROP TABLE id_base; -- View this message in context: http://postgresql.1045698.n5.nabble.com/PostgreSQL-add-id-column-that-increments-based-on-data-tp5813514p5813661.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.