Thread: Inserting streamed data
Hello everyone, I have data that is streamed to my server and stored in a text file. I need to get that data into my database as fast as possible. There are approximately 160,000 rows in this text file. I understand I can use the COPY command to insert large chunks of data from a text file, but I can't use it in this situation. Each record in the text file has 502 "fields". I pull out 50 of those. I haven't found a way to manipulate the COPY command to pull out the values I need. So that solution would be out. I have a perl script that goes through the file and pulls out the 50 fields, then inserts them into the database, but it seems to be very slow. I think I just need some minor performance tuning, but dont' know which variables to set in the postgresql.conf file that would help with the speed of the inserts. Here's my postgresql.conf file now: max_connections = 10 shared_buffers = 20 I'm running a Solaris 2.7 with 2GB RAM. Also, saw this at http://developer.postgresql.org/docs/postgres/kernel-resources.html [snip...] Solaris At least in version 2.6, the default maximum size of a shared memory segments is too low for PostgreSQL. The relevant settings can be changed in /etc/system, for example: set shmsys:shminfo_shmmax=0x2000000 set shmsys:shminfo_shmmin=1 set shmsys:shminfo_shmmni=256 set shmsys:shminfo_shmseg=256 set semsys:seminfo_semmap=256 set semsys:seminfo_semmni=512 set semsys:seminfo_semmns=512 set semsys:seminfo_semmsl=32 [snip...] Should I do this? Thanks, Kevin -- Kevin Old <kold@carolina.rr.com>
Kevin Old <kold@carolina.rr.com> writes: > I have data that is streamed to my server and stored in a text file. I > need to get that data into my database as fast as possible. There are > approximately 160,000 rows in this text file. I understand I can use > the COPY command to insert large chunks of data from a text file, but I > can't use it in this situation. Each record in the text file has 502 > "fields". I pull out 50 of those. I haven't found a way to manipulate > the COPY command to pull out the values I need. So that solution would > be out. > > I have a perl script that goes through the file and pulls out the 50 > fields, then inserts them into the database, but it seems to be very > slow. I think I just need some minor performance tuning, but dont' know > which variables to set in the postgresql.conf file that would help with > the speed of the inserts. First: are you batching up multiple INSERTS in a transaction? If you don't it will be very slow indeed. Second, why not have the Perl script pull out the fields you want, paste them together and feed them to COPY? That should eliminate the parse overhead of multiple INSERTS. -Doug
Why not use use your perl or awk or sed to rebuild the text file with the columns you want in the order that you want then copy in. This is the only way we have found to get large amount of data inserted quickly. David Blood Matraex, Inc -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Kevin Old Sent: Thursday, October 31, 2002 11:12 AM To: pgsql Subject: [GENERAL] Inserting streamed data Hello everyone, I have data that is streamed to my server and stored in a text file. I need to get that data into my database as fast as possible. There are approximately 160,000 rows in this text file. I understand I can use the COPY command to insert large chunks of data from a text file, but I can't use it in this situation. Each record in the text file has 502 "fields". I pull out 50 of those. I haven't found a way to manipulate the COPY command to pull out the values I need. So that solution would be out. I have a perl script that goes through the file and pulls out the 50 fields, then inserts them into the database, but it seems to be very slow. I think I just need some minor performance tuning, but dont' know which variables to set in the postgresql.conf file that would help with the speed of the inserts. Here's my postgresql.conf file now: max_connections = 10 shared_buffers = 20 I'm running a Solaris 2.7 with 2GB RAM. Also, saw this at http://developer.postgresql.org/docs/postgres/kernel-resources.html [snip...] Solaris At least in version 2.6, the default maximum size of a shared memory segments is too low for PostgreSQL. The relevant settings can be changed in /etc/system, for example: set shmsys:shminfo_shmmax=0x2000000 set shmsys:shminfo_shmmin=1 set shmsys:shminfo_shmmni=256 set shmsys:shminfo_shmseg=256 set semsys:seminfo_semmap=256 set semsys:seminfo_semmni=512 set semsys:seminfo_semmns=512 set semsys:seminfo_semmsl=32 [snip...] Should I do this? Thanks, Kevin -- Kevin Old <kold@carolina.rr.com> ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
Does your table have an index ?? -- You can probably speed it up significantly by Preparing the datafile... Beginning a transaction... Dropping the index... Doing the 160,000 insert(s)... Rebuilding the index... Committing the transaction... Ending the transaction "Kevin Old" <kold@carolina.rr.com> wrote in message news:1036087909.3123.54.camel@oc... > Hello everyone, > > I have data that is streamed to my server and stored in a text file. I > need to get that data into my database as fast as possible. There are > approximately 160,000 rows in this text file. I understand I can use > the COPY command to insert large chunks of data from a text file, but I > can't use it in this situation. Each record in the text file has 502 > "fields". I pull out 50 of those. I haven't found a way to manipulate > the COPY command to pull out the values I need. So that solution would > be out. > > I have a perl script that goes through the file and pulls out the 50 > fields, then inserts them into the database, but it seems to be very > slow. I think I just need some minor performance tuning, but dont' know > which variables to set in the postgresql.conf file that would help with > the speed of the inserts. > > Here's my postgresql.conf file now: > > max_connections = 10 > shared_buffers = 20 > > > I'm running a Solaris 2.7 with 2GB RAM. > > Also, saw this at > http://developer.postgresql.org/docs/postgres/kernel-resources.html > > [snip...] > > Solaris > > At least in version 2.6, the default maximum size of a shared > memory segments is too low for PostgreSQL. The relevant settings > can be changed in /etc/system, for example: > > set shmsys:shminfo_shmmax=0x2000000 > set shmsys:shminfo_shmmin=1 > set shmsys:shminfo_shmmni=256 > set shmsys:shminfo_shmseg=256 > > set semsys:seminfo_semmap=256 > set semsys:seminfo_semmni=512 > set semsys:seminfo_semmns=512 > set semsys:seminfo_semmsl=32 > > [snip...] > > Should I do this? > > Thanks, > Kevin > > -- > Kevin Old <kold@carolina.rr.com> > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
Why don't you pull out the fields with the perl script and write them to a temprary table, and use COPY to import from that one ? Perl should be fast with the files, Postgres with the COPY... Regards, Csaba. -----Ursprüngliche Nachricht----- Von: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]Im Auftrag von Greg Patnude Gesendet: Samstag, 2. November 2002 18:08 An: pgsql-general@postgresql.org Betreff: Re: [GENERAL] Inserting streamed data Does your table have an index ?? -- You can probably speed it up significantly by Preparing the datafile... Beginning a transaction... Dropping the index... Doing the 160,000 insert(s)... Rebuilding the index... Committing the transaction... Ending the transaction "Kevin Old" <kold@carolina.rr.com> wrote in message news:1036087909.3123.54.camel@oc... > Hello everyone, > > I have data that is streamed to my server and stored in a text file. I > need to get that data into my database as fast as possible. There are > approximately 160,000 rows in this text file. I understand I can use > the COPY command to insert large chunks of data from a text file, but I > can't use it in this situation. Each record in the text file has 502 > "fields". I pull out 50 of those. I haven't found a way to manipulate > the COPY command to pull out the values I need. So that solution would > be out. > > I have a perl script that goes through the file and pulls out the 50 > fields, then inserts them into the database, but it seems to be very > slow. I think I just need some minor performance tuning, but dont' know > which variables to set in the postgresql.conf file that would help with > the speed of the inserts. > > Here's my postgresql.conf file now: > > max_connections = 10 > shared_buffers = 20 > > > I'm running a Solaris 2.7 with 2GB RAM. > > Also, saw this at > http://developer.postgresql.org/docs/postgres/kernel-resources.html > > [snip...] > > Solaris > > At least in version 2.6, the default maximum size of a shared > memory segments is too low for PostgreSQL. The relevant settings > can be changed in /etc/system, for example: > > set shmsys:shminfo_shmmax=0x2000000 > set shmsys:shminfo_shmmin=1 > set shmsys:shminfo_shmmni=256 > set shmsys:shminfo_shmseg=256 > > set semsys:seminfo_semmap=256 > set semsys:seminfo_semmni=512 > set semsys:seminfo_semmns=512 > set semsys:seminfo_semmsl=32 > > [snip...] > > Should I do this? > > Thanks, > Kevin > > -- > Kevin Old <kold@carolina.rr.com> > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster