Thread: Inserting streamed data

Inserting streamed data

From

Kevin Old

Date:

31 October 2002, 13:13:32

Hello everyone,

I have data that is streamed to my server and stored in a text file.  I
need to get that data into my database as fast as possible.  There are
approximately 160,000 rows in this text file.  I understand I can use
the COPY command to insert large chunks of data from a text file, but I
can't use it in this situation.  Each record in the text file has 502
"fields".  I pull out 50 of those.  I haven't found a way to manipulate
the COPY command to pull out the values I need.  So that solution would
be out.

I have a perl script that goes through the file and pulls out the 50
fields, then inserts them into the database, but it seems to be very
slow.  I think I just need some minor performance tuning, but dont' know
which variables to set in the postgresql.conf file that would help with
the speed of the inserts.

Here's my postgresql.conf file now:

max_connections = 10
shared_buffers = 20


I'm running a Solaris 2.7 with 2GB RAM.

Also, saw this at
http://developer.postgresql.org/docs/postgres/kernel-resources.html

[snip...]

Solaris

        At least in version 2.6, the default maximum size of a shared
        memory segments is too low for PostgreSQL. The relevant settings
        can be changed in /etc/system, for example:

        set shmsys:shminfo_shmmax=0x2000000
        set shmsys:shminfo_shmmin=1
        set shmsys:shminfo_shmmni=256
        set shmsys:shminfo_shmseg=256

        set semsys:seminfo_semmap=256
        set semsys:seminfo_semmni=512
        set semsys:seminfo_semmns=512
        set semsys:seminfo_semmsl=32

[snip...]

Should I do this?

Thanks,
Kevin

--
Kevin Old <kold@carolina.rr.com>

Re: Inserting streamed data

From

Doug McNaught

Date:

31 October 2002, 13:19:50

Kevin Old <kold@carolina.rr.com> writes:

> I have data that is streamed to my server and stored in a text file.  I
> need to get that data into my database as fast as possible.  There are
> approximately 160,000 rows in this text file.  I understand I can use
> the COPY command to insert large chunks of data from a text file, but I
> can't use it in this situation.  Each record in the text file has 502
> "fields".  I pull out 50 of those.  I haven't found a way to manipulate
> the COPY command to pull out the values I need.  So that solution would
> be out.
>
> I have a perl script that goes through the file and pulls out the 50
> fields, then inserts them into the database, but it seems to be very
> slow.  I think I just need some minor performance tuning, but dont' know
> which variables to set in the postgresql.conf file that would help with
> the speed of the inserts.

First: are you batching up multiple INSERTS in a transaction?  If you
don't it will be very slow indeed.

Second, why not have the Perl script pull out the fields you want,
paste them together and feed them to COPY?  That should eliminate the
parse overhead of multiple INSERTS.

-Doug

Re: Inserting streamed data

From

"David Blood"

Date:

31 October 2002, 13:26:26

Why not use use your perl or awk or sed to rebuild the text file with
the columns you want in the order that you want then copy in.  This is
the only way we have found to get large amount of data inserted quickly.

David Blood
Matraex, Inc


-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Kevin Old
Sent: Thursday, October 31, 2002 11:12 AM
To: pgsql
Subject: [GENERAL] Inserting streamed data

Hello everyone,

I have data that is streamed to my server and stored in a text file.  I
need to get that data into my database as fast as possible.  There are
approximately 160,000 rows in this text file.  I understand I can use
the COPY command to insert large chunks of data from a text file, but I
can't use it in this situation.  Each record in the text file has 502
"fields".  I pull out 50 of those.  I haven't found a way to manipulate
the COPY command to pull out the values I need.  So that solution would
be out.

I have a perl script that goes through the file and pulls out the 50
fields, then inserts them into the database, but it seems to be very
slow.  I think I just need some minor performance tuning, but dont' know
which variables to set in the postgresql.conf file that would help with
the speed of the inserts.

Here's my postgresql.conf file now:

max_connections = 10
shared_buffers = 20


I'm running a Solaris 2.7 with 2GB RAM.

Also, saw this at
http://developer.postgresql.org/docs/postgres/kernel-resources.html

[snip...]

Solaris

        At least in version 2.6, the default maximum size of a shared
        memory segments is too low for PostgreSQL. The relevant settings
        can be changed in /etc/system, for example:

        set shmsys:shminfo_shmmax=0x2000000
        set shmsys:shminfo_shmmin=1
        set shmsys:shminfo_shmmni=256
        set shmsys:shminfo_shmseg=256

        set semsys:seminfo_semmap=256
        set semsys:seminfo_semmni=512
        set semsys:seminfo_semmns=512
        set semsys:seminfo_semmsl=32

[snip...]

Should I do this?

Thanks,
Kevin

--
Kevin Old <kold@carolina.rr.com>


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: Inserting streamed data

From

"Greg Patnude"

Date:

07 November 2002, 09:19:50

Does your table have an index ?? -- You can probably speed it up
significantly by

Preparing the datafile...
Beginning a transaction...

    Dropping the index...
    Doing the 160,000 insert(s)...
    Rebuilding the index...
    Committing the transaction...

Ending the transaction


"Kevin Old" <kold@carolina.rr.com> wrote in message
news:1036087909.3123.54.camel@oc...
> Hello everyone,
>
> I have data that is streamed to my server and stored in a text file.  I
> need to get that data into my database as fast as possible.  There are
> approximately 160,000 rows in this text file.  I understand I can use
> the COPY command to insert large chunks of data from a text file, but I
> can't use it in this situation.  Each record in the text file has 502
> "fields".  I pull out 50 of those.  I haven't found a way to manipulate
> the COPY command to pull out the values I need.  So that solution would
> be out.
>
> I have a perl script that goes through the file and pulls out the 50
> fields, then inserts them into the database, but it seems to be very
> slow.  I think I just need some minor performance tuning, but dont' know
> which variables to set in the postgresql.conf file that would help with
> the speed of the inserts.
>
> Here's my postgresql.conf file now:
>
> max_connections = 10
> shared_buffers = 20
>
>
> I'm running a Solaris 2.7 with 2GB RAM.
>
> Also, saw this at
> http://developer.postgresql.org/docs/postgres/kernel-resources.html
>
> [snip...]
>
> Solaris
>
>         At least in version 2.6, the default maximum size of a shared
>         memory segments is too low for PostgreSQL. The relevant settings
>         can be changed in /etc/system, for example:
>
>         set shmsys:shminfo_shmmax=0x2000000
>         set shmsys:shminfo_shmmin=1
>         set shmsys:shminfo_shmmni=256
>         set shmsys:shminfo_shmseg=256
>
>         set semsys:seminfo_semmap=256
>         set semsys:seminfo_semmni=512
>         set semsys:seminfo_semmns=512
>         set semsys:seminfo_semmsl=32
>
> [snip...]
>
> Should I do this?
>
> Thanks,
> Kevin
>
> --
> Kevin Old <kold@carolina.rr.com>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Re: Inserting streamed data

From

Csaba Nagy

Date:

07 November 2002, 11:36:34

Why don't you pull out the fields with the perl script and write them to a
temprary table, and use COPY to import from that one ?
Perl should be fast with the files, Postgres with the COPY...

Regards,
Csaba.

-----Ursprüngliche Nachricht-----
Von: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]Im Auftrag von Greg Patnude
Gesendet: Samstag, 2. November 2002 18:08
An: pgsql-general@postgresql.org
Betreff: Re: [GENERAL] Inserting streamed data


Does your table have an index ?? -- You can probably speed it up
significantly by

Preparing the datafile...
Beginning a transaction...

    Dropping the index...
    Doing the 160,000 insert(s)...
    Rebuilding the index...
    Committing the transaction...

Ending the transaction


"Kevin Old" <kold@carolina.rr.com> wrote in message
news:1036087909.3123.54.camel@oc...
> Hello everyone,
>
> I have data that is streamed to my server and stored in a text file.  I
> need to get that data into my database as fast as possible.  There are
> approximately 160,000 rows in this text file.  I understand I can use
> the COPY command to insert large chunks of data from a text file, but I
> can't use it in this situation.  Each record in the text file has 502
> "fields".  I pull out 50 of those.  I haven't found a way to manipulate
> the COPY command to pull out the values I need.  So that solution would
> be out.
>
> I have a perl script that goes through the file and pulls out the 50
> fields, then inserts them into the database, but it seems to be very
> slow.  I think I just need some minor performance tuning, but dont' know
> which variables to set in the postgresql.conf file that would help with
> the speed of the inserts.
>
> Here's my postgresql.conf file now:
>
> max_connections = 10
> shared_buffers = 20
>
>
> I'm running a Solaris 2.7 with 2GB RAM.
>
> Also, saw this at
> http://developer.postgresql.org/docs/postgres/kernel-resources.html
>
> [snip...]
>
> Solaris
>
>         At least in version 2.6, the default maximum size of a shared
>         memory segments is too low for PostgreSQL. The relevant settings
>         can be changed in /etc/system, for example:
>
>         set shmsys:shminfo_shmmax=0x2000000
>         set shmsys:shminfo_shmmin=1
>         set shmsys:shminfo_shmmni=256
>         set shmsys:shminfo_shmseg=256
>
>         set semsys:seminfo_semmap=256
>         set semsys:seminfo_semmni=512
>         set semsys:seminfo_semmns=512
>         set semsys:seminfo_semmsl=32
>
> [snip...]
>
> Should I do this?
>
> Thanks,
> Kevin
>
> --
> Kevin Old <kold@carolina.rr.com>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster