Thread: poor performance of loading data

poor performance of loading data

From

"Zhang, Anna"

Date:

19 December 2001, 13:10:37

I just installed Postgres 7.1.3 on my Red Hat 7.2 linux box. We are doing
research to see how postgres doing, I used copy utility to import data from
a text file which contains 32 mils rows, it has been 26 hours passed, but
still running. My question is how postgres handles such data loading? it
commited every row? or commit point is adjustable? How? Does postgres
provide direct load to disk files like oracle? Other ways to speed up? If
loading performance can't be improved significantly, we have to go back to
oracle. Anybody can help? Thanks!

Anna Zhang

Re: poor performance of loading data

From

"Mitch Vincent"

Date:

19 December 2001, 13:40:48

    Are the rows huge? What kind of machine hardware-wise are we talking
about? Did you start the postmaster with fsync disabled? I generally turn
fsync off for importing, the improvement is amazing :-)

Good luck!

-Mitch

----- Original Message -----
From: "Zhang, Anna" <azhang@verisign.com>
To: <pgsql-admin@postgresql.org>
Sent: Wednesday, December 19, 2001 10:57 AM
Subject: [ADMIN] poor performance of loading data


>
> I just installed Postgres 7.1.3 on my Red Hat 7.2 linux box. We are doing
> research to see how postgres doing, I used copy utility to import data
from
> a text file which contains 32 mils rows, it has been 26 hours passed, but
> still running. My question is how postgres handles such data loading? it
> commited every row? or commit point is adjustable? How? Does postgres
> provide direct load to disk files like oracle? Other ways to speed up? If
> loading performance can't be improved significantly, we have to go back to
> oracle. Anybody can help? Thanks!
>
> Anna Zhang
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

Re: poor performance of loading data

From

bangh

Date:

19 December 2001, 17:23:07

To create an index, to do ordering is a basic operation, it is time consumed
job. Generally there is only relative better algorithm to be used to do
ordering. Some cases, it might be pretty good, but this is sure, not for each
one. If you look the typical ordering algorithms, different initial conditions
an algorithm behaves total differently. If your object is numbers, it must be
fast to order them.

In you case, you might have no choice to create a pure numeric index.

You really have 4 indexes for this table? or you just have one index, but it is
from 4 fields.

If you really have 3 indexes, it is too many. Not only slow down when you add a
new record, also take your space.

In you case, I still suggest:

1. import from file without index
2. create index after import

Banghe

"Zhang, Anna" wrote:

> I am loading data to one table. This table has 9 columns which are all text,
> and 4 indexes. If all indexes dropped before loading I am sure loading will
> be speed up, but recreate those indexes still time consuming, overall still
> a problem.
>
> Thanks for information.
>
> Anna Zhang
>
> -----Original Message-----
> From: bangh [mailto:banghe@baileylink.net]
> Sent: Wednesday, December 19, 2001 4:33 PM
> To: Zhang, Anna
> Subject: Re: [ADMIN] poor performance of loading data
>
> Just wonder to how many tables does the data go from your text file?
>
> If just one table (simple case), how is your table defined?
>
> I think the index may be the key that will affect the speed.
>
> Have you defined the index in that table? As some guys said, if the index is
> numeric, it could be better than chars.
>
> Banghe
>
> "Zhang, Anna" wrote:
>
> > I just installed Postgres 7.1.3 on my Red Hat 7.2 linux box. We are doing
> > research to see how postgres doing, I used copy utility to import data
> from
> > a text file which contains 32 mils rows, it has been 26 hours passed, but
> > still running. My question is how postgres handles such data loading? it
> > commited every row? or commit point is adjustable? How? Does postgres
> > provide direct load to disk files like oracle? Other ways to speed up? If
> > loading performance can't be improved significantly, we have to go back to
> > oracle. Anybody can help? Thanks!
> >
> > Anna Zhang
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: poor performance of loading data

From

Chris Ruprecht

Date:

19 December 2001, 20:10:22

Anna,

At 12:57 -0500 12/19/2001, Zhang, Anna wrote:
>I just installed Postgres 7.1.3 on my Red Hat 7.2 linux box. We are doing
>research to see how postgres doing, I used copy utility to import data from
>a text file which contains 32 mils rows, it has been 26 hours passed, but
>still running. My question is how postgres handles such data loading? it
>commited every row? or commit point is adjustable? How? Does postgres
>provide direct load to disk files like oracle? Other ways to speed up? If
>loading performance can't be improved significantly, we have to go back to
>oracle. Anybody can help? Thanks!

Did you edit the PostGreSQL startup parameters? if not, you have a
very sluggish system. You should have an ample supply of shared
memory (2000 - 8000 blocks of 8K each). Also, do you have an index
defined on your table? If so, drop the index, load the records, then,
re-create the index(es) for the table. Speed will improve
tremendously for copy commands with no index to update. creating an
index for all 32M records is much faster than having to insert index
values.

You might find your file loaded in a fraction of the time is has used
already with these optimizations applied. Without knowing more
specifics, this is all I can recommend.

Best regards,
Chris
--
Chris Ruprecht
Network grunt and bit pusher extraordinaíre

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

Postgres RPMS for RedHat 7.2

From

Oleg Lebedev

Date:

28 December 2001, 18:43:01

Hi everybody,
I just installed RedHat 7.2 on one of my machines. I was trying to locate latest
postgres RPMS for this version of linux, but couldn't find any. When I try to
use RPMS for RedHat 7.1, I get the following error:
error: failed dependencies
    libpq.so.2    is needed by postgresql-7.1.3-1PGDG
    libreadline.so.4.1    is needed by postgresql-7.1.3-1PGDG
I found out that the first file is in $PGSQL/lib directory, but I don't have
libreadline.so.4.1, but libreadline.so.4 and libreadline.so.4.2 in /usr/lib. Is
it possible at all to use RedHat 7.1 RPMS on RedHat 7.2?

I can avoid using RPMS if smbd. can tell he how I can do a FULL installation of
postgres. In other words, I want all packages in $PG_SRC/src/include to be
compiled and place in $PGSQL/include directory, where $PG_SRC and $PGSQL are
postgres source and installation directories respectively. The thing is that
when I configure it with PHP, PHP can't find some of the packages in
$PGSQL/include, so I have to configure PHP with $PG_SRC/src in order to compile
it, which causes some other problems later.
thanks,

Oleg

Re: Postgres RPMS for RedHat 7.2

From

Lamar Owen

Date:

28 December 2001, 21:43:09

On Friday 28 December 2001 06:55 pm, Oleg Lebedev wrote:
> Hi everybody,
> I just installed RedHat 7.2 on one of my machines. I was trying to locate
> latest postgres RPMS for this version of linux, but couldn't find any. When
> I try to use RPMS for RedHat 7.1, I get the following error:

RedHat 7.2 includes PostgreSQL 7.1.3 as part of the base OS install.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

Re: Postgres RPMS for RedHat 7.2

From

"James Kelty"

Date:

03 January 2002, 09:07:02

When you compile the postgres, you can make it so that other progs can find
the included when you need them to by putting the $PGSQL/lib in
/etc/ld.so.conf and running /sbin/ldconfig

You can always specify where the includes and libs are in php using
the --with-pgsql=$PGSQL

-James

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Oleg Lebedev
Sent: Friday, December 28, 2001 3:55 PM
To: Chris Ruprecht
Cc: 'pgsql-admin@postgresql.org'
Subject: [ADMIN] Postgres RPMS for RedHat 7.2


Hi everybody,
I just installed RedHat 7.2 on one of my machines. I was trying to locate
latest
postgres RPMS for this version of linux, but couldn't find any. When I try
to
use RPMS for RedHat 7.1, I get the following error:
error: failed dependencies
    libpq.so.2    is needed by postgresql-7.1.3-1PGDG
    libreadline.so.4.1    is needed by postgresql-7.1.3-1PGDG
I found out that the first file is in $PGSQL/lib directory, but I don't have
libreadline.so.4.1, but libreadline.so.4 and libreadline.so.4.2 in /usr/lib.
Is
it possible at all to use RedHat 7.1 RPMS on RedHat 7.2?

I can avoid using RPMS if smbd. can tell he how I can do a FULL installation
of
postgres. In other words, I want all packages in $PG_SRC/src/include to be
compiled and place in $PGSQL/include directory, where $PG_SRC and $PGSQL are
postgres source and installation directories respectively. The thing is that
when I configure it with PHP, PHP can't find some of the packages in
$PGSQL/include, so I have to configure PHP with $PG_SRC/src in order to
compile
it, which causes some other problems later.
thanks,

Oleg



---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org