Thread: poor performance of loading data
I just installed Postgres 7.1.3 on my Red Hat 7.2 linux box. We are doing research to see how postgres doing, I used copy utility to import data from a text file which contains 32 mils rows, it has been 26 hours passed, but still running. My question is how postgres handles such data loading? it commited every row? or commit point is adjustable? How? Does postgres provide direct load to disk files like oracle? Other ways to speed up? If loading performance can't be improved significantly, we have to go back to oracle. Anybody can help? Thanks! Anna Zhang
Are the rows huge? What kind of machine hardware-wise are we talking about? Did you start the postmaster with fsync disabled? I generally turn fsync off for importing, the improvement is amazing :-) Good luck! -Mitch ----- Original Message ----- From: "Zhang, Anna" <azhang@verisign.com> To: <pgsql-admin@postgresql.org> Sent: Wednesday, December 19, 2001 10:57 AM Subject: [ADMIN] poor performance of loading data > > I just installed Postgres 7.1.3 on my Red Hat 7.2 linux box. We are doing > research to see how postgres doing, I used copy utility to import data from > a text file which contains 32 mils rows, it has been 26 hours passed, but > still running. My question is how postgres handles such data loading? it > commited every row? or commit point is adjustable? How? Does postgres > provide direct load to disk files like oracle? Other ways to speed up? If > loading performance can't be improved significantly, we have to go back to > oracle. Anybody can help? Thanks! > > Anna Zhang > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
To create an index, to do ordering is a basic operation, it is time consumed job. Generally there is only relative better algorithm to be used to do ordering. Some cases, it might be pretty good, but this is sure, not for each one. If you look the typical ordering algorithms, different initial conditions an algorithm behaves total differently. If your object is numbers, it must be fast to order them. In you case, you might have no choice to create a pure numeric index. You really have 4 indexes for this table? or you just have one index, but it is from 4 fields. If you really have 3 indexes, it is too many. Not only slow down when you add a new record, also take your space. In you case, I still suggest: 1. import from file without index 2. create index after import Banghe "Zhang, Anna" wrote: > I am loading data to one table. This table has 9 columns which are all text, > and 4 indexes. If all indexes dropped before loading I am sure loading will > be speed up, but recreate those indexes still time consuming, overall still > a problem. > > Thanks for information. > > Anna Zhang > > -----Original Message----- > From: bangh [mailto:banghe@baileylink.net] > Sent: Wednesday, December 19, 2001 4:33 PM > To: Zhang, Anna > Subject: Re: [ADMIN] poor performance of loading data > > Just wonder to how many tables does the data go from your text file? > > If just one table (simple case), how is your table defined? > > I think the index may be the key that will affect the speed. > > Have you defined the index in that table? As some guys said, if the index is > numeric, it could be better than chars. > > Banghe > > "Zhang, Anna" wrote: > > > I just installed Postgres 7.1.3 on my Red Hat 7.2 linux box. We are doing > > research to see how postgres doing, I used copy utility to import data > from > > a text file which contains 32 mils rows, it has been 26 hours passed, but > > still running. My question is how postgres handles such data loading? it > > commited every row? or commit point is adjustable? How? Does postgres > > provide direct load to disk files like oracle? Other ways to speed up? If > > loading performance can't be improved significantly, we have to go back to > > oracle. Anybody can help? Thanks! > > > > Anna Zhang > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Anna, At 12:57 -0500 12/19/2001, Zhang, Anna wrote: >I just installed Postgres 7.1.3 on my Red Hat 7.2 linux box. We are doing >research to see how postgres doing, I used copy utility to import data from >a text file which contains 32 mils rows, it has been 26 hours passed, but >still running. My question is how postgres handles such data loading? it >commited every row? or commit point is adjustable? How? Does postgres >provide direct load to disk files like oracle? Other ways to speed up? If >loading performance can't be improved significantly, we have to go back to >oracle. Anybody can help? Thanks! Did you edit the PostGreSQL startup parameters? if not, you have a very sluggish system. You should have an ample supply of shared memory (2000 - 8000 blocks of 8K each). Also, do you have an index defined on your table? If so, drop the index, load the records, then, re-create the index(es) for the table. Speed will improve tremendously for copy commands with no index to update. creating an index for all 32M records is much faster than having to insert index values. You might find your file loaded in a fraction of the time is has used already with these optimizations applied. Without knowing more specifics, this is all I can recommend. Best regards, Chris -- Chris Ruprecht Network grunt and bit pusher extraordinaíre _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Hi everybody, I just installed RedHat 7.2 on one of my machines. I was trying to locate latest postgres RPMS for this version of linux, but couldn't find any. When I try to use RPMS for RedHat 7.1, I get the following error: error: failed dependencies libpq.so.2 is needed by postgresql-7.1.3-1PGDG libreadline.so.4.1 is needed by postgresql-7.1.3-1PGDG I found out that the first file is in $PGSQL/lib directory, but I don't have libreadline.so.4.1, but libreadline.so.4 and libreadline.so.4.2 in /usr/lib. Is it possible at all to use RedHat 7.1 RPMS on RedHat 7.2? I can avoid using RPMS if smbd. can tell he how I can do a FULL installation of postgres. In other words, I want all packages in $PG_SRC/src/include to be compiled and place in $PGSQL/include directory, where $PG_SRC and $PGSQL are postgres source and installation directories respectively. The thing is that when I configure it with PHP, PHP can't find some of the packages in $PGSQL/include, so I have to configure PHP with $PG_SRC/src in order to compile it, which causes some other problems later. thanks, Oleg
On Friday 28 December 2001 06:55 pm, Oleg Lebedev wrote: > Hi everybody, > I just installed RedHat 7.2 on one of my machines. I was trying to locate > latest postgres RPMS for this version of linux, but couldn't find any. When > I try to use RPMS for RedHat 7.1, I get the following error: RedHat 7.2 includes PostgreSQL 7.1.3 as part of the base OS install. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
When you compile the postgres, you can make it so that other progs can find the included when you need them to by putting the $PGSQL/lib in /etc/ld.so.conf and running /sbin/ldconfig You can always specify where the includes and libs are in php using the --with-pgsql=$PGSQL -James -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Oleg Lebedev Sent: Friday, December 28, 2001 3:55 PM To: Chris Ruprecht Cc: 'pgsql-admin@postgresql.org' Subject: [ADMIN] Postgres RPMS for RedHat 7.2 Hi everybody, I just installed RedHat 7.2 on one of my machines. I was trying to locate latest postgres RPMS for this version of linux, but couldn't find any. When I try to use RPMS for RedHat 7.1, I get the following error: error: failed dependencies libpq.so.2 is needed by postgresql-7.1.3-1PGDG libreadline.so.4.1 is needed by postgresql-7.1.3-1PGDG I found out that the first file is in $PGSQL/lib directory, but I don't have libreadline.so.4.1, but libreadline.so.4 and libreadline.so.4.2 in /usr/lib. Is it possible at all to use RedHat 7.1 RPMS on RedHat 7.2? I can avoid using RPMS if smbd. can tell he how I can do a FULL installation of postgres. In other words, I want all packages in $PG_SRC/src/include to be compiled and place in $PGSQL/include directory, where $PG_SRC and $PGSQL are postgres source and installation directories respectively. The thing is that when I configure it with PHP, PHP can't find some of the packages in $PGSQL/include, so I have to configure PHP with $PG_SRC/src in order to compile it, which causes some other problems later. thanks, Oleg ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org