Thread: Disk Performance Problem on Large DB
Hello,
I have a RHEL 5 box, 4 GB RAM, single hard drive SATA, Intel Core 2 Duo 2.4. A basic workstation.
I have a simple database, with one table for now. It has 4 columns:
anid serial primary key unique,
time timestamp,
source varchar(5),
unitid varchar(15),
guid varchar(32)
There is a btree index on each.
I am loading data 1,000,000 (1M) rows at a time using psql and a COPY command. Once I hit 2M rows, my performance just drops out, and the next 1M never finishes. It takes 7 minutes for 1M rows to load. Once 2M are in there, I’ve waited an hour, and nothing. It doesn’t seem to matter which 1M rows I try to load next, none ever finish. Each 1M rows is about 70MB on disk in the raw input file.
I have “atop” installed, and it reports the drives at 100%, which it reports for the first 1M rows too. The MBw/s goes from 20+ on the first 2M rows, down to about 4 MBw/s or less now. The processor usage is at about 2 to 8% at this time (used by postgres).
I have even waited for 1M rows to load, then done a vacuum for no good reason, then even restarted postgresql. I’ve made sure no disk or proc activity is happening before I start the next 1M rows. None of that seems to matter.
I have a total of about 70M rows to load, but am at a standstill. I’ve read up on whatever performance docs I can find online, but I am not getting anywhere.
I’ve increased shared_buffers to 256MB, and I’ve tried it with fsync commented out as per the default config. I’ve also tried it with fsync=off. No difference.
Ideas? Thanks in advance,
Jon
On Thu, Nov 04, 2010 at 01:42:49PM -0700, Jonathan Hoover wrote: > Hello, > > I have a RHEL 5 box, 4 GB RAM, single hard drive SATA, Intel Core 2 Duo 2.4. A basic workstation. > > I have a simple database, with one table for now. It has 4 columns: > > anid serial primary key unique, > time timestamp, > source varchar(5), > unitid varchar(15), > guid varchar(32) > > There is a btree index on each. > > I am loading data 1,000,000 (1M) rows at a time using psql and a COPY command. Once I hit 2M rows, my performance justdrops out, and the next 1M never finishes. It takes 7 minutes for 1M rows to load. Once 2M are in there, I've waitedan hour, and nothing. It doesn't seem to matter which 1M rows I try to load next, none ever finish. Each 1M rows isabout 70MB on disk in the raw input file. > > I have "atop" installed, and it reports the drives at 100%, which it reports for the first 1M rows too. The MBw/s goesfrom 20+ on the first 2M rows, down to about 4 MBw/s or less now. The processor usage is at about 2 to 8% at this time(used by postgres). > > I have even waited for 1M rows to load, then done a vacuum for no good reason, then even restarted postgresql. I've madesure no disk or proc activity is happening before I start the next 1M rows. None of that seems to matter. > > I have a total of about 70M rows to load, but am at a standstill. I've read up on whatever performance docs I can findonline, but I am not getting anywhere. > > I've increased shared_buffers to 256MB, and I've tried it with fsync commented out as per the default config. I've alsotried it with fsync=off. No difference. > > Ideas? Thanks in advance, > Jon The initial 1M load if the table has just been truncated or created has no WAL logging. You can boost maintenance_work_mem to increase index creation/update performance. You are severely I/O limited and would be better off dropping your indexes during the load and re- creating them afterwards. If you are starting with an empty table, truncate it and then load all the data in a single transaction, all 7 COPY commands. Then COMMIT and build the indexes. Your question is also missing key information like config details, PostgreSQL version, ... Cheers, Ken
"Jonathan Hoover" <jhoover@yahoo-inc.com> wrote: > I have a simple database, with one table for now. It has 4 > columns: > > anid serial primary key unique, > time timestamp, > source varchar(5), > unitid varchar(15), > guid varchar(32) > > There is a btree index on each. > > I am loading data 1,000,000 (1M) rows at a time using psql and a > COPY command. Once I hit 2M rows, my performance just drops out Drop the indexes and the primary key before you copy in. Personally, I strongly recommend a VACUUM FREEZE ANALYZE after the bulk load. Then use ALTER TABLE to restore the primary key, and create all the other indexes. Also, if you don't mind starting over from initdb if it crashes partway through you can turn fsync off. You want a big maintenance_work_mem setting during the index builds -- at least 200 MB. -Kevin
1. I have now set maintenance_work_mem to 256 MB (which was previously commented by the default config) 2. The version is PostgreSQL 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46) 3. What would be the best and cheapest thing I could for IO performance? 4. I need to read up on TRUNCATE, which I have not used before. Care to give a quick overview before I RTFM? Thanks, jon -----Original Message----- From: Kenneth Marshall [mailto:ktm@rice.edu] Sent: Thursday, November 04, 2010 4:03 PM To: Jonathan Hoover Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Disk Performance Problem on Large DB On Thu, Nov 04, 2010 at 01:42:49PM -0700, Jonathan Hoover wrote: > Hello, > > I have a RHEL 5 box, 4 GB RAM, single hard drive SATA, Intel Core 2 Duo 2.4. A basic workstation. > > I have a simple database, with one table for now. It has 4 columns: > > anid serial primary key unique, > time timestamp, > source varchar(5), > unitid varchar(15), > guid varchar(32) > > There is a btree index on each. > > I am loading data 1,000,000 (1M) rows at a time using psql and a COPY command. Once I hit 2M rows, my performance justdrops out, and the next 1M never finishes. It takes 7 minutes for 1M rows to load. Once 2M are in there, I've waitedan hour, and nothing. It doesn't seem to matter which 1M rows I try to load next, none ever finish. Each 1M rows isabout 70MB on disk in the raw input file. > > I have "atop" installed, and it reports the drives at 100%, which it reports for the first 1M rows too. The MBw/s goesfrom 20+ on the first 2M rows, down to about 4 MBw/s or less now. The processor usage is at about 2 to 8% at this time(used by postgres). > > I have even waited for 1M rows to load, then done a vacuum for no good reason, then even restarted postgresql. I've madesure no disk or proc activity is happening before I start the next 1M rows. None of that seems to matter. > > I have a total of about 70M rows to load, but am at a standstill. I've read up on whatever performance docs I can findonline, but I am not getting anywhere. > > I've increased shared_buffers to 256MB, and I've tried it with fsync commented out as per the default config. I've alsotried it with fsync=off. No difference. > > Ideas? Thanks in advance, > Jon The initial 1M load if the table has just been truncated or created has no WAL logging. You can boost maintenance_work_mem to increase index creation/update performance. You are severely I/O limited and would be better off dropping your indexes during the load and re- creating them afterwards. If you are starting with an empty table, truncate it and then load all the data in a single transaction, all 7 COPY commands. Then COMMIT and build the indexes. Your question is also missing key information like config details, PostgreSQL version, ... Cheers, Ken
Just FYI, I removed the PK and the indexes for now. Load times for 1M rows is now 7 SECONDS instead of 7 MINUTES (or justnever happening). Granted, I made the changes in #1 below, but WOW! So, question: what helped the most: 1) no PK, 2)no indexes, 3) the maintenance_work_mem being uncommented? I will test myself when I have time, but I'd like to know everyone'sthoughts. Jon -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover Sent: Thursday, November 04, 2010 10:03 PM To: pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB 1. I have now set maintenance_work_mem to 256 MB (which was previously commented by the default config) 2. The version is PostgreSQL 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46) 3. What would be the best and cheapest thing I could for IO performance? 4. I need to read up on TRUNCATE, which I have not used before. Care to give a quick overview before I RTFM? Thanks, jon -----Original Message----- From: Kenneth Marshall [mailto:ktm@rice.edu] Sent: Thursday, November 04, 2010 4:03 PM To: Jonathan Hoover Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Disk Performance Problem on Large DB On Thu, Nov 04, 2010 at 01:42:49PM -0700, Jonathan Hoover wrote: > Hello, > > I have a RHEL 5 box, 4 GB RAM, single hard drive SATA, Intel Core 2 Duo 2.4. A basic workstation. > > I have a simple database, with one table for now. It has 4 columns: > > anid serial primary key unique, > time timestamp, > source varchar(5), > unitid varchar(15), > guid varchar(32) > > There is a btree index on each. > > I am loading data 1,000,000 (1M) rows at a time using psql and a COPY command. Once I hit 2M rows, my performance justdrops out, and the next 1M never finishes. It takes 7 minutes for 1M rows to load. Once 2M are in there, I've waitedan hour, and nothing. It doesn't seem to matter which 1M rows I try to load next, none ever finish. Each 1M rows isabout 70MB on disk in the raw input file. > > I have "atop" installed, and it reports the drives at 100%, which it reports for the first 1M rows too. The MBw/s goesfrom 20+ on the first 2M rows, down to about 4 MBw/s or less now. The processor usage is at about 2 to 8% at this time(used by postgres). > > I have even waited for 1M rows to load, then done a vacuum for no good reason, then even restarted postgresql. I've madesure no disk or proc activity is happening before I start the next 1M rows. None of that seems to matter. > > I have a total of about 70M rows to load, but am at a standstill. I've read up on whatever performance docs I can findonline, but I am not getting anywhere. > > I've increased shared_buffers to 256MB, and I've tried it with fsync commented out as per the default config. I've alsotried it with fsync=off. No difference. > > Ideas? Thanks in advance, > Jon The initial 1M load if the table has just been truncated or created has no WAL logging. You can boost maintenance_work_mem to increase index creation/update performance. You are severely I/O limited and would be better off dropping your indexes during the load and re- creating them afterwards. If you are starting with an empty table, truncate it and then load all the data in a single transaction, all 7 COPY commands. Then COMMIT and build the indexes. Your question is also missing key information like config details, PostgreSQL version, ... Cheers, Ken -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
TRUNCATE removes all data from the table leaving the schema structure in place. What helped the most was probably the drop of the indexes. -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover Sent: Friday, 5 November 2010 1:53 PM To: pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB Just FYI, I removed the PK and the indexes for now. Load times for 1M rows is now 7 SECONDS instead of 7 MINUTES (or justnever happening). Granted, I made the changes in #1 below, but WOW! So, question: what helped the most: 1) no PK, 2)no indexes, 3) the maintenance_work_mem being uncommented? I will test myself when I have time, but I'd like to know everyone'sthoughts. Jon -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover Sent: Thursday, November 04, 2010 10:03 PM To: pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB 1. I have now set maintenance_work_mem to 256 MB (which was previously commented by the default config) 2. The version is PostgreSQL 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46) 3. What would be the best and cheapest thing I could for IO performance? 4. I need to read up on TRUNCATE, which I have not used before. Care to give a quick overview before I RTFM? Thanks, jon -----Original Message----- From: Kenneth Marshall [mailto:ktm@rice.edu] Sent: Thursday, November 04, 2010 4:03 PM To: Jonathan Hoover Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Disk Performance Problem on Large DB On Thu, Nov 04, 2010 at 01:42:49PM -0700, Jonathan Hoover wrote: > Hello, > > I have a RHEL 5 box, 4 GB RAM, single hard drive SATA, Intel Core 2 Duo 2.4. A basic workstation. > > I have a simple database, with one table for now. It has 4 columns: > > anid serial primary key unique, > time timestamp, > source varchar(5), > unitid varchar(15), > guid varchar(32) > > There is a btree index on each. > > I am loading data 1,000,000 (1M) rows at a time using psql and a COPY command. Once I hit 2M rows, my performance justdrops out, and the next 1M never finishes. It takes 7 minutes for 1M rows to load. Once 2M are in there, I've waitedan hour, and nothing. It doesn't seem to matter which 1M rows I try to load next, none ever finish. Each 1M rows isabout 70MB on disk in the raw input file. > > I have "atop" installed, and it reports the drives at 100%, which it reports for the first 1M rows too. The MBw/s goesfrom 20+ on the first 2M rows, down to about 4 MBw/s or less now. The processor usage is at about 2 to 8% at this time(used by postgres). > > I have even waited for 1M rows to load, then done a vacuum for no good reason, then even restarted postgresql. I've madesure no disk or proc activity is happening before I start the next 1M rows. None of that seems to matter. > > I have a total of about 70M rows to load, but am at a standstill. I've read up on whatever performance docs I can findonline, but I am not getting anywhere. > > I've increased shared_buffers to 256MB, and I've tried it with fsync commented out as per the default config. I've alsotried it with fsync=off. No difference. > > Ideas? Thanks in advance, > Jon The initial 1M load if the table has just been truncated or created has no WAL logging. You can boost maintenance_work_mem to increase index creation/update performance. You are severely I/O limited and would be better off dropping your indexes during the load and re- creating them afterwards. If you are starting with an empty table, truncate it and then load all the data in a single transaction, all 7 COPY commands. Then COMMIT and build the indexes. Your question is also missing key information like config details, PostgreSQL version, ... Cheers, Ken -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
TRUNCATE doesn't generate any rollback data, which makes it lightning fast. It just de-allocates the data pages used by thetable. Sam -----Original Message----- From: Jonathan Hoover [mailto:jhoover@yahoo-inc.com] Sent: Friday, 5 November 2010 1:59 PM To: Samuel Stearns; pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: RE: [ADMIN] Disk Performance Problem on Large DB How does TRUNCATE differ from DELETE FROM <table>? Sorry, probably an easy RTFM question, but I'll ask anyhow. -----Original Message----- From: Samuel Stearns [mailto:SStearns@internode.com.au] Sent: Thursday, November 04, 2010 10:27 PM To: Jonathan Hoover; pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: RE: [ADMIN] Disk Performance Problem on Large DB TRUNCATE removes all data from the table leaving the schema structure in place. What helped the most was probably the drop of the indexes. -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover Sent: Friday, 5 November 2010 1:53 PM To: pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB Just FYI, I removed the PK and the indexes for now. Load times for 1M rows is now 7 SECONDS instead of 7 MINUTES (or justnever happening). Granted, I made the changes in #1 below, but WOW! So, question: what helped the most: 1) no PK, 2)no indexes, 3) the maintenance_work_mem being uncommented? I will test myself when I have time, but I'd like to know everyone'sthoughts. Jon -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover Sent: Thursday, November 04, 2010 10:03 PM To: pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB 1. I have now set maintenance_work_mem to 256 MB (which was previously commented by the default config) 2. The version is PostgreSQL 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46) 3. What would be the best and cheapest thing I could for IO performance? 4. I need to read up on TRUNCATE, which I have not used before. Care to give a quick overview before I RTFM? Thanks, jon -----Original Message----- From: Kenneth Marshall [mailto:ktm@rice.edu] Sent: Thursday, November 04, 2010 4:03 PM To: Jonathan Hoover Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Disk Performance Problem on Large DB On Thu, Nov 04, 2010 at 01:42:49PM -0700, Jonathan Hoover wrote: > Hello, > > I have a RHEL 5 box, 4 GB RAM, single hard drive SATA, Intel Core 2 Duo 2.4. A basic workstation. > > I have a simple database, with one table for now. It has 4 columns: > > anid serial primary key unique, > time timestamp, > source varchar(5), > unitid varchar(15), > guid varchar(32) > > There is a btree index on each. > > I am loading data 1,000,000 (1M) rows at a time using psql and a COPY command. Once I hit 2M rows, my performance justdrops out, and the next 1M never finishes. It takes 7 minutes for 1M rows to load. Once 2M are in there, I've waitedan hour, and nothing. It doesn't seem to matter which 1M rows I try to load next, none ever finish. Each 1M rows isabout 70MB on disk in the raw input file. > > I have "atop" installed, and it reports the drives at 100%, which it reports for the first 1M rows too. The MBw/s goesfrom 20+ on the first 2M rows, down to about 4 MBw/s or less now. The processor usage is at about 2 to 8% at this time(used by postgres). > > I have even waited for 1M rows to load, then done a vacuum for no good reason, then even restarted postgresql. I've madesure no disk or proc activity is happening before I start the next 1M rows. None of that seems to matter. > > I have a total of about 70M rows to load, but am at a standstill. I've read up on whatever performance docs I can findonline, but I am not getting anywhere. > > I've increased shared_buffers to 256MB, and I've tried it with fsync commented out as per the default config. I've alsotried it with fsync=off. No difference. > > Ideas? Thanks in advance, > Jon The initial 1M load if the table has just been truncated or created has no WAL logging. You can boost maintenance_work_mem to increase index creation/update performance. You are severely I/O limited and would be better off dropping your indexes during the load and re- creating them afterwards. If you are starting with an empty table, truncate it and then load all the data in a single transaction, all 7 COPY commands. Then COMMIT and build the indexes. Your question is also missing key information like config details, PostgreSQL version, ... Cheers, Ken -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
On Thu, Nov 4, 2010 at 9:03 PM, Jonathan Hoover <jhoover@yahoo-inc.com> wrote: > 1. I have now set maintenance_work_mem to 256 MB (which was previously commented by the default config) > 2. The version is PostgreSQL 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46) If performance matters, look at upgrading to at LEAST 8.3.latest or 8.4.latest. Preferably 9.0.latest. > 3. What would be the best and cheapest thing I could for IO performance? Get an SSD drive. Or a RAID controller with battery backed cache. If you're data isn't valuable (i.e. you can reproduce it at will) then turning off things like fsync and full page writes (I don't think 8.1 has the ability to turn off full page writes). can help a lot. > 4. I need to read up on TRUNCATE, which I have not used before. Care to give a quick overview before I RTFM? Truncate is basically a DDL (data definition language) command as opposed to a DML (data manipulation language) command. It baseically drops the table underneath the table def and recreates it as empty.
How does TRUNCATE differ from DELETE FROM <table>? Sorry, probably an easy RTFM question, but I'll ask anyhow. -----Original Message----- From: Samuel Stearns [mailto:SStearns@internode.com.au] Sent: Thursday, November 04, 2010 10:27 PM To: Jonathan Hoover; pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: RE: [ADMIN] Disk Performance Problem on Large DB TRUNCATE removes all data from the table leaving the schema structure in place. What helped the most was probably the drop of the indexes. -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover Sent: Friday, 5 November 2010 1:53 PM To: pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB Just FYI, I removed the PK and the indexes for now. Load times for 1M rows is now 7 SECONDS instead of 7 MINUTES (or justnever happening). Granted, I made the changes in #1 below, but WOW! So, question: what helped the most: 1) no PK, 2)no indexes, 3) the maintenance_work_mem being uncommented? I will test myself when I have time, but I'd like to know everyone'sthoughts. Jon -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover Sent: Thursday, November 04, 2010 10:03 PM To: pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB 1. I have now set maintenance_work_mem to 256 MB (which was previously commented by the default config) 2. The version is PostgreSQL 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46) 3. What would be the best and cheapest thing I could for IO performance? 4. I need to read up on TRUNCATE, which I have not used before. Care to give a quick overview before I RTFM? Thanks, jon -----Original Message----- From: Kenneth Marshall [mailto:ktm@rice.edu] Sent: Thursday, November 04, 2010 4:03 PM To: Jonathan Hoover Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Disk Performance Problem on Large DB On Thu, Nov 04, 2010 at 01:42:49PM -0700, Jonathan Hoover wrote: > Hello, > > I have a RHEL 5 box, 4 GB RAM, single hard drive SATA, Intel Core 2 Duo 2.4. A basic workstation. > > I have a simple database, with one table for now. It has 4 columns: > > anid serial primary key unique, > time timestamp, > source varchar(5), > unitid varchar(15), > guid varchar(32) > > There is a btree index on each. > > I am loading data 1,000,000 (1M) rows at a time using psql and a COPY command. Once I hit 2M rows, my performance justdrops out, and the next 1M never finishes. It takes 7 minutes for 1M rows to load. Once 2M are in there, I've waitedan hour, and nothing. It doesn't seem to matter which 1M rows I try to load next, none ever finish. Each 1M rows isabout 70MB on disk in the raw input file. > > I have "atop" installed, and it reports the drives at 100%, which it reports for the first 1M rows too. The MBw/s goesfrom 20+ on the first 2M rows, down to about 4 MBw/s or less now. The processor usage is at about 2 to 8% at this time(used by postgres). > > I have even waited for 1M rows to load, then done a vacuum for no good reason, then even restarted postgresql. I've madesure no disk or proc activity is happening before I start the next 1M rows. None of that seems to matter. > > I have a total of about 70M rows to load, but am at a standstill. I've read up on whatever performance docs I can findonline, but I am not getting anywhere. > > I've increased shared_buffers to 256MB, and I've tried it with fsync commented out as per the default config. I've alsotried it with fsync=off. No difference. > > Ideas? Thanks in advance, > Jon The initial 1M load if the table has just been truncated or created has no WAL logging. You can boost maintenance_work_mem to increase index creation/update performance. You are severely I/O limited and would be better off dropping your indexes during the load and re- creating them afterwards. If you are starting with an empty table, truncate it and then load all the data in a single transaction, all 7 COPY commands. Then COMMIT and build the indexes. Your question is also missing key information like config details, PostgreSQL version, ... Cheers, Ken -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Actually, this is a better definition of TRUNCATE than my previous post. From the doco: TRUNCATE quickly removes all rows from a set of tables. It has the same effect as an unqualified DELETE on each table, butsince it does not actually scan the tables it is faster. Furthermore, it reclaims disk space immediately, rather thanrequiring a subsequent VACUUM operation. This is most useful on large tables. Sam -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Scott Marlowe Sent: Friday, 5 November 2010 2:04 PM To: Jonathan Hoover Cc: pgsql-admin@postgresql.org; Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB On Thu, Nov 4, 2010 at 9:03 PM, Jonathan Hoover <jhoover@yahoo-inc.com> wrote: > 1. I have now set maintenance_work_mem to 256 MB (which was previously commented by the default config) > 2. The version is PostgreSQL 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46) If performance matters, look at upgrading to at LEAST 8.3.latest or 8.4.latest. Preferably 9.0.latest. > 3. What would be the best and cheapest thing I could for IO performance? Get an SSD drive. Or a RAID controller with battery backed cache. If you're data isn't valuable (i.e. you can reproduce it at will) then turning off things like fsync and full page writes (I don't think 8.1 has the ability to turn off full page writes). can help a lot. > 4. I need to read up on TRUNCATE, which I have not used before. Care to give a quick overview before I RTFM? Truncate is basically a DDL (data definition language) command as opposed to a DML (data manipulation language) command. It baseically drops the table underneath the table def and recreates it as empty. -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
On Thu, Nov 4, 2010 at 9:33 PM, Samuel Stearns <SStearns@internode.com.au> wrote: > TRUNCATE doesn't generate any rollback data, which makes it lightning fast. It just de-allocates the data pages used bythe table. Also truncate, like nearly everything in pgsql, can be rolled back. I still remember showing my oracle co-dbas in my last job freak out when I showed them things like begin; update table yada... truncate table; rollback; or begin; drop index xyz; explain select ... rollback; transactable everything is pretty cool. (note database and tablespace craete / drop are the only things that aren't transactable, which makes some sense.)
Yep! Coming from previous Oracle job into Postgres, discovering the transactable stuff, is indeed, pretty cool. -----Original Message----- From: Scott Marlowe [mailto:scott.marlowe@gmail.com] Sent: Friday, 5 November 2010 2:12 PM To: Samuel Stearns Cc: Jonathan Hoover; pgsql-admin@postgresql.org; Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB On Thu, Nov 4, 2010 at 9:33 PM, Samuel Stearns <SStearns@internode.com.au> wrote: > TRUNCATE doesn't generate any rollback data, which makes it lightning fast. It just de-allocates the data pages used bythe table. Also truncate, like nearly everything in pgsql, can be rolled back. I still remember showing my oracle co-dbas in my last job freak out when I showed them things like begin; update table yada... truncate table; rollback; or begin; drop index xyz; explain select ... rollback; transactable everything is pretty cool. (note database and tablespace craete / drop are the only things that aren't transactable, which makes some sense.)
I am in the middle of re-creating the indexes now, and what is interesting is how atop is not reporting heavy use of thehard drive now. Instead, I see postgres using 80% of the proc (instead of 8% earlier) and drive usage is 20+ MBr/s and16+ MBw/s now (instead of .1 and 3.0 respectively earlier). Could it really be the PK causing the delay, or is it reallythe maintenance_work_mem or simply the idea of creating the indexing after? Good info, hopefully I can do some testingover these ideas over the next few days. For now, I'm hoping I can just get things moving enough. As I finished this up, I have noticed disk performance is down to 4+ MBw/s and MBr/s, but it is not "red" in atop any longer,and proc usage now seems to be the limiting factor. Good stuff... Jon -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover Sent: Thursday, November 04, 2010 10:29 PM To: Samuel Stearns; pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB How does TRUNCATE differ from DELETE FROM <table>? Sorry, probably an easy RTFM question, but I'll ask anyhow. -----Original Message----- From: Samuel Stearns [mailto:SStearns@internode.com.au] Sent: Thursday, November 04, 2010 10:27 PM To: Jonathan Hoover; pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: RE: [ADMIN] Disk Performance Problem on Large DB TRUNCATE removes all data from the table leaving the schema structure in place. What helped the most was probably the drop of the indexes. -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover Sent: Friday, 5 November 2010 1:53 PM To: pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB Just FYI, I removed the PK and the indexes for now. Load times for 1M rows is now 7 SECONDS instead of 7 MINUTES (or justnever happening). Granted, I made the changes in #1 below, but WOW! So, question: what helped the most: 1) no PK, 2)no indexes, 3) the maintenance_work_mem being uncommented? I will test myself when I have time, but I'd like to know everyone'sthoughts. Jon -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover Sent: Thursday, November 04, 2010 10:03 PM To: pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB 1. I have now set maintenance_work_mem to 256 MB (which was previously commented by the default config) 2. The version is PostgreSQL 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46) 3. What would be the best and cheapest thing I could for IO performance? 4. I need to read up on TRUNCATE, which I have not used before. Care to give a quick overview before I RTFM? Thanks, jon -----Original Message----- From: Kenneth Marshall [mailto:ktm@rice.edu] Sent: Thursday, November 04, 2010 4:03 PM To: Jonathan Hoover Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Disk Performance Problem on Large DB On Thu, Nov 04, 2010 at 01:42:49PM -0700, Jonathan Hoover wrote: > Hello, > > I have a RHEL 5 box, 4 GB RAM, single hard drive SATA, Intel Core 2 Duo 2.4. A basic workstation. > > I have a simple database, with one table for now. It has 4 columns: > > anid serial primary key unique, > time timestamp, > source varchar(5), > unitid varchar(15), > guid varchar(32) > > There is a btree index on each. > > I am loading data 1,000,000 (1M) rows at a time using psql and a COPY command. Once I hit 2M rows, my performance justdrops out, and the next 1M never finishes. It takes 7 minutes for 1M rows to load. Once 2M are in there, I've waitedan hour, and nothing. It doesn't seem to matter which 1M rows I try to load next, none ever finish. Each 1M rows isabout 70MB on disk in the raw input file. > > I have "atop" installed, and it reports the drives at 100%, which it reports for the first 1M rows too. The MBw/s goesfrom 20+ on the first 2M rows, down to about 4 MBw/s or less now. The processor usage is at about 2 to 8% at this time(used by postgres). > > I have even waited for 1M rows to load, then done a vacuum for no good reason, then even restarted postgresql. I've madesure no disk or proc activity is happening before I start the next 1M rows. None of that seems to matter. > > I have a total of about 70M rows to load, but am at a standstill. I've read up on whatever performance docs I can findonline, but I am not getting anywhere. > > I've increased shared_buffers to 256MB, and I've tried it with fsync commented out as per the default config. I've alsotried it with fsync=off. No difference. > > Ideas? Thanks in advance, > Jon The initial 1M load if the table has just been truncated or created has no WAL logging. You can boost maintenance_work_mem to increase index creation/update performance. You are severely I/O limited and would be better off dropping your indexes during the load and re- creating them afterwards. If you are starting with an empty table, truncate it and then load all the data in a single transaction, all 7 COPY commands. Then COMMIT and build the indexes. Your question is also missing key information like config details, PostgreSQL version, ... Cheers, Ken -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Your biggest benefit was dropping the indexes before the load, most likely. -----Original Message----- From: Jonathan Hoover [mailto:jhoover@yahoo-inc.com] Sent: Friday, 5 November 2010 2:16 PM To: Samuel Stearns; pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: RE: [ADMIN] Disk Performance Problem on Large DB I am in the middle of re-creating the indexes now, and what is interesting is how atop is not reporting heavy use of thehard drive now. Instead, I see postgres using 80% of the proc (instead of 8% earlier) and drive usage is 20+ MBr/s and16+ MBw/s now (instead of .1 and 3.0 respectively earlier). Could it really be the PK causing the delay, or is it reallythe maintenance_work_mem or simply the idea of creating the indexing after? Good info, hopefully I can do some testingover these ideas over the next few days. For now, I'm hoping I can just get things moving enough. As I finished this up, I have noticed disk performance is down to 4+ MBw/s and MBr/s, but it is not "red" in atop any longer,and proc usage now seems to be the limiting factor. Good stuff... Jon -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover Sent: Thursday, November 04, 2010 10:29 PM To: Samuel Stearns; pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB How does TRUNCATE differ from DELETE FROM <table>? Sorry, probably an easy RTFM question, but I'll ask anyhow. -----Original Message----- From: Samuel Stearns [mailto:SStearns@internode.com.au] Sent: Thursday, November 04, 2010 10:27 PM To: Jonathan Hoover; pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: RE: [ADMIN] Disk Performance Problem on Large DB TRUNCATE removes all data from the table leaving the schema structure in place. What helped the most was probably the drop of the indexes. -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover Sent: Friday, 5 November 2010 1:53 PM To: pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB Just FYI, I removed the PK and the indexes for now. Load times for 1M rows is now 7 SECONDS instead of 7 MINUTES (or justnever happening). Granted, I made the changes in #1 below, but WOW! So, question: what helped the most: 1) no PK, 2)no indexes, 3) the maintenance_work_mem being uncommented? I will test myself when I have time, but I'd like to know everyone'sthoughts. Jon -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover Sent: Thursday, November 04, 2010 10:03 PM To: pgsql-admin@postgresql.org Cc: Kenneth Marshall Subject: Re: [ADMIN] Disk Performance Problem on Large DB 1. I have now set maintenance_work_mem to 256 MB (which was previously commented by the default config) 2. The version is PostgreSQL 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46) 3. What would be the best and cheapest thing I could for IO performance? 4. I need to read up on TRUNCATE, which I have not used before. Care to give a quick overview before I RTFM? Thanks, jon -----Original Message----- From: Kenneth Marshall [mailto:ktm@rice.edu] Sent: Thursday, November 04, 2010 4:03 PM To: Jonathan Hoover Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Disk Performance Problem on Large DB On Thu, Nov 04, 2010 at 01:42:49PM -0700, Jonathan Hoover wrote: > Hello, > > I have a RHEL 5 box, 4 GB RAM, single hard drive SATA, Intel Core 2 Duo 2.4. A basic workstation. > > I have a simple database, with one table for now. It has 4 columns: > > anid serial primary key unique, > time timestamp, > source varchar(5), > unitid varchar(15), > guid varchar(32) > > There is a btree index on each. > > I am loading data 1,000,000 (1M) rows at a time using psql and a COPY command. Once I hit 2M rows, my performance justdrops out, and the next 1M never finishes. It takes 7 minutes for 1M rows to load. Once 2M are in there, I've waitedan hour, and nothing. It doesn't seem to matter which 1M rows I try to load next, none ever finish. Each 1M rows isabout 70MB on disk in the raw input file. > > I have "atop" installed, and it reports the drives at 100%, which it reports for the first 1M rows too. The MBw/s goesfrom 20+ on the first 2M rows, down to about 4 MBw/s or less now. The processor usage is at about 2 to 8% at this time(used by postgres). > > I have even waited for 1M rows to load, then done a vacuum for no good reason, then even restarted postgresql. I've madesure no disk or proc activity is happening before I start the next 1M rows. None of that seems to matter. > > I have a total of about 70M rows to load, but am at a standstill. I've read up on whatever performance docs I can findonline, but I am not getting anywhere. > > I've increased shared_buffers to 256MB, and I've tried it with fsync commented out as per the default config. I've alsotried it with fsync=off. No difference. > > Ideas? Thanks in advance, > Jon The initial 1M load if the table has just been truncated or created has no WAL logging. You can boost maintenance_work_mem to increase index creation/update performance. You are severely I/O limited and would be better off dropping your indexes during the load and re- creating them afterwards. If you are starting with an empty table, truncate it and then load all the data in a single transaction, all 7 COPY commands. Then COMMIT and build the indexes. Your question is also missing key information like config details, PostgreSQL version, ... Cheers, Ken -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Correct, with a single drive and no write cache, once you get more than 1 I/O running simultaneously, i.e. 1 writing the data and 1 writing each index = 5 I/Os at once, you effectively devolve to your drives random I/O rate which can be an order of magnitude slower than its sequential I/O rate. You can use bonnie or some other disk speed test to get those numbers for your system. When you do the indexes after the load, each step can use sequential I/O much more of the time which is why it runs so much faster. Cheers, Ken On Fri, Nov 05, 2010 at 02:19:41PM +1030, Samuel Stearns wrote: > Your biggest benefit was dropping the indexes before the load, most likely. > > -----Original Message----- > From: Jonathan Hoover [mailto:jhoover@yahoo-inc.com] > Sent: Friday, 5 November 2010 2:16 PM > To: Samuel Stearns; pgsql-admin@postgresql.org > Cc: Kenneth Marshall > Subject: RE: [ADMIN] Disk Performance Problem on Large DB > > I am in the middle of re-creating the indexes now, and what is interesting is how atop is not reporting heavy use of thehard drive now. Instead, I see postgres using 80% of the proc (instead of 8% earlier) and drive usage is 20+ MBr/s and16+ MBw/s now (instead of .1 and 3.0 respectively earlier). Could it really be the PK causing the delay, or is it reallythe maintenance_work_mem or simply the idea of creating the indexing after? Good info, hopefully I can do some testingover these ideas over the next few days. For now, I'm hoping I can just get things moving enough. > > As I finished this up, I have noticed disk performance is down to 4+ MBw/s and MBr/s, but it is not "red" in atop any longer,and proc usage now seems to be the limiting factor. > > Good stuff... > Jon > > -----Original Message----- > From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover > Sent: Thursday, November 04, 2010 10:29 PM > To: Samuel Stearns; pgsql-admin@postgresql.org > Cc: Kenneth Marshall > Subject: Re: [ADMIN] Disk Performance Problem on Large DB > > How does TRUNCATE differ from DELETE FROM <table>? Sorry, probably an easy RTFM question, but I'll ask anyhow. > > -----Original Message----- > From: Samuel Stearns [mailto:SStearns@internode.com.au] > Sent: Thursday, November 04, 2010 10:27 PM > To: Jonathan Hoover; pgsql-admin@postgresql.org > Cc: Kenneth Marshall > Subject: RE: [ADMIN] Disk Performance Problem on Large DB > > TRUNCATE removes all data from the table leaving the schema structure in place. > > What helped the most was probably the drop of the indexes. > > -----Original Message----- > From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover > Sent: Friday, 5 November 2010 1:53 PM > To: pgsql-admin@postgresql.org > Cc: Kenneth Marshall > Subject: Re: [ADMIN] Disk Performance Problem on Large DB > > Just FYI, I removed the PK and the indexes for now. Load times for 1M rows is now 7 SECONDS instead of 7 MINUTES (or justnever happening). Granted, I made the changes in #1 below, but WOW! So, question: what helped the most: 1) no PK, 2)no indexes, 3) the maintenance_work_mem being uncommented? I will test myself when I have time, but I'd like to know everyone'sthoughts. > > Jon > > -----Original Message----- > From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover > Sent: Thursday, November 04, 2010 10:03 PM > To: pgsql-admin@postgresql.org > Cc: Kenneth Marshall > Subject: Re: [ADMIN] Disk Performance Problem on Large DB > > 1. I have now set maintenance_work_mem to 256 MB (which was previously commented by the default config) > 2. The version is PostgreSQL 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46) > 3. What would be the best and cheapest thing I could for IO performance? > 4. I need to read up on TRUNCATE, which I have not used before. Care to give a quick overview before I RTFM? > > Thanks, > jon > > -----Original Message----- > From: Kenneth Marshall [mailto:ktm@rice.edu] > Sent: Thursday, November 04, 2010 4:03 PM > To: Jonathan Hoover > Cc: pgsql-admin@postgresql.org > Subject: Re: [ADMIN] Disk Performance Problem on Large DB > > On Thu, Nov 04, 2010 at 01:42:49PM -0700, Jonathan Hoover wrote: > > Hello, > > > > I have a RHEL 5 box, 4 GB RAM, single hard drive SATA, Intel Core 2 Duo 2.4. A basic workstation. > > > > I have a simple database, with one table for now. It has 4 columns: > > > > anid serial primary key unique, > > time timestamp, > > source varchar(5), > > unitid varchar(15), > > guid varchar(32) > > > > There is a btree index on each. > > > > I am loading data 1,000,000 (1M) rows at a time using psql and a COPY command. Once I hit 2M rows, my performance justdrops out, and the next 1M never finishes. It takes 7 minutes for 1M rows to load. Once 2M are in there, I've waitedan hour, and nothing. It doesn't seem to matter which 1M rows I try to load next, none ever finish. Each 1M rows isabout 70MB on disk in the raw input file. > > > > I have "atop" installed, and it reports the drives at 100%, which it reports for the first 1M rows too. The MBw/s goesfrom 20+ on the first 2M rows, down to about 4 MBw/s or less now. The processor usage is at about 2 to 8% at this time(used by postgres). > > > > I have even waited for 1M rows to load, then done a vacuum for no good reason, then even restarted postgresql. I've madesure no disk or proc activity is happening before I start the next 1M rows. None of that seems to matter. > > > > I have a total of about 70M rows to load, but am at a standstill. I've read up on whatever performance docs I can findonline, but I am not getting anywhere. > > > > I've increased shared_buffers to 256MB, and I've tried it with fsync commented out as per the default config. I've alsotried it with fsync=off. No difference. > > > > Ideas? Thanks in advance, > > Jon > > The initial 1M load if the table has just been truncated or created > has no WAL logging. You can boost maintenance_work_mem to increase > index creation/update performance. You are severely I/O limited and > would be better off dropping your indexes during the load and re- > creating them afterwards. If you are starting with an empty table, > truncate it and then load all the data in a single transaction, all > 7 COPY commands. Then COMMIT and build the indexes. Your question > is also missing key information like config details, PostgreSQL version, > ... > > Cheers, > Ken > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin >