Re: Disk Performance Problem on Large DB - Mailing list pgsql-admin

From Samuel Stearns
Subject Re: Disk Performance Problem on Large DB
Date
Msg-id 68B59BEDCD36854AADBDF17E91B2937A0783A71E38@EXCHMAIL.staff.internode.com.au
Whole thread Raw
In response to Disk Performance Problem on Large DB  ("Jonathan Hoover" <jhoover@yahoo-inc.com>)
Responses Re: Disk Performance Problem on Large DB
List pgsql-admin
TRUNCATE doesn't generate any rollback data, which makes it lightning fast. It just de-allocates the data pages used by
thetable. 

Sam

-----Original Message-----
From: Jonathan Hoover [mailto:jhoover@yahoo-inc.com]
Sent: Friday, 5 November 2010 1:59 PM
To: Samuel Stearns; pgsql-admin@postgresql.org
Cc: Kenneth Marshall
Subject: RE: [ADMIN] Disk Performance Problem on Large DB

How does TRUNCATE differ from DELETE FROM <table>? Sorry, probably an easy RTFM question, but I'll ask anyhow.

-----Original Message-----
From: Samuel Stearns [mailto:SStearns@internode.com.au]
Sent: Thursday, November 04, 2010 10:27 PM
To: Jonathan Hoover; pgsql-admin@postgresql.org
Cc: Kenneth Marshall
Subject: RE: [ADMIN] Disk Performance Problem on Large DB

TRUNCATE removes all data from the table leaving the schema structure in place.

What helped the most was probably the drop of the indexes.

-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover
Sent: Friday, 5 November 2010 1:53 PM
To: pgsql-admin@postgresql.org
Cc: Kenneth Marshall
Subject: Re: [ADMIN] Disk Performance Problem on Large DB

Just FYI, I removed the PK and the indexes for now. Load times for 1M rows is now 7 SECONDS instead of 7 MINUTES (or
justnever happening). Granted, I made the changes in #1 below, but WOW! So, question: what helped the most: 1) no PK,
2)no indexes, 3) the maintenance_work_mem being uncommented? I will test myself when I have time, but I'd like to know
everyone'sthoughts. 

Jon

-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jonathan Hoover
Sent: Thursday, November 04, 2010 10:03 PM
To: pgsql-admin@postgresql.org
Cc: Kenneth Marshall
Subject: Re: [ADMIN] Disk Performance Problem on Large DB

1. I have now set maintenance_work_mem to 256 MB (which was previously commented by the default config)
2. The version is PostgreSQL 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat
4.1.2-46)
3. What would be the best and cheapest thing I could for IO performance?
4. I need to read up on TRUNCATE, which I have not used before. Care to give a quick overview before I RTFM?

Thanks,
jon

-----Original Message-----
From: Kenneth Marshall [mailto:ktm@rice.edu]
Sent: Thursday, November 04, 2010 4:03 PM
To: Jonathan Hoover
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Disk Performance Problem on Large DB

On Thu, Nov 04, 2010 at 01:42:49PM -0700, Jonathan  Hoover wrote:
> Hello,
>
> I have a RHEL 5 box, 4 GB RAM, single hard drive SATA, Intel Core 2 Duo 2.4. A basic workstation.
>
> I have a simple database, with one table for now. It has 4 columns:
>
> anid serial primary key unique,
> time timestamp,
> source varchar(5),
> unitid varchar(15),
> guid varchar(32)
>
> There is a btree index on each.
>
> I am loading data 1,000,000 (1M) rows at a time using psql and a COPY command. Once I hit 2M rows, my performance
justdrops out, and the next 1M never finishes. It takes 7 minutes for 1M rows to load. Once 2M are in there, I've
waitedan hour, and nothing. It doesn't seem to matter which 1M rows I try to load next, none ever finish. Each 1M rows
isabout 70MB on disk in the raw input file. 
>
> I have "atop" installed, and it reports the drives at 100%, which it reports for the first 1M rows too. The MBw/s
goesfrom 20+ on the first 2M rows, down to about 4 MBw/s or less now. The processor usage is at about 2 to 8% at this
time(used by postgres). 
>
> I have even waited for 1M rows to load, then done a vacuum for no good reason, then even restarted postgresql. I've
madesure no disk or proc activity is happening before I start the next 1M rows. None of that seems to matter. 
>
> I have a total of about 70M rows to load, but am at a standstill. I've read up on whatever performance docs I can
findonline, but I am not getting anywhere. 
>
> I've increased shared_buffers to 256MB, and I've tried it with fsync commented out as per the default config. I've
alsotried it with fsync=off. No difference. 
>
> Ideas? Thanks in advance,
> Jon

The initial 1M load if the table has just been truncated or created
has no WAL logging. You can boost maintenance_work_mem to increase
index creation/update performance. You are severely I/O limited and
would be better off dropping your indexes during the load and re-
creating them afterwards. If you are starting with an empty table,
truncate it and then load all the data in a single transaction, all
7 COPY commands. Then COMMIT and build the indexes. Your question
is also missing key information like config details, PostgreSQL version,
...

Cheers,
Ken

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

pgsql-admin by date:

Previous
From: Samuel Stearns
Date:
Subject: Re: Disk Performance Problem on Large DB
Next
From: Scott Marlowe
Date:
Subject: Re: Disk Performance Problem on Large DB