Re: Advice for optimizing queries using Large Tables - Mailing list pgsql-general

From Francisco Reyes
Subject Re: Advice for optimizing queries using Large Tables
Date
Msg-id 20020310131915.O42725-100000@zoraida.natserv.net
Whole thread Raw
In response to Advice for optimizing queries using Large Tables  ("Shaun Grannis" <shaun_grannis@hotmail.com>)
List pgsql-general
On Sat, 9 Mar 2002, Shaun Grannis wrote:

> I'm working with a table containing over 65 million records in Postgres v
> 7.1.3. The machine is a dual-processor Athlon MP1900 (Tyan Tiger board) with
> 3GB of PC2100 DDR RAM, and 3-80GB IBM 120GXP hard drives configured in a
> software RAID 0 Array running under RedHat Linux v. 7.2. Queries don't seem
> to be running as fast as "they should".


Have you considered moving to a SCSI setup?

>     SELECT count(*) FROM table WHERE value=1999;
> takes approximately 45 minutes to execute, and returns a count of approx 2.2
> million records. My "instinct" is that this is much too slow for a query of
> an indexed column running on this hardware.

As it was suggested you may want to consider going to 7.2

> Here's the table schema:
> Table "table"
>
> Attribute  | Type          | Modifier
> -----------+---------------+----------
> col01      | character(9)  |
> col02      | character(15) |
> col03      | character(15) |
> col04      | character(1)  |
> col05      | character(15) |
> col06      | character(15) |
> col07      | character(15) |
> col08      | character(15) |
> col09      | character(1)  |
> col10      | integer       |
> col11      | integer       |
> col12      | integer       |
> col13      | integer       |
> col14      | integer       |
> value      | integer       |


> Does anyone have any advice for optimizing the SELECT query listed above?

Another "optimization"/trick we do here is to split tables into the most
needed info and the least needed info. If you have a part of the data
which is used often you put it in the "main" table. If you have data which
is big and not used often you put it in a second table. In our case we
have a few tables where the size of the data which is not used often can
be from 2 to 4 times the size of the data used often. This helps a lot
with joins and sequential scans.


>Is this as fast as Postgresql will perform? Any good pointers on working
>with large tables in Postgres?

I would suspect this is not as fast as PostgreSQL can perform. Although my
data set is not 60+ million records to do a somewhat complex  aggregate +
join of 5+ million records takes on the 30 minutes range.


pgsql-general by date:

Previous
From: Francisco Reyes
Date:
Subject: Wal_buffers memory utilization
Next
From: Francisco Reyes
Date:
Subject: Re: How to check for successfull inserts