Re: Slow queries on big table - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: Slow queries on big table
Date
Msg-id 464E0036.3030804@g2switchworks.com
Whole thread Raw
In response to Slow queries on big table  ("Tyrrill, Ed" <tyrrill_ed@emc.com>)
Responses Re: Slow queries on big table
List pgsql-performance
Tyrrill, Ed wrote:
> I have a two column table with over 160 million rows in it.  As the size
> of the table grows queries on this table get exponentially slower.  I am
> using version 8.1.5 32-bit on Red Hat Enterprise Linux 3.  The hardware
> is an Intel 3 Ghz Xeon with 4GB RAM, and 6 disks in a RAID 5
> configuration.  For current testing I am running a single database
> connection with no other applications running on the machine, and the
> swap is not being used at all.
>
> Here is the table definition:
>
> mdsdb=# \d backup_location
>  Table "public.backup_location"
>   Column   |  Type   | Modifiers
> -----------+---------+-----------
>  record_id | bigint  | not null
>  backup_id | integer | not null
> Indexes:
>     "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id)
>     "backup_location_rid" btree (record_id)
> Foreign-key constraints:
>     "backup_location_bfk" FOREIGN KEY (backup_id) REFERENCES
> backups(backup_id) ON DELETE CASCADE
>
> Here is the table size:
>
> mdsdb=# select count(*) from backup_location;
>    count
> -----------
>  162101296
> (1 row)
>
> And here is a simple query on this table that takes nearly 20 minutes to
> return less then 3000 rows.  I ran an analyze immediately before I ran
> this query:
>
> mdsdb=# explain analyze select record_id from backup_location where
> backup_id = 1070;
>
> QUERY PLAN
>
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> -------------
>  Index Scan using backup_location_pkey on backup_location
> (cost=0.00..1475268.53 rows=412394 width=8) (actual
> time=3318.057..1196723.915 rows=2752 loops=1)
>    Index Cond: (backup_id = 1070)
>  Total runtime: 1196725.617 ms
> (3 rows)
>
I've got a few points.  Firstly, is your data amenable to partitioning?
If so that might be a big winner.
Secondly, it might be more efficient for the planner to choose the
backup_location_rid index than the combination primary key index.  You
can test this theory with this cool pg trick:

begin;
alter table backup_location drop constraint backup_location_pkey;
explain analyze select ....
rollback;

to see if it's faster.

> Obviously at this point the application is not usable.  If possible we
> would like to grow this table to the 3-5 billion row range, but I don't
> know if that is realistic.
>
> Any guidance would be greatly appreciated.
>

Without knowing more about your usage patterns, it's hard to say.  But
partitioning seems like your best choice at the moment.

pgsql-performance by date:

Previous
From: Ron Mayer
Date:
Subject: Re: Background vacuum
Next
From: "Mark Harris"
Date:
Subject: Re: reading large BYTEA type is slower than expected