Thread: Slow queries on big table

Slow queries on big table

From
"Tyrrill, Ed"
Date:
I have a two column table with over 160 million rows in it.  As the size
of the table grows queries on this table get exponentially slower.  I am
using version 8.1.5 32-bit on Red Hat Enterprise Linux 3.  The hardware
is an Intel 3 Ghz Xeon with 4GB RAM, and 6 disks in a RAID 5
configuration.  For current testing I am running a single database
connection with no other applications running on the machine, and the
swap is not being used at all.

Here is the table definition:

mdsdb=# \d backup_location
 Table "public.backup_location"
  Column   |  Type   | Modifiers
-----------+---------+-----------
 record_id | bigint  | not null
 backup_id | integer | not null
Indexes:
    "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id)
    "backup_location_rid" btree (record_id)
Foreign-key constraints:
    "backup_location_bfk" FOREIGN KEY (backup_id) REFERENCES
backups(backup_id) ON DELETE CASCADE

Here is the table size:

mdsdb=# select count(*) from backup_location;
   count
-----------
 162101296
(1 row)

And here is a simple query on this table that takes nearly 20 minutes to
return less then 3000 rows.  I ran an analyze immediately before I ran
this query:

mdsdb=# explain analyze select record_id from backup_location where
backup_id = 1070;

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
-------------
 Index Scan using backup_location_pkey on backup_location
(cost=0.00..1475268.53 rows=412394 width=8) (actual
time=3318.057..1196723.915 rows=2752 loops=1)
   Index Cond: (backup_id = 1070)
 Total runtime: 1196725.617 ms
(3 rows)

Obviously at this point the application is not usable.  If possible we
would like to grow this table to the 3-5 billion row range, but I don't
know if that is realistic.

Any guidance would be greatly appreciated.

Thanks,
Ed

Re: Slow queries on big table

From
Scott Marlowe
Date:
Tyrrill, Ed wrote:
> I have a two column table with over 160 million rows in it.  As the size
> of the table grows queries on this table get exponentially slower.  I am
> using version 8.1.5 32-bit on Red Hat Enterprise Linux 3.  The hardware
> is an Intel 3 Ghz Xeon with 4GB RAM, and 6 disks in a RAID 5
> configuration.  For current testing I am running a single database
> connection with no other applications running on the machine, and the
> swap is not being used at all.
>
> Here is the table definition:
>
> mdsdb=# \d backup_location
>  Table "public.backup_location"
>   Column   |  Type   | Modifiers
> -----------+---------+-----------
>  record_id | bigint  | not null
>  backup_id | integer | not null
> Indexes:
>     "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id)
>     "backup_location_rid" btree (record_id)
> Foreign-key constraints:
>     "backup_location_bfk" FOREIGN KEY (backup_id) REFERENCES
> backups(backup_id) ON DELETE CASCADE
>
> Here is the table size:
>
> mdsdb=# select count(*) from backup_location;
>    count
> -----------
>  162101296
> (1 row)
>
> And here is a simple query on this table that takes nearly 20 minutes to
> return less then 3000 rows.  I ran an analyze immediately before I ran
> this query:
>
> mdsdb=# explain analyze select record_id from backup_location where
> backup_id = 1070;
>
> QUERY PLAN
>
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> -------------
>  Index Scan using backup_location_pkey on backup_location
> (cost=0.00..1475268.53 rows=412394 width=8) (actual
> time=3318.057..1196723.915 rows=2752 loops=1)
>    Index Cond: (backup_id = 1070)
>  Total runtime: 1196725.617 ms
> (3 rows)
>
I've got a few points.  Firstly, is your data amenable to partitioning?
If so that might be a big winner.
Secondly, it might be more efficient for the planner to choose the
backup_location_rid index than the combination primary key index.  You
can test this theory with this cool pg trick:

begin;
alter table backup_location drop constraint backup_location_pkey;
explain analyze select ....
rollback;

to see if it's faster.

> Obviously at this point the application is not usable.  If possible we
> would like to grow this table to the 3-5 billion row range, but I don't
> know if that is realistic.
>
> Any guidance would be greatly appreciated.
>

Without knowing more about your usage patterns, it's hard to say.  But
partitioning seems like your best choice at the moment.

Re: Slow queries on big table

From
Tom Lane
Date:
"Tyrrill, Ed" <tyrrill_ed@emc.com> writes:
>  Index Scan using backup_location_pkey on backup_location
> (cost=0.00..1475268.53 rows=412394 width=8) (actual
> time=3318.057..1196723.915 rows=2752 loops=1)
>    Index Cond: (backup_id = 1070)
>  Total runtime: 1196725.617 ms

If we take that at face value it says the indexscan is requiring 434
msec per actual row fetched.  Which is just not very credible; the worst
case should be about 1 disk seek per row fetched.  So there's something
going on that doesn't meet the eye.

What I'm wondering about is whether the table is heavily updated and
seldom vacuumed, leading to lots and lots of dead tuples being fetched
and then rejected (hence they'd not show in the actual-rows count).

The other thing that seems pretty odd is that it's not using a bitmap
scan --- for such a large estimated rowcount I'd have expected a bitmap
scan not a plain indexscan.  What do you get from EXPLAIN ANALYZE if
you force a bitmap scan?  (Set enable_indexscan off, and enable_seqscan
too if you have to.)

            regards, tom lane

Re: Slow queries on big table

From
Andrew Kroeger
Date:
Tyrrill, Ed wrote:
> mdsdb=# \d backup_location
>  Table "public.backup_location"
>   Column   |  Type   | Modifiers
> -----------+---------+-----------
>  record_id | bigint  | not null
>  backup_id | integer | not null
> Indexes:
>     "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id)
>     "backup_location_rid" btree (record_id)
> Foreign-key constraints:
>     "backup_location_bfk" FOREIGN KEY (backup_id) REFERENCES
> backups(backup_id) ON DELETE CASCADE

[snip]

> mdsdb=# explain analyze select record_id from backup_location where
> backup_id = 1070;
>
> QUERY PLAN
>
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> -------------
>  Index Scan using backup_location_pkey on backup_location
> (cost=0.00..1475268.53 rows=412394 width=8) (actual
> time=3318.057..1196723.915 rows=2752 loops=1)
>    Index Cond: (backup_id = 1070)
>  Total runtime: 1196725.617 ms
> (3 rows)

The "backup_location_rid" index on your table is not necessary.  The
primary key index on (record_id, backup_id) can be used by Postgres,
even if the query is only constrained by record_id.  See
http://www.postgresql.org/docs/8.2/interactive/indexes-multicolumn.html
for details.

The explain plan indicates that your query is filtered on backup_id, but
is using the primary key index on (record_id, backup_id).  Based on the
table definition, you do not have any good index for filtering on backup_id.

The explain plan also seems way off, as I would expect a sequential scan
would be used without a good index for backup_id.  Did you disable
sequential scans before running this query?  Have you altered any other
configuration or planner parameters?

As your "backup_location_rid" is not necessary, I would recommend
dropping that index and creating a new one on just backup_id.  This
should be a net wash on space, and the new index should make for a
straight index scan for the query you presented.  Don't forget to
analyze after changing the indexes.

Hope this helps.

Andrew


Re: Slow queries on big table

From
Tom Lane
Date:
Scott Marlowe <smarlowe@g2switchworks.com> writes:
> Secondly, it might be more efficient for the planner to choose the
> backup_location_rid index than the combination primary key index.

Oh, I'm an idiot; I didn't notice the way the index was set up.  Yeah,
that index pretty well sucks for a query on backup_id --- it has to scan
the entire index, since there's no constraint on the leading column.
So that's where the time is going.

This combination of indexes:

> Indexes:
>     "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id)
>     "backup_location_rid" btree (record_id)

is really just silly.  You should have the pkey and then an index on
backup_id alone.  See the discussion of multiple indexes in the fine
manual:
http://www.postgresql.org/docs/8.2/static/indexes-multicolumn.html
http://www.postgresql.org/docs/8.2/static/indexes-bitmap-scans.html

            regards, tom lane

Re: Slow queries on big table

From
"Tyrrill, Ed"
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
>
> Scott Marlowe <smarlowe@g2switchworks.com> writes:
> > Secondly, it might be more efficient for the planner to choose the
> > backup_location_rid index than the combination primary key index.
>
> Oh, I'm an idiot; I didn't notice the way the index was set up.
> Yeah, that index pretty well sucks for a query on backup_id ---
> it has to scan the entire index, since there's no constraint on the
> leading column.
> So that's where the time is going.
>
> This combination of indexes:
>
> > Indexes:
> >     "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id)
> >     "backup_location_rid" btree (record_id)
>
> is really just silly.  You should have the pkey and then an index on
> backup_id alone.  See the discussion of multiple indexes in the fine
> manual:
> http://www.postgresql.org/docs/8.2/static/indexes-multicolumn.html
> http://www.postgresql.org/docs/8.2/static/indexes-bitmap-scans.html
>
>             regards, tom lane

Thanks for the help guys!  That was my problem.  I actually need the
backup_location_rid index for a different query so I am going to keep
it.  Here is the result with the new index:

mdsdb=# explain analyze select record_id from backup_location where
backup_id = 1070;
                                                                   QUERY
PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
 Index Scan using backup_location_bid on backup_location
(cost=0.00..9573.07 rows=415897 width=8) (actual time=0.106..3.486
rows=2752 loops=1)
   Index Cond: (backup_id = 1070)
 Total runtime: 4.951 ms
(3 rows)

Re: Slow queries on big table

From
"Steinar H. Gunderson"
Date:
On Fri, May 18, 2007 at 02:22:52PM -0700, Tyrrill, Ed wrote:
>  Total runtime: 4.951 ms

Going from 1197 seconds to 5 milliseconds. That's some sort of record in a
while, I think :-)

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Slow queries on big table

From
Tom Lane
Date:
"Tyrrill, Ed" <tyrrill_ed@emc.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> This combination of indexes:
>>
>>> Indexes:
>>> "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id)
>>> "backup_location_rid" btree (record_id)
>>
>> is really just silly.  You should have the pkey and then an index on
>> backup_id alone.

> Thanks for the help guys!  That was my problem.  I actually need the
> backup_location_rid index for a different query so I am going to keep
> it.

Well, you don't really *need* it; the two-column index on (record_id,
backup_id) will serve perfectly well for queries on its leading column
alone.  It'll be physically bigger and hence slightly slower to scan
than a single-column index; but unless the table is almost completely
read-only, the update overhead of maintaining all three indexes is
probably going to cost more than you can save with it.  Try that other
query with and without backup_location_rid and see how much you're
really saving.

>  Index Scan using backup_location_bid on backup_location
> (cost=0.00..9573.07 rows=415897 width=8) (actual time=0.106..3.486
> rows=2752 loops=1)
>    Index Cond: (backup_id = 1070)
>  Total runtime: 4.951 ms

That's more like it ;-)

            regards, tom lane

Re: Slow queries on big table

From
"Tyrrill, Ed"
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Thanks for the help guys!  That was my problem.  I actually need the
>> backup_location_rid index for a different query so I am going to keep

>> it.
>
> Well, you don't really *need* it; the two-column index on (record_id,
> backup_id) will serve perfectly well for queries on its leading column
> alone.  It'll be physically >>bigger and hence slightly slower to scan
> than a single-column index; but unless the table is almost completely
> read-only, the update overhead of maintaining all three indexes is
> probably going to cost more than you can save with it.  Try that other
> query with and without backup_location_rid and see how much you're
> really saving.

Well, the query that got me to add backup_location_rid took 105 minutes
using only the primary key index.  After I added backup_location_rid
the query was down to about 45 minutes.  Still not very good, and I am
still fiddling around with it.  The query is:

mdsdb=# explain analyze select backupobjects.record_id from
backupobjects left outer join backup_location using(record_id) where
backup_id is null;

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
-------------------------
 Merge Left Join  (cost=0.00..21408455.06 rows=11790970 width=8) (actual
time=2784967.410..2784967.410 rows=0 loops=1)
   Merge Cond: ("outer".record_id = "inner".record_id)
   Filter: ("inner".backup_id IS NULL)
   ->  Index Scan using backupobjects_pkey on backupobjects
(cost=0.00..443484.31 rows=11790970 width=8) (actual
time=0.073..47865.957 rows=11805996 loops=1)
   ->  Index Scan using backup_location_rid on backup_location
(cost=0.00..20411495.21 rows=162435366 width=12) (actual
time=0.110..2608485.437 rows=162426837 loops=1)
 Total runtime: 2784991.612 ms
(6 rows)

It is of course the same backup_location, but backupobjects is:

mdsdb=# \d backupobjects
               Table "public.backupobjects"
     Column     |            Type             | Modifiers
----------------+-----------------------------+-----------
 record_id      | bigint                      | not null
 dir_record_id  | integer                     |
 name           | text                        |
 extension      | character varying(64)       |
 hash           | character(40)               |
 mtime          | timestamp without time zone |
 size           | bigint                      |
 user_id        | integer                     |
 group_id       | integer                     |
 meta_data_hash | character(40)               |
Indexes:
    "backupobjects_pkey" PRIMARY KEY, btree (record_id)
    "backupobjects_meta_data_hash_key" UNIQUE, btree (meta_data_hash)
    "backupobjects_extension" btree (extension)
    "backupobjects_hash" btree (hash)
    "backupobjects_mtime" btree (mtime)
    "backupobjects_size" btree (size)

record_id has in backupobjects has a many to many relationship to
record_id
in backup_location.

Ed