Thread: Index Performance Help

Index Performance Help

From
"Damien Dougan"
Date:
Hi All,

I've been seeing very slow read performance on a database of 1 million
indexed subscribers, which I believe is nothing to do with the data
itself, but delays on processing the index.

If I make a random jump into the index (say X), it can take about 50ms
to read the subscriber. If I then make a "close by" lookup (say X+10),
it takes only about 0.5ms to read the subscriber. Making another lookup
to a "far away" (say X+1000), it again takes about 50ms to read.

From the analyze output, it looks like most of the work is being done in
the index scan of the subscriber table - reading the actual data from
the PublicView is quite fast.

Am I correct in my analysis? Is there anything I can do to improve the
performance of the index lookups?

(The indexes in question are all created as B-TREE.)

I've tried increasing the index memory and making a number of queries
around the index range, but a stray of several hundred indexes from a
cached entry always results in a major lookup delay.

I've also increased the shared memory available to Postgres to 80MB
incase this is a paging of the index, but it hasn't seemed to have any
effect.




Sample analyze output for an initial query:

hydradb=# explain analyze select * from pvsubscriber where actorid =
'b3432-asdas-232-Subscriber793500';

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------------
 Nested Loop Left Join  (cost=0.00..13.19 rows=1 width=100) (actual
time=49.688..49.699 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..10.16 rows=1 width=69) (actual
time=49.679..49.689 rows=1 loops=1)
         Join Filter: ("inner".mc_childactor_id = "outer".id)
         ->  Nested Loop  (cost=0.00..10.15 rows=1 width=69) (actual
time=49.669..49.677 rows=1 loops=1)
               ->  Nested Loop  (cost=0.00..7.12 rows=1 width=73)
(actual time=43.969..43.974 rows=1 loops=1)
                     ->  Index Scan using mc_actor_key on mc_actor
(cost=0.00..4.08 rows=1 width=69) (actual time=39.497..39.499 rows=1
loops=1)
                           Index Cond: ((actorid)::text =
'b3432-asdas-232-Subscriber793500'::text)
                     ->  Index Scan using rel_actor_has_subscriber_idx1
on rel_actor_has_subscriber rel_sub  (cost=0.00..3.02 rows=1 width=8)
(actual time=4.458..4.460 rows=1 loops=1)
                           Index Cond: ("outer".id =
rel_sub.mc_actor_id)
               ->  Index Scan using mc_subscriber_id_idx on
mc_subscriber sub  (cost=0.00..3.02 rows=1 width=4) (actual
time=5.689..5.691 rows=1 loops=1)
                     Index Cond: (sub.id = "outer".mc_subscriber_id)
         ->  Seq Scan on rel_actor_has_actor rel_parent
(cost=0.00..0.00 rows=1 width=8) (actual time=0.002..0.002 rows=0
loops=1)
   ->  Index Scan using mc_actor_id_idx on mc_actor  (cost=0.00..3.02
rows=1 width=39) (actual time=0.002..0.002 rows=0 loops=1)
         Index Cond: ("outer".mc_parentactor_id = mc_actor.id)
 Total runtime: 49.845 ms
(15 rows)



And the analyze output for a "nearby" subscriber (10 indexes away):

hydradb=# explain analyze select * from pvsubscriber where actorid =
'b3432-asdas-232-Subscriber793510';

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------------
 Nested Loop Left Join  (cost=0.00..13.19 rows=1 width=100) (actual
time=0.278..0.288 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..10.16 rows=1 width=69) (actual
time=0.271..0.280 rows=1 loops=1)
         Join Filter: ("inner".mc_childactor_id = "outer".id)
         ->  Nested Loop  (cost=0.00..10.15 rows=1 width=69) (actual
time=0.264..0.272 rows=1 loops=1)
               ->  Nested Loop  (cost=0.00..7.12 rows=1 width=73)
(actual time=0.246..0.251 rows=1 loops=1)
                     ->  Index Scan using mc_actor_key on mc_actor
(cost=0.00..4.08 rows=1 width=69) (actual time=0.220..0.221 rows=1
loops=1)
                           Index Cond: ((actorid)::text =
'b3432-asdas-232-Subscriber793510'::text)
                     ->  Index Scan using rel_actor_has_subscriber_idx1
on rel_actor_has_subscriber rel_sub  (cost=0.00..3.02 rows=1 width=8)
(actual time=0.017..0.018 rows=1 loops=1)
                           Index Cond: ("outer".id =
rel_sub.mc_actor_id)
               ->  Index Scan using mc_subscriber_id_idx on
mc_subscriber sub  (cost=0.00..3.02 rows=1 width=4) (actual
time=0.012..0.013 rows=1 loops=1)
                     Index Cond: (sub.id = "outer".mc_subscriber_id)
         ->  Seq Scan on rel_actor_has_actor rel_parent
(cost=0.00..0.00 rows=1 width=8) (actual time=0.002..0.002 rows=0
loops=1)
   ->  Index Scan using mc_actor_id_idx on mc_actor  (cost=0.00..3.02
rows=1 width=39) (actual time=0.001..0.001 rows=0 loops=1)
         Index Cond: ("outer".mc_parentactor_id = mc_actor.id)
 Total runtime: 0.428 ms
(15 rows)




Many thanks,

Damien

----------------------------------------------------------------------
Damien Dougan, Software Architect
Mobile Cohesion - http://www.mobilecohesion.com
Email: damien.dougan@mobilecohesion.com
Mobile: +44 7766477997
----------------------------------------------------------------------



Re: Index Performance Help

From
Richard Huxton
Date:
On Thursday 05 February 2004 12:13, Damien Dougan wrote:
> Hi All,
>
> I've been seeing very slow read performance on a database of 1 million
> indexed subscribers, which I believe is nothing to do with the data
> itself, but delays on processing the index.
>
> If I make a random jump into the index (say X), it can take about 50ms
> to read the subscriber. If I then make a "close by" lookup (say X+10),
> it takes only about 0.5ms to read the subscriber. Making another lookup
> to a "far away" (say X+1000), it again takes about 50ms to read.

The first time, it has to fetch a block from disk. The second time that disk
block is already in RAM so it's much faster. The third time it needs a
different disk block.

> Am I correct in my analysis? Is there anything I can do to improve the
> performance of the index lookups?

Make sure you have enough RAM to buffer your disks. Buy faster disks.

> I've tried increasing the index memory and making a number of queries
> around the index range, but a stray of several hundred indexes from a
> cached entry always results in a major lookup delay.

Yep, that'll be your disks.

> I've also increased the shared memory available to Postgres to 80MB
> incase this is a paging of the index, but it hasn't seemed to have any
> effect.

Probably the wrong thing to do (although you don't mention what hardware
you've got). Read the tuning document at:
http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

> Sample analyze output for an initial query:
>
> hydradb=# explain analyze select * from pvsubscriber where actorid =
> 'b3432-asdas-232-Subscriber793500';
...
>                      ->  Index Scan using mc_actor_key on mc_actor
> (cost=0.00..4.08 rows=1 width=69) (actual time=39.497..39.499 rows=1
> loops=1)
...
>  Total runtime: 49.845 ms

> And the analyze output for a "nearby" subscriber (10 indexes away):
>
> hydradb=# explain analyze select * from pvsubscriber where actorid =
> 'b3432-asdas-232-Subscriber793510';
>
...
>                      ->  Index Scan using mc_actor_key on mc_actor
> (cost=0.00..4.08 rows=1 width=69) (actual time=0.220..0.221 rows=1
> loops=1)
>  Total runtime: 0.428 ms
> (15 rows)

That certainly seems to be the big change - the only way to consistently get
1ms timings is going to be to make sure all your data is cached. Try the
tuning guide above and see what difference that makes. If that's no good,
post again with details of your config settings, hardware, number of clients
etc...

--
  Richard Huxton
  Archonet Ltd

Re: Index Performance Help

From
"Damien Dougan"
Date:
Thanks Richard.

It certainly does appear to be memory related (on a smaller data set of
250K subscribers, all accesses are < 1ms).


We're going to play with increasing RAM on the machine, and applying the
optimisation levels on the page you recommended.

(We're also running on a hardware RAID controlled SCSI set - mirrored
disks so reading should be very fast).


Cheers,

Damien



Re: Index Performance Help

From
Josh Berkus
Date:
Damian,

Also, if there have been a lot of updates to the table, you may need to run a
REINDEX on it.   An attenuated index would be slow to load because of the
nummber of empty disk blocks.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Index Performance Help

From
Greg Stark
Date:
"Damien Dougan" <damien.dougan@mobilecohesion.com> writes:

> Sample analyze output for an initial query:
>
> hydradb=# explain analyze select * from pvsubscriber where actorid =
> 'b3432-asdas-232-Subscriber793500';

I take it pvsubscriber is a view? What's the definition of your view?

> ->  Index Scan using mc_actor_key on mc_actor
>        (cost=0.00..4.08 rows=1 width=69)
>        (actual time=39.497..39.499 rows=1 loops=1)

Is this table regularly vacuumed? Is it possible it has lots of dead records
with this value for actorid? Try running vacuum full, or better "vacuum full
verbose" and keep the output, it might explain.

What version of postgres is this? You might try reindexing all your indexes
(but particularly this one). Older versions of postgres were prone to index
bloat problems.


--
greg