Re: index usage (and foreign keys/triggers) - Mailing list pgsql-general

From scott.marlowe
Subject Re: index usage (and foreign keys/triggers)
Date
Msg-id Pine.LNX.4.33.0302261619090.18487-100000@css120.ihs.com
Whole thread Raw
In response to index usage (and foreign keys/triggers)  (Patrik Kudo <kudo@pingpong.net>)
Responses Re: index usage (and foreign keys/triggers)  (Patrik Kudo <kudo@pingpong.net>)
List pgsql-general
On Wed, 26 Feb 2003, Patrik Kudo wrote:

> Hi gurus et al ;)
>
> I have a database with a couple of hundred tables. In this database one
> table, "person", represents a user. person.userid is a primary key. To
> this key there are about a hundred foreign keys spread out over
> aproximately equaly many tables. When deleting a user I noticed a quite
> big difference in time depending on how much data there are in the
> foreign key-tables. After some investigation I concluded that for some
> users and some tables the indices wheren't used when deleting, resulting
> in longer run-times.
>
> Here's an example:
>
> select count(*) from login;
>   count
> -------
>   96824
>
> select count(*) from login where userid = 'patrik';
>   count
> -------
>     608
>
> select count(*) from login where userid = 'jennie';
>   count
> -------
>    4211
>
> explain delete from login where userid = 'patrik';
>                                     QUERY PLAN
>
> ---------------------------------------------------------------------------------
>   Index Scan using login_userid_idx on login  (cost=0.00..237.06 rows=61
> width=6)
>     Index Cond: (userid = 'patrik'::text)
>
> explain delete from login where userid = 'jennie';
>                          QUERY PLAN
> -----------------------------------------------------------
>   Seq Scan on login  (cost=0.00..2045.30 rows=3421 width=6)
>     Filter: (userid = 'jennie'::text)
>
>
> What makes the planer choose seq scan for 'jennie', but not for
> 'patrik'? I also tested the following:
>
> delete from login where userid = 'jennie';
> DELETE 4211
> Time: 508.94 ms
>
> set enable_seqscan = false;
>
> delete from login where userid = 'jennie';
> DELETE 4211
> Time: 116.92 ms
>
> As you can see the index scan is almost 5 times faster, but still
> postgres chooses to seq scan... Am I doing something wrong or is
> postgres being stupid on me?

Postgresql is being smart, just not smart enough.

Imagine that one of your queries was to delete 99.9% of all the tuples.
Would an index scan help then?  Of course not, since you're going to visit
nearly every row in the database.

the planner uses several settings to try and figure out the cost of
sequentially scanning a table versus index access, and it doesn't always
get things right.

Take a look at random_page_cost.  It defaults to 4, which means that
postgresql will make it's decisions on index versus seq scan assuming that
random individual pages cost 4 times as much to get as a sequential scan
that just happens to include them.

On most modern machines the difference in cost is very low, what with disk
caching and all.  This is especially true for smaller tables that can fit
in memory.  Once a table's in buffer memory, along with it's index, random
page cost will be about 1.  I.e. a seq scan in memory or an index op are
both quite fast.  In fact, it is possible that at this point, a random
page access for certain percentages of your table will cost you LESS than
1 in practice because linear access in memory yields little if any gain
over random access.  The only overhead is reading the index blocks.

So, try tuning your random page cost down to somewhere between 1.0 and 2.0
for best performance on these kinds of things.  Our database at work runs
on a machine with 1.5 gig ram, of which 800 megs is cache, and postgresql
has 256 meg shared buffer.  It generally only hits the drives about 5% of
the reads, so random page cost for us is set to 1.2 and works well.

Welcome to the wonderful world of performance tuning...


pgsql-general by date:

Previous
From: Joe Tomcat
Date:
Subject: Re: 7.4?
Next
From: Jonathan Bartlett
Date:
Subject: Filesystem solution for database redundancy