Re: performance on selecting a row in large tables - Mailing list pgsql-admin

From Tino Schwarze
Subject Re: performance on selecting a row in large tables
Date
Msg-id 20080204214653.GC17045@easy2.in-chemnitz.de
Whole thread Raw
In response to Re: performance on selecting a row in large tables  ("Rainer Spittel" <rainer.spittel@terralink.co.nz>)
List pgsql-admin
Hi Rainer,

On Tue, Feb 05, 2008 at 10:24:11AM +1300, Rainer Spittel wrote:

>   You are right, this query is not the right approach for performance
> testing. I thought that this will give an indication about the
> performance of a select statement on that table.

Only do benchmarking with real-world queries. Or you'll end up
optimizing for artificial workloads which never occur.

>   One of those slow queries are running on col02 which has a btree
> index. But I use the 'in' expression to get a set of matching rows:
>
>   select * from table where col02 in ('...',[...],'...')
>
>   This query gets sometimes really slow, I guess it depends on the size
> of the set used by 'in'.

How much in-clauses are there? It might be faster to use a subselect if
the in clauses come from DB anyway.

>   Would the query perform better when I cluster the index on col02 and
> force to order the set for the in clause?

>   Is there a way to disable the caching for testing? Once I ran the
> query, the result set seems to be cached and the second run of the query
> is fast. This makes a testing a little difficult ;-)

Use "EXPLAIN query" first to get an estimate how the query will perform.
Get used to reading EXPLAIN output. You already seem to have tuned the
DB to your server (setting memory sizes etc.), now optimize queries
first, then the DB server (look at the settings like random_page_cost,
effective_cache_size etc. if the planner performs sequential scans and
you think it should use indices).

If you really want to rule out caching, you'd need to keep trashing your
OS's cache, e.g. by running a "while dd if=/dev/some-drive of=/dev/null
bs=1m ; do : ; done" in the background. But again, you'll optimize for a
artifical workload - in real live, the cache will play a significant
role. There's no use in optimizing queries for no-cache scenario if in
practice, the cache will have the data. I know of real-live applications
where the first query takes some time and all subsequent are fast.
That's what caches are for.

HTH,

Tino.

--
www.craniosacralzentrum.de
www.spiritualdesign-chemnitz.de

Tino Schwarze * Lortzingstraße 21 * 09119 Chemnitz

pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: performance on selecting a row in large tables
Next
From: Kevin Kempter
Date:
Subject: last analyze / vacuum