Thread: turn off caching for performance test

turn off caching for performance test

From
Willy-Bas Loos
Date:
Hi,

I have a colleague that is convinced that the website is faster if enable_seqscan is turned OFF.
I'm convinced of the opposite (better to leave it ON), but i would like to show it, prove it to him.
Now the first query we tried, would do a bitmap heap scan instead of a seqscan when the latter were disabled, to exclude about 50% of the records (18K of 37K records).
The bitmap heap scan is 3% faster, so that didn't really plea my case.
The thing is that by the time we tried it, the data had been cached, so there is no penalty for the use of the index (HDD retention on random access). So it's logical that the index lookup is faster, it looks up less records.

Now i'm looking for a way to turn off the caching, so that we'll have a fair test.

It makes no sense to me to set shared_buffers really low. Any tips?

Cheers,

WBL


--
"Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw

Re: turn off caching for performance test

From
Arjen van der Meijden
Date:
Isn't it more fair to just flush the cache before doing each of the
queries? In real-life, you'll also have disk caching... Flushing the
buffer pool is easy, just restart PostgreSQL (or perhaps there is a
admin command for it too?). Flushing the OS-disk cache is obviously
OS-dependent, for linux its trivial: http://linux-mm.org/Drop_Caches

Best regards,

Arjen

On 26-8-2010 12:32 Willy-Bas Loos wrote:
> Hi,
>
> I have a colleague that is convinced that the website is faster if
> enable_seqscan is turned OFF.
> I'm convinced of the opposite (better to leave it ON), but i would like
> to show it, prove it to him.
> Now the first query we tried, would do a bitmap heap scan instead of a
> seqscan when the latter were disabled, to exclude about 50% of the
> records (18K of 37K records).
> The bitmap heap scan is 3% faster, so that didn't really plea my case.
> The thing is that by the time we tried it, the data had been cached, so
> there is no penalty for the use of the index (HDD retention on random
> access). So it's logical that the index lookup is faster, it looks up
> less records.
>
> Now i'm looking for a way to turn off the caching, so that we'll have a
> fair test.
>
> It makes no sense to me to set shared_buffers really low. Any tips?
>
> Cheers,
>
> WBL
>
>
> --
> "Patriotism is the conviction that your country is superior to all
> others because you were born in it." -- George Bernard Shaw

Re: turn off caching for performance test

From
"Pierre C"
Date:
> The bitmap heap scan is 3% faster,

3% isn't really significant. Especially if the new setting makes one query
100 times slower... Like a query which will, by bad luck, get turned into
a nested loop index scan for a lot of rows, on a huge table which isn't in
cache...

Re: turn off caching for performance test

From
Willy-Bas Loos
Date:
@Pierre: i know.. but first i'd have to find such a query from real-life. And also, i'm convinced that this query would be faster with a seqscan if the data wenen't cached.


@Arjen: thanks, that helps.
But that's only the OS cache. There's also the shared_buffers, which are a postgres specific thing.
I've found DISCARD in the manual, but that only influences a single session, not the shared buffers.

I reckon restarting the cluster should help, would it wipe out the cache? (pg_ctlcluster 8.3 main restart)
Or is there a more graceful way?

Cheers,

WBL


On Thu, Aug 26, 2010 at 7:37 PM, Pierre C <lists@peufeu.com> wrote:

The bitmap heap scan is 3% faster,

3% isn't really significant. Especially if the new setting makes one query 100 times slower... Like a query which will, by bad luck, get turned into a nested loop index scan for a lot of rows, on a huge table which isn't in cache...



--
"Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw

Re: turn off caching for performance test

From
Greg Smith
Date:
Willy-Bas Loos wrote:
But that's only the OS cache. There's also the shared_buffers, which are a postgres specific thing.
I've found DISCARD in the manual, but that only influences a single session, not the shared buffers.

I reckon restarting the cluster should help, would it wipe out the cache? (pg_ctlcluster 8.3 main restart)
Or is there a more graceful way?

Stop the cluster; flush the OS cache; start the cluster again.  Now you have a clean cache to retest again.  No easier way that's reliable.  If you try to clear out the database by doing things like scanning large tables not involved in the query, you'll discover features in PostgreSQL will specifically defeat that from using more than a small portion of the cache.  Better to just do a full shutdown.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: turn off caching for performance test

From
Scott Marlowe
Date:
On Thu, Aug 26, 2010 at 4:32 AM, Willy-Bas Loos <willybas@gmail.com> wrote:
> Hi,
>
> I have a colleague that is convinced that the website is faster if
> enable_seqscan is turned OFF.
> I'm convinced of the opposite (better to leave it ON), but i would like to
> show it, prove it to him.

Stop, you're both doing it wrong.  The issue isn't whether or not
turning off seq scans will make a few things faster here and there,
it's why is the query planner choosing sequential scans when it should
be choosing index scans.

So, what are your non-default settings in postgresql.conf?
Have you increased effective_cache_size yet?
Lowered random_page_cost?
Raised default stats target and re-analyzed?

Have you been looking at the problem queries with explain analyze?
What does it have to say about the planners choices?

Re: turn off caching for performance test

From
Merlin Moncure
Date:
On Thu, Aug 26, 2010 at 6:32 AM, Willy-Bas Loos <willybas@gmail.com> wrote:
> I have a colleague that is convinced that the website is faster if
> enable_seqscan is turned OFF.
> I'm convinced of the opposite (better to leave it ON), but i would like to
> show it, prove it to him.
> Now the first query we tried, would do a bitmap heap scan instead of a
> seqscan when the latter were disabled, to exclude about 50% of the records
> (18K of 37K records).
> The bitmap heap scan is 3% faster, so that didn't really plea my case.
> The thing is that by the time we tried it, the data had been cached, so
> there is no penalty for the use of the index (HDD retention on random
> access). So it's logical that the index lookup is faster, it looks up less
> records.
>
> Now i'm looking for a way to turn off the caching, so that we'll have a fair
> test.
>
> It makes no sense to me to set shared_buffers really low. Any tips?

setting shared_buffers low or high is not going to flush the cache. it
only controls whether the o/s cache or the pg buffer cache is used.

Disabling sequential scans is going to un-optimize a large class of
operations where a sequential scan is really the best choice of
action.   In the old days of postgres, where the planner wasn't as
smart as it is today and some of the plan invalidation mechanics
weren't there, it wasn't that uncommon to disable them.  Today, it's
really not a good idea unless you have a very specific reason to, and
even then I'd advise temporarily setting it and then setting it back
when your operation is done.

merlin

Re: turn off caching for performance test

From
Robert Haas
Date:
On Fri, Aug 27, 2010 at 1:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Aug 26, 2010 at 4:32 AM, Willy-Bas Loos <willybas@gmail.com> wrote:
>> Hi,
>>
>> I have a colleague that is convinced that the website is faster if
>> enable_seqscan is turned OFF.
>> I'm convinced of the opposite (better to leave it ON), but i would like to
>> show it, prove it to him.
>
> Stop, you're both doing it wrong.  The issue isn't whether or not
> turning off seq scans will make a few things faster here and there,
> it's why is the query planner choosing sequential scans when it should
> be choosing index scans.
>
> So, what are your non-default settings in postgresql.conf?
> Have you increased effective_cache_size yet?
> Lowered random_page_cost?
> Raised default stats target and re-analyzed?
>
> Have you been looking at the problem queries with explain analyze?
> What does it have to say about the planners choices?

[a bit behind on my email]

This was exactly my thought on first reading this post.  If the
indexes are faster and PG thinks they are slower, it's a good bet that
there are some parameters that need tuning.  Specifically,
effective_cache_size may be too low, and random_page_cost and
seq_page_cost are almost certainly too high.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: turn off caching for performance test

From
Willy-Bas Loos
Date:
Hi,

Sorry for the late answer.
I found the query i was looking for in the log (duration) and could
prove that the seqscan is faster if the data were not cached.
This particular one was 22% faster.
It is "a query which will get turned into a nested loop index scan for
a lot of rows, on a huge table", but it's only 22% slower without a
seqscan.
(there's no advantage with seqscans off, as long as the cache is empty)

I found few queries that did sequential scans in the normal mode on
tables that matter.
I found one query that did a seqscan anyway(with enable_seqscan off),
because doing an index scan would be more than 1M points more
expensive (to the planner).

$ grep ^[^#] /etc/postgresql/8.3/main/postgresql.conf|grep -e ^[^[:space:]]
data_directory =<blah>        # use data in another directory
hba_file = <blah>    # host-based authentication file
ident_file = <blah>    # ident configuration file
external_pid_file = <blah>        # write an extra PID file
listen_addresses = '*'        # what IP address(es) to listen on;
port = 5432                # (change requires restart)
max_connections = 200            # (change requires restart)
unix_socket_directory = '/var/run/postgresql'        # (change requires restart)
tcp_keepalives_idle = 120        # TCP_KEEPIDLE, in seconds;
tcp_keepalives_interval = 0        # TCP_KEEPINTVL, in seconds;
tcp_keepalives_count = 0        # TCP_KEEPCNT;
shared_buffers = 2GB            # min 128kB or max_connections*16kB
temp_buffers = 24MB            # min 800kB
work_mem = 100MB                # min 64kB
maintenance_work_mem = 256MB        # min 1MB
max_fsm_pages = 600000            # min max_fsm_relations*16, 6 bytes each
synchronous_commit = off        # immediate fsync at commit
checkpoint_segments = 16        # in logfile segments, min 1, 16MB each
effective_cache_size = 4GB
log_min_duration_statement = 2000    # -1 is disabled, 0 logs all
statements --> milliseconds
log_line_prefix = '%t '            # special values:
autovacuum = on             # Enable autovacuum subprocess?  'on'
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'            # locale for system error message
lc_monetary = 'en_US.UTF-8'            # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'            # locale for number formatting
lc_time = 'en_US.UTF-8'                # locale for time formatting
default_text_search_config = 'pg_catalog.english'
max_locks_per_transaction = 128        # min 10

We have 15K rpm SAS disks in RAID10.
We have 16 GB of RAM and 4 modern processor cores (i think xeons,
might also be opteron)
We run Debian Lenny.
It's a dedicated DB server, there is one other cluster on it without
very much data and with few connections to it daily.
df -h on the data dir gives me 143G
we're growing
there are many queries that should be optimized
the seqscan option is in the connection string, not in the postgresql.conf

Cheers,

On Fri, Aug 27, 2010 at 7:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Aug 26, 2010 at 4:32 AM, Willy-Bas Loos <willybas@gmail.com> wrote:
>> Hi,
>>
>> I have a colleague that is convinced that the website is faster if
>> enable_seqscan is turned OFF.
>> I'm convinced of the opposite (better to leave it ON), but i would like to
>> show it, prove it to him.
>
> Stop, you're both doing it wrong.  The issue isn't whether or not
> turning off seq scans will make a few things faster here and there,
> it's why is the query planner choosing sequential scans when it should
> be choosing index scans.
>
> So, what are your non-default settings in postgresql.conf?
> Have you increased effective_cache_size yet?
> Lowered random_page_cost?
> Raised default stats target and re-analyzed?
>
> Have you been looking at the problem queries with explain analyze?
> What does it have to say about the planners choices?
>



--
"Patriotism is the conviction that your country is superior to all
others because you were born in it." -- George Bernard Shaw

Re: turn off caching for performance test

From
Willy-Bas Loos
Date:
> I found one query that did a seqscan anyway(with enable_seqscan off),
> because doing an index scan would be more than 1M points more
> expensive (to the planner).

Hmm, i guess that says it all :)
--
"Patriotism is the conviction that your country is superior to all
others because you were born in it." -- George Bernard Shaw