Thread: turn off caching for performance test
Hi,
I have a colleague that is convinced that the website is faster if enable_seqscan is turned OFF.
I'm convinced of the opposite (better to leave it ON), but i would like to show it, prove it to him.
Now the first query we tried, would do a bitmap heap scan instead of a seqscan when the latter were disabled, to exclude about 50% of the records (18K of 37K records).
The bitmap heap scan is 3% faster, so that didn't really plea my case.
The thing is that by the time we tried it, the data had been cached, so there is no penalty for the use of the index (HDD retention on random access). So it's logical that the index lookup is faster, it looks up less records.
Now i'm looking for a way to turn off the caching, so that we'll have a fair test.
It makes no sense to me to set shared_buffers really low. Any tips?
Cheers,
WBL
--
"Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw
I have a colleague that is convinced that the website is faster if enable_seqscan is turned OFF.
I'm convinced of the opposite (better to leave it ON), but i would like to show it, prove it to him.
Now the first query we tried, would do a bitmap heap scan instead of a seqscan when the latter were disabled, to exclude about 50% of the records (18K of 37K records).
The bitmap heap scan is 3% faster, so that didn't really plea my case.
The thing is that by the time we tried it, the data had been cached, so there is no penalty for the use of the index (HDD retention on random access). So it's logical that the index lookup is faster, it looks up less records.
Now i'm looking for a way to turn off the caching, so that we'll have a fair test.
It makes no sense to me to set shared_buffers really low. Any tips?
Cheers,
WBL
--
"Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw
Isn't it more fair to just flush the cache before doing each of the queries? In real-life, you'll also have disk caching... Flushing the buffer pool is easy, just restart PostgreSQL (or perhaps there is a admin command for it too?). Flushing the OS-disk cache is obviously OS-dependent, for linux its trivial: http://linux-mm.org/Drop_Caches Best regards, Arjen On 26-8-2010 12:32 Willy-Bas Loos wrote: > Hi, > > I have a colleague that is convinced that the website is faster if > enable_seqscan is turned OFF. > I'm convinced of the opposite (better to leave it ON), but i would like > to show it, prove it to him. > Now the first query we tried, would do a bitmap heap scan instead of a > seqscan when the latter were disabled, to exclude about 50% of the > records (18K of 37K records). > The bitmap heap scan is 3% faster, so that didn't really plea my case. > The thing is that by the time we tried it, the data had been cached, so > there is no penalty for the use of the index (HDD retention on random > access). So it's logical that the index lookup is faster, it looks up > less records. > > Now i'm looking for a way to turn off the caching, so that we'll have a > fair test. > > It makes no sense to me to set shared_buffers really low. Any tips? > > Cheers, > > WBL > > > -- > "Patriotism is the conviction that your country is superior to all > others because you were born in it." -- George Bernard Shaw
> The bitmap heap scan is 3% faster, 3% isn't really significant. Especially if the new setting makes one query 100 times slower... Like a query which will, by bad luck, get turned into a nested loop index scan for a lot of rows, on a huge table which isn't in cache...
@Pierre: i know.. but first i'd have to find such a query from real-life. And also, i'm convinced that this query would be faster with a seqscan if the data wenen't cached.
@Arjen: thanks, that helps.
But that's only the OS cache. There's also the shared_buffers, which are a postgres specific thing.
I've found DISCARD in the manual, but that only influences a single session, not the shared buffers.
I reckon restarting the cluster should help, would it wipe out the cache? (pg_ctlcluster 8.3 main restart)
Or is there a more graceful way?
Cheers,
WBL
--
"Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw
@Arjen: thanks, that helps.
But that's only the OS cache. There's also the shared_buffers, which are a postgres specific thing.
I've found DISCARD in the manual, but that only influences a single session, not the shared buffers.
I reckon restarting the cluster should help, would it wipe out the cache? (pg_ctlcluster 8.3 main restart)
Or is there a more graceful way?
Cheers,
WBL
On Thu, Aug 26, 2010 at 7:37 PM, Pierre C <lists@peufeu.com> wrote:
3% isn't really significant. Especially if the new setting makes one query 100 times slower... Like a query which will, by bad luck, get turned into a nested loop index scan for a lot of rows, on a huge table which isn't in cache...The bitmap heap scan is 3% faster,
--
"Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw
Willy-Bas Loos wrote:
Stop the cluster; flush the OS cache; start the cluster again. Now you have a clean cache to retest again. No easier way that's reliable. If you try to clear out the database by doing things like scanning large tables not involved in the query, you'll discover features in PostgreSQL will specifically defeat that from using more than a small portion of the cache. Better to just do a full shutdown.
But that's only the OS cache. There's also the shared_buffers, which are a postgres specific thing.
I've found DISCARD in the manual, but that only influences a single session, not the shared buffers.
I reckon restarting the cluster should help, would it wipe out the cache? (pg_ctlcluster 8.3 main restart)
Or is there a more graceful way?
Stop the cluster; flush the OS cache; start the cluster again. Now you have a clean cache to retest again. No easier way that's reliable. If you try to clear out the database by doing things like scanning large tables not involved in the query, you'll discover features in PostgreSQL will specifically defeat that from using more than a small portion of the cache. Better to just do a full shutdown.
-- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Thu, Aug 26, 2010 at 4:32 AM, Willy-Bas Loos <willybas@gmail.com> wrote: > Hi, > > I have a colleague that is convinced that the website is faster if > enable_seqscan is turned OFF. > I'm convinced of the opposite (better to leave it ON), but i would like to > show it, prove it to him. Stop, you're both doing it wrong. The issue isn't whether or not turning off seq scans will make a few things faster here and there, it's why is the query planner choosing sequential scans when it should be choosing index scans. So, what are your non-default settings in postgresql.conf? Have you increased effective_cache_size yet? Lowered random_page_cost? Raised default stats target and re-analyzed? Have you been looking at the problem queries with explain analyze? What does it have to say about the planners choices?
On Thu, Aug 26, 2010 at 6:32 AM, Willy-Bas Loos <willybas@gmail.com> wrote: > I have a colleague that is convinced that the website is faster if > enable_seqscan is turned OFF. > I'm convinced of the opposite (better to leave it ON), but i would like to > show it, prove it to him. > Now the first query we tried, would do a bitmap heap scan instead of a > seqscan when the latter were disabled, to exclude about 50% of the records > (18K of 37K records). > The bitmap heap scan is 3% faster, so that didn't really plea my case. > The thing is that by the time we tried it, the data had been cached, so > there is no penalty for the use of the index (HDD retention on random > access). So it's logical that the index lookup is faster, it looks up less > records. > > Now i'm looking for a way to turn off the caching, so that we'll have a fair > test. > > It makes no sense to me to set shared_buffers really low. Any tips? setting shared_buffers low or high is not going to flush the cache. it only controls whether the o/s cache or the pg buffer cache is used. Disabling sequential scans is going to un-optimize a large class of operations where a sequential scan is really the best choice of action. In the old days of postgres, where the planner wasn't as smart as it is today and some of the plan invalidation mechanics weren't there, it wasn't that uncommon to disable them. Today, it's really not a good idea unless you have a very specific reason to, and even then I'd advise temporarily setting it and then setting it back when your operation is done. merlin
On Fri, Aug 27, 2010 at 1:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Thu, Aug 26, 2010 at 4:32 AM, Willy-Bas Loos <willybas@gmail.com> wrote: >> Hi, >> >> I have a colleague that is convinced that the website is faster if >> enable_seqscan is turned OFF. >> I'm convinced of the opposite (better to leave it ON), but i would like to >> show it, prove it to him. > > Stop, you're both doing it wrong. The issue isn't whether or not > turning off seq scans will make a few things faster here and there, > it's why is the query planner choosing sequential scans when it should > be choosing index scans. > > So, what are your non-default settings in postgresql.conf? > Have you increased effective_cache_size yet? > Lowered random_page_cost? > Raised default stats target and re-analyzed? > > Have you been looking at the problem queries with explain analyze? > What does it have to say about the planners choices? [a bit behind on my email] This was exactly my thought on first reading this post. If the indexes are faster and PG thinks they are slower, it's a good bet that there are some parameters that need tuning. Specifically, effective_cache_size may be too low, and random_page_cost and seq_page_cost are almost certainly too high. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Hi, Sorry for the late answer. I found the query i was looking for in the log (duration) and could prove that the seqscan is faster if the data were not cached. This particular one was 22% faster. It is "a query which will get turned into a nested loop index scan for a lot of rows, on a huge table", but it's only 22% slower without a seqscan. (there's no advantage with seqscans off, as long as the cache is empty) I found few queries that did sequential scans in the normal mode on tables that matter. I found one query that did a seqscan anyway(with enable_seqscan off), because doing an index scan would be more than 1M points more expensive (to the planner). $ grep ^[^#] /etc/postgresql/8.3/main/postgresql.conf|grep -e ^[^[:space:]] data_directory =<blah> # use data in another directory hba_file = <blah> # host-based authentication file ident_file = <blah> # ident configuration file external_pid_file = <blah> # write an extra PID file listen_addresses = '*' # what IP address(es) to listen on; port = 5432 # (change requires restart) max_connections = 200 # (change requires restart) unix_socket_directory = '/var/run/postgresql' # (change requires restart) tcp_keepalives_idle = 120 # TCP_KEEPIDLE, in seconds; tcp_keepalives_interval = 0 # TCP_KEEPINTVL, in seconds; tcp_keepalives_count = 0 # TCP_KEEPCNT; shared_buffers = 2GB # min 128kB or max_connections*16kB temp_buffers = 24MB # min 800kB work_mem = 100MB # min 64kB maintenance_work_mem = 256MB # min 1MB max_fsm_pages = 600000 # min max_fsm_relations*16, 6 bytes each synchronous_commit = off # immediate fsync at commit checkpoint_segments = 16 # in logfile segments, min 1, 16MB each effective_cache_size = 4GB log_min_duration_statement = 2000 # -1 is disabled, 0 logs all statements --> milliseconds log_line_prefix = '%t ' # special values: autovacuum = on # Enable autovacuum subprocess? 'on' datestyle = 'iso, mdy' lc_messages = 'en_US.UTF-8' # locale for system error message lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting default_text_search_config = 'pg_catalog.english' max_locks_per_transaction = 128 # min 10 We have 15K rpm SAS disks in RAID10. We have 16 GB of RAM and 4 modern processor cores (i think xeons, might also be opteron) We run Debian Lenny. It's a dedicated DB server, there is one other cluster on it without very much data and with few connections to it daily. df -h on the data dir gives me 143G we're growing there are many queries that should be optimized the seqscan option is in the connection string, not in the postgresql.conf Cheers, On Fri, Aug 27, 2010 at 7:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Thu, Aug 26, 2010 at 4:32 AM, Willy-Bas Loos <willybas@gmail.com> wrote: >> Hi, >> >> I have a colleague that is convinced that the website is faster if >> enable_seqscan is turned OFF. >> I'm convinced of the opposite (better to leave it ON), but i would like to >> show it, prove it to him. > > Stop, you're both doing it wrong. The issue isn't whether or not > turning off seq scans will make a few things faster here and there, > it's why is the query planner choosing sequential scans when it should > be choosing index scans. > > So, what are your non-default settings in postgresql.conf? > Have you increased effective_cache_size yet? > Lowered random_page_cost? > Raised default stats target and re-analyzed? > > Have you been looking at the problem queries with explain analyze? > What does it have to say about the planners choices? > -- "Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw
> I found one query that did a seqscan anyway(with enable_seqscan off), > because doing an index scan would be more than 1M points more > expensive (to the planner). Hmm, i guess that says it all :) -- "Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw