Thread: First query very slow. Solutions: memory, or settings, or SQL?
Hi. I have a query that should be very fast because it's using all indexes, but it takes a lot of time. explain analyze select * from sites where user_id = 'phoenix' order by id desc limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=344.85..344.87 rows=10 width=262) (actual time=5879.069..5879.167 rows=10 loops=1) -> Sort (cost=344.85..345.66 rows=323 width=262) (actual time=5879.060..5879.093 rows=10 loops=1) Sort Key: id -> Index Scan using new_idx_sites_userid on sites (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557 rows=2178 loops=1) Index Cond: ((user_id)::text = 'phoenix'::text) Total runtime: 5879.414 ms (6 rows) Time: 5885.928 ms This query should really not be taking 6 seconds! So my theories: 1. Somehow the sorting is taking a lot of time. Even though it's on the primary key, to find "id desc" the query has to see all the rows. 2. Or the vacuums that I am doing are not very efficient? (Autovacuum is on, and there's nothing in the pg log!) 3. Or the shared buffer or other settings that I have are not up to the mark? I am on CentOS, running PG 8.2.9, with 4GB of RAM. The hard disk is SATA II. All other queries are pretty fast, and this query repeated with different offsets is also very fast, perhaps because the results are cached? My PG conf settings are: #================== max_connections = 300 shared_buffers = 330MB # Not much more than 20k...http://www.revsys.com/writigs/postgresql-performance.html effective_cache_size = 512000 # http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm max_fsm_relations = 100 max_fsm_pages = 300000 work_mem = 20MB temp_buffers = 4096 authentication_timeout = 10s ssl = off checkpoint_warning = 3600 random_page_cost = 1 autovacuum = on autovacuum_vacuum_cost_delay = 20 vacuum_cost_delay = 20 vacuum_cost_limit = 600 # http://sn.im/b86nd - Vacuums taking forever autovacuum_naptime = 10 stats_start_collector = on stats_row_level = on autovacuum_vacuum_threshold = 75 autovacuum_analyze_threshold = 25 #================== And my table is as follows: #================== Table "public.sites" Column | Type | Modifiers -----------------------+-----------------------------+------------------------------ id | bigint | not null link_id | character varying(10) | not null alias | character varying(35) | not null aliasentered | character(1) | default 'N'::bpchar url | text | not null user_known | smallint | not null default 0 user_id | character varying(45) | not null url_encrypted | character(40) | default ''::bpchar title | character varying(500) | private | character(1) | private_key | character varying(6) | status | character(1) | default 'Y'::bpchar create_date | timestamp without time zone | default now() modify_date | timestamp without time zone | disable_in_statistics | character(1) | not null default 'N'::bpchar Indexes: "sites2_pkey" PRIMARY KEY, btree (id) "sites2_alias_key" UNIQUE, btree (alias) WITH (fillfactor=75) "idx_unique_user_urlenc" btree (user_id, url_encrypted) WITH (fillfactor=70) "new_idx_sites_userid" btree (user_id) WITH (fillfactor=70) "new_idx_modify_date" btree (modify_date) WITH (fillfactor=75) "new_idx_userknown" btree (user_id) WITH (fillfactor=70) WHERE user_known = 1 Check constraints: "sites2_id_check" CHECK (id > 0) "sites2_url_check" CHECK (url <> ''::text) "sites2_user_id_check" CHECK (user_id::text <> ''::text) "sites_alias_check" CHECK (alias::text ~ '[-.~a-z0-9_]'::text) #==================
On Sun, Jul 19, 2009 at 9:45 PM, Phoenix Kiula<phoenix.kiula@gmail.com> wrote: > Hi. > > I have a query that should be very fast because it's using all > indexes, but it takes a lot of time. > > > explain analyze select * from sites where user_id = 'phoenix' order by > id desc limit 10; > > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=344.85..344.87 rows=10 width=262) (actual > time=5879.069..5879.167 rows=10 loops=1) > -> Sort (cost=344.85..345.66 rows=323 width=262) (actual > time=5879.060..5879.093 rows=10 loops=1) > Sort Key: id > -> Index Scan using new_idx_sites_userid on sites > (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557 > rows=2178 loops=1) > Index Cond: ((user_id)::text = 'phoenix'::text) > Total runtime: 5879.414 ms > (6 rows) > > Time: 5885.928 ms > > > This query should really not be taking 6 seconds! > > So my theories: > > 1. Somehow the sorting is taking a lot of time. Even though it's on > the primary key, to find "id desc" the query has to see all the rows. Look again. The time is being taken up by the index scan. The index scan is having to go through 2k+ entries before it finds the 10 you want. Could be index bloat. Hard to say. Definitely faster once it's all cached, but the first time it's hitting 2k index entries followed by 2k table entries. Given that it takes 6 seconds to run, that's 333 disk entries scanned per second, not bad really. > 2. Or the vacuums that I am doing are not very efficient? (Autovacuum > is on, and there's nothing in the pg log!) Try reindexing to see if that helps
On Monday 20 July 2009 06:45:40 Phoenix Kiula wrote: > explain analyze select * from sites where user_id = 'phoenix' order by > id desc limit 10; > > QUERY PLAN > --------------------------------------------------------------------------- >----------------------------------------------------------- Limit > (cost=344.85..344.87 rows=10 width=262) (actual > time=5879.069..5879.167 rows=10 loops=1) > -> Sort (cost=344.85..345.66 rows=323 width=262) (actual > time=5879.060..5879.093 rows=10 loops=1) > Sort Key: id > -> Index Scan using new_idx_sites_userid on sites > (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557 > rows=2178 loops=1) > Index Cond: ((user_id)::text = 'phoenix'::text) > Total runtime: 5879.414 ms > (6 rows) The row estimate for the index scan is off. Try reanalyzing, or increase the statistics target.
On Mon, Jul 20, 2009 at 2:04 PM, Peter Eisentraut<peter_e@gmx.net> wrote: > On Monday 20 July 2009 06:45:40 Phoenix Kiula wrote: >> explain analyze select * from sites where user_id = 'phoenix' order by >> id desc limit 10; >> >> QUERY PLAN >> --------------------------------------------------------------------------- >>----------------------------------------------------------- Limit >> (cost=344.85..344.87 rows=10 width=262) (actual >> time=5879.069..5879.167 rows=10 loops=1) >> -> Sort (cost=344.85..345.66 rows=323 width=262) (actual >> time=5879.060..5879.093 rows=10 loops=1) >> Sort Key: id >> -> Index Scan using new_idx_sites_userid on sites >> (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557 >> rows=2178 loops=1) >> Index Cond: ((user_id)::text = 'phoenix'::text) >> Total runtime: 5879.414 ms >> (6 rows) > > The row estimate for the index scan is off. Try reanalyzing, or increase the > statistics target. How did you tell that the row estimate is off? I mean which numbers? Also, my statistics are already set pretty high. On "USER_ID" they are at 100. Which columns should I increase the stats on, those in the WHERE clause or those in the SELECT bit? The USER_ID and the USER_KNOWN, which are in the WHERE clause, both have statistics of 100! And USER_KNOWN is just a binary value (0 or 1) so I wonder what purpose stats would serve?
On Tuesday 21 July 2009 04:36:41 Phoenix Kiula wrote: > On Mon, Jul 20, 2009 at 2:04 PM, Peter Eisentraut<peter_e@gmx.net> wrote: > > On Monday 20 July 2009 06:45:40 Phoenix Kiula wrote: > >> explain analyze select * from sites where user_id = 'phoenix' order by > >> id desc limit 10; > >> > >> QUERY PLAN > >> ------------------------------------------------------------------------ > >>--- ----------------------------------------------------------- Limit > >> (cost=344.85..344.87 rows=10 width=262) (actual > >> time=5879.069..5879.167 rows=10 loops=1) > >> -> Sort (cost=344.85..345.66 rows=323 width=262) (actual > >> time=5879.060..5879.093 rows=10 loops=1) > >> Sort Key: id > >> -> Index Scan using new_idx_sites_userid on sites > >> (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557 > >> rows=2178 loops=1) > >> Index Cond: ((user_id)::text = 'phoenix'::text) > >> Total runtime: 5879.414 ms > >> (6 rows) > > > > The row estimate for the index scan is off. Try reanalyzing, or increase > > the statistics target. > > How did you tell that the row estimate is off? I mean which numbers? Compare the rows=N figures for the estimate and the actual time. > Also, my statistics are already set pretty high. On "USER_ID" they are > at 100. Which columns should I increase the stats on, those in the > WHERE clause or those in the SELECT bit? The USER_ID and the > USER_KNOWN, which are in the WHERE clause, both have statistics of > 100! And USER_KNOWN is just a binary value (0 or 1) so I wonder what > purpose stats would serve? Well then that should be OK.
On Tue, Jul 21, 2009 at 6:12 PM, Peter Eisentraut<peter_e@gmx.net> wrote: > On Tuesday 21 July 2009 04:36:41 Phoenix Kiula wrote: >> On Mon, Jul 20, 2009 at 2:04 PM, Peter Eisentraut<peter_e@gmx.net> wrote: >> > On Monday 20 July 2009 06:45:40 Phoenix Kiula wrote: >> >> explain analyze select * from sites where user_id = 'phoenix' order by >> >> id desc limit 10; >> >> >> >> QUERY PLAN >> >> ------------------------------------------------------------------------ >> >>--- ----------------------------------------------------------- Limit >> >> (cost=344.85..344.87 rows=10 width=262) (actual >> >> time=5879.069..5879.167 rows=10 loops=1) >> >> -> Sort (cost=344.85..345.66 rows=323 width=262) (actual >> >> time=5879.060..5879.093 rows=10 loops=1) >> >> Sort Key: id >> >> -> Index Scan using new_idx_sites_userid on sites >> >> (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557 >> >> rows=2178 loops=1) >> >> Index Cond: ((user_id)::text = 'phoenix'::text) >> >> Total runtime: 5879.414 ms >> >> (6 rows) >> > >> > The row estimate for the index scan is off. Try reanalyzing, or increase >> > the statistics target. >> >> How did you tell that the row estimate is off? I mean which numbers? > > Compare the rows=N figures for the estimate and the actual time. > >> Also, my statistics are already set pretty high. On "USER_ID" they are >> at 100. Which columns should I increase the stats on, those in the >> WHERE clause or those in the SELECT bit? The USER_ID and the >> USER_KNOWN, which are in the WHERE clause, both have statistics of >> 100! And USER_KNOWN is just a binary value (0 or 1) so I wonder what >> purpose stats would serve? > > Well then that should be OK. > So what can I do to speed up? I now have about 8GB of memory (upgraded yesterday). Would it help to increase the following: shared_buffers = 512MB effective_cache_size = 3GB Both of these are conservative I think? My data size is about 30 GB right now. Vacuum is all autovacuum as you see from settings I shared. Any other ideas?
In response to Phoenix Kiula <phoenix.kiula@gmail.com>: > On Tue, Jul 21, 2009 at 6:12 PM, Peter Eisentraut<peter_e@gmx.net> wrote: > > On Tuesday 21 July 2009 04:36:41 Phoenix Kiula wrote: > >> On Mon, Jul 20, 2009 at 2:04 PM, Peter Eisentraut<peter_e@gmx.net> wrote: > >> > On Monday 20 July 2009 06:45:40 Phoenix Kiula wrote: > >> >> explain analyze select * from sites where user_id = 'phoenix' order by > >> >> id desc limit 10; > >> >> > >> >> QUERY PLAN > >> >> ------------------------------------------------------------------------ > >> >>--- ----------------------------------------------------------- Limit > >> >> (cost=344.85..344.87 rows=10 width=262) (actual > >> >> time=5879.069..5879.167 rows=10 loops=1) > >> >> -> Sort (cost=344.85..345.66 rows=323 width=262) (actual > >> >> time=5879.060..5879.093 rows=10 loops=1) > >> >> Sort Key: id > >> >> -> Index Scan using new_idx_sites_userid on sites > >> >> (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557 > >> >> rows=2178 loops=1) > >> >> Index Cond: ((user_id)::text = 'phoenix'::text) > >> >> Total runtime: 5879.414 ms > >> >> (6 rows) > >> > > >> > The row estimate for the index scan is off. Try reanalyzing, or increase > >> > the statistics target. > >> > >> How did you tell that the row estimate is off? I mean which numbers? > > > > Compare the rows=N figures for the estimate and the actual time. > > > >> Also, my statistics are already set pretty high. On "USER_ID" they are > >> at 100. Which columns should I increase the stats on, those in the > >> WHERE clause or those in the SELECT bit? The USER_ID and the > >> USER_KNOWN, which are in the WHERE clause, both have statistics of > >> 100! And USER_KNOWN is just a binary value (0 or 1) so I wonder what > >> purpose stats would serve? > > > > Well then that should be OK. > > So what can I do to speed up? I now have about 8GB of memory (upgraded > yesterday). Would it help to increase the following: > > shared_buffers = 512MB > effective_cache_size = 3GB > > Both of these are conservative I think? My data size is about 30 GB > right now. Vacuum is all autovacuum as you see from settings I shared. With that much RAM, you should see improvement bumping shared_buffers up to 2G or so (assuming that you're using a POSIX os). > Any other ideas? Does a manual analyze cause the query to run faster for a while? If so, your autovacuum settings may not be analyzing aggressively enough. I came late to the thread, so I don't know what your autovac settings were. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/