Thread: select query takes 13 seconds to run with index

select query takes 13 seconds to run with index

From
mark
Date:
Hi, is there anyway this can be made faster?  id is the primary key,
and there is an index on uid..

thanks


EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id
DESC limit 6;
                                                                     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..9329.02 rows=6 width=135) (actual
time=13612.247..13612.247 rows=0 loops=1)
   ->  Index Scan Backward using pokes_pkey on pokes
(cost=0.00..5182270.69 rows=3333 width=135) (actual
time=13612.245..13612.245 rows=0 loops=1)
         Filter: (uid = 578439028)
 Total runtime: 13612.369 ms
(4 rows)

Re: select query takes 13 seconds to run with index

From
Justin
Date:

mark wrote:
> Hi, is there anyway this can be made faster?  id is the primary key,
> and there is an index on uid..
>
> thanks
>
>
> EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id
> DESC limit 6;
>                                                                      QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..9329.02 rows=6 width=135) (actual
> time=13612.247..13612.247 rows=0 loops=1)
>    ->  Index Scan Backward using pokes_pkey on pokes
> (cost=0.00..5182270.69 rows=3333 width=135) (actual
> time=13612.245..13612.245 rows=0 loops=1)
>          Filter: (uid = 578439028)
>  Total runtime: 13612.369 ms
> (4 rows)
>
>
First this should be posted on performance list.

how many records are in this table?  The uid 578,439,028 assuming this
is auto incremented key  that started 1 this means there is 578 million
records in the table.

The estimate is way off, when was the last time Vaccum was on the table?

What verison of Postgresql are you running
Size of the Table
Table layout
Load on the database




Re: select query takes 13 seconds to run with index

From
mark
Date:
On Mon, May 26, 2008 at 4:26 PM, Justin <justin@emproshunts.com> wrote:
> mark wrote:
>> Hi, is there anyway this can be made faster?  id is the primary key,
>> and there is an index on uid..
>> thanks
>> EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id
>> DESC limit 6;
>>                                                                     QUERY
>> PLAN
>>
>>
----------------------------------------------------------------------------------------------------------------------------------------------------
>>  Limit  (cost=0.00..9329.02 rows=6 width=135) (actual
>> time=13612.247..13612.247 rows=0 loops=1)
>>   ->  Index Scan Backward using pokes_pkey on pokes
>> (cost=0.00..5182270.69 rows=3333 width=135) (actual
>> time=13612.245..13612.245 rows=0 loops=1)
>>         Filter: (uid = 578439028)
>>  Total runtime: 13612.369 ms
>> (4 rows)
> First this should be posted on performance list.
sorry about this.

> how many records are in this table?
22334262, 22 million records.

> The estimate is way off, when was the last time Vaccum was on the table?
about a week ago i ran this VACUUM VERBOSE ANALYZE;
this table is never updated or deleted, rows are just inserted...


> What verison of Postgresql are you running
8.3.1

> Size of the Table
22 million rows approximately

> Table layout
CREATE TABLE pokes
(
  id serial NOT NULL,
  uid integer,
  action_id integer,
  created timestamp without time zone DEFAULT now(),
  friend_id integer,
  message text,
  pic text,
  "name" text,
  CONSTRAINT pokes_pkey PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
ALTER TABLE pokes OWNER TO postgres;

-- Index: idx_action_idx

-- DROP INDEX idx_action_idx;

CREATE INDEX idx_action_idx
  ON pokes
  USING btree
  (action_id);

-- Index: idx_friend_id

-- DROP INDEX idx_friend_id;

CREATE INDEX idx_friend_id
  ON pokes
  USING btree
  (friend_id);

-- Index: idx_pokes_uid

-- DROP INDEX idx_pokes_uid;

CREATE INDEX idx_pokes_uid
  ON pokes
  USING btree
  (uid);


> Load on the database
how do i measure load on database?

Re: select query takes 13 seconds to run with index

From
Justin
Date:


mark wrote:
On Mon, May 26, 2008 at 4:26 PM, Justin <justin@emproshunts.com> wrote: 
mark wrote:   
Hi, is there anyway this can be made faster?  id is the primary key,
and there is an index on uid..
thanks
EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id
DESC limit 6;                                                                   QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------Limit  (cost=0.00..9329.02 rows=6 width=135) (actual
time=13612.247..13612.247 rows=0 loops=1) ->  Index Scan Backward using pokes_pkey on pokes
(cost=0.00..5182270.69 rows=3333 width=135) (actual
time=13612.245..13612.245 rows=0 loops=1)       Filter: (uid = 578439028)Total runtime: 13612.369 ms
(4 rows)     
First this should be posted on performance list.   
sorry about this.
 
how many records are in this table?   
22334262, 22 million records.
 
The estimate is way off, when was the last time Vaccum was on the table?   
about a week ago i ran this VACUUM VERBOSE ANALYZE;
this table is never updated or deleted, rows are just inserted...

 
What verison of Postgresql are you running   
8.3.1
 
Size of the Table   
22 million rows approximately 
I have no experience  on large datasets so people with more experience in this area are going to have to chime in.
My gut feel is 13 seconds for Postgresql to sort through an index of that size and table is not bad. 

you may need to take a look at hardware and postgresql.config settings to improve the performance for this query

This query is very simple where changing it around or adding index results massive improvements is not going to help in this case.
 
Table layout   
CREATE TABLE pokes
( id serial NOT NULL, uid integer, action_id integer, created timestamp without time zone DEFAULT now(), friend_id integer, message text, pic text, "name" text, CONSTRAINT pokes_pkey PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
ALTER TABLE pokes OWNER TO postgres;

-- Index: idx_action_idx

-- DROP INDEX idx_action_idx;

CREATE INDEX idx_action_idx ON pokes USING btree (action_id);

-- Index: idx_friend_id

-- DROP INDEX idx_friend_id;

CREATE INDEX idx_friend_id ON pokes USING btree (friend_id);

-- Index: idx_pokes_uid

-- DROP INDEX idx_pokes_uid;

CREATE INDEX idx_pokes_uid ON pokes USING btree (uid);

 
Load on the database   
how do i measure load on database? 

How many users are attached to the server at any given time.  how many inserts, deletes selects are being done on the server.  Its number  TPS  on the server.

Re: select query takes 13 seconds to run with index

From
mark
Date:
On Mon, May 26, 2008 at 4:49 PM, Justin <justin@emproshunts.com> wrote:
> mark wrote:
>
> On Mon, May 26, 2008 at 4:26 PM, Justin <justin@emproshunts.com> wrote:
> mark wrote:
> Hi, is there anyway this can be made faster?  id is the primary key,
> and there is an index on uid..
> thanks
> EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id
> DESC limit 6;
>                                                                     QUERY
> PLAN
>
>
----------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..9329.02 rows=6 width=135) (actual
> time=13612.247..13612.247 rows=0 loops=1)
>   ->  Index Scan Backward using pokes_pkey on pokes
> (cost=0.00..5182270.69 rows=3333 width=135) (actual
> time=13612.245..13612.245 rows=0 loops=1)
>         Filter: (uid = 578439028)
>  Total runtime: 13612.369 ms
> (4 rows)
>
>
> First this should be posted on performance list.
>
>
> sorry about this.
>
>
>
> how many records are in this table?
>
>
> 22334262, 22 million records.
>
>
>
> The estimate is way off, when was the last time Vaccum was on the table?
>
>
> about a week ago i ran this VACUUM VERBOSE ANALYZE;
> this table is never updated or deleted, rows are just inserted...
>
>
>
>
> What verison of Postgresql are you running
>
>
> 8.3.1
>
>
>
> Size of the Table
>
>
> 22 million rows approximately
>
>
> I have no experience  on large datasets so people with more experience in
> this area are going to have to chime in.
> My gut feel is 13 seconds for Postgresql to sort through an index of that
> size and table is not bad.
>
> you may need to take a look at hardware and postgresql.config settings to
> improve the performance for this query
>
> This query is very simple where changing it around or adding index results
> massive improvements is not going to help in this case.
the hardware is e5405 dual quad core on a 16GB RAM machine, with 8.3.1
default settings except maximum connections increased...

Re: select query takes 13 seconds to run with index

From
Justin
Date:
mark wrote:



Size of the Table


22 million rows approximately


I have no experience  on large datasets so people with more experience in
this area are going to have to chime in.
My gut feel is 13 seconds for Postgresql to sort through an index of that
size and table is not bad.

you may need to take a look at hardware and postgresql.config settings to
improve the performance for this query

This query is very simple where changing it around or adding index results
massive improvements is not going to help in this case.   
the hardware is e5405 dual quad core on a 16GB RAM machine, with 8.3.1
default settings except maximum connections increased... 
That could be problem, Postgresql default settings are very conservative.

You need to read http://www.postgresqldocs.org/wiki/Performance_Optimization
and tune posgtresql.config settings. 

What OS are you running?
What is Disk Subsystem setup??? 

Re: select query takes 13 seconds to run with index

From
"Gregory Williamson"
Date:

Justin wrote:
>
>
> mark wrote:
>> On Mon, May 26, 2008 at 4:26 PM, Justin <justin@emproshunts.com> wrote:
>>  
>>> mark wrote:
>>>    
>>>> Hi, is there anyway this can be made faster?  id is the primary key,
>>>> and there is an index on uid..
>>>> thanks
>>>> EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id
>>>> DESC limit 6;
>>>>                                                                     QUERY
>>>> PLAN
>>>>
>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------
>>>>  Limit  (cost=0.00..9329.02 rows=6 width=135) (actual
>>>> time=13612.247..13612.247 rows=0 loops=1)
>>>>   ->  Index Scan Backward using pokes_pkey on pokes
>>>> (cost=0.00..5182270.69 rows=3333 width=135) (actual
>>>> time=13612.245..13612.245 rows=0 loops=1)
>>>>         Filter: (uid = 578439028)
>>>>  Total runtime: 13612.369 ms
>>>> (4 rows)
>>>>      
>>> First this should be posted on performance list.
>>>    
>> sorry about this.
>>
>>  
>>> how many records are in this table?
>>>    
>> 22334262, 22 million records.
>>
>>  
>>> The estimate is way off, when was the last time Vaccum was on the table?
>>>    
>> about a week ago i ran this VACUUM VERBOSE ANALYZE;
>> this table is never updated or deleted, rows are just inserted...
>>
>>
>>  
>>> What verison of Postgresql are you running
>>>    
>> 8.3.1
>>
>>  
>>> Size of the Table
>>>    
>> 22 million rows approximately
>>  
> I have no experience  on large datasets so people with more experience in this area are going to have to chime in.
> My gut feel is 13 seconds for Postgresql to sort through an index of that size and table is not bad.
>
> you may need to take a look at hardware and postgresql.config settings to improve the performance for this query
>
> This query is very simple where changing it around or adding index results massive improvements is not going to help in this case.
>>  
>>> Table layout
>>>    
>> CREATE TABLE pokes
>> (
>>   id serial NOT NULL,
>>   uid integer,
>>   action_id integer,
>>   created timestamp without time zone DEFAULT now(),
>>   friend_id integer,
>>   message text,
>>   pic text,
>>   "name" text,
>>   CONSTRAINT pokes_pkey PRIMARY KEY (id)
>> )
>> WITH (OIDS=FALSE);
>> ALTER TABLE pokes OWNER TO postgres;
>>
>> -- Index: idx_action_idx
>>
>> -- DROP INDEX idx_action_idx;
>>
>> CREATE INDEX idx_action_idx
>>   ON pokes
>>   USING btree
>>   (action_id);
>>
>> -- Index: idx_friend_id
>>
>> -- DROP INDEX idx_friend_id;
>>
>> CREATE INDEX idx_friend_id
>>   ON pokes
>>   USING btree
>>   (friend_id);
>>
>> -- Index: idx_pokes_uid
>>
>> -- DROP INDEX idx_pokes_uid;
>>
>> CREATE INDEX idx_pokes_uid
>>   ON pokes
>>   USING btree
>>   (uid);
>>
>>
>>  
>>> Load on the database
>>>    
>> how do i measure load on database?
>>  
>
> How many users are attached to the server at any given time.  how many inserts, deletes selects are being done on the server.  Its number  TPS  on the server.

Justin wrote:
>
>
> mark wrote:
>> On Mon, May 26, 2008 at 4:26 PM, Justin <justin@emproshunts.com> wrote:
>>  
>>> mark wrote:
>>>    
>>>> Hi, is there anyway this can be made faster?  id is the primary key,
>>>> and there is an index on uid..
>>>> thanks
>>>> EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id
>>>> DESC limit 6;
>>>>                                                                     QUERY
>>>> PLAN
>>>>
>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------
>>>>  Limit  (cost=0.00..9329.02 rows=6 width=135) (actual
>>>> time=13612.247..13612.247 rows=0 loops=1)
>>>>   ->  Index Scan Backward using pokes_pkey on pokes
>>>> (cost=0.00..5182270.69 rows=3333 width=135) (actual
>>>> time=13612.245..13612.245 rows=0 loops=1)
>>>>         Filter: (uid = 578439028)
>>>>  Total runtime: 13612.369 ms
>>>> (4 rows)
>>>>      
>>> First this should be posted on performance list.
>>>    
>> sorry about this.
>>
>>  
>>> how many records are in this table?
>>>    
>> 22334262, 22 million records.
>>
>>  
>>> The estimate is way off, when was the last time Vaccum was on the table?
>>>    
>> about a week ago i ran this VACUUM VERBOSE ANALYZE;
>> this table is never updated or deleted, rows are just inserted...
>>
>>
>>  
>>> What verison of Postgresql are you running
>>>    
>> 8.3.1
>>
>>  
>>> Size of the Table
>>>    
>> 22 million rows approximately
>>  
> I have no experience  on large datasets so people with more experience in this area are going to have to chime in.
> My gut feel is 13 seconds for Postgresql to sort through an index of that size and table is not bad.
>
> you may need to take a look at hardware and postgresql.config settings to improve the performance for this query
>
> This query is very simple where changing it around or adding index results massive
> improvements is not going to help in this case.

I just ran a test on not particularly impressive hardware (8.2.6) on a table with 58980741 rows:
billing=# explain analyze select * from stats_asset_use where date = '2006-03-12' order by tracking_id desc limit 6;
                                                                      QUERY PLAN                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..5.45 rows=6 width=38) (actual time=0.028..0.037 rows=6 loops=1)
   ->  Index Scan Backward using stats_day_ndx on stats_asset_use  (cost=0.00..61279.91 rows=67437 width=38) (actual time=0.026..0.032 rows=6 loops=1)
         Index Cond: (date = '2006-03-12'::date)
 Total runtime: 5.957 ms
(4 rows)

There is an index on date (only). A typical day might have anywhere from a few thousand entries to a few hundred thousand with the average in the low thousands. Inserts only, no deletes or updates.

This table gets analyzed daily (overkill) so the stats are up to date; I wonder if that's a problem in your case ?

>>  
>>> Table layout
>>>    
>> CREATE TABLE pokes
>> (
>>   id serial NOT NULL,
>>   uid integer,
>>   action_id integer,
>>   created timestamp without time zone DEFAULT now(),
>>   friend_id integer,
>>   message text,
>>   pic text,
>>   "name" text,
>>   CONSTRAINT pokes_pkey PRIMARY KEY (id)
>> )
>> WITH (OIDS=FALSE);
>> ALTER TABLE pokes OWNER TO postgres;
>>
>> -- Index: idx_action_idx
>>
>> -- DROP INDEX idx_action_idx;
>>
>> CREATE INDEX idx_action_idx
>>   ON pokes
>>   USING btree
>>   (action_id);
>>
>> -- Index: idx_friend_id
>>
>> -- DROP INDEX idx_friend_id;
>>
>> CREATE INDEX idx_friend_id
>>   ON pokes
>>   USING btree
>>   (friend_id);
>>
>> -- Index: idx_pokes_uid
>>
>> -- DROP INDEX idx_pokes_uid;
>>
>> CREATE INDEX idx_pokes_uid
>>   ON pokes
>>   USING btree
>>   (uid);
>>
>>
>>  
>>> Load on the database
>>>    
>> how do i measure load on database?
>>  
>
> How many users are attached to the server at any given time.  how many inserts, deletes
> selects are being done on the server.  Its number  TPS  on the server.

On Windoze I don't know; on *NIX variants the utility "top" can show useful information on load and active processes; iostat or vmstat can give detailed looks over time (use period of between 1 and 5 seconds maybe and discard the first row as nonsense); they show disk i/o and context switching, etc.

HTH,

Greg Williamson
Senior DBA
DigitalGlobe

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

(My corporate masters made me say this.)

Re: select query takes 13 seconds to run with index

From
"David Wilson"
Date:
On Mon, May 26, 2008 at 7:26 PM, Justin <justin@emproshunts.com> wrote:
> The estimate is way off, when was the last time Vaccum was on the table?

I'm going to second this- run "ANALYZE pokes;" and then test the query
again; I'll bet you'll get much better results.

It's not the VACUUM that matters so much as the ANALYZE, and it
definitely needs to be done on occasion if you're adding a lot of
records. Do you have the autovacuum daemon running? (And if not, why
not?)


--
- David T. Wilson
david.t.wilson@gmail.com

Re: select query takes 13 seconds to run with index

From
Justin
Date:
 

>
> How many users are attached to the server at any given time.  how many inserts, deletes
> selects are being done on the server.  Its number  TPS  on the server.

On Windoze I don't know; on *NIX variants the utility "top" can show useful information on load and active processes; iostat or vmstat can give detailed looks over time (use period of between 1 and 5 seconds maybe and discard the first row as nonsense); they show disk i/o and context switching, etc.

HTH,

Greg Williamson
Senior DBA
DigitalGlobe


There are several tools to do this process explorer which has to be down loaded, and performance monitor.  The problem with performance monitor is posgresql keeps spawning new exe which makes reading the result real a pain.

Re: select query takes 13 seconds to run with index

From
Tom Lane
Date:
mark <markkicks@gmail.com> writes:
> EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id
> DESC limit 6;
>                                                                      QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..9329.02 rows=6 width=135) (actual
> time=13612.247..13612.247 rows=0 loops=1)
>    ->  Index Scan Backward using pokes_pkey on pokes
> (cost=0.00..5182270.69 rows=3333 width=135) (actual
> time=13612.245..13612.245 rows=0 loops=1)
>          Filter: (uid = 578439028)
>  Total runtime: 13612.369 ms
> (4 rows)

The problem is the vast disconnect between the estimated and actual
rowcounts for the indexscan (3333 vs 0).  The planner thinks there
are three thousand rows matching uid = 578439028, and that encourages
it to try a plan that's only going to be fast if at least six such
rows show up fairly soon while scanning the index in reverse id order.
What you really want it to do here is scan on the uid index and then
sort the result by id ... but that will be slow in exactly the case
where this plan is fast, ie, when there are a lot of matching uids.

Bottom line: the planner cannot make the right choice between these
alternatives unless it's got decent statistics about the frequency
of uid values.  "I analyzed the table about a week ago" is not good
enough maintenance policy --- you need current stats, and you might need
to bump up the statistics target to get enough data about less-common
values of uid.

(Since it's 8.3, the autovac daemon might have been analyzing for you,
if you didn't turn off autovacuum.  In that case increasing the
statistics target is the first thing to try.)

            regards, tom lane

Re: select query takes 13 seconds to run with index

From
mark
Date:
On Mon, May 26, 2008 at 5:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> mark <markkicks@gmail.com> writes:
>> EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id
>> DESC limit 6;
>>                                                                      QUERY PLAN
>>
----------------------------------------------------------------------------------------------------------------------------------------------------
>>  Limit  (cost=0.00..9329.02 rows=6 width=135) (actual
>> time=13612.247..13612.247 rows=0 loops=1)
>>    ->  Index Scan Backward using pokes_pkey on pokes
>> (cost=0.00..5182270.69 rows=3333 width=135) (actual
>> time=13612.245..13612.245 rows=0 loops=1)
>>          Filter: (uid = 578439028)
>>  Total runtime: 13612.369 ms
>> (4 rows)
>
> The problem is the vast disconnect between the estimated and actual
> rowcounts for the indexscan (3333 vs 0).  The planner thinks there
> are three thousand rows matching uid = 578439028, and that encourages
> it to try a plan that's only going to be fast if at least six such
> rows show up fairly soon while scanning the index in reverse id order.
> What you really want it to do here is scan on the uid index and then
> sort the result by id ... but that will be slow in exactly the case
> where this plan is fast, ie, when there are a lot of matching uids.
>
> Bottom line: the planner cannot make the right choice between these
> alternatives unless it's got decent statistics about the frequency
> of uid values.  "I analyzed the table about a week ago" is not good
> enough maintenance policy --- you need current stats, and you might need
> to bump up the statistics target to get enough data about less-common
> values of uid.
how do i do this? bump up the statistics target?

> (Since it's 8.3, the autovac daemon might have been analyzing for you,
> if you didn't turn off autovacuum.  In that case increasing the
> statistics target is the first thing to try.)
i did not turn it off..
and my OS is fedora 9

i ran vacuum verbose analyze pokes, and then ran the same query, and
there is no improvement..

EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id  limit 6;
                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..8446.80 rows=6 width=130) (actual
time=12262.779..12262.779 rows=0 loops=1)
   ->  Index Scan using pokes_pkey on pokes  (cost=0.00..5149730.49
rows=3658 width=130) (actual time=12262.777..12262.777 rows=0 loops=1)
         Filter: (uid = 578439028)
 Total runtime: 12262.817 ms

VACUUM VERBOSE ANALYZE pokes ;
INFO:  vacuuming "public.pokes"
INFO:  index "pokes_pkey" now contains 22341026 row versions in 61258 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.24s/0.06u sec elapsed 1.61 sec.
INFO:  index "idx_action_idx" now contains 22341026 row versions in 61548 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.38s/0.09u sec elapsed 7.21 sec.
INFO:  index "idx_friend_id" now contains 22341026 row versions in 60547 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.44s/0.11u sec elapsed 9.13 sec.
INFO:  index "idx_pokes_uid" now contains 22341026 row versions in 62499 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.41s/0.09u sec elapsed 7.44 sec.
INFO:  "pokes": found 0 removable, 22341026 nonremovable row versions
in 388144 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
1923 pages contain useful free space.
0 pages are entirely empty.
CPU 3.02s/2.38u sec elapsed 29.21 sec.
INFO:  vacuuming "pg_toast.pg_toast_43415"
INFO:  index "pg_toast_43415_index" now contains 12 row versions in 2 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_43415": found 0 removable, 12 nonremovable row
versions in 2 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
2 pages contain useful free space.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.pokes"
INFO:  "pokes": scanned 3000 of 388144 pages, containing 172933 live
rows and 0 dead rows; 3000 rows in sample, 22374302 estimated total

Re: select query takes 13 seconds to run with index

From
mark
Date:
On Mon, May 26, 2008 at 7:58 PM, mark <markkicks@gmail.com> wrote:
> On Mon, May 26, 2008 at 5:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> mark <markkicks@gmail.com> writes:
>>> EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id
>>> DESC limit 6;
>>>                                                                      QUERY PLAN
>>>
----------------------------------------------------------------------------------------------------------------------------------------------------
>>>  Limit  (cost=0.00..9329.02 rows=6 width=135) (actual
>>> time=13612.247..13612.247 rows=0 loops=1)
>>>    ->  Index Scan Backward using pokes_pkey on pokes
>>> (cost=0.00..5182270.69 rows=3333 width=135) (actual
>>> time=13612.245..13612.245 rows=0 loops=1)
>>>          Filter: (uid = 578439028)
>>>  Total runtime: 13612.369 ms
>>> (4 rows)
>>
>> The problem is the vast disconnect between the estimated and actual
>> rowcounts for the indexscan (3333 vs 0).  The planner thinks there
>> are three thousand rows matching uid = 578439028, and that encourages
>> it to try a plan that's only going to be fast if at least six such
>> rows show up fairly soon while scanning the index in reverse id order.
>> What you really want it to do here is scan on the uid index and then
>> sort the result by id ... but that will be slow in exactly the case
>> where this plan is fast, ie, when there are a lot of matching uids.
>>
>> Bottom line: the planner cannot make the right choice between these
>> alternatives unless it's got decent statistics about the frequency
>> of uid values.  "I analyzed the table about a week ago" is not good
>> enough maintenance policy --- you need current stats, and you might need
>> to bump up the statistics target to get enough data about less-common
>> values of uid.
> how do i do this? bump up the statistics target?
>
>> (Since it's 8.3, the autovac daemon might have been analyzing for you,
>> if you didn't turn off autovacuum.  In that case increasing the
>> statistics target is the first thing to try.)
> i did not turn it off..
> and my OS is fedora 9
>
> i ran vacuum verbose analyze pokes, and then ran the same query, and
> there is no improvement..
>
> EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id  limit 6;
>                                                                QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..8446.80 rows=6 width=130) (actual
> time=12262.779..12262.779 rows=0 loops=1)
>   ->  Index Scan using pokes_pkey on pokes  (cost=0.00..5149730.49
> rows=3658 width=130) (actual time=12262.777..12262.777 rows=0 loops=1)
>         Filter: (uid = 578439028)
>  Total runtime: 12262.817 ms

OK I did this

ALTER TABLE pokes ALTER uid set statistics 500;
ALTER TABLE

ANALYZE pokes;
ANALYZE

and then it became super fast!! thanks a lot!!!
my question:
-> is 500 too high? what all does this affect?
-> now increasing this number does it affect only when i am running
analyze commands, or will it slow down inserts and other operations?
EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id
desc limit 6;
                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=467.80..467.81 rows=6 width=134) (actual
time=0.016..0.016 rows=0 loops=1)
   ->  Sort  (cost=467.80..468.09 rows=117 width=134) (actual
time=0.016..0.016 rows=0 loops=1)
         Sort Key: id
         Sort Method:  quicksort  Memory: 25kB
         ->  Index Scan using idx_pokes_uid on pokes
(cost=0.00..465.70 rows=117 width=134) (actual time=0.011..0.011
rows=0 loops=1)
               Index Cond: (uid = 578439028)
 Total runtime: 0.037 ms

>> The estimate is way off, when was the last time Vaccum was on the table?
> about a week ago i ran this VACUUM VERBOSE ANALYZE;
> this table is never updated or deleted, rows are just inserted...

    You should analyze it more often, then... Postgres probably thinks the
table has the same data distribution as last week !
    Analyze just takes a couple seconds...

>> Load on the database
> how do i measure load on database?

    Just look at vmstat.

    Also if you very often do SELECT .. WHERE x = ... ORDER BY id DESC you'll
benefit from an index on (x,id) instead of just (x).



Re: [PERFORM] select query takes 13 seconds to run with index

From
hubert depesz lubaczewski
Date:
On Mon, May 26, 2008 at 04:32:50PM -0700, mark wrote:
> >> EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id
> >> DESC limit 6;
> > The estimate is way off, when was the last time Vaccum was on the table?
> about a week ago i ran this VACUUM VERBOSE ANALYZE;
> this table is never updated or deleted, rows are just inserted...

1. boost default_statistics_target
2. run analyze more often - daily job for example
3. create index q on pokes (uid, id); should help

depesz