Thread: First query very slow. Solutions: memory, or settings, or SQL?

First query very slow. Solutions: memory, or settings, or SQL?

From
Phoenix Kiula
Date:
Hi.

I have a query that should be very fast because it's using all
indexes, but it takes a lot of time.


explain analyze select * from sites where user_id = 'phoenix' order by
id desc limit 10;

                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=344.85..344.87 rows=10 width=262) (actual
time=5879.069..5879.167 rows=10 loops=1)
   ->  Sort  (cost=344.85..345.66 rows=323 width=262) (actual
time=5879.060..5879.093 rows=10 loops=1)
         Sort Key: id
         ->  Index Scan using new_idx_sites_userid on sites
(cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557
rows=2178 loops=1)
               Index Cond: ((user_id)::text = 'phoenix'::text)
 Total runtime: 5879.414 ms
(6 rows)

Time: 5885.928 ms


This query should really not be taking 6 seconds!

So my theories:

1. Somehow the sorting is taking a lot of time. Even though it's on
the primary key, to find "id desc" the query has to see all the rows.

2. Or the vacuums that I am doing are not very efficient? (Autovacuum
is on, and there's nothing in the pg log!)

3. Or the shared buffer or other settings that I have are not up to the mark?

I am on CentOS, running PG 8.2.9, with 4GB of RAM. The hard disk is
SATA II. All other queries are pretty fast, and this query repeated
with different offsets is also very fast, perhaps because the results
are cached?

My PG conf settings are:


#==================
max_connections              = 300
shared_buffers               = 330MB    # Not much more than
20k...http://www.revsys.com/writigs/postgresql-performance.html
effective_cache_size         = 512000   #
http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm
max_fsm_relations            = 100
max_fsm_pages                = 300000

work_mem                     = 20MB
temp_buffers                 = 4096
authentication_timeout       = 10s
ssl                          = off
checkpoint_warning           = 3600
random_page_cost             = 1

autovacuum                   = on
autovacuum_vacuum_cost_delay = 20

vacuum_cost_delay            = 20
vacuum_cost_limit            = 600    # http://sn.im/b86nd - Vacuums
taking forever

autovacuum_naptime           = 10
stats_start_collector        = on
stats_row_level              = on
autovacuum_vacuum_threshold  = 75
autovacuum_analyze_threshold = 25
#==================



And my table is as follows:




#==================
                                Table "public.sites"
        Column         |            Type             |
Modifiers
-----------------------+-----------------------------+------------------------------
 id                    | bigint                      | not null
 link_id               | character varying(10)       | not null
 alias                 | character varying(35)       | not null
 aliasentered          | character(1)                | default 'N'::bpchar
 url                   | text                        | not null
 user_known            | smallint                    | not null default 0
 user_id               | character varying(45)       | not null
 url_encrypted         | character(40)               | default ''::bpchar
 title                 | character varying(500)      |
 private               | character(1)                |
 private_key           | character varying(6)        |
 status                | character(1)                | default 'Y'::bpchar
 create_date           | timestamp without time zone | default now()
 modify_date           | timestamp without time zone |
 disable_in_statistics | character(1)                | not null
default 'N'::bpchar

Indexes:
    "sites2_pkey" PRIMARY KEY, btree (id)
    "sites2_alias_key" UNIQUE, btree (alias) WITH (fillfactor=75)
    "idx_unique_user_urlenc" btree (user_id, url_encrypted) WITH (fillfactor=70)
    "new_idx_sites_userid" btree (user_id) WITH (fillfactor=70)
    "new_idx_modify_date" btree (modify_date) WITH (fillfactor=75)
    "new_idx_userknown" btree (user_id) WITH (fillfactor=70) WHERE
user_known = 1
Check constraints:
    "sites2_id_check" CHECK (id > 0)
    "sites2_url_check" CHECK (url <> ''::text)
    "sites2_user_id_check" CHECK (user_id::text <> ''::text)
    "sites_alias_check" CHECK (alias::text ~ '[-.~a-z0-9_]'::text)
#==================

Re: First query very slow. Solutions: memory, or settings, or SQL?

From
Scott Marlowe
Date:
On Sun, Jul 19, 2009 at 9:45 PM, Phoenix Kiula<phoenix.kiula@gmail.com> wrote:
> Hi.
>
> I have a query that should be very fast because it's using all
> indexes, but it takes a lot of time.
>
>
> explain analyze select * from sites where user_id = 'phoenix' order by
> id desc limit 10;
>
>                                                        QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=344.85..344.87 rows=10 width=262) (actual
> time=5879.069..5879.167 rows=10 loops=1)
>   ->  Sort  (cost=344.85..345.66 rows=323 width=262) (actual
> time=5879.060..5879.093 rows=10 loops=1)
>         Sort Key: id
>         ->  Index Scan using new_idx_sites_userid on sites
> (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557
> rows=2178 loops=1)
>               Index Cond: ((user_id)::text = 'phoenix'::text)
>  Total runtime: 5879.414 ms
> (6 rows)
>
> Time: 5885.928 ms
>
>
> This query should really not be taking 6 seconds!
>
> So my theories:
>
> 1. Somehow the sorting is taking a lot of time. Even though it's on
> the primary key, to find "id desc" the query has to see all the rows.

Look again.  The time is being taken up by the index scan.  The index
scan is having to go through 2k+ entries before it finds the 10 you
want.  Could be index bloat. Hard to say.  Definitely faster once it's
all cached, but the first time it's hitting 2k index entries followed
by 2k table entries.  Given that it takes 6 seconds to run, that's 333
disk entries scanned per second, not bad really.

> 2. Or the vacuums that I am doing are not very efficient? (Autovacuum
> is on, and there's nothing in the pg log!)

Try reindexing to see if that helps

Re: First query very slow. Solutions: memory, or settings, or SQL?

From
Peter Eisentraut
Date:
On Monday 20 July 2009 06:45:40 Phoenix Kiula wrote:
> explain analyze select * from sites where user_id = 'phoenix' order by
> id desc limit 10;
>
>                                                         QUERY PLAN
> ---------------------------------------------------------------------------
>----------------------------------------------------------- Limit
> (cost=344.85..344.87 rows=10 width=262) (actual
> time=5879.069..5879.167 rows=10 loops=1)
>    ->  Sort  (cost=344.85..345.66 rows=323 width=262) (actual
> time=5879.060..5879.093 rows=10 loops=1)
>          Sort Key: id
>          ->  Index Scan using new_idx_sites_userid on sites
> (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557
> rows=2178 loops=1)
>                Index Cond: ((user_id)::text = 'phoenix'::text)
>  Total runtime: 5879.414 ms
> (6 rows)

The row estimate for the index scan is off.  Try reanalyzing, or increase the
statistics target.


Re: First query very slow. Solutions: memory, or settings, or SQL?

From
Phoenix Kiula
Date:
On Mon, Jul 20, 2009 at 2:04 PM, Peter Eisentraut<peter_e@gmx.net> wrote:
> On Monday 20 July 2009 06:45:40 Phoenix Kiula wrote:
>> explain analyze select * from sites where user_id = 'phoenix' order by
>> id desc limit 10;
>>
>>                                                         QUERY PLAN
>> ---------------------------------------------------------------------------
>>----------------------------------------------------------- Limit
>> (cost=344.85..344.87 rows=10 width=262) (actual
>> time=5879.069..5879.167 rows=10 loops=1)
>>    ->  Sort  (cost=344.85..345.66 rows=323 width=262) (actual
>> time=5879.060..5879.093 rows=10 loops=1)
>>          Sort Key: id
>>          ->  Index Scan using new_idx_sites_userid on sites
>> (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557
>> rows=2178 loops=1)
>>                Index Cond: ((user_id)::text = 'phoenix'::text)
>>  Total runtime: 5879.414 ms
>> (6 rows)
>
> The row estimate for the index scan is off.  Try reanalyzing, or increase the
> statistics target.



How did you tell that the row estimate is off? I mean which numbers?

Also, my statistics are already set pretty high. On "USER_ID" they are
at 100. Which columns should I increase the stats on, those in the
WHERE clause or those in the SELECT bit? The USER_ID and the
USER_KNOWN, which are in the WHERE clause, both have statistics of
100! And USER_KNOWN is just a binary value (0 or 1) so I wonder what
purpose stats would serve?

Re: First query very slow. Solutions: memory, or settings, or SQL?

From
Peter Eisentraut
Date:
On Tuesday 21 July 2009 04:36:41 Phoenix Kiula wrote:
> On Mon, Jul 20, 2009 at 2:04 PM, Peter Eisentraut<peter_e@gmx.net> wrote:
> > On Monday 20 July 2009 06:45:40 Phoenix Kiula wrote:
> >> explain analyze select * from sites where user_id = 'phoenix' order by
> >> id desc limit 10;
> >>
> >>                                                         QUERY PLAN
> >> ------------------------------------------------------------------------
> >>--- ----------------------------------------------------------- Limit
> >> (cost=344.85..344.87 rows=10 width=262) (actual
> >> time=5879.069..5879.167 rows=10 loops=1)
> >>    ->  Sort  (cost=344.85..345.66 rows=323 width=262) (actual
> >> time=5879.060..5879.093 rows=10 loops=1)
> >>          Sort Key: id
> >>          ->  Index Scan using new_idx_sites_userid on sites
> >> (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557
> >> rows=2178 loops=1)
> >>                Index Cond: ((user_id)::text = 'phoenix'::text)
> >>  Total runtime: 5879.414 ms
> >> (6 rows)
> >
> > The row estimate for the index scan is off.  Try reanalyzing, or increase
> > the statistics target.
>
> How did you tell that the row estimate is off? I mean which numbers?

Compare the rows=N figures for the estimate and the actual time.

> Also, my statistics are already set pretty high. On "USER_ID" they are
> at 100. Which columns should I increase the stats on, those in the
> WHERE clause or those in the SELECT bit? The USER_ID and the
> USER_KNOWN, which are in the WHERE clause, both have statistics of
> 100! And USER_KNOWN is just a binary value (0 or 1) so I wonder what
> purpose stats would serve?

Well then that should be OK.

Re: First query very slow. Solutions: memory, or settings, or SQL?

From
Phoenix Kiula
Date:
On Tue, Jul 21, 2009 at 6:12 PM, Peter Eisentraut<peter_e@gmx.net> wrote:
> On Tuesday 21 July 2009 04:36:41 Phoenix Kiula wrote:
>> On Mon, Jul 20, 2009 at 2:04 PM, Peter Eisentraut<peter_e@gmx.net> wrote:
>> > On Monday 20 July 2009 06:45:40 Phoenix Kiula wrote:
>> >> explain analyze select * from sites where user_id = 'phoenix' order by
>> >> id desc limit 10;
>> >>
>> >>                                                         QUERY PLAN
>> >> ------------------------------------------------------------------------
>> >>--- ----------------------------------------------------------- Limit
>> >> (cost=344.85..344.87 rows=10 width=262) (actual
>> >> time=5879.069..5879.167 rows=10 loops=1)
>> >>    ->  Sort  (cost=344.85..345.66 rows=323 width=262) (actual
>> >> time=5879.060..5879.093 rows=10 loops=1)
>> >>          Sort Key: id
>> >>          ->  Index Scan using new_idx_sites_userid on sites
>> >> (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557
>> >> rows=2178 loops=1)
>> >>                Index Cond: ((user_id)::text = 'phoenix'::text)
>> >>  Total runtime: 5879.414 ms
>> >> (6 rows)
>> >
>> > The row estimate for the index scan is off.  Try reanalyzing, or increase
>> > the statistics target.
>>
>> How did you tell that the row estimate is off? I mean which numbers?
>
> Compare the rows=N figures for the estimate and the actual time.
>
>> Also, my statistics are already set pretty high. On "USER_ID" they are
>> at 100. Which columns should I increase the stats on, those in the
>> WHERE clause or those in the SELECT bit? The USER_ID and the
>> USER_KNOWN, which are in the WHERE clause, both have statistics of
>> 100! And USER_KNOWN is just a binary value (0 or 1) so I wonder what
>> purpose stats would serve?
>
> Well then that should be OK.
>



So what can I do to speed up? I now have about 8GB of memory (upgraded
yesterday). Would it help to increase the following:

shared_buffers               = 512MB
effective_cache_size         = 3GB

Both of these are conservative I think? My data size is about 30 GB
right now. Vacuum is all autovacuum as you see from settings I shared.

Any other ideas?

Re: First query very slow. Solutions: memory, or settings, or SQL?

From
Bill Moran
Date:
In response to Phoenix Kiula <phoenix.kiula@gmail.com>:

> On Tue, Jul 21, 2009 at 6:12 PM, Peter Eisentraut<peter_e@gmx.net> wrote:
> > On Tuesday 21 July 2009 04:36:41 Phoenix Kiula wrote:
> >> On Mon, Jul 20, 2009 at 2:04 PM, Peter Eisentraut<peter_e@gmx.net> wrote:
> >> > On Monday 20 July 2009 06:45:40 Phoenix Kiula wrote:
> >> >> explain analyze select * from sites where user_id = 'phoenix' order by
> >> >> id desc limit 10;
> >> >>
> >> >>                                                         QUERY PLAN
> >> >> ------------------------------------------------------------------------
> >> >>--- ----------------------------------------------------------- Limit
> >> >> (cost=344.85..344.87 rows=10 width=262) (actual
> >> >> time=5879.069..5879.167 rows=10 loops=1)
> >> >>    ->  Sort  (cost=344.85..345.66 rows=323 width=262) (actual
> >> >> time=5879.060..5879.093 rows=10 loops=1)
> >> >>          Sort Key: id
> >> >>          ->  Index Scan using new_idx_sites_userid on sites
> >> >> (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557
> >> >> rows=2178 loops=1)
> >> >>                Index Cond: ((user_id)::text = 'phoenix'::text)
> >> >>  Total runtime: 5879.414 ms
> >> >> (6 rows)
> >> >
> >> > The row estimate for the index scan is off.  Try reanalyzing, or increase
> >> > the statistics target.
> >>
> >> How did you tell that the row estimate is off? I mean which numbers?
> >
> > Compare the rows=N figures for the estimate and the actual time.
> >
> >> Also, my statistics are already set pretty high. On "USER_ID" they are
> >> at 100. Which columns should I increase the stats on, those in the
> >> WHERE clause or those in the SELECT bit? The USER_ID and the
> >> USER_KNOWN, which are in the WHERE clause, both have statistics of
> >> 100! And USER_KNOWN is just a binary value (0 or 1) so I wonder what
> >> purpose stats would serve?
> >
> > Well then that should be OK.
>
> So what can I do to speed up? I now have about 8GB of memory (upgraded
> yesterday). Would it help to increase the following:
>
> shared_buffers               = 512MB
> effective_cache_size         = 3GB
>
> Both of these are conservative I think? My data size is about 30 GB
> right now. Vacuum is all autovacuum as you see from settings I shared.

With that much RAM, you should see improvement bumping shared_buffers
up to 2G or so (assuming that you're using a POSIX os).

> Any other ideas?

Does a manual analyze cause the query to run faster for a while?  If so,
your autovacuum settings may not be analyzing aggressively enough.  I came
late to the thread, so I don't know what your autovac settings were.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/