Thread: LIMIT confuses the planner

LIMIT confuses the planner

From
Kouber Saparev
Date:
Hello,

I'm experiencing a strange issue. I have a table with around 11 million
records (11471762 to be exact), storing login attempts to a web site.
Thanks to the index I have created on username, looking into that table
by username is very fast:



db=# EXPLAIN ANALYZE
SELECT
   *
FROM
   login_attempt
WHERE
   username='kouber'
ORDER BY
   login_attempt_sid DESC;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=1415.15..1434.93 rows=7914 width=38) (actual
time=0.103..0.104 rows=2 loops=1)
    Sort Key: login_attempt_sid
    Sort Method:  quicksort  Memory: 25kB
    ->  Index Scan using login_attempt_username_idx on login_attempt
(cost=0.00..902.71 rows=7914 width=38) (actual time=0.090..0.091 rows=2
loops=1)
          Index Cond: ((username)::text = 'kouber'::text)
  Total runtime: 0.140 ms
(6 rows)



As you can see, there are only 2 records for that particular username.

However when I add a LIMIT clause to the same query the planner no
longer uses the right index, hence the query becomes very slow:



db=# EXPLAIN ANALYZE
SELECT
   *
FROM
   login_attempt
WHERE
   username='kouber'
ORDER BY
   login_attempt_sid DESC LIMIT 20;

   QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..770.45 rows=20 width=38) (actual
time=0.064..3797.660 rows=2 loops=1)
    ->  Index Scan Backward using login_attempt_pkey on login_attempt
(cost=0.00..304866.46 rows=7914 width=38) (actual time=0.062..3797.657
rows=2 loops=1)
          Filter: ((username)::text = 'kouber'::text)
  Total runtime: 3797.691 ms
(4 rows)



Now, recently I have altered some of the default parameters in order to
get as much as possible out of the hardware - 12 GB of RAM, 8
processors. So, I guess I have done something wrong, thus the planner is
taking that wrong decision. Here's what I have changed in
postgresql.conf (from the default one):

max_connections = 200
shared_buffers = 256MB
work_mem = 64MB
maintenance_work_mem = 128MB
max_stack_depth = 6MB
max_fsm_pages = 100000
synchronous_commit = off
wal_buffers = 1MB
commit_delay = 100
commit_siblings = 5
checkpoint_segments = 10
checkpoint_timeout = 10min
random_page_cost = 0.1
effective_cache_size = 2048MB

Any idea what's wrong here?

Regards,
--
Kouber Saparev
http://kouber.saparev.com

Re: LIMIT confuses the planner

From
Richard Huxton
Date:
Kouber Saparev wrote:

> db=# EXPLAIN ANALYZE
> SELECT
>   *
> FROM
>   login_attempt
> WHERE
>   username='kouber'
> ORDER BY
>   login_attempt_sid DESC;
>
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------
>
>  Sort  (cost=1415.15..1434.93 rows=7914 width=38) (actual
> time=0.103..0.104 rows=2 loops=1)
>    Sort Key: login_attempt_sid
>    Sort Method:  quicksort  Memory: 25kB
>    ->  Index Scan using login_attempt_username_idx on login_attempt
> (cost=0.00..902.71 rows=7914 width=38) (actual time=0.090..0.091 rows=2
> loops=1)
>          Index Cond: ((username)::text = 'kouber'::text)
>  Total runtime: 0.140 ms

It's expecting 7914 rows returned and is getting only 2. That is
probably the root of the problem.

> However when I add a LIMIT clause to the same query the planner no
> longer uses the right index, hence the query becomes very slow:
>
>
> db=# EXPLAIN ANALYZE
> SELECT
>   *
> FROM
>   login_attempt
> WHERE
>   username='kouber'
> ORDER BY
>   login_attempt_sid DESC LIMIT 20;

Since it's expecting 7914 rows for "kouber" it thinks it will find the
20 rows you want fairly quickly by just looking backward through the
login_attempt_pkey index.

Try increasing the stats on the username column.

ALTER TABLE login_attempt ALTER COLUMN username SET STATISTICS 100;
ANALYZE login_attempt;

You can try different values of statistics up to 1000, but there's no
point in setting it too high.

--
  Richard Huxton
  Archonet Ltd

Re: LIMIT confuses the planner

From
Robert Haas
Date:
On Mon, Feb 23, 2009 at 7:26 AM, Kouber Saparev <kouber@saparev.com> wrote:
> Now, recently I have altered some of the default parameters in order to get
> as much as possible out of the hardware - 12 GB of RAM, 8 processors. So, I
> guess I have done something wrong, thus the planner is taking that wrong
> decision. Here's what I have changed in postgresql.conf (from the default
> one):
>
> max_connections = 200
> shared_buffers = 256MB
> work_mem = 64MB
> maintenance_work_mem = 128MB
> max_stack_depth = 6MB
> max_fsm_pages = 100000
> synchronous_commit = off
> wal_buffers = 1MB
> commit_delay = 100
> commit_siblings = 5
> checkpoint_segments = 10
> checkpoint_timeout = 10min
> random_page_cost = 0.1
> effective_cache_size = 2048MB
>
> Any idea what's wrong here?

If you left seq_page_cost (which isn't mentioned here) at the default
value but reduced random_page_cost to 0.1, then you have
random_page_cost < seq_page_cost.  That's probably Bad.

...Robert

Re: LIMIT confuses the planner

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> If you left seq_page_cost (which isn't mentioned here) at the default
> value but reduced random_page_cost to 0.1, then you have
> random_page_cost < seq_page_cost.  That's probably Bad.

... well, it's certainly going to push the planner to believe indexscans
are cheaper than sorts no matter what.

The previously noted rowcount estimation problem might be a bigger issue
in this particular case, but I agree this is a Bad Idea.

            regards, tom lane

Re: LIMIT confuses the planner

From
Kouber Saparev
Date:
Richard Huxton wrote:
> Since it's expecting 7914 rows for "kouber" it thinks it will find the
> 20 rows you want fairly quickly by just looking backward through the
> login_attempt_pkey index.
>
> Try increasing the stats on the username column.
>
> ALTER TABLE login_attempt ALTER COLUMN username SET STATISTICS 100;
> ANALYZE login_attempt;
>
> You can try different values of statistics up to 1000, but there's no
> point in setting it too high.
>

Hmmm, that did the trick, thank you. I updated the statistics of the
column to 300, so now the query plan changed to:


Limit  (cost=127.65..127.70 rows=20 width=38) (actual time=0.085..0.086
rows=3 loops=1)
->  Sort  (cost=127.65..129.93 rows=910 width=38) (actual
time=0.084..0.085 rows=3 loops=1)
  Sort Key: login_attempt_sid
  Sort Method:  quicksort  Memory: 25kB
  ->  Bitmap Heap Scan on login_attempt  (cost=7.74..103.44 rows=910
width=38) (actual time=0.075..0.078 rows=3 loops=1)
        Recheck Cond: ((username)::text = 'kouber'::text)
        ->  Bitmap Index Scan on login_attempt_username_idx
(cost=0.00..7.51 rows=910 width=0) (actual time=0.069..0.069 rows=3 loops=1)
         Index Cond: ((username)::text = 'kouber'::text)
Total runtime: 0.114 ms


Now the planner believes there're 910 rows, which is a bit closer to the
real data:

swing=# select avg(length) from (select username, count(*) as length
from login_attempt group by username) as freq;
          avg
----------------------
  491.6087310427555479
(1 row)


--
Kouber Saparev
http://kouber.saparev.com

Re: LIMIT confuses the planner

From
Tom Lane
Date:
Kouber Saparev <kouber@saparev.com> writes:
> Now the planner believes there're 910 rows, which is a bit closer to the
> real data:

> swing=# select avg(length) from (select username, count(*) as length
> from login_attempt group by username) as freq;
>           avg
> ----------------------
>   491.6087310427555479
> (1 row)

Hmph, that's still not real good.  Ideally it should be estimating
*less* than the average frequency, because the estimate is made after
excluding all the most-common-values, which evidently 'kouber' is not
one of.  I suppose there's quite a large number of infrequently-seen
usernames and the ndistinct estimate is much less than reality?  (Look
at the pg_stats row for this column.)  It might be worth going all the
way to stats target 1000 for this column.

            regards, tom lane

Re: LIMIT confuses the planner

From
Kouber Saparev
Date:
Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> If you left seq_page_cost (which isn't mentioned here) at the default
>> value but reduced random_page_cost to 0.1, then you have
>> random_page_cost < seq_page_cost.  That's probably Bad.
>
> ... well, it's certainly going to push the planner to believe indexscans
> are cheaper than sorts no matter what.
>
> The previously noted rowcount estimation problem might be a bigger issue
> in this particular case, but I agree this is a Bad Idea.

So I've set it wrong, I guess. :-)

Now I put it to:

seq_page_cost = 1
random_page_cost = 2

Regards,
--
Kouber Saparev
http://kouber.saparev.com

Re: LIMIT confuses the planner

From
Kouber Saparev
Date:
Tom Lane wrote:
> Kouber Saparev <kouber@saparev.com> writes:
>> Now the planner believes there're 910 rows, which is a bit closer to the
>> real data:
>
>> swing=# select avg(length) from (select username, count(*) as length
>> from login_attempt group by username) as freq;
>>           avg
>> ----------------------
>>   491.6087310427555479
>> (1 row)
>
> Hmph, that's still not real good.  Ideally it should be estimating
> *less* than the average frequency, because the estimate is made after
> excluding all the most-common-values, which evidently 'kouber' is not
> one of.  I suppose there's quite a large number of infrequently-seen
> usernames and the ndistinct estimate is much less than reality?  (Look
> at the pg_stats row for this column.)  It might be worth going all the
> way to stats target 1000 for this column.


I altered the statistics for that column to 1000, so now the planner
assumes exactly 492 rows for the fore-mentioned query, which is indeed
the average. It never went *less* than that value, it was always higher,
i.e. for a statistics value of 600, it was 588, for 800, it became 540.

The current value of n_distinct (given statistics=1000) is:

db=# SELECT n_distinct FROM pg_stats WHERE tablename='login_attempt' AND
attname='username';
  n_distinct
------------
        5605
(1 row)

db=# SELECT COUNT(DISTINCT username) FROM login_attempt;
  count
-------
  23391
(1 row)


In fact, what is n_distinct standing for, apart the famous formula:
n*d / (n - f1 + f1*n/N)

;-)

Regards,
--
Kouber Saparev
http://kouber.saparev.com

Re: LIMIT confuses the planner

From
Tom Lane
Date:
Kouber Saparev <kouber@saparev.com> writes:
> Tom Lane wrote:
>> Hmph, that's still not real good.  Ideally it should be estimating
>> *less* than the average frequency, because the estimate is made after
>> excluding all the most-common-values, which evidently 'kouber' is not
>> one of.

> I altered the statistics for that column to 1000, so now the planner
> assumes exactly 492 rows for the fore-mentioned query, which is indeed
> the average. It never went *less* than that value, it was always higher,
> i.e. for a statistics value of 600, it was 588, for 800, it became 540.

I got some time to think more about this and experiment a bit further.
As far as I can tell there is no fundamental bug here --- given
reasonably accurate stats the rowcount estimate behaves as expected, ie,
you get an estimate that's less than the actual average number of values
if the target value is not one of the known MCVs.  However, as the
n_distinct estimate falls below the actual number of distinct values,
that rowcount estimate necessarily rises.  What had surprised me about
this report is that the estimate matched the true average number of rows
so closely; I wondered if there was some property of the way we estimate
n_distinct that would make that happen.  But I now think that that was
just chance: there doesn't seem to be any underlying behavior that would
cause it.  I did some experiments with random data matching a Zipfian
distribution (1/k law) and did not observe that the rowcount estimate
converged to the true average when the n_distinct value was too low.

So the bottom line here is just that the estimated n_distinct is too
low.  We've seen before that the equation we use tends to do that more
often than not.  I doubt that consistently erring on the high side would
be better though :-(.  Estimating n_distinct from a limited sample of
the population is known to be a statistically hard problem, so we'll
probably not ever have perfect answers, but doing better is on the
to-do list.

            regards, tom lane

Re: LIMIT confuses the planner

From
marcin mank
Date:
> So the bottom line here is just that the estimated n_distinct is too
> low.  We've seen before that the equation we use tends to do that more
> often than not.  I doubt that consistently erring on the high side would
> be better though :-(.  Estimating n_distinct from a limited sample of
> the population is known to be a statistically hard problem, so we'll
> probably not ever have perfect answers, but doing better is on the
> to-do list.
>

I hit an interestinhg paper on n_distinct calculation:

http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf

the PCSA algorithm described there requires O(1) calculation per
value. Page 22 describes what to do with updates streams.

This I think (disclaimer: I know little about PG internals) means that
the n_distinct estimation can be done during vacuum time (it would
play well with the visibility map addon).

What do You think?

Greetings
Marcin

Re: LIMIT confuses the planner

From
marcin mank
Date:
> I hit an interestinhg paper on n_distinct calculation:
>
> http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf
>
> the PCSA algorithm described there requires O(1) calculation per
> value. Page 22 describes what to do with updates streams.
>
> This I think (disclaimer: I know little about PG internals) means that
> the n_distinct estimation can be done during vacuum time (it would
> play well with the visibility map addon).
>
> What do You think?

ok, if You think that calculating a has function of every data field
for each insert or delete is prohibitive, just say so and don`t bother
reading the paper :]

Greetings
Marcin

Re: LIMIT confuses the planner

From
Tom Lane
Date:
marcin mank <marcin.mank@gmail.com> writes:
> I hit an interestinhg paper on n_distinct calculation:
> http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf

I don't think we're quite ready to make ANALYZE read every row of a
table in order to estimate n_distinct.  It is an interesting paper
in that it says that you have to do that in order to get *provably*
good estimates, but I've not abandoned the hope of getting *usually*
good estimates without so much work.

            regards, tom lane

Re: LIMIT confuses the planner

From
Kouber Saparev
Date:
Now I am experiencing similar issue with another table, called
"message", for which there's a conditional index:

CREATE TABLE message (
   message_sid SERIAL PRIMARY KEY,
   from_profile_sid INT NOT NULL REFERENCES profile,
   to_profile_sid INT NOT NULL REFERENCES profile,
   sender_has_deleted BOOLEAN NOT NULL DEFAULT FALSE,
   receiver_has_deleted BOOLEAN NOT NULL DEFAULT FALSE,
   datetime TIMESTAMP NOT NULL DEFAULT NOW(),
   body TEXT
);

CREATE INDEX message_from_profile_idx (from_profile_sid) WHERE NOT
sender_has_deleted;


So, again... adding a LIMIT makes the planner choose the "wrong" index.


db=# EXPLAIN ANALYZE SELECT
          message_sid
FROM
   message
WHERE
   from_profile_sid = 1296 AND NOT sender_has_deleted
ORDER BY
   message_sid DESC;
                                                                 QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=2307.70..2310.74 rows=1215 width=4) (actual
time=0.040..0.040 rows=2 loops=1)
    Sort Key: message_sid
    Sort Method:  quicksort  Memory: 25kB
    ->  Bitmap Heap Scan on message  (cost=23.59..2245.45 rows=1215
width=4) (actual time=0.029..0.033 rows=2 loops=1)
          Recheck Cond: ((from_profile_sid = 1296) AND (NOT
sender_has_deleted))
          ->  Bitmap Index Scan on message_from_profile_idx
(cost=0.00..23.28 rows=1215 width=0) (actual time=0.022..0.022 rows=2
loops=1)
                Index Cond: (from_profile_sid = 1296)
  Total runtime: 0.068 ms
(8 rows)




db=# EXPLAIN ANALYZE SELECT
   message_sid
FROM
   message
WHERE
   from_profile_sid = 1296 AND NOT sender_has_deleted
ORDER BY
   message_sid DESC LIMIT 20;
                                                                   QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..1461.12 rows=20 width=4) (actual
time=0.817..932.398 rows=2 loops=1)
    ->  Index Scan Backward using message_pkey on message
(cost=0.00..88762.80 rows=1215 width=4) (actual time=0.816..932.395
rows=2 loops=1)
          Filter: ((NOT sender_has_deleted) AND (from_profile_sid = 1296))
  Total runtime: 932.432 ms
(4 rows)



I had already increased STATISTICS to 1000 for both from_profile_sid and
sender_has_deleted, and vacuum analyzed respectively (also did reindex),
but still statistical data is confusing me:


db=# SELECT n_distinct FROM pg_stats WHERE tablename='message' AND
attname='from_profile_sid';

  n_distinct
------------
        4068
(1 row)

db=# select avg(length) from (select from_profile_sid, count(*) as
length from message group by from_profile_sid) as freq;

  avg
----------------------
  206.5117822008693663
(1 row)



Any ideas/thoughts?


--
Kouber Saparev
http://kouber.saparev.com