Re: massive performance hit when using "Limit 1" - Mailing list pgsql-general

From Rich Doughty
Subject Re: massive performance hit when using "Limit 1"
Date
Msg-id 43958B99.8090904@opusvl.com
Whole thread Raw
In response to Re: massive performance hit when using "Limit 1"  (Richard Huxton <dev@archonet.com>)
Responses Re: massive performance hit when using "Limit 1"
List pgsql-general
Richard Huxton wrote:
> Rich Doughty wrote:
>
>>
>> This one goes nuts and doesn't return. is there any way i can
>> force a query plan similar to the one above?
>>
>>   EXPLAIN SELECT _t.* FROM
>>        tokens.ta_tokens       _t INNER JOIN
>>        tokens.ta_tokens_stock _s ON _t.token_id = _s.token_id
>>   WHERE
>>        _s.retailer_id = '96599' AND
>>        _t.value       = '10'
>>   ORDER BY
>>        _t.number ASC
>>   LIMIT '1';
>>                                                     QUERY PLAN
>> -------------------------------------------------------------------------------------------------------------------
>>
>>  Limit  (cost=0.00..14967.39 rows=1 width=27)
>>    ->  Nested Loop  (cost=0.00..22316378.56 rows=1491 width=27)
>>          ->  Index Scan using ta_tokens_number_key on ta_tokens _t
>> (cost=0.00..15519868.33 rows=1488768 width=27)
>>                Filter: ((value)::numeric = 10::numeric)
>>          ->  Index Scan using ta_tokens_stock_pkey on ta_tokens_stock
>> _s  (cost=0.00..4.55 rows=1 width=4)
>>                Index Cond: (("outer".token_id)::integer =
>> (_s.token_id)::integer)
>>                Filter: ((retailer_id)::integer = 96599)
>
>
> I *think* what's happening here is that PG thinks it will use the index
> on _t.number (since you are going to sort by that anyway) and pretty
> soon find a row that will:
>   1. have value=10
>   2. join to a row in _s with the right retailer_id
> It turns out that isn't the case, and so the query takes forever.
> Without knowing what "value" and "number" mean it's difficult to be
> sure, but I'd guess it's the "token_id" join part that's the problem,
> since at a guess a high-numbered retailer will have tokens with
> high-numbered "retailer_id".
>
> If you'd posted EXPLAIN ANALYSE then we'd be able to see what actually
> did happen.

no chance. it takes far too long to return (days...).

> Try the same query but with a low retailer_id (100 or something) and see
> if it goes a lot quicker. If that is what the problem is, try changing
> the ORDER BY to something like "_s.retailer_id, _t.value, _t.number" and
> see if that gives the planner a nudge in the right direction.

the retailer_id would make no difference as thee are only 4000-ish rows in
ta_tokens_stock and they all (for now) have the same retailer_id.

> Failing that, a change to your indexes will almost certainly help.

i'm not sure that's the case. the exact same query, but limited to >2 rows
is fine.

I found this in the 8.0.4 relnotes. i reckon its a good guess that's what the
problem is:

* Fix mis-planning of queries with small LIMIT values due to poorly thought
   out "fuzzy" cost comparison



--

   - Rich Doughty

pgsql-general by date:

Previous
From: "Markus Wollny"
Date:
Subject: Re: Unicode Corruption and upgrading to 8.0.4. to 8.1
Next
From: Rich Doughty
Date:
Subject: Re: massive performance hit when using "Limit 1"