Re: seq vs index scan in join query - Mailing list pgsql-general

From Laurenz Albe
Subject Re: seq vs index scan in join query
Date
Msg-id 1511975838.2309.12.camel@cybertec.at
Whole thread Raw
In response to seq vs index scan in join query  (Emanuel Alvarez <ema@abductedcow.com.ar>)
Responses Re: seq vs index scan in join query  (Andres Freund <andres@anarazel.de>)
List pgsql-general
Emanuel Alvarez wrote:
> the problematic query looks like this:
> 
> SELECT keywords.strategy_id, results.position, results.created_at FROM results
>   JOIN  keywords ON results.keyword_id = keywords.id
>   WHERE results.account_id = 1
>      AND results.created_at >= '2017-10-25 00:00:00.000000'
>      AND results.created_at <= '2017-11-10 23:59:59.999999';
> 
> 
> as you can see in the query plan [1] a sequential scan is preferred.
> as we understand it, this happens because the number of rows returned
> from results is too large. if we reduce this number by either
> selecting a smaller created_at range, or another account_id with fewer
> keywords, the planner falls back to an index scan, confirming that the
> number of rows returned from results has a direct influence in this
> choice.
> 
> on the other hand, if we disable sequential scans (SET enable_seqscan
> = 0), we see than not only the query runs faster but the cost seems to
> be lower, as seen in the query plan [2].

The optimizer is right here.

Even though your second execution without sequential scans ran faster,
it is worse.

That is because the execution with the sequential scan touched
26492  + 80492 = 106984 blocks, while the second execution touched
311301 + 48510 = 359811 blocks, more than three times as many.

The second execution was just lucky because most of these blocks were
already cached, and it had to read only half as many blocks from disk.

If you repeat the execution a couple of times, you should see that
the execution using the sequential scans becomes faster.


You can boost performance even more by increasing work_mem
so that the hash can be created in memory.

Yours,
Laurenz Albe







pgsql-general by date:

Previous
From: Steven Lembark
Date:
Subject: Re: large numbers of inserts out of memory strategy
Next
From: Andres Freund
Date:
Subject: Re: seq vs index scan in join query