Re: Slow SELECT - Mailing list pgsql-general

From Charles Clavadetscher
Subject Re: Slow SELECT
Date
Msg-id 40598c3956f865c69d04e8032305fb03@swisspug.org
Whole thread Raw
In response to Re: Slow SELECT  (Frank Millman <frank@chagford.com>)
Responses Re: Slow SELECT
Re: Slow SELECT
List pgsql-general
Hello

On 2020-05-26 10:38, Frank Millman wrote:
> On 2020-05-26 9:32 AM, Olivier Gautherot wrote:
>> Hi Frank,
>> 
>> On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank@chagford.com 
>> <mailto:frank@chagford.com>> wrote:
>> 
>>     Hi all
>> 
>>     I have a SELECT that runs over 5 times slower on PostgreSQL 
>> compared
>>     with Sql Server and sqlite3. I am trying to understand why.
>> 
>>     I have a table that looks like this (simplified) -
>> 
>>     CREATE TABLE my_table (
>>           row_id SERIAL PRIMARY KEY,
>>           deleted_id INT DEFAULT 0,
>>           fld_1 INT REFERENCES table_1(row_id),
>>           fld_2 INT REFERENCES table_2(row_id),
>>           fld_3 INT REFERENCES table_3(row_id),
>>           fld_4 INT REFERENCES table_4(row_id),
>>           tran_date DATE,
>>           tran_total DEC(21,2)
>>           );
>> 
>>     CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3,
>>     fld_4, tran_date) WHERE deleted_id = 0;
>> 
>>     The table sizes are -
>>           my_table : 167 rows
>>           table_1 : 21 rows
>>           table_2 : 11 rows
>>           table_3 : 3 rows
>>           table_4 : 16 rows
>> 
>>     Therefore for each tran_date in my_table there are potentially
>>     21x11x3x16 = 11088 rows. Most will be null.
>> 
>>     I want to select the row_id for the last tran_date for each of 
>> those
>>     potential groups. This is my select -
>> 
>>           SELECT (
>>               SELECT a.row_id FROM my_table a
>>               WHERE a.fld_1 = b.row_id
>>               AND a.fld_2 = c.row_id
>>               AND a.fld_3 = d.row_id
>>               AND a.fld_4 = e.row_id
>>               AND a.deleted_id = 0
>>               ORDER BY a.tran_date DESC LIMIT 1
>>           )
>>           FROM table_1 b, table_2 c, table_3 d, table_4 e
>> 
>>     Out of 11088 rows selected, 103 are not null.
>> 
>>     On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on
>>     sqlite3, and 0.31 sec on PostgreSQL.
>> 
>> 
>> SQL Server does a good job at caching data in memory. PostgreSQL does 
>> too on consecutive calls to the same table. What execution time do you 
>> get if you issue the query a second time?
>> 
>> My first guess would be to add an index on my_table.tran_date and 
>> check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table.
>> 
>>     I have looked at the EXPLAIN, but I don't really know what to look 
>> for.
>>     I can supply it if that would help.
>> 
>>     Thanks for any advice.
>> 
> 
> Thanks Olivier. Unfortunately that did not help.
> 
> I was already running the query twice and only timing the second one.
> 
> I added the index on tran_date. The timing is the same, and EXPLAIN
> shows that it is using a SEQUENTIAL SCAN.
> 
> Here is the EXPLAIN -
> 
> 
>  Nested Loop  (cost=0.00..64155.70 rows=11088 width=4)
>    ->  Nested Loop  (cost=0.00..10.36 rows=528 width=12)
>          ->  Nested Loop  (cost=0.00..2.56 rows=33 width=8)
>                ->  Seq Scan on table_2 c  (cost=0.00..1.11 rows=11 
> width=4)
>                ->  Materialize  (cost=0.00..1.04 rows=3 width=4)
>                      ->  Seq Scan on table_3 d  (cost=0.00..1.03 rows=3 
> width=4)
>          ->  Materialize  (cost=0.00..1.24 rows=16 width=4)
>                ->  Seq Scan on table_4 e  (cost=0.00..1.16 rows=16 
> width=4)
>    ->  Materialize  (cost=0.00..1.31 rows=21 width=4)
>          ->  Seq Scan on table_1 b  (cost=0.00..1.21 rows=21 width=4)
>    SubPlan 1
>      ->  Limit  (cost=5.77..5.77 rows=1 width=8)
>            ->  Sort  (cost=5.77..5.77 rows=1 width=8)
>                  Sort Key: a.tran_date DESC
>                  ->  Seq Scan on my_table a  (cost=0.00..5.76 rows=1 
> width=8)
>                        Filter: ((fld_1 = b.row_id) AND (fld_2 =
> c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND
> (deleted_id = 0))
> 
> 
> Frank

If I see it correct, the query runs sequential scans on all tables, i.e. 
table_1 to table_4.
Do you have an index on the referenced keys (row_id) in table_1 to 
table_4?

It happens often that referenced keys are not indexed, leading to poor 
execution plans.

Bye
Charles



pgsql-general by date:

Previous
From: Frank Millman
Date:
Subject: Re: Slow SELECT
Next
From: Charles Clavadetscher
Date:
Subject: Re: Slow SELECT