Re: Slow SELECT - Mailing list pgsql-general

From Charles Clavadetscher
Subject Re: Slow SELECT
Date
Msg-id 43655b03c50ff22bb076b5946c7cec76@swisspug.org
Whole thread Raw
In response to Re: Slow SELECT  (Charles Clavadetscher <clavadetscher@swisspug.org>)
Responses Re: Slow SELECT
List pgsql-general
On 2020-05-26 11:10, Charles Clavadetscher wrote:
> Hello
> 
> On 2020-05-26 10:38, Frank Millman wrote:
>> On 2020-05-26 9:32 AM, Olivier Gautherot wrote:
>>> Hi Frank,
>>> 
>>> On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank@chagford.com 
>>> <mailto:frank@chagford.com>> wrote:
>>> 
>>>     Hi all
>>> 
>>>     I have a SELECT that runs over 5 times slower on PostgreSQL 
>>> compared
>>>     with Sql Server and sqlite3. I am trying to understand why.
>>> 
>>>     I have a table that looks like this (simplified) -
>>> 
>>>     CREATE TABLE my_table (
>>>           row_id SERIAL PRIMARY KEY,
>>>           deleted_id INT DEFAULT 0,
>>>           fld_1 INT REFERENCES table_1(row_id),
>>>           fld_2 INT REFERENCES table_2(row_id),
>>>           fld_3 INT REFERENCES table_3(row_id),
>>>           fld_4 INT REFERENCES table_4(row_id),
>>>           tran_date DATE,
>>>           tran_total DEC(21,2)
>>>           );
>>> 
>>>     CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, 
>>> fld_3,
>>>     fld_4, tran_date) WHERE deleted_id = 0;
>>> 
>>>     The table sizes are -
>>>           my_table : 167 rows
>>>           table_1 : 21 rows
>>>           table_2 : 11 rows
>>>           table_3 : 3 rows
>>>           table_4 : 16 rows
>>> 
>>>     Therefore for each tran_date in my_table there are potentially
>>>     21x11x3x16 = 11088 rows. Most will be null.
>>> 
>>>     I want to select the row_id for the last tran_date for each of 
>>> those
>>>     potential groups. This is my select -
>>> 
>>>           SELECT (
>>>               SELECT a.row_id FROM my_table a
>>>               WHERE a.fld_1 = b.row_id
>>>               AND a.fld_2 = c.row_id
>>>               AND a.fld_3 = d.row_id
>>>               AND a.fld_4 = e.row_id
>>>               AND a.deleted_id = 0
>>>               ORDER BY a.tran_date DESC LIMIT 1
>>>           )
>>>           FROM table_1 b, table_2 c, table_3 d, table_4 e
>>> 
>>>     Out of 11088 rows selected, 103 are not null.
>>> 
>>>     On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on
>>>     sqlite3, and 0.31 sec on PostgreSQL.
>>> 
>>> 
>>> SQL Server does a good job at caching data in memory. PostgreSQL does 
>>> too on consecutive calls to the same table. What execution time do 
>>> you get if you issue the query a second time?
>>> 
>>> My first guess would be to add an index on my_table.tran_date and 
>>> check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table.
>>> 
>>>     I have looked at the EXPLAIN, but I don't really know what to 
>>> look for.
>>>     I can supply it if that would help.
>>> 
>>>     Thanks for any advice.
>>> 
>> 
>> Thanks Olivier. Unfortunately that did not help.
>> 
>> I was already running the query twice and only timing the second one.
>> 
>> I added the index on tran_date. The timing is the same, and EXPLAIN
>> shows that it is using a SEQUENTIAL SCAN.
>> 
>> Here is the EXPLAIN -
>> 
>> 
>>  Nested Loop  (cost=0.00..64155.70 rows=11088 width=4)
>>    ->  Nested Loop  (cost=0.00..10.36 rows=528 width=12)
>>          ->  Nested Loop  (cost=0.00..2.56 rows=33 width=8)
>>                ->  Seq Scan on table_2 c  (cost=0.00..1.11 rows=11 
>> width=4)
>>                ->  Materialize  (cost=0.00..1.04 rows=3 width=4)
>>                      ->  Seq Scan on table_3 d  (cost=0.00..1.03 
>> rows=3 width=4)
>>          ->  Materialize  (cost=0.00..1.24 rows=16 width=4)
>>                ->  Seq Scan on table_4 e  (cost=0.00..1.16 rows=16 
>> width=4)
>>    ->  Materialize  (cost=0.00..1.31 rows=21 width=4)
>>          ->  Seq Scan on table_1 b  (cost=0.00..1.21 rows=21 width=4)
>>    SubPlan 1
>>      ->  Limit  (cost=5.77..5.77 rows=1 width=8)
>>            ->  Sort  (cost=5.77..5.77 rows=1 width=8)
>>                  Sort Key: a.tran_date DESC
>>                  ->  Seq Scan on my_table a  (cost=0.00..5.76 rows=1 
>> width=8)
>>                        Filter: ((fld_1 = b.row_id) AND (fld_2 =
>> c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND
>> (deleted_id = 0))
>> 
>> 
>> Frank
> 
> If I see it correct, the query runs sequential scans on all tables,
> i.e. table_1 to table_4.
> Do you have an index on the referenced keys (row_id) in table_1 to 
> table_4?
> 
> It happens often that referenced keys are not indexed, leading to poor
> execution plans.
> 
> Bye
> Charles

I noticed later that you have very small tables. This will probably lead 
to a sequential scan althought there is an index in place.

I am not sure if it makes a difference, but what about using explicit 
joins?

SELECT a.row_id FROM my_table a
JOIN b table_1 ON (b.row_id = a.fld_1)
JOIN c table_2 ON (c.row_id = a.fld_2)
JOIN d table_3 ON (d.row_id = a.fld_3)
JOIN e table_4 ON (e.row_id = a.fld_4)
WHERE a.deleted_id = 0
ORDER BY a.tran_date DESC LIMIT 1;

Regards
Charles



pgsql-general by date:

Previous
From: Charles Clavadetscher
Date:
Subject: Re: Slow SELECT
Next
From: Frank Millman
Date:
Subject: Re: Slow SELECT