Home > mailing lists

Re: Slow SELECT - Mailing list pgsql-general

From	Frank Millman
Subject	Re: Slow SELECT
Date	May 26, 2020 11:38:51
Msg-id	8d91512c-5bdd-893e-af97-9336357e39fd@chagford.com Whole thread Raw
In response to	Re: Slow SELECT (Olivier Gautherot <ogautherot@gautherot.net>)
Responses	Re: Slow SELECT (Charles Clavadetscher <clavadetscher@swisspug.org>)
List	pgsql-general

Tree view

On 2020-05-26 9:32 AM, Olivier Gautherot wrote:
> Hi Frank,
> 
> On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank@chagford.com 
> <mailto:frank@chagford.com>> wrote:
> 
>     Hi all
> 
>     I have a SELECT that runs over 5 times slower on PostgreSQL compared
>     with Sql Server and sqlite3. I am trying to understand why.
> 
>     I have a table that looks like this (simplified) -
> 
>     CREATE TABLE my_table (
>           row_id SERIAL PRIMARY KEY,
>           deleted_id INT DEFAULT 0,
>           fld_1 INT REFERENCES table_1(row_id),
>           fld_2 INT REFERENCES table_2(row_id),
>           fld_3 INT REFERENCES table_3(row_id),
>           fld_4 INT REFERENCES table_4(row_id),
>           tran_date DATE,
>           tran_total DEC(21,2)
>           );
> 
>     CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3,
>     fld_4, tran_date) WHERE deleted_id = 0;
> 
>     The table sizes are -
>           my_table : 167 rows
>           table_1 : 21 rows
>           table_2 : 11 rows
>           table_3 : 3 rows
>           table_4 : 16 rows
> 
>     Therefore for each tran_date in my_table there are potentially
>     21x11x3x16 = 11088 rows. Most will be null.
> 
>     I want to select the row_id for the last tran_date for each of those
>     potential groups. This is my select -
> 
>           SELECT (
>               SELECT a.row_id FROM my_table a
>               WHERE a.fld_1 = b.row_id
>               AND a.fld_2 = c.row_id
>               AND a.fld_3 = d.row_id
>               AND a.fld_4 = e.row_id
>               AND a.deleted_id = 0
>               ORDER BY a.tran_date DESC LIMIT 1
>           )
>           FROM table_1 b, table_2 c, table_3 d, table_4 e
> 
>     Out of 11088 rows selected, 103 are not null.
> 
>     On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on
>     sqlite3, and 0.31 sec on PostgreSQL.
> 
> 
> SQL Server does a good job at caching data in memory. PostgreSQL does 
> too on consecutive calls to the same table. What execution time do you 
> get if you issue the query a second time?
> 
> My first guess would be to add an index on my_table.tran_date and check 
> in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table.
> 
>     I have looked at the EXPLAIN, but I don't really know what to look for.
>     I can supply it if that would help.
> 
>     Thanks for any advice.
> 

Thanks Olivier. Unfortunately that did not help.

I was already running the query twice and only timing the second one.

I added the index on tran_date. The timing is the same, and EXPLAIN 
shows that it is using a SEQUENTIAL SCAN.

Here is the EXPLAIN -


  Nested Loop  (cost=0.00..64155.70 rows=11088 width=4)
    ->  Nested Loop  (cost=0.00..10.36 rows=528 width=12)
          ->  Nested Loop  (cost=0.00..2.56 rows=33 width=8)
                ->  Seq Scan on table_2 c  (cost=0.00..1.11 rows=11 width=4)
                ->  Materialize  (cost=0.00..1.04 rows=3 width=4)
                      ->  Seq Scan on table_3 d  (cost=0.00..1.03 rows=3 
width=4)
          ->  Materialize  (cost=0.00..1.24 rows=16 width=4)
                ->  Seq Scan on table_4 e  (cost=0.00..1.16 rows=16 width=4)
    ->  Materialize  (cost=0.00..1.31 rows=21 width=4)
          ->  Seq Scan on table_1 b  (cost=0.00..1.21 rows=21 width=4)
    SubPlan 1
      ->  Limit  (cost=5.77..5.77 rows=1 width=8)
            ->  Sort  (cost=5.77..5.77 rows=1 width=8)
                  Sort Key: a.tran_date DESC
                  ->  Seq Scan on my_table a  (cost=0.00..5.76 rows=1 
width=8)
                        Filter: ((fld_1 = b.row_id) AND (fld_2 = 
c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND (deleted_id 
= 0))


Frank

pgsql-general by date:

From: Olivier Gautherot
Date: 26 May 2020, 10:32:14
Subject: Re: Slow SELECT

From: Charles Clavadetscher
Date: 26 May 2020, 12:10:45
Subject: Re: Slow SELECT

Re: Slow SELECT - Mailing list pgsql-general

Previous

Next