Re: Slow SELECT - Mailing list pgsql-general
From | Charles Clavadetscher |
---|---|
Subject | Re: Slow SELECT |
Date | |
Msg-id | 43655b03c50ff22bb076b5946c7cec76@swisspug.org Whole thread Raw |
In response to | Re: Slow SELECT (Charles Clavadetscher <clavadetscher@swisspug.org>) |
Responses |
Re: Slow SELECT
|
List | pgsql-general |
On 2020-05-26 11:10, Charles Clavadetscher wrote: > Hello > > On 2020-05-26 10:38, Frank Millman wrote: >> On 2020-05-26 9:32 AM, Olivier Gautherot wrote: >>> Hi Frank, >>> >>> On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank@chagford.com >>> <mailto:frank@chagford.com>> wrote: >>> >>> Hi all >>> >>> I have a SELECT that runs over 5 times slower on PostgreSQL >>> compared >>> with Sql Server and sqlite3. I am trying to understand why. >>> >>> I have a table that looks like this (simplified) - >>> >>> CREATE TABLE my_table ( >>> row_id SERIAL PRIMARY KEY, >>> deleted_id INT DEFAULT 0, >>> fld_1 INT REFERENCES table_1(row_id), >>> fld_2 INT REFERENCES table_2(row_id), >>> fld_3 INT REFERENCES table_3(row_id), >>> fld_4 INT REFERENCES table_4(row_id), >>> tran_date DATE, >>> tran_total DEC(21,2) >>> ); >>> >>> CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, >>> fld_3, >>> fld_4, tran_date) WHERE deleted_id = 0; >>> >>> The table sizes are - >>> my_table : 167 rows >>> table_1 : 21 rows >>> table_2 : 11 rows >>> table_3 : 3 rows >>> table_4 : 16 rows >>> >>> Therefore for each tran_date in my_table there are potentially >>> 21x11x3x16 = 11088 rows. Most will be null. >>> >>> I want to select the row_id for the last tran_date for each of >>> those >>> potential groups. This is my select - >>> >>> SELECT ( >>> SELECT a.row_id FROM my_table a >>> WHERE a.fld_1 = b.row_id >>> AND a.fld_2 = c.row_id >>> AND a.fld_3 = d.row_id >>> AND a.fld_4 = e.row_id >>> AND a.deleted_id = 0 >>> ORDER BY a.tran_date DESC LIMIT 1 >>> ) >>> FROM table_1 b, table_2 c, table_3 d, table_4 e >>> >>> Out of 11088 rows selected, 103 are not null. >>> >>> On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on >>> sqlite3, and 0.31 sec on PostgreSQL. >>> >>> >>> SQL Server does a good job at caching data in memory. PostgreSQL does >>> too on consecutive calls to the same table. What execution time do >>> you get if you issue the query a second time? >>> >>> My first guess would be to add an index on my_table.tran_date and >>> check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table. >>> >>> I have looked at the EXPLAIN, but I don't really know what to >>> look for. >>> I can supply it if that would help. >>> >>> Thanks for any advice. >>> >> >> Thanks Olivier. Unfortunately that did not help. >> >> I was already running the query twice and only timing the second one. >> >> I added the index on tran_date. The timing is the same, and EXPLAIN >> shows that it is using a SEQUENTIAL SCAN. >> >> Here is the EXPLAIN - >> >> >> Nested Loop (cost=0.00..64155.70 rows=11088 width=4) >> -> Nested Loop (cost=0.00..10.36 rows=528 width=12) >> -> Nested Loop (cost=0.00..2.56 rows=33 width=8) >> -> Seq Scan on table_2 c (cost=0.00..1.11 rows=11 >> width=4) >> -> Materialize (cost=0.00..1.04 rows=3 width=4) >> -> Seq Scan on table_3 d (cost=0.00..1.03 >> rows=3 width=4) >> -> Materialize (cost=0.00..1.24 rows=16 width=4) >> -> Seq Scan on table_4 e (cost=0.00..1.16 rows=16 >> width=4) >> -> Materialize (cost=0.00..1.31 rows=21 width=4) >> -> Seq Scan on table_1 b (cost=0.00..1.21 rows=21 width=4) >> SubPlan 1 >> -> Limit (cost=5.77..5.77 rows=1 width=8) >> -> Sort (cost=5.77..5.77 rows=1 width=8) >> Sort Key: a.tran_date DESC >> -> Seq Scan on my_table a (cost=0.00..5.76 rows=1 >> width=8) >> Filter: ((fld_1 = b.row_id) AND (fld_2 = >> c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND >> (deleted_id = 0)) >> >> >> Frank > > If I see it correct, the query runs sequential scans on all tables, > i.e. table_1 to table_4. > Do you have an index on the referenced keys (row_id) in table_1 to > table_4? > > It happens often that referenced keys are not indexed, leading to poor > execution plans. > > Bye > Charles I noticed later that you have very small tables. This will probably lead to a sequential scan althought there is an index in place. I am not sure if it makes a difference, but what about using explicit joins? SELECT a.row_id FROM my_table a JOIN b table_1 ON (b.row_id = a.fld_1) JOIN c table_2 ON (c.row_id = a.fld_2) JOIN d table_3 ON (d.row_id = a.fld_3) JOIN e table_4 ON (e.row_id = a.fld_4) WHERE a.deleted_id = 0 ORDER BY a.tran_date DESC LIMIT 1; Regards Charles
pgsql-general by date: