Re: Slow SELECT - Mailing list pgsql-general
From | Frank Millman |
---|---|
Subject | Re: Slow SELECT |
Date | |
Msg-id | 8d91512c-5bdd-893e-af97-9336357e39fd@chagford.com Whole thread Raw |
In response to | Re: Slow SELECT (Olivier Gautherot <ogautherot@gautherot.net>) |
Responses |
Re: Slow SELECT
(Charles Clavadetscher <clavadetscher@swisspug.org>)
|
List | pgsql-general |
On 2020-05-26 9:32 AM, Olivier Gautherot wrote: > Hi Frank, > > On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank@chagford.com > <mailto:frank@chagford.com>> wrote: > > Hi all > > I have a SELECT that runs over 5 times slower on PostgreSQL compared > with Sql Server and sqlite3. I am trying to understand why. > > I have a table that looks like this (simplified) - > > CREATE TABLE my_table ( > row_id SERIAL PRIMARY KEY, > deleted_id INT DEFAULT 0, > fld_1 INT REFERENCES table_1(row_id), > fld_2 INT REFERENCES table_2(row_id), > fld_3 INT REFERENCES table_3(row_id), > fld_4 INT REFERENCES table_4(row_id), > tran_date DATE, > tran_total DEC(21,2) > ); > > CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3, > fld_4, tran_date) WHERE deleted_id = 0; > > The table sizes are - > my_table : 167 rows > table_1 : 21 rows > table_2 : 11 rows > table_3 : 3 rows > table_4 : 16 rows > > Therefore for each tran_date in my_table there are potentially > 21x11x3x16 = 11088 rows. Most will be null. > > I want to select the row_id for the last tran_date for each of those > potential groups. This is my select - > > SELECT ( > SELECT a.row_id FROM my_table a > WHERE a.fld_1 = b.row_id > AND a.fld_2 = c.row_id > AND a.fld_3 = d.row_id > AND a.fld_4 = e.row_id > AND a.deleted_id = 0 > ORDER BY a.tran_date DESC LIMIT 1 > ) > FROM table_1 b, table_2 c, table_3 d, table_4 e > > Out of 11088 rows selected, 103 are not null. > > On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on > sqlite3, and 0.31 sec on PostgreSQL. > > > SQL Server does a good job at caching data in memory. PostgreSQL does > too on consecutive calls to the same table. What execution time do you > get if you issue the query a second time? > > My first guess would be to add an index on my_table.tran_date and check > in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table. > > I have looked at the EXPLAIN, but I don't really know what to look for. > I can supply it if that would help. > > Thanks for any advice. > Thanks Olivier. Unfortunately that did not help. I was already running the query twice and only timing the second one. I added the index on tran_date. The timing is the same, and EXPLAIN shows that it is using a SEQUENTIAL SCAN. Here is the EXPLAIN - Nested Loop (cost=0.00..64155.70 rows=11088 width=4) -> Nested Loop (cost=0.00..10.36 rows=528 width=12) -> Nested Loop (cost=0.00..2.56 rows=33 width=8) -> Seq Scan on table_2 c (cost=0.00..1.11 rows=11 width=4) -> Materialize (cost=0.00..1.04 rows=3 width=4) -> Seq Scan on table_3 d (cost=0.00..1.03 rows=3 width=4) -> Materialize (cost=0.00..1.24 rows=16 width=4) -> Seq Scan on table_4 e (cost=0.00..1.16 rows=16 width=4) -> Materialize (cost=0.00..1.31 rows=21 width=4) -> Seq Scan on table_1 b (cost=0.00..1.21 rows=21 width=4) SubPlan 1 -> Limit (cost=5.77..5.77 rows=1 width=8) -> Sort (cost=5.77..5.77 rows=1 width=8) Sort Key: a.tran_date DESC -> Seq Scan on my_table a (cost=0.00..5.76 rows=1 width=8) Filter: ((fld_1 = b.row_id) AND (fld_2 = c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND (deleted_id = 0)) Frank
pgsql-general by date: