Re: Slow SELECT - Mailing list pgsql-general
From | Charles Clavadetscher |
---|---|
Subject | Re: Slow SELECT |
Date | |
Msg-id | 40598c3956f865c69d04e8032305fb03@swisspug.org Whole thread Raw |
In response to | Re: Slow SELECT (Frank Millman <frank@chagford.com>) |
Responses |
Re: Slow SELECT
Re: Slow SELECT |
List | pgsql-general |
Hello On 2020-05-26 10:38, Frank Millman wrote: > On 2020-05-26 9:32 AM, Olivier Gautherot wrote: >> Hi Frank, >> >> On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank@chagford.com >> <mailto:frank@chagford.com>> wrote: >> >> Hi all >> >> I have a SELECT that runs over 5 times slower on PostgreSQL >> compared >> with Sql Server and sqlite3. I am trying to understand why. >> >> I have a table that looks like this (simplified) - >> >> CREATE TABLE my_table ( >> row_id SERIAL PRIMARY KEY, >> deleted_id INT DEFAULT 0, >> fld_1 INT REFERENCES table_1(row_id), >> fld_2 INT REFERENCES table_2(row_id), >> fld_3 INT REFERENCES table_3(row_id), >> fld_4 INT REFERENCES table_4(row_id), >> tran_date DATE, >> tran_total DEC(21,2) >> ); >> >> CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3, >> fld_4, tran_date) WHERE deleted_id = 0; >> >> The table sizes are - >> my_table : 167 rows >> table_1 : 21 rows >> table_2 : 11 rows >> table_3 : 3 rows >> table_4 : 16 rows >> >> Therefore for each tran_date in my_table there are potentially >> 21x11x3x16 = 11088 rows. Most will be null. >> >> I want to select the row_id for the last tran_date for each of >> those >> potential groups. This is my select - >> >> SELECT ( >> SELECT a.row_id FROM my_table a >> WHERE a.fld_1 = b.row_id >> AND a.fld_2 = c.row_id >> AND a.fld_3 = d.row_id >> AND a.fld_4 = e.row_id >> AND a.deleted_id = 0 >> ORDER BY a.tran_date DESC LIMIT 1 >> ) >> FROM table_1 b, table_2 c, table_3 d, table_4 e >> >> Out of 11088 rows selected, 103 are not null. >> >> On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on >> sqlite3, and 0.31 sec on PostgreSQL. >> >> >> SQL Server does a good job at caching data in memory. PostgreSQL does >> too on consecutive calls to the same table. What execution time do you >> get if you issue the query a second time? >> >> My first guess would be to add an index on my_table.tran_date and >> check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table. >> >> I have looked at the EXPLAIN, but I don't really know what to look >> for. >> I can supply it if that would help. >> >> Thanks for any advice. >> > > Thanks Olivier. Unfortunately that did not help. > > I was already running the query twice and only timing the second one. > > I added the index on tran_date. The timing is the same, and EXPLAIN > shows that it is using a SEQUENTIAL SCAN. > > Here is the EXPLAIN - > > > Nested Loop (cost=0.00..64155.70 rows=11088 width=4) > -> Nested Loop (cost=0.00..10.36 rows=528 width=12) > -> Nested Loop (cost=0.00..2.56 rows=33 width=8) > -> Seq Scan on table_2 c (cost=0.00..1.11 rows=11 > width=4) > -> Materialize (cost=0.00..1.04 rows=3 width=4) > -> Seq Scan on table_3 d (cost=0.00..1.03 rows=3 > width=4) > -> Materialize (cost=0.00..1.24 rows=16 width=4) > -> Seq Scan on table_4 e (cost=0.00..1.16 rows=16 > width=4) > -> Materialize (cost=0.00..1.31 rows=21 width=4) > -> Seq Scan on table_1 b (cost=0.00..1.21 rows=21 width=4) > SubPlan 1 > -> Limit (cost=5.77..5.77 rows=1 width=8) > -> Sort (cost=5.77..5.77 rows=1 width=8) > Sort Key: a.tran_date DESC > -> Seq Scan on my_table a (cost=0.00..5.76 rows=1 > width=8) > Filter: ((fld_1 = b.row_id) AND (fld_2 = > c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND > (deleted_id = 0)) > > > Frank If I see it correct, the query runs sequential scans on all tables, i.e. table_1 to table_4. Do you have an index on the referenced keys (row_id) in table_1 to table_4? It happens often that referenced keys are not indexed, leading to poor execution plans. Bye Charles
pgsql-general by date: