Re: Slow SELECT - Mailing list pgsql-general
From | Frank Millman |
---|---|
Subject | Re: Slow SELECT |
Date | |
Msg-id | d4676b1c-a829-741b-1112-1be4e588beb4@chagford.com Whole thread Raw |
In response to | Re: Slow SELECT (Charles Clavadetscher <clavadetscher@swisspug.org>) |
List | pgsql-general |
On 2020-05-26 11:27 AM, Charles Clavadetscher wrote: > On 2020-05-26 11:10, Charles Clavadetscher wrote: >> Hello >> >> On 2020-05-26 10:38, Frank Millman wrote: >>> On 2020-05-26 9:32 AM, Olivier Gautherot wrote: >>>> Hi Frank, >>>> >>>> On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank@chagford.com >>>> <mailto:frank@chagford.com>> wrote: >>>> >>>> Hi all >>>> >>>> I have a SELECT that runs over 5 times slower on PostgreSQL >>>> compared >>>> with Sql Server and sqlite3. I am trying to understand why. >>>> >>>> I have a table that looks like this (simplified) - >>>> >>>> CREATE TABLE my_table ( >>>> row_id SERIAL PRIMARY KEY, >>>> deleted_id INT DEFAULT 0, >>>> fld_1 INT REFERENCES table_1(row_id), >>>> fld_2 INT REFERENCES table_2(row_id), >>>> fld_3 INT REFERENCES table_3(row_id), >>>> fld_4 INT REFERENCES table_4(row_id), >>>> tran_date DATE, >>>> tran_total DEC(21,2) >>>> ); >>>> >>>> CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3, >>>> fld_4, tran_date) WHERE deleted_id = 0; >>>> >>>> The table sizes are - >>>> my_table : 167 rows >>>> table_1 : 21 rows >>>> table_2 : 11 rows >>>> table_3 : 3 rows >>>> table_4 : 16 rows >>>> >>>> Therefore for each tran_date in my_table there are potentially >>>> 21x11x3x16 = 11088 rows. Most will be null. >>>> >>>> I want to select the row_id for the last tran_date for each of >>>> those >>>> potential groups. This is my select - >>>> >>>> SELECT ( >>>> SELECT a.row_id FROM my_table a >>>> WHERE a.fld_1 = b.row_id >>>> AND a.fld_2 = c.row_id >>>> AND a.fld_3 = d.row_id >>>> AND a.fld_4 = e.row_id >>>> AND a.deleted_id = 0 >>>> ORDER BY a.tran_date DESC LIMIT 1 >>>> ) >>>> FROM table_1 b, table_2 c, table_3 d, table_4 e >>>> >>>> Out of 11088 rows selected, 103 are not null. >>>> >>>> On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on >>>> sqlite3, and 0.31 sec on PostgreSQL. >>>> >>>> >>>> SQL Server does a good job at caching data in memory. PostgreSQL >>>> does too on consecutive calls to the same table. What execution time >>>> do you get if you issue the query a second time? >>>> >>>> My first guess would be to add an index on my_table.tran_date and >>>> check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table. >>>> >>>> I have looked at the EXPLAIN, but I don't really know what to >>>> look for. >>>> I can supply it if that would help. >>>> >>>> Thanks for any advice. >>>> >>> >>> Thanks Olivier. Unfortunately that did not help. >>> >>> I was already running the query twice and only timing the second one. >>> >>> I added the index on tran_date. The timing is the same, and EXPLAIN >>> shows that it is using a SEQUENTIAL SCAN. >>> >>> Here is the EXPLAIN - >>> >>> >>> Nested Loop (cost=0.00..64155.70 rows=11088 width=4) >>> -> Nested Loop (cost=0.00..10.36 rows=528 width=12) >>> -> Nested Loop (cost=0.00..2.56 rows=33 width=8) >>> -> Seq Scan on table_2 c (cost=0.00..1.11 rows=11 >>> width=4) >>> -> Materialize (cost=0.00..1.04 rows=3 width=4) >>> -> Seq Scan on table_3 d (cost=0.00..1.03 >>> rows=3 width=4) >>> -> Materialize (cost=0.00..1.24 rows=16 width=4) >>> -> Seq Scan on table_4 e (cost=0.00..1.16 rows=16 >>> width=4) >>> -> Materialize (cost=0.00..1.31 rows=21 width=4) >>> -> Seq Scan on table_1 b (cost=0.00..1.21 rows=21 width=4) >>> SubPlan 1 >>> -> Limit (cost=5.77..5.77 rows=1 width=8) >>> -> Sort (cost=5.77..5.77 rows=1 width=8) >>> Sort Key: a.tran_date DESC >>> -> Seq Scan on my_table a (cost=0.00..5.76 rows=1 >>> width=8) >>> Filter: ((fld_1 = b.row_id) AND (fld_2 = >>> c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND >>> (deleted_id = 0)) >>> >>> >>> Frank >> >> If I see it correct, the query runs sequential scans on all tables, >> i.e. table_1 to table_4. >> Do you have an index on the referenced keys (row_id) in table_1 to >> table_4? >> >> It happens often that referenced keys are not indexed, leading to poor >> execution plans. >> >> Bye >> Charles > > I noticed later that you have very small tables. This will probably lead > to a sequential scan althought there is an index in place. > > I am not sure if it makes a difference, but what about using explicit > joins? > > SELECT a.row_id FROM my_table a > JOIN b table_1 ON (b.row_id = a.fld_1) > JOIN c table_2 ON (c.row_id = a.fld_2) > JOIN d table_3 ON (d.row_id = a.fld_3) > JOIN e table_4 ON (e.row_id = a.fld_4) > WHERE a.deleted_id = 0 > ORDER BY a.tran_date DESC LIMIT 1; > Thanks, Charles. I tried that, but unfortunately it produces a different result. I need to test for every possible combination of fld1-4, and get the highest date for each one. Using joins only tests existing combinations, and gets the highest date for all of them combined. Seel my reply to David Rowley. I do not fully understand his solution yet, but it seems to be what I am looking for. Thanks again Frank
pgsql-general by date: