Re: [GENERAL] a JOIN to a VIEW seems slow - Mailing list pgsql-general

From Frank Millman
Subject Re: [GENERAL] a JOIN to a VIEW seems slow
Date
Msg-id 1925252F16BF4B0BA1F2AB53F5CD8817@FrankLaptop
Whole thread Raw
In response to [GENERAL] a JOIN to a VIEW seems slow  ("Frank Millman" <frank@chagford.com>)
Responses Re: [GENERAL] a JOIN to a VIEW seems slow
List pgsql-general
On 2017-09-18 Frank Millman wrote:
>
> Here are the timings for running the query on identical data sets using Postgresql, Sql Server, and Sqlite3 -
>
> PostgreSQL -
>     Method 1 - 0.28 sec
>     Method 2 – 1607 sec, or 26 minutes
>
> Sql Server -
>     Method 1 – 0.33 sec
>     Method 2 – 1.8 sec
>
> Sqlite3 -
>     Method 1 – 0.15 sec
>     Method 2 – 1.0 sec
>
> It seems that Sql Server and Sqlite3 are able to analyse the ‘join’, and execute an indexed read against the underlying physical tables.
>
 
I did not get any response to this, but I am still persevering, and feel that I am getting closer. Instead of waiting 26 minutes for a result, I realise that I can learn a lot by using EXPLAIN. This is what I have found out.
 
To recap, I have the following tables -
 
1. ‘ar_tran_inv’, to store invoices
2. ‘ar_tran_crn’, to store credit notes
3. ‘ar_tran_rec’ to store receipts
 
This is a subset of their common columns -
    row_id INT SERIAL PRIMARY KEY,
    tran_number VARCHAR,
    posted BOOL,
 
I have created a VIEW called ‘ar_trans’ to combine them -
 
CREATE VIEW ar_trans AS
  SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_inv WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_crn WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_rec WHERE posted = ‘1’
 
I have another table called ‘ar_trans_due’, where a row is inserted whenever a row is inserted into any of the three transaction tables. To identify the source transaction, I have the following columns -
    tran_type VARCHAR – can be ‘ar_inv’, ‘ar_crn’, or ‘ar_rec’
    tran_row_id INT – the primary key of the originating transaction
 
Now here are my tests -
 
1. =======================================================
 
SELECT tran_type, tran_row_id FROM ar_trans_due WHERE row_id = 1;
 
tran_type | tran_row_id
-----------+-------------
ar_inv    |           1
(1 row)
 
Just to give me some sample data to work with.
 
2. =======================================================
 
EXPLAIN SELECT * FROM ar_trans WHERE tran_type = ‘ar_inv’ AND tran_row_id = 1;
 
                                        QUERY PLAN                                        
-------------------------------------------------------------------------------------------
Append  (cost=0.29..8.32 rows=1 width=117)
   ->  Index Scan using ar_tran_inv_pkey on ar_tran_inv  (cost=0.29..8.31 rows=1 width=46)
         Index Cond: (row_id = 1)
         Filter: posted
(4 rows)
 
This is a select against the view. It has worked out that the underlying table to use is ‘ar_tran_inv’, and performed an indexed read.
 
3. =======================================================
 
EXPLAIN SELECT * FROM ar_trans_due a LEFT JOIN ar_tran_inv b ON b.row_id = a.tran_row_id where a.row_id = 1;
 
                                          QUERY PLAN                                          
-----------------------------------------------------------------------------------------------
Nested Loop Left Join  (cost=0.58..16.62 rows=1 width=142)
   ->  Index Scan using ar_trans_due_pkey on ar_trans_due a  (cost=0.29..8.31 rows=1 width=52)
         Index Cond: (row_id = 1)
   ->  Index Scan using ar_tran_inv_pkey on ar_tran_inv b  (cost=0.29..8.30 rows=1 width=90)
         Index Cond: (row_id = a.tran_row_id)
(5 rows)
 
Here I have selected a row from ar_trans_due, and joined the underlying physical table directly. It uses an indexed read to perform the join.
 
4. =======================================================
 
EXPLAIN SELECT * FROM ar_trans_due a LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id where a.row_id = 1;
 
                                             QUERY PLAN                                             
-----------------------------------------------------------------------------------------------------
Hash Right Join  (cost=8.32..2072.99 rows=1 width=169)
   Hash Cond: ((('ar_inv'::text) = (a.tran_type)::text) AND (ar_tran_inv.row_id = a.tran_row_id))
   ->  Append  (cost=0.00..1739.47 rows=43357 width=117)
         ->  Seq Scan on ar_tran_inv  (cost=0.00..676.01 rows=21601 width=46)
               Filter: posted
         ->  Seq Scan on ar_tran_crn  (cost=0.00..13.88 rows=155 width=124)
               Filter: posted
         ->  Seq Scan on ar_tran_rec  (cost=0.00..616.01 rows=21601 width=40)
               Filter: posted
   ->  Hash  (cost=8.31..8.31 rows=1 width=52)
         ->  Index Scan using ar_trans_due_pkey on ar_trans_due a  (cost=0.29..8.31 rows=1 width=52)
               Index Cond: (row_id = 1)
(12 rows)
 
Here I have selected the same row, and joined the view ‘ar_trans’. It seems to have all the information necessary to perform an indexed read, but instead it performs a sequential scan of all three of the underlying tables. I don’t know why it shows 155 rows for ar_tran_crn – the table is actually empty.
 
I am using version 9.4.4 on Fedora 22. I don’t want to upgrade just for the sake of it, but if any work has been done in this area for 9.5 or 9.6, that would be the obvious first thing to try.
 
Any suggestions welcomed.
 
Frank Millman
 

pgsql-general by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [GENERAL] 10 beta 4 foreign table partition check constraint broken?
Next
From: Paul Jones
Date:
Subject: Re: [GENERAL] 10 beta 4 foreign table partition check constraintbroken?