Re: Query with large number of joins - Mailing list pgsql-performance

From Marco Di Cesare
Subject Re: Query with large number of joins
Date
Msg-id 0a72826fe7d04b71b9ab47d24992d898@CO2PR0701MB759.namprd07.prod.outlook.com
Whole thread Raw
In response to Re: Query with large number of joins  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: Query with large number of joins  (Andrew Dunstan <andrew@dunslane.net>)
Re: Query with large number of joins  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
I did not mean to imply this works any better on SQL Server. We never tried. I just meant to say this is the first time
weare using Postgres so we don't have much experience with it.
 

We tried with work_mem set to 1GB (even as high as 3GB) but it didn't change the GroupAggregate and Sort or query run
time.

Sorry, I had to sanitize the query and a few of the relevant tables so hopefully I got it all right. 

SELECT
                "foxtrot_india"."juliet_alpha", 
                "foxtrot_india"."foxtrot_yankee", 
                "foxtrot_india"."hotel_sierra", 
                "foxtrot_india"."juliet_alpha", 
                "foxtrot_india"."bravo_romeo", 
                "oscar_bravo"."golf_foxtrot", 
                "seven_kilo"."november_lima", 
                "foxtrot_india"."echo_six", 
                "uniform_six"."seven_six", 
                "oscar_charlie"."foxtrot_charlie", 
                COUNT(DISTINCT "foxtrot_india"."bravo_romeo")  
FROM
                "public"."seven_kilo" "seven_kilo"
                INNER JOIN "public"."papa_sierra" "papa_sierra" ON ("seven_kilo"."golf_bravo" =
"papa_sierra"."golf_bravo")
                LEFT JOIN "public"."golf_two" "golf_two" ON ("seven_kilo"."lima" = "golf_two"."lima")
                LEFT JOIN "public"."bravo_xray" "bravo_xray" ON ("seven_kilo"."lima" = "bravo_xray"."lima")
                LEFT JOIN "public"."foo1" "foo1" ON (("seven_kilo"."bar1" = "foo1"."bar1") AND
("seven_kilo"."golf_bravo"= "foo1"."golf_bravo"))
 
                INNER JOIN "public"."oscar_charlie" "oscar_charlie" ON ("seven_kilo"."lima" = "oscar_charlie"."lima")
                INNER JOIN "public"."oscar_bravo" "oscar_bravo" ON ("oscar_charlie"."foxtrot_four" =
"oscar_bravo"."foxtrot_four")
                INNER JOIN "public"."foxtrot_india" "foxtrot_india" ON ("oscar_bravo"."sierra" =
"foxtrot_india"."sierra")
                INNER JOIN "public"."hotel_romeo" "hotel_romeo" ON ("oscar_charlie"."foxtrot_charlie" =
"hotel_romeo"."foxtrot_charlie")
                INNER JOIN "public"."uniform_six" "uniform_six" ON ("hotel_romeo"."hotel_lima" =
"uniform_six"."hotel_lima")
                LEFT JOIN "public"."lookup" "foo2" ON ("foxtrot_india"."bar2" = "foo2"."lookup_id")
                LEFT JOIN "public"."uniform_two" "uniform_two" ON ("foxtrot_india"."sierra" = "uniform_two"."sierra")
                INNER JOIN "public"."lookup" "four_xray" ON ("uniform_two"."quebec" = "four_xray"."quebec")
                LEFT JOIN "public"."papa_four" "papa_four" ON ("foxtrot_india"."sierra" = "papa_four"."sierra")
                INNER JOIN "public"."lookup" "romeo_bravo" ON ("papa_four"."quebec" = "romeo_bravo"."quebec")
                LEFT JOIN "public"."juliet_two" "juliet_two" ON ("foxtrot_india"."sierra" = "juliet_two"."sierra")
                INNER JOIN "public"."lookup" "four_delta" ON ("juliet_two"."quebec" = "four_delta"."quebec")
                LEFT JOIN "public"."foo3" "foo3" ON ("foxtrot_india"."bar3" = "foo3"."bar3")
                INNER JOIN "public"."xray" "xray" ON ("seven_kilo"."lima" = "xray"."lima")
                INNER JOIN "public"."romeo_echo" "romeo_echo" ON ("xray"."echo_sierra" = "romeo_echo"."echo_sierra") 
WHERE
                ((("xray"."echo_sierra" = 'november_foxtrot')
                AND ("romeo_echo"."hotel_oscar" = 'zulu')
                AND ("oscar_charlie"."five" = 6)
                AND ("oscar_charlie"."whiskey" = 'four_romeo')
                AND ("oscar_charlie"."charlie_romeo" = 2014)))                
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10


                                    Table "public.oscar_bravo"
         Column          |         Type          | Modifiers | Storage  | Stats target | Description

-------------------------+-----------------------+-----------+----------+--------------+------------
-
 foxtrot_four            | character varying(60) | not null  | extended |              |
 sierra                     | character varying(40) | not null  | extended |              |
 foo                     | boolean               | not null  | plain    |              |
 bar                 | numeric(3,2)          |           | main     |              |
 baz                     | integer               | not null  | plain    |              |
 
Indexes:
    "foxtrot_four_sierra_PK_IX" PRIMARY KEY, btree (foxtrot_four, sierra)
    "foxtrot_four_idx" btree (foxtrot_four)
    "sierra_idx" btree (sierra) CLUSTER

Foreign-key constraints:
    "sierra_FK" FOREIGN KEY (sierra) REFERENCES foxtrot_india(sierra)
    "foxtrot_four_FK" FOREIGN KEY (foxtrot_four) REFERENCES oscar_charlie(foxtrot_four )
Has OIDs: no


                                   Table "public.oscar_charlie"
      Column       |         Type          | Modifiers | Storage  | Stats target | Description
-------------------+-----------------------+-----------+----------+--------------+-------------
 foxtrot_four      | character varying(60) | not null  | extended |              |
 foxtrot_charlie   | character varying(10) | not null  | extended |              |
 lima               | character varying(30) | not null  | extended |              |
 whiskey            | character varying(3)  | not null  | extended |              |
 charlie_romeo     | numeric(4,0)          | not null  | main     |              |
 five                 | numeric(2,0)          | not null  | main     |              |
 period_end_date   | date                  | not null  | plain    |              |
 qm_score          | numeric(5,2)          | not null  | main     |              |
 revision_date     | date                  | not null  | plain    |              |
Indexes:
    "foxtrot_four_PK_IX" PRIMARY KEY, btree (foxtrot_four)
    "foxtrot_charlie_UQ_IX" UNIQUE CONSTRAINT, btree (foxtrot_charlie, lima, whiskey, charlie_romeo, five)
    "target_period_idx" btree (five, whiskey, charlie_romeo) CLUSTER
    "foxtrot_charlie_idx" btree (foxtrot_charlie)
Foreign-key constraints:
Referenced by:
    TABLE "oscar_bravo" CONSTRAINT "foxtrot_four_FK" FOREIGN KEY (foxtrot_four) REFERENCES oscar_charlie(foxtrot_four)
Has OIDs: no


                                   Table "public.foxtrot_india"
        Column        |          Type          | Modifiers | Storage  | Stats target | Description
----------------------+------------------------+-----------+----------+--------------+-------------
 sierra                  | character varying(40)  | not null  | extended |              |
 lima                  | character varying(30)  | not null  | extended |              |
 global_client_id     | character varying(40)  | not null  | extended |              |
 org_assess_id        | integer                |           | plain    |              |
 org_client_id        | integer                |           | plain    |              |
 assess_ref_date      | date                   | not null  | plain    |              |
 assess_type          | character varying(10)  | not null  | extended |              |
 client_name          | character varying(100) | not null  | extended |              |
 gender               | character(1)           | not null  | extended |              |
 date_of_birth        | date                   | not null  | plain    |              |
 room                 | character varying(10)  | not null  | extended |              |
 bed                  | character varying(10)  |           | extended |              |
 org_floor_id         | integer                |           | plain    |              |
 org_unit_id          | integer                |           | plain    |              |
 global_physician_id  | character varying(20)  |           | extended |              |
 payer_type_lookup_id | character varying(10)  |           | extended |              |
 mds_version          | character varying(10)  | not null  | extended |              |
Indexes:
    "sierra_PK_IX" PRIMARY KEY, btree (sierra)
    "lima_FK_IX" hash (lima)    

Referenced by:
    TABLE "oscar_bravo" CONSTRAINT "oscar_sierra_FK" FOREIGN KEY (sierra) REFERENCES foxtrot_india(sierra)
Has OIDs: no


We did also attempt to change the pk's from large varchar to ints but it didn't make any noticeable difference. If you
arewondering why we went with that data type it's a long story. :-)
 


-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com] 
Sent: Tuesday, October 21, 2014 9:39 AM
To: Marco Di Cesare
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query with large number of joins

On Mon, Oct 20, 2014 at 3:32 PM, Marco Di Cesare <Marco.DiCesare@pointclickcare.com> wrote:
> We are using Postgres for the first time after being SQLServer users 
> for a long time so forgive for being noobs.
>
>
>
> We are using a BI tool that generates a query with an unusually large 
> number of joins. My understanding is that with this many joins 
> Postgres query planner can't possibly use an exhaustive search so it 
> drops into a heuristics algorithm. Unfortunately, the query runs quite 
> slow (~35 seconds) and seems to ignore using primary keys and indexes where available.
>
>
>
> Query plan here (sorry had to anonymize):
>
> http://explain.depesz.com/s/Uml
>
>
>
> Line 30 is one of the pain points where a full table scan is running 
> on 4.2 million rows even though there are indexes on 
> oscar_bravo.foxtrot_four and oscar_charlie.foxtrot_four
>
>
>
> We've tried to play around with the join_collapse_limit value by 
> upping it from the default of 8 to 10 or 12 but it doesn't seem to 
> help much. Cranking the value up to an unreasonable value of 20 does 
> shave some seconds off the query time but not substantially (explain plan with the value set to 20:
> http://explain.depesz.com/s/sW6).

You always have the option of disabling geqo completely.

However, in this case, can you fetch out the relevant fields for "oscar_bravo" that are participating in the join?  I'd
liketo see the field name/type in the source table and the destination table.  Also.
 
I'd like to see the index definition and the snippit of the query that presents the join condition.

You can encourage the server to favor index scans vs seq scans by lowering 'random_page_cost'.  The nuclear option is
todisable sequential scans completely (which is generally a bad idea but can be useful to try and fetch out queries
thatare inadvertently forced into a seqscan for some reason).
 

merlin

pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Query with large number of joins
Next
From: Andrew Dunstan
Date:
Subject: Re: Query with large number of joins