Bad Estimate for complex query with JOINS on subselects and OR in where conditions - Mailing list pgsql-general

From Peter Grman
Subject Bad Estimate for complex query with JOINS on subselects and OR in where conditions
Date
Msg-id CACF7Wx0twq1scHAuc3Q7itp727nfYZsj6EWYvFf1zYYjG1nJNQ@mail.gmail.com
Whole thread Raw
Responses Re: Bad Estimate for complex query with JOINS on subselects and OR in where conditions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hello,

our ORM with tenant separation enabled is creating the following query:

    SELECT *
        FROM "Reservation"."Reservation" AS q
                 LEFT JOIN (
            SELECT q0.*
            FROM "Reservation"."PaymentAccount" AS q0
            WHERE q0."AccountCode" = 'A1'
        ) AS t ON q."PaymentAccountId" = t."Id"
                 INNER JOIN (
            SELECT q1.*
            FROM "Reservation"."NoShowFee" AS q1
            WHERE q1."AccountCode" = 'A1'
        ) AS t0 ON q."NoShowFeeId" = t0."Id"
                 INNER JOIN (
            SELECT q2.*
            FROM "Policy_Fees"."Configuration" AS q2
            WHERE q2."AccountCode" = 'A1'
        ) AS t1 ON t0."OriginalFeeConfigurationId" = t1."Id"
                 INNER JOIN (
            SELECT q3.*
            FROM "Reservation"."CancellationFee" AS q3
            WHERE q3."AccountCode" = 'A1'
        ) AS t2 ON q."CancellationFeeId" = t2."Id"
                 INNER JOIN (
            SELECT q4.*
            FROM "Policy"."CancellationPolicy" AS q4
            WHERE q4."AccountCode" = 'A1'
        ) AS t3 ON t2."OriginalCancellationPolicyId" = t3."Id"
                 INNER JOIN (
            SELECT q5.*
            FROM "Reservation"."Booking" AS q5
            WHERE q5."AccountCode" = 'A1'
        ) AS t4 ON q."BookingId" = t4."Id"
                 INNER JOIN (
            SELECT q6.*
            FROM "Reservation"."Person" AS q6
            WHERE q6."AccountCode" = 'A1'
        ) AS t5 ON t4."BookerId" = t5."Id"
                 INNER JOIN (
            SELECT q7.*
            FROM "Reservation"."Person" AS q7
            WHERE q7."AccountCode" = 'A1'
        ) AS t6 ON q."PrimaryGuestId" = t6."Id"
        WHERE (q."AccountCode" = 'A1') AND (((((t6."SearchVector" @@ to_tsquery(('foobar' || ':*'))) = TRUE) OR ((t5."SearchVector" @@ to_tsquery(('foobar' || ':*'))) = TRUE)) OR ((q."SearchVector" @@ to_tsquery(('foobar' || ':*'))) = TRUE)) OR EXISTS (
                SELECT 1
                FROM "Reservation"."TimeSlice" AS q8
                         LEFT JOIN (
                    SELECT q9.*
                    FROM "Inventory"."Unit" AS q9
                    WHERE q9."AccountCode" = 'A1'
                ) AS t7 ON q8."UnitId" = t7."Id"
                WHERE ((q8."AccountCode" = 'A1') AND ((t7."SearchVector" @@ to_tsquery(('foobar' || ':*'))) = TRUE)) AND (q."Id" = q8."ReservationId")))

This is already optimized to really only include the tables that we need (I left out the projection as we fetch almost every column from those tables).

When I tried to analyze the execution plan in http://tatiyants.com/pev/ I saw a label "bad estimate" in one path of the execution plan:

Screenshot_20190816_131944.png

I looked around to see what it might mean, and found https://www.postgresql.org/message-id/CAFh58O_Myr6G3tcH3gcGrF-=OExB08PJdWZcSBcEcovaiPsrHA@mail.gmail.com but there the issues seemed to have been a long running transaction, which we don't have (https://www.postgresql.org/message-id/CAFh58O9ttZpyCM%2BnDb7DMQsi1ca%2BZ1c%3DB50eP2Vf3AQiNZsF9w%40mail.gmail.com)

The is the execution plan created:

Nested Loop  (cost=7.37..6041.66 rows=1 width=2792) (actual time=1.930..528.372 rows=252 loops=1)
  Join Filter: ((q7."SearchVector" @@ to_tsquery('foobar:*'::text)) OR (q6."SearchVector" @@ to_tsquery('foobar:*'::text)) OR (q."SearchVector" @@ to_tsquery('foobar:*'::text)) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
  Rows Removed by Join Filter: 18800
  Buffers: shared hit=622893
  ->  Nested Loop  (cost=6.95..5995.95 rows=1 width=2463) (actual time=0.104..297.051 rows=19052 loops=1)
        Buffers: shared hit=383698
        ->  Nested Loop  (cost=6.53..5992.42 rows=1 width=2134) (actual time=0.091..239.030 rows=19052 loops=1)
              Buffers: shared hit=307469
              ->  Nested Loop Left Join  (cost=6.23..5990.48 rows=1 width=1759) (actual time=0.081..192.608 rows=19052 loops=1)
                    Buffers: shared hit=250274
                    ->  Nested Loop  (cost=5.94..5989.22 rows=1 width=1582) (actual time=0.072..161.734 rows=19052 loops=1)
                          Buffers: shared hit=223445
                          ->  Nested Loop  (cost=5.66..5988.56 rows=2 width=1281) (actual time=0.062..130.338 rows=19052 loops=1)
                                Buffers: shared hit=166289
                                ->  Nested Loop  (cost=5.24..5960.88 rows=15 width=1137) (actual time=0.051..81.762 rows=19052 loops=1)
                                      Buffers: shared hit=90036
                                      ->  Nested Loop  (cost=4.82..5651.40 rows=110 width=620) (actual time=0.035..23.878 rows=19052 loops=1)
                                            Buffers: shared hit=13824
                                            ->  Bitmap Heap Scan on "CancellationPolicy" q4  (cost=4.40..53.38 rows=16 width=245) (actual time=0.018..0.052 rows=16 loops=1)
                                                  Recheck Cond: ("AccountCode" = 'A1'::text)
                                                  Heap Blocks: exact=10
                                                  Buffers: shared hit=12
                                                  ->  Bitmap Index Scan on "IX_CancellationPolicy_AccountCode_EntityId"  (cost=0.00..4.40 rows=16 width=0) (actual time=0.014..0.014 rows=16 loops=1)
                                                        Index Cond: ("AccountCode" = 'A1'::text)
                                                        Buffers: shared hit=2
                                            ->  Index Scan using "IX_CancellationFee_OriginalCancellationPolicyId" on "CancellationFee" q3  (cost=0.42..349.69 rows=19 width=375) (actual time=0.008..1.236 rows=1191 loops=16)
                                                  Index Cond: ("OriginalCancellationPolicyId" = q4."Id")
                                                  Filter: ("AccountCode" = 'A1'::text)
                                                  Buffers: shared hit=13812
                                      ->  Index Scan using "IX_Reservation_CancellationFeeId" on "Reservation" q  (cost=0.42..2.81 rows=1 width=517) (actual time=0.002..0.002 rows=1 loops=19052)
                                            Index Cond: ("CancellationFeeId" = q3."Id")
                                            Filter: ("AccountCode" = 'A1'::text)
                                            Buffers: shared hit=76212
                                ->  Index Scan using "PK_NoShowFee" on "NoShowFee" q1  (cost=0.42..1.85 rows=1 width=144) (actual time=0.002..0.002 rows=1 loops=19052)
                                      Index Cond: ("Id" = q."NoShowFeeId")
                                      Filter: ("AccountCode" = 'A1'::text)
                                      Buffers: shared hit=76253
                          ->  Index Scan using "PK_Configuration" on "Configuration" q2  (cost=0.28..0.33 rows=1 width=301) (actual time=0.001..0.001 rows=1 loops=19052)
                                Index Cond: ("Id" = q1."OriginalFeeConfigurationId")
                                Filter: ("AccountCode" = 'A1'::text)
                                Buffers: shared hit=57156
                    ->  Index Scan using "PK_PaymentAccount" on "PaymentAccount" q0  (cost=0.29..1.26 rows=1 width=177) (actual time=0.001..0.001 rows=0 loops=19052)
                          Index Cond: (q."PaymentAccountId" = "Id")
                          Filter: ("AccountCode" = 'A1'::text)
                          Buffers: shared hit=26829
              ->  Index Scan using "PK_Booking" on "Booking" q5  (cost=0.29..1.94 rows=1 width=375) (actual time=0.002..0.002 rows=1 loops=19052)
                    Index Cond: ("Id" = q."BookingId")
                    Filter: ("AccountCode" = 'A1'::text)
                    Buffers: shared hit=57195
        ->  Index Scan using "PK_Person" on "Person" q6  (cost=0.42..3.53 rows=1 width=329) (actual time=0.002..0.002 rows=1 loops=19052)
              Index Cond: ("Id" = q5."BookerId")
              Filter: ("AccountCode" = 'A1'::text)
              Buffers: shared hit=76229
  ->  Index Scan using "PK_Person" on "Person" q7  (cost=0.42..3.33 rows=1 width=329) (actual time=0.002..0.002 rows=1 loops=19052)
        Index Cond: ("Id" = q."PrimaryGuestId")
        Filter: ("AccountCode" = 'A1'::text)
        Buffers: shared hit=76340
  SubPlan 1
    ->  Nested Loop  (cost=0.71..41.61 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=18800)
          Buffers: shared hit=162855
          ->  Index Scan using "IX_TimeSlice_ReservationId_RatePlanId_UnitGroupId_FromUtc_T~" on "TimeSlice" q8  (cost=0.42..24.78 rows=1 width=8) (actual time=0.003..0.003 rows=2 loops=18800)
                Index Cond: (q."Id" = "ReservationId")
                Filter: ("AccountCode" = 'A1'::text)
                Buffers: shared hit=81574
          ->  Index Scan using "PK_Unit" on "Unit" q9  (cost=0.29..8.56 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=33918)
                Index Cond: ("Id" = q8."UnitId")
                Filter: (("AccountCode" = 'A1'::text) AND ("SearchVector" @@ to_tsquery('foobar:*'::text)))
                Rows Removed by Filter: 1
                Buffers: shared hit=81281
  SubPlan 2
    ->  Nested Loop  (cost=32.69..187.36 rows=1 width=8) (never executed)
          ->  Bitmap Heap Scan on "Unit" q9_1  (cost=32.26..40.63 rows=1 width=8) (never executed)
                Recheck Cond: ("SearchVector" @@ to_tsquery('foobar:*'::text))
                Filter: ("AccountCode" = 'A1'::text)
                ->  Bitmap Index Scan on "IX_Unit_SearchVector"  (cost=0.00..32.26 rows=2 width=0) (never executed)
                      Index Cond: ("SearchVector" @@ to_tsquery('foobar:*'::text))
          ->  Index Scan using "IX_TimeSlice_UnitId" on "TimeSlice" q8_1  (cost=0.42..146.70 rows=4 width=16) (never executed)
                Index Cond: ("UnitId" = q9_1."Id")
                Filter: ("AccountCode" = 'A1'::text)
Planning Time: 19.838 ms
Execution Time: 528.758 ms

I'm aware that those OR conditions make it worse than it has to be, using AND instead for the where condition like this:

        WHERE (q."AccountCode" = 'A1') AND (((((t6."SearchVector" @@ to_tsquery(('foobar' || ':*'))) = TRUE) AND ((t5."SearchVector" @@ to_tsquery(('foobar' || ':*'))) = TRUE)) AND ((q."SearchVector" @@ to_tsquery(('foobar' || ':*'))) = TRUE)) AND EXISTS (
                SELECT 1
                FROM "Reservation"."TimeSlice" AS q8
                         LEFT JOIN (
                    SELECT q9.*
                    FROM "Inventory"."Unit" AS q9
                    WHERE q9."AccountCode" = 'A1'
                ) AS t7 ON q8."UnitId" = t7."Id"
                WHERE ((q8."AccountCode" = 'A1') AND ((t7."SearchVector" @@ to_tsquery(('foobar' || ':*'))) = TRUE)) AND (q."Id" = q8."ReservationId")))

I get a seemingly similar plan, with all those nested loops, but without the "bad estimate" label and it's supposed to be much faster in execution:

Nested Loop  (cost=263.89..395.88 rows=1 width=2792) (actual time=0.835..0.835 rows=0 loops=1)
  Buffers: shared hit=137
  ->  Nested Loop Semi Join  (cost=263.47..387.09 rows=1 width=2463) (actual time=0.834..0.834 rows=0 loops=1)
        Buffers: shared hit=137
        ->  Nested Loop  (cost=262.75..360.82 rows=1 width=2463) (actual time=0.834..0.834 rows=0 loops=1)
              Buffers: shared hit=137
              ->  Nested Loop  (cost=262.33..357.00 rows=1 width=2134) (actual time=0.763..0.808 rows=3 loops=1)
                    Buffers: shared hit=125
                    ->  Nested Loop  (cost=262.05..356.66 rows=1 width=1889) (actual time=0.751..0.791 rows=3 loops=1)
                          Buffers: shared hit=116
                          ->  Nested Loop  (cost=261.63..348.22 rows=1 width=1514) (actual time=0.739..0.771 rows=3 loops=1)
                                Buffers: shared hit=104
                                ->  Nested Loop  (cost=261.35..347.89 rows=1 width=1213) (actual time=0.721..0.749 rows=3 loops=1)
                                      Buffers: shared hit=95
                                      ->  Nested Loop Left Join  (cost=260.93..339.45 rows=1 width=1069) (actual time=0.709..0.729 rows=3 loops=1)
                                            Buffers: shared hit=83
                                            ->  Nested Loop  (cost=260.64..331.13 rows=1 width=892) (actual time=0.702..0.717 rows=3 loops=1)
                                                  Buffers: shared hit=80
                                                  ->  Bitmap Heap Scan on "Reservation" q  (cost=260.35..314.51 rows=2 width=517) (actual time=0.689..0.695 rows=3 loops=1)
                                                        Recheck Cond: ("SearchVector" @@ to_tsquery('foobar:*'::text))
                                                        Filter: ("AccountCode" = 'A1'::text)
                                                        Rows Removed by Filter: 6
                                                        Heap Blocks: exact=8
                                                        Buffers: shared hit=71
                                                        ->  Bitmap Index Scan on "IX_Reservation_SearchVector"  (cost=0.00..260.35 rows=13 width=0) (actual time=0.677..0.677 rows=13 loops=1)
                                                              Index Cond: ("SearchVector" @@ to_tsquery('foobar:*'::text))
                                                              Buffers: shared hit=63
                                                  ->  Index Scan using "PK_Booking" on "Booking" q5  (cost=0.29..8.31 rows=1 width=375) (actual time=0.006..0.006 rows=1 loops=3)
                                                        Index Cond: ("Id" = q."BookingId")
                                                        Filter: ("AccountCode" = 'A1'::text)
                                                        Buffers: shared hit=9
                                            ->  Index Scan using "PK_PaymentAccount" on "PaymentAccount" q0  (cost=0.29..8.31 rows=1 width=177) (actual time=0.003..0.003 rows=0 loops=3)
                                                  Index Cond: (q."PaymentAccountId" = "Id")
                                                  Filter: ("AccountCode" = 'A1'::text)
                                                  Buffers: shared hit=3
                                      ->  Index Scan using "PK_NoShowFee" on "NoShowFee" q1  (cost=0.42..8.44 rows=1 width=144) (actual time=0.005..0.005 rows=1 loops=3)
                                            Index Cond: ("Id" = q."NoShowFeeId")
                                            Filter: ("AccountCode" = 'A1'::text)
                                            Buffers: shared hit=12
                                ->  Index Scan using "PK_Configuration" on "Configuration" q2  (cost=0.28..0.33 rows=1 width=301) (actual time=0.006..0.006 rows=1 loops=3)
                                      Index Cond: ("Id" = q1."OriginalFeeConfigurationId")
                                      Filter: ("AccountCode" = 'A1'::text)
                                      Buffers: shared hit=9
                          ->  Index Scan using "PK_CancellationFee" on "CancellationFee" q3  (cost=0.42..8.44 rows=1 width=375) (actual time=0.005..0.005 rows=1 loops=3)
                                Index Cond: ("Id" = q."CancellationFeeId")
                                Filter: ("AccountCode" = 'A1'::text)
                                Buffers: shared hit=12
                    ->  Index Scan using "PK_CancellationPolicy" on "CancellationPolicy" q4  (cost=0.28..0.33 rows=1 width=245) (actual time=0.004..0.004 rows=1 loops=3)
                          Index Cond: ("Id" = q3."OriginalCancellationPolicyId")
                          Filter: ("AccountCode" = 'A1'::text)
                          Buffers: shared hit=9
              ->  Index Scan using "PK_Person" on "Person" q6  (cost=0.42..3.78 rows=1 width=329) (actual time=0.008..0.008 rows=0 loops=3)
                    Index Cond: ("Id" = q5."BookerId")
                    Filter: (("AccountCode" = 'A1'::text) AND ("SearchVector" @@ to_tsquery('foobar:*'::text)))
                    Rows Removed by Filter: 1
                    Buffers: shared hit=12
        ->  Nested Loop  (cost=0.71..26.26 rows=1 width=8) (never executed)
              ->  Index Scan using "IX_TimeSlice_ReservationId_RatePlanId_UnitGroupId_FromUtc_T~" on "TimeSlice" q8  (cost=0.42..24.78 rows=1 width=16) (never executed)
                    Index Cond: ("ReservationId" = q."Id")
                    Filter: ("AccountCode" = 'A1'::text)
              ->  Index Scan using "PK_Unit" on "Unit" q9  (cost=0.29..0.89 rows=1 width=8) (never executed)
                    Index Cond: ("Id" = q8."UnitId")
                    Filter: (("AccountCode" = 'A1'::text) AND ("SearchVector" @@ to_tsquery('foobar:*'::text)))
  ->  Index Scan using "PK_Person" on "Person" q7  (cost=0.42..8.69 rows=1 width=329) (never executed)
        Index Cond: ("Id" = q."PrimaryGuestId")
        Filter: (("AccountCode" = 'A1'::text) AND ("SearchVector" @@ to_tsquery('foobar:*'::text)))
Planning Time: 31.910 ms
Execution Time: 1.184 ms

We're working on being able to have only one "SearchVector" column, but until then, I'd like to know how I could help postgres estimating the execution better.

We're using Postgres 11 on RDS with the default settings

Thank you very much!
Peter
Attachment

pgsql-general by date:

Previous
From: Charles Clavadetscher
Date:
Subject: Re: Variable constants ?
Next
From: stan
Date:
Subject: A 3 table join question