Multi-column join + aggregate subquery resulting in infinite run time - Mailing list pgsql-novice

From Steve Estes
Subject Multi-column join + aggregate subquery resulting in infinite run time
Date
Msg-id CAJjrZPCQcL1u0GaUXhexwwAt8NWUxnKsr+vjKYuY2H45jv_y4g@mail.gmail.com
Whole thread Raw
Responses Re: Multi-column join + aggregate subquery resulting in infinite run time  (Steve Estes <denzera@gmail.com>)
List pgsql-novice
Hey gang,

I thought I knew what I was doing with queries, but this has more-or-less ruined my 4th of July holiday so I think it's high time to ask for help.  I imagine it's something trivial and obvious to you.

I've got 3 tables in question here:

prod_account_details => 538k records
line_number PK,
acct_id integer, -- nearly unique, but not quite; acct_id PLUS address is unique
address varchar(100),
entity_name varchar(100),
(... a few other fields ...)
btree index on acct_id, hash index on address, also btree on (acct_id, address)

prod_customer_profiles => 532k records, nearly 1:1 with account details
acct_id integer, -- corresponds to account_details.acct_id
match_input_address varchar(100), -- corresponds to account_details.address
(... lots of other fields ...)
btree index on acct_id, hash index on match_input_address, also btree on (acct_id, match_input_address)

prod_dunning_letters => 518k records, 1:M with account_details (covering ~181k acct_ids)
line_number PK,
acct_id integer, -- FK to account_details, not formally via constraint but semantically
(... a few other fields ...)
btree index on acct_id


--- Queries ---

So firstly, what works:

(1)

SELECT ad.acct_id, ad.entity_name, ad.address, acx.flag_infobase_match

FROM prod_account_details ad

INNER JOIN prod_customer_profiles acx ON ad.acct_id = acx.acct_id 

AND ad.address = acx.match_input_address;


...returns the 532k matched records in 3.0 seconds.


Now if we INNER join to the dunning_letters table via a grouped subquery:

(2)

SELECT ad.acct_id, ad.entity_name, ad.address, acx.flag_infobase_match, dl.num_records AS dl_count

FROM prod_account_details ad

INNER JOIN prod_customer_profiles acx ON ad.acct_id = acx.acct_id 

AND ad.address = acx.match_input_address

INNER JOIN (SELECT count(*) AS num_records, acct_id FROM prod_dunning_letters GROUP BY acct_id ) dl ON ad.acct_id = dl.acct_id;


... it returns fine too, 161k records in 1.3 seconds.  

But now if I do something as simple as changing that second join (to the subquery) to a LEFT JOIN, it blows up. i.e. I want all valid accounts, appended with the count of records in the dunning_letters table if any exist:

(3)

SELECT ad.acct_id, ad.entity_name, ad.address, acx.flag_infobase_match, dl.num_records AS dl_count

FROM prod_account_details ad

INNER JOIN prod_customer_profiles acx ON ad.acct_id = acx.acct_id 

AND ad.address = acx.match_input_address

LEFT JOIN (SELECT count(*) AS num_records, acct_id FROM prod_dunning_letters GROUP BY acct_id ) dl ON ad.acct_id = dl.acct_id;


What happens is, this runs for 20+ seconds, and then starts returning ~100 records every 10 seconds, which at the volume I'm talking about (>500k) would take forever to return.  And next up I need to bolt on the counts for some other, bigger tables too, so we're quickly approaching heat-death-of-the-universe amounts of runtime here.

Now what's interesting is, if I relax the join on the first two tables, i.e. drop the address part of the join and only do the acct_id one,

(4)

SELECT ad.acct_id, ad.entity_name, ad.address, acx.flag_infobase_match, dl.num_records AS dl_count

FROM prod_account_details ad

INNER JOIN prod_customer_profiles acx ON ad.acct_id = acx.acct_id 

-- AND ad.address = acx.match_input_address

LEFT JOIN (SELECT count(*) AS num_records, acct_id FROM prod_dunning_letters GROUP BY acct_id ) dl ON ad.acct_id = dl.acct_id;


...it starts returning immediately and spools 695k records in 4.7 seconds.  The problem is that we need the address part of the join in there, because there are some duplicate acct_ids on both sides so we get a bunch of cartesian-join junk.


--- EXPLAIN query plans ---

So here's the Explain for query (1), with just the two tables:

Gather  (cost=102995.20..123944.32 rows=1 width=41)
  Workers Planned: 2
  ->  Parallel Hash Join  (cost=101995.20..122944.22 rows=1 width=41)
        Hash Cond: ((ad.acct_id = acx.acct_id) AND (ad.address = (acx.match_input_address)::text))
        ->  Parallel Seq Scan on prod_account_details ad  (cost=0.00..14226.43 rows=224343 width=39)
        ->  Parallel Hash  (cost=97096.88..97096.88 rows=224288 width=26)
              ->  Parallel Seq Scan on prod_customer_profiles acx  (cost=0.00..97096.88 rows=224288 width=26)



Now, when we add on the third table, its strategy for that join changes.  Here's the Explain for (2), with the inner join that works:

Gather  (cost=22500.40..83636.43 rows=1 width=49)
  Workers Planned: 2
  ->  Nested Loop  (cost=21500.40..82636.33 rows=1 width=49)
        ->  Hash Join  (cost=21500.40..40427.74 rows=56285 width=54)
              Hash Cond: (ad.acct_id = prod_dunning_letters.acct_id)
              ->  Parallel Seq Scan on prod_account_details ad  (cost=0.00..14226.43 rows=224343 width=39)
              ->  Hash  (cost=19345.40..19345.40 rows=123920 width=15)
                    ->  GroupAggregate  (cost=0.42..18106.20 rows=123920 width=15)
                          Group Key: prod_dunning_letters.acct_id
                          ->  Index Only Scan using prod_dunning_letters_acct_id_idx on prod_dunning_letters  (cost=0.42..14269.36 rows=519529 width=7)
        ->  Index Scan using prod_customer_profiles_match_input_address_idx on prod_customer_profiles acx  (cost=0.00..0.74 rows=1 width=26)
              Index Cond: ((match_input_address)::text = ad.address)
              Filter: (ad.acct_id = acct_id)



But then change it to a LEFT JOIN from query (3), and it changes to:

Nested Loop Left Join  (cost=102995.63..144838.72 rows=1 width=49)
  Join Filter: (ad.acct_id = prod_dunning_letters.acct_id)
  ->  Gather  (cost=102995.20..123944.32 rows=1 width=41)
        Workers Planned: 2
        ->  Parallel Hash Join  (cost=101995.20..122944.22 rows=1 width=41)
              Hash Cond: ((ad.acct_id = acx.acct_id) AND (ad.address = (acx.match_input_address)::text))
              ->  Parallel Seq Scan on prod_account_details ad  (cost=0.00..14226.43 rows=224343 width=39)
              ->  Parallel Hash  (cost=97096.88..97096.88 rows=224288 width=26)
                    ->  Parallel Seq Scan on prod_customer_profiles acx  (cost=0.00..97096.88 rows=224288 width=26)
  ->  GroupAggregate  (cost=0.42..18106.20 rows=123920 width=15)
        Group Key: prod_dunning_letters.acct_id
        ->  Index Only Scan using prod_dunning_letters_acct_id_idx on prod_dunning_letters  (cost=0.42..14269.36 rows=519529 width=7)



Just reading that, I can't readily tell why #2 finishes fast and #3 runs without end.  If the difference was JUST the use of a left join, then why does #4 finish easily too, just with a single-column join between the first two tables?

This has befuddled the rest of my experimentation with it, so I'd really appreciate some insight.

-Steve

pgsql-novice by date:

Previous
From: Gerald Cheves
Date:
Subject: Re: What's the best practice to compare the transaction with the checkpoint?
Next
From: Tom Lane
Date:
Subject: Re: What's the best practice to compare the transaction with the checkpoint?