Home > mailing lists

Hash join seq scan slow - Mailing list pgsql-performance

From	Aldo Sarmiento
Subject	Hash join seq scan slow
Date	April 19, 2016 08:07:56
Msg-id	CAHX=r6yJzUgkLbG7ZBKdPRt2kZEHiePL7rpLDwg5bTrOZUH6eg@mail.gmail.com Whole thread
Responses	Re: Hash join seq scan slow
List	pgsql-performance

Tree view

Hello,

I'm assuming this topic has probably been bludgeoned to a pulp, but my google-fu can't seem to find a solution.

I have two relatively largish tables that I'm trying to join that result in a slow query.

Hardware:

2014 iMac w/ SSD & i5 processor

Tables:

contacts: 1.14 million rows

permissions: 2.49 million rows

contacts have many permissions

Goal: get first page of contacts (limit 40) a user has access to, sorted by creation date.

For simplicity's sake, I've taken out some of the complexity of how I retrieve the permissions & just got all permissions with id less than 2,100,000 to simulate access of ~151k contacts.

stage4=# EXPLAIN ANALYZE WITH perms AS (

stage4(# SELECT DISTINCT(contact_id) from permissions where id < 2100000

stage4(# ) SELECT

stage4-# contacts.id,

stage4-# contacts.first_name

stage4-# FROM contacts

stage4-# INNER JOIN perms ON perms.contact_id = contacts.id

stage4-# ORDER BY contacts.updated_at desc NULLS LAST LIMIT 40 OFFSET 0;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=226777.45..226777.55 rows=40 width=20) (actual time=1556.133..1556.143 rows=40 loops=1)

CTE perms

-> HashAggregate (cost=34891.61..35404.25 rows=51264 width=4) (actual time=107.107..151.455 rows=151920 loops=1)

Group Key: permissions.contact_id

-> Bitmap Heap Scan on permissions (cost=3561.50..34416.43 rows=190074 width=4) (actual time=14.839..50.445 rows=192372 loops=1)

Recheck Cond: (id < 2100000)

Heap Blocks: exact=2435

-> Bitmap Index Scan on permissions_pkey (cost=0.00..3513.98 rows=190074 width=0) (actual time=14.496..14.496 rows=192372 loops=1)

Index Cond: (id < 2100000)

-> Sort (cost=191373.19..191501.35 rows=51264 width=20) (actual time=1556.132..1556.137 rows=40 loops=1)

Sort Key: contacts.updated_at DESC NULLS LAST

Sort Method: top-N heapsort Memory: 27kB

-> Hash Join (cost=180911.60..189752.76 rows=51264 width=20) (actual time=1124.969..1532.269 rows=124152 loops=1)

Hash Cond: (perms.contact_id = contacts.id)

-> CTE Scan on perms (cost=0.00..1025.28 rows=51264 width=4) (actual time=107.110..203.330 rows=151920 loops=1)

-> Hash (cost=159891.71..159891.71 rows=1144871 width=20) (actual time=1017.354..1017.354 rows=1145174 loops=1)

Buckets: 65536 Batches: 32 Memory Usage: 2521kB

-> Seq Scan on contacts (cost=0.00..159891.71 rows=1144871 width=20) (actual time=0.035..684.361 rows=1145174 loops=1)

Planning time: 0.222 ms

Execution time: 1561.693 ms

(20 rows)

It is to my understanding that the query requires the entire 150k matched contacts to be joined in order for the Sort to run its operation. I don't see a way around this, however, I can't also see how this can be a unique case where it's acceptable to have a 1.5 second query time? There has to be lots of other companies out there that manage much more data that needs to be sorted for the presentation layer.

Thanks in advance,

-Aldo

pgsql-performance by date:

From: Evgeniy Shishkin
Date: 15 April 2016, 09:32:09
Subject: Re: Slow update on column that is part of exclusion constraint

From: Jeff Janes
Date: 20 April 2016, 00:04:46
Subject: Re: Hash join seq scan slow

Hash join seq scan slow - Mailing list pgsql-performance

Previous

Next