Re: Slow join - Mailing list pgsql-performance

From Roman Kushnir
Subject Re: Slow join
Date
Msg-id 9A5A439B-2C8F-4027-9EFA-297754EC1941@ad2games.com
Whole thread Raw
In response to Re: Slow join  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-performance
Hi Justin,

Thank you for your comments.

As you mentioned the size of shared buffers, my first thought was to just switch to a larger machine as this one only has 2 gigs of RAM. But then it occurred to me that the whole videos table is getting loaded into memory while only 2 small columns are actually used! So I created a covering index on videos (channel_id, id) and the query now completes in 190ms!

Thanks, you helped me a lot.


On Jun 25, 2018, at 6:45 PM, Justin Pryzby <pryzby@telsasoft.com> wrote:

Hi,

Thanks for providing all this info :)

On Mon, Jun 25, 2018 at 05:55:49PM +0200, Roman Kushnir wrote:
Hello,

The following basic inner join is taking too much time for me. (I’m using count(videos.id <http://videos.id/>) instead of count(*) because my actual query looks different, but I simplified it here to the essence).
I’ve tried following random people's suggestions and adjusting the random_page_cost(decreasing it from 4 to 1.1) without a stable improvement. Any hints on what is wrong here? Thank you.

Running on Amazon RDS, with default 10.1 parameters

All default ?
https://wiki.postgresql.org/wiki/Server_Configuration

It looks like nearly the entire time is spent reading this table:

Parallel Seq Scan on videos ... (ACTUAL TIME=0.687..55,555.774...)
Buffers: shared hit=7138 read=464357

Perhaps shared_buffers should be at least several times larger, and perhaps up
to 4gb to keep the entire table in RAM.  You could maybe also benefit from
better device readahead (blockdev --setra or lvchange -r or
/sys/block/sd?/queue/read_ahead_kb)

Also, it looks like there's a row count misestimate, which probably doesn't
matter for the query you sent, but maybe affects your larger query:
Hash Join (... ROWS=365,328 ... ) (... ROWS=45,307 ... )

If that matters, maybe it'd help to increase statistics on channel_id.
Actually, I see both tables have FK into channels.id:

   "fk_rails_11d6d9bea2" FOREIGN KEY (channel_id) REFERENCES channels(id)
   "fk_rails_257f68ae55" FOREIGN KEY (channel_id) REFERENCES channels(id)

I don't see the definition of "channels" (and it looks like the query I put on
the wiki doesn't show null_frac), but I think that postgres since 9.6 should be
able to infer good join statistics from the existence of the FKs.  Maybe that
only works if you actually JOIN to the channels table (?).  But if anything
that's only a 2ndary problem, if at all.

Justin

pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Slow join
Next
From: Elvir Kurić
Date:
Subject: "set primary keys..." is missing when using hight values fortransactions / scaling factor with pgbench