Home > mailing lists

Re: [PERFORM] Inappropriate inner table for nested loop join - Mailing list pgsql-performance

From	Albe Laurenz
Subject	Re: [PERFORM] Inappropriate inner table for nested loop join
Date	June 23, 2017 14:20:13
Msg-id	A737B7A37273E048B164557ADEF4A58B53A5D88F@ntex2010i.host.magwien.gv.at Whole thread
In response to	[PERFORM] Inappropriate inner table for nested loop join (Akihiko Odaki <akihiko.odaki.4i@stu.hosei.ac.jp>)
Responses	Re: [PERFORM] Inappropriate inner table for nested loop join
List	pgsql-performance

Tree view

Akihiko Odaki wrote:
> I am having a problem with nested loop join.
> 
> A database has 2 tables: "posts" and "follows".
> Table "posts" have two columns: "timestamp" and "account".
> Table "follows" have two columns: "target_account" and "owner_account".
> The database also has an index on "posts" ("account", "timestamp"), one
> on "posts"("timestamp") and on "follows" ("owner_account",
> "target_account").
> 
> Table "posts" is so big and have 10 million records.
> The number of Records with the same value for "owner_accounts" in table
> "follows" is about 100 by average.
> 
> I issue the following query:
> 
> SELECT "posts".*
>    FROM "posts"
>    JOIN "follows" ON "follows"."target_account" = "posts"."account"
>    WHERE "follows"."owner_account" = $1
>    ORDER BY "posts"."timestamp"
>    LIMIT 100
> 
> That results in a nested loop join with table "posts" as the inner and
> "follows" as the outer, which queried for each loop. EXPlAIN ANALYZE
> says the actual number of rows queried from table "posts" is 500,000.
> This behavior is problematic.
> 
> For performance, it may be better to retrieve 100 records joined with a
> record in table "follows", and then to retrieve those whose
> "posts"."timestamp" is greater than the one of last record we already
> have, or 100 records, joined with another record in table "follows", and
> so on. It would end up querying 10,000 records from table "posts" at
> most. The number could be even smaller in some cases.
> 
> Now I have these tough questions:
> * Is the "ideal" operation I suggested possible for PostgreSQL?
> * If so, I think that could be achieved by letting PostgreSQL use
> "follows" as the inner in the loops. How could I achieve that?
> * Is there any other way to improve the performance of the query?

PostgreSQL`s plan is to use the index on "posts"."timestamp" to find the
rows with the lowest "timestamp", match with rows from "posts" in
a nested loop and stop as soon as it has found 100 matches.

Now it must be that the rows in "posts" that match with rows in "follows"
have high values of "timestamp".

PostgreSQL doesn't know that, because it has no estimates how
values correlate across tables, so it has to scan much more of the index
than it had expected to, and the query performs poorly.

You could either try to do something like

SELECT *
FROM (SELECT "posts".*
      FROM "posts"
         JOIN "follows" ON "follows"."target_account" = "posts"."account"
      WHERE "follows"."owner_account" = $1
      OFFSET 0) q
ORDER BY "posts"."timestamp"  
LIMIT 100;

Or you could frop the index on "posts"."timestamp" and see if that helps.

Yours,
Laurenz Albe

pgsql-performance by date:

From: Akihiko Odaki
Date: 23 June 2017, 13:31:40
Subject: [PERFORM] Inappropriate inner table for nested loop join

From: Akihiko Odaki
Date: 23 June 2017, 14:36:41
Subject: Re: [PERFORM] Inappropriate inner table for nested loop join

Re: [PERFORM] Inappropriate inner table for nested loop join - Mailing list pgsql-performance

Previous

Next