Re: Index scan prefetch? - Mailing list pgsql-hackers

From konstantin knizhnik
Subject Re: Index scan prefetch?
Date
Msg-id 49BFF604-37DF-4F84-9AA6-BCE30293175E@postgrespro.ru
Whole thread Raw
In response to Re: Index scan prefetch?  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers

On Mar 26, 2018, at 10:59 PM, Justin Pryzby wrote:

On Mon, Mar 26, 2018 at 12:43:02PM +0300, Konstantin Knizhnik wrote:
Hi, hackers.

I was faced with the following bad performance use case with Postgres: there
is a huge append-only table with serial key (ID)
which is permanently appended using multithreaded pgloader.

I think this could be similar to the problem I reported here:
https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.GA11880@telsasoft.com

The analysis at the time was that, due to "repeated keys", a btree index on the
timestamp column had non-consecutive heap TIDs (btree insertion uses random
early insertion point to avoid superlinear lookup cost during such insertions).

But, our case also involved multiple child processes simultaneously inserting
into same table, and I wonder if "repeated keys" were more or less unrelated to
the problem.  The behavior is maybe caused easily by simultaneous insertions
"clustered" around the same target: for us: now(), for you: nextval().

But now effective_io_concurrency parameter is applicable only for bitmap
...
Will it be useful to support it also for index scan?
Or there are some other ways to address this problem?

Does your case perform well with bitmap heap scan (I mean bitmap scan of the
single index)?  It seems to me that prefetch wouldn't help, as it would just
incur the same random cost you're already seeing; the solution may be to choose
another plan(bitmap) with sequential access to enable read-ahead,

Also: Claudio mentioned here that bitmap prefetch can cause the kernel to avoid
its own readahead, negatively affecting some queries:
https://www.postgresql.org/message-id/flat/8fb758a1-d7fa-4dcc-fb5b-07a992ae6a32%40gmail.com#20180207054227.GE17521@telsasoft.com

What's the pg_stats "correlation" for the table column with index being
scanned?  How many tuples?  Would you send explain(analyze,buffers) for the
problem query, and with SET enable_bitmapscan=off; ?

Thank you very much for your feedback.

Setting random_page_cost to 10 forces Postgres to use bitmap scan:

paradoxdb=# explain select * from public.people_raw where ("ID" < 2068113880) and not (("ID" < 1669762074)) order by 2,1,3;
QUERY PLAN
—---------------------------------------------------------------------------------------------------—
Sort (cost=269989441.71..270965997.99 rows=390622510 width=351)
Sort Key: "LastName", "FirstName", "MiddleName"
-> Index Scan using "People_pkey" on people_raw (cost=0.58..88742525.10 rows=390622510 width=351)
Index Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
(4 rows)

paradoxdb=# set random_page_cost to 10;
SET
paradoxdb=# explain select * from public.people_raw where ("ID" < 2068113880) and not (("ID" < 1669762074)) order by 2,1,3;
QUERY PLAN
—---------------------------------------------------------------------------------------------—
Sort (cost=397271527.57..398248083.84 rows=390622510 width=351)
Sort Key: "LastName", "FirstName", "MiddleName"
-> Bitmap Heap Scan on people_raw (cost=14905651.30..108450832.96 rows=390622510 width=351)
Recheck Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
-> Bitmap Index Scan on "People_pkey" (cost=0.00..14807995.68 rows=390622510 width=0)
Index Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
(6 rows)

Correlation of "ID" column is 0.978884.

Sorry, it is customer's system and I have limited access to it.
Next load of data is expected during this week, so we will check how bitmap scan can improve performance.
According to IO traffic, I expect up to 100 times improvement.

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Next
From: Alexander Korotkov
Date:
Subject: Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index