Home > mailing lists

Index scan prefetch? - Mailing list pgsql-hackers

From	Konstantin Knizhnik
Subject	Index scan prefetch?
Date	March 26, 2018 15:43:02
Msg-id	e6067677-d762-4a31-7d5e-060f4eb70a8d@postgrespro.ru Whole thread Raw
Responses	Re: Index scan prefetch? (Justin Pryzby <pryzby@telsasoft.com>)
List	pgsql-hackers

Tree view

Hi, hackers.

I was faced with the following bad performance use case with Postgres: 
there is a huge append-only table with serial key (ID)
which is permanently appended using multithreaded pgloader.

Ideally (if there is no parallel load), this table should be clustered 
by "ID" key: records are stored in the heap in ID increasing order.
So index scan using ID index should be very efficient, because records 
with subsequent IDs are expected to belong to the same heap page.

In practice I see a lot of random reads and bad performance (read speed 
is about 3Mb/sec, instead of 300Mb/sec for sequential read).

This is the typical page layout:

select ctid,"ID" from T limit 100;
ctid  |    ID
--------+-----------
  (0,1)  | 230827730
  (0,2)  | 230830833
  (0,3)  | 230840164
  (0,4)  | 230853499
  (0,5)  | 230862073
  (0,6)  | 230864443
  (0,7)  | 230870332
  (0,8)  | 230872992
  (0,9)  | 230873052
  (0,10) | 230875245
  (0,11) | 230877107
  (0,12) | 230878144
  (0,13) | 230878612
  (0,14) | 230885569
  (0,15) | 230885726
  (0,16) | 230886305
  (0,17) | 230886606
  (0,18) | 230892813
  (0,19) | 230895012
  (0,20) | 230895041
  (0,21) | 230895175
  (0,22) | 230896732
  (0,23) | 230897719
  (0,24) | 230897841
  (0,25) | 230898386
  (0,26) | 230899992
  (0,27) | 230900646
  (0,28) | 230901053
  (0,29) | 230901717
  (0,30) | 230904115
  (0,31) | 230908189

And this is the result of index scan:

select ctid,"ID" from T where "ID"  > 1669762074  limit 100;
    ctid    |     ID
-----------+------------
  (1890,22) | 1669762075
  (2035,20) | 1669762076
  (2104,19) | 1669762077
  (2518,27) | 1669762078
  (2530,15) | 1669762079
  (2552,32) | 1669762080
  (2578,30) | 1669762081
  (2587,19) | 1669762082
  (2676,17) | 1669762083
  (2701,15) | 1669762084
  (2707,20) | 1669762085
  (2804,22) | 1669762086
  (2837,28) | 1669762087
  (2902,17) | 1669762088
  (2972,28) | 1669762089
  (2992,36) | 1669762090
  (3014,27) | 1669762091
  (3038,24) | 1669762092
  (3063,22) | 1669762093
  (3116,33) | 1669762094
  (3131,28) | 1669762095
  (3139,19) | 1669762096
  (3167,17) | 1669762097
  (3170,28) | 1669762098
  (3202,21) | 1669762099
  (3213,35) | 1669762100
  (3220,21) | 1669762101
  (3236,21) | 1669762102
  (3282,21) | 1669762103
  (3314,22) | 1669762104
  (3366,26) | 1669762105

So records with subsequent IDs are not located sequentially but still 
locality is good enough.
I think that prefetch can solve this problem.
But now  effective_io_concurrency parameter is  applicable only for 
bitmap scan.

Will it be useful to support it also for index scan?
Or there are some other ways to address this problem?

Thanks in advance,

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

From: David Rowley
Date: 26 March 2018, 15:37:37
Subject: Re: Parallel Aggregates for string_agg and array_agg

From: Simon Riggs
Date: 26 March 2018, 15:53:42
Subject: Re: [HACKERS] MERGE SQL Statement for PG11

Index scan prefetch? - Mailing list pgsql-hackers

Previous

Next