Home > mailing lists

Re: index prefetching - Mailing list pgsql-hackers

From	Alexandre Felipe
Subject	Re: index prefetching
Date	February 17 02:05:26
Msg-id	CAE8JnxNmKP+iAhfRwt9C8BTHK1KYBUBZLQav5=1wudEzSFmMSg@mail.gmail.com Whole thread
In response to	Re: index prefetching (Andres Freund <andres@anarazel.de>)
Responses	Re: index prefetching
List	pgsql-hackers

Tree view

Hi guys,

There seems to be some very interesting stuff here, I have to try to catch up with your analysis Andres.

In the meantime.

I am sharing the results I have got on a well behaved Linux system.

No sophisticated algorithm here but evicting OS cache helps to verify the benefit of prefetching at a much smaller scale, and I think this is useful

% gcc drop_cache.c -o drop_cache;

% sudo chown root:root drop_cache;

% sudo chmod 4755 drop_cache;

I was executing like this

python3 .../run_regression_test.py --port 5433 --iterations 10 \
--columns sequential,random --workers 0 --evict os,off \
--payload-size 50 \
--rows 10000 \
--reset \
--ntables 5

1 table: significant benefit with HDD cold, SSD random cold access.

5 tables: significant benefit for random cold access. Somewhat detrimental for sequential cold access, and random hot access.

10 tables: significant benefit for random cold access. Slightly better than 5 tables for cold sequential access, and somewhat detrimental for random hot access.

These results are hard to explain, but maybe Andres has the answer:

> I think this specific issue is a bit different, because today you get
> drastically different behaviour if you have
>
> a) [miss, (miss, hit)+]
> vs
> b) [(miss, hit)+]

Tomas said

> I think a "proper" solution would require some sort of cost model for
> the I/O part, so that we can schedule the I/Os just so that the I/O
> completes right before we actually need the page.

I dare to ask

Why not use this on a feedback loop?

while (!current_buffer.ready && reasonable to prefetch) {

fetch next index tuple.

if necessary prefetch one more buffer

}

I also dare to ask

Is it possible to skip an unavailable buffer and gain time processing the rows that will be needed afterwards?

This could also help by releasing buffers more quickly if they need to be recycled.

Regards,

Alexandre

Attachment

pgsql-hackers by date:

From: Sami Imseih
Date: 17 February, 01:42:50
Subject: Re: Flush some statistics within running transactions

From: Michael Paquier
Date: 17 February, 02:19:45
Subject: Re: generating function default settings from pg_proc.dat

Re: index prefetching - Mailing list pgsql-hackers

Attachment

Previous

Next