Hi guys,
There seems to be some very interesting stuff here, I have to try to catch up with your analysis Andres.
In the meantime.
I am sharing the results I have got on a well behaved Linux system.
No sophisticated algorithm here but evicting OS cache helps to verify the benefit of prefetching at a much smaller scale, and I think this is useful
% gcc drop_cache.c -o drop_cache;
% sudo chown root:root drop_cache;
% sudo chmod 4755 drop_cache;
I was executing like this
python3 .../run_regression_test.py --port 5433 --iterations 10 \
--columns sequential,random --workers 0 --evict os,off \
--payload-size 50 \
--rows 10000 \
--reset \
--ntables 5
1 table: significant benefit with HDD cold, SSD random cold access.
5 tables: significant benefit for random cold access. Somewhat detrimental for sequential cold access, and random hot access.
10 tables: significant benefit for random cold access. Slightly better than 5 tables for cold sequential access, and somewhat detrimental for random hot access.
These results are hard to explain, but maybe Andres has the answer:
> I think this specific issue is a bit different, because today you get
> drastically different behaviour if you have
>
> a) [miss, (miss, hit)+]
> vs
> b) [(miss, hit)+]
Tomas said
> I think a "proper" solution would require some sort of cost model for
> the I/O part, so that we can schedule the I/Os just so that the I/O
> completes right before we actually need the page.
I dare to ask
Why not use this on a feedback loop?
while (!current_buffer.ready && reasonable to prefetch) {
fetch next index tuple.
if necessary prefetch one more buffer
}
I also dare to ask
Is it possible to skip an unavailable buffer and gain time processing the rows that will be needed afterwards?
This could also help by releasing buffers more quickly if they need to be recycled.
Regards,
Alexandre