Re: Bitmap table scan cost per page formula - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Bitmap table scan cost per page formula
Date
Msg-id CAMkU=1xaLs8RnWrY_jLAc_ON--5sT0-QCEbBcQCjR5oxMHc4tg@mail.gmail.com
Whole thread Raw
In response to Re: Bitmap table scan cost per page formula  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Bitmap table scan cost per page formula
List pgsql-hackers
On Wed, Dec 20, 2017 at 2:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Dec 20, 2017 at 4:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
It is not obvious to me that the parabola is wrong.  I've certainly seen cases where reading every 2nd or 3rd block (either stochastically, or modulus) actually does take longer than reading every block, because it defeats read-ahead.  But it depends on a lot on your kernel version and your kernel settings and your file system and probably other things as well.

Well, that's an interesting point, too.  Maybe we need another graph that also shows the actual runtime of a bitmap scan and a sequential scan.

I've did some low level IO benchmarking, and I actually get 13 times slower to read every 3rd block than every block using CentOS6.9 with ext4 and the setting:
blockdev --setra 8192 /dev/sdb1
On some virtualized storage which I don't know the details of, but it  behaves as if it were RAID/JBOD with around 6 independent spindles..

If I pick the 1/3 of the blocks to read stochastically rather than by modulus, it is only 2 times slower than reading all of them, I assume because having occasional reads which are adjacent to each other does make read-ahead kick in, while evenly spaced never-adjacent reads does not.  This is probably a better model of how bitmap table scans actually work, as there is no general reason to think they would be evenly spaced and non-adjacent.  So this result is in reasonable agreement with how the current cost estimation works, the parabola peaks at about twice the cost as the sequence scan. 

I used a file of about 10GB, because I happened to have one laying around.

## read every block ($_%3>5 is never true)
sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x=""; next if $_%3>5; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024; print length $x} '|uniq -c

1295683 8192
   4317 0
real    0m38.329s

## read every 3rd block
sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x=""; next if $_%3>0; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024; print length $x} '|uniq -c
 431894 8192
   1439 0
real    8m54.698s

time perl -wle 'open my $fh, "rand" or die; foreach (1..1300000) {$x=""; next if rand()>0.3333; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024; print length $x} '|uniq -c
 431710 8192
   1434 0
real    1m23.130s

Dropping the caches is a reasonable way to do this type of benchmark, because it simulates what would happen if your data set was much larger than RAM, without needing to actually use a data set much larger than RAM.

It would be interesting to see what other people get for similar low level tests, as well actual bitmap scans.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Add hint about replication slots when nearing wraparound
Next
From: Andrey Borodin
Date:
Subject: Re: GSoC 2018