Re: Bitmap table scan cost per page formula - Mailing list pgsql-hackers
From | Jeff Janes |
---|---|
Subject | Re: Bitmap table scan cost per page formula |
Date | |
Msg-id | CAMkU=1xaLs8RnWrY_jLAc_ON--5sT0-QCEbBcQCjR5oxMHc4tg@mail.gmail.com Whole thread Raw |
In response to | Re: Bitmap table scan cost per page formula (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Bitmap table scan cost per page formula
|
List | pgsql-hackers |
On Wed, Dec 20, 2017 at 2:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Dec 20, 2017 at 4:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:It is not obvious to me that the parabola is wrong. I've certainly seen cases where reading every 2nd or 3rd block (either stochastically, or modulus) actually does take longer than reading every block, because it defeats read-ahead. But it depends on a lot on your kernel version and your kernel settings and your file system and probably other things as well.Well, that's an interesting point, too. Maybe we need another graph that also shows the actual runtime of a bitmap scan and a sequential scan.
I've did some low level IO benchmarking, and I actually get 13 times slower to read every 3rd block than every block using CentOS6.9 with ext4 and the setting:
blockdev --setra 8192 /dev/sdb1
On some virtualized storage which I don't know the details of, but it behaves as if it were RAID/JBOD with around 6 independent spindles..
If I pick the 1/3 of the blocks to read stochastically rather than by modulus, it is only 2 times slower than reading all of them, I assume because having occasional reads which are adjacent to each other does make read-ahead kick in, while evenly spaced never-adjacent reads does not. This is probably a better model of how bitmap table scans actually work, as there is no general reason to think they would be evenly spaced and non-adjacent. So this result is in reasonable agreement with how the current cost estimation works, the parabola peaks at about twice the cost as the sequence scan.
I used a file of about 10GB, because I happened to have one laying around.
## read every block ($_%3>5 is never true)
sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x=""; next if $_%3>5; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024; print length $x} '|uniq -c
1295683 8192
4317 0
real 0m38.329s
## read every 3rd block
sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x=""; next if $_%3>0; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024; print length $x} '|uniq -c
431894 8192
1439 0
real 8m54.698s
time perl -wle 'open my $fh, "rand" or die; foreach (1..1300000) {$x=""; next if rand()>0.3333; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024; print length $x} '|uniq -c
431710 8192
1434 0
real 1m23.130s
Dropping the caches is a reasonable way to do this type of benchmark, because it simulates what would happen if your data set was much larger than RAM, without needing to actually use a data set much larger than RAM.
It would be interesting to see what other people get for similar low level tests, as well actual bitmap scans.
Cheers,
Jeff
pgsql-hackers by date: