Home > mailing lists

Re: Bitmap table scan cost per page formula - Mailing list pgsql-hackers

From	Jeff Janes
Subject	Re: Bitmap table scan cost per page formula
Date	December 21, 2017 10:51:04
Msg-id	CAMkU=1xaLs8RnWrY_jLAc_ON--5sT0-QCEbBcQCjR5oxMHc4tg@mail.gmail.com Whole thread Raw
In response to	Re: Bitmap table scan cost per page formula (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Bitmap table scan cost per page formula
List	pgsql-hackers

Tree view

On Wed, Dec 20, 2017 at 2:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Dec 20, 2017 at 4:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
It is not obvious to me that the parabola is wrong. I've certainly seen cases where reading every 2nd or 3rd block (either stochastically, or modulus) actually does take longer than reading every block, because it defeats read-ahead. But it depends on a lot on your kernel version and your kernel settings and your file system and probably other things as well.

Well, that's an interesting point, too. Maybe we need another graph that also shows the actual runtime of a bitmap scan and a sequential scan.

I've did some low level IO benchmarking, and I actually get 13 times slower to read every 3rd block than every block using CentOS6.9 with ext4 and the setting:

blockdev --setra 8192 /dev/sdb1

On some virtualized storage which I don't know the details of, but it behaves as if it were RAID/JBOD with around 6 independent spindles..

If I pick the 1/3 of the blocks to read stochastically rather than by modulus, it is only 2 times slower than reading all of them, I assume because having occasional reads which are adjacent to each other does make read-ahead kick in, while evenly spaced never-adjacent reads does not. This is probably a better model of how bitmap table scans actually work, as there is no general reason to think they would be evenly spaced and non-adjacent. So this result is in reasonable agreement with how the current cost estimation works, the parabola peaks at about twice the cost as the sequence scan.

I used a file of about 10GB, because I happened to have one laying around.

## read every block ($_%3>5 is never true)

sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"

time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x=""; next if $_%3>5; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024; print length $x} '|uniq -c

1295683 8192

4317 0

real 0m38.329s

## read every 3rd block

sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"

time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x=""; next if $_%3>0; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024; print length $x} '|uniq -c

431894 8192

1439 0

real 8m54.698s

time perl -wle 'open my $fh, "rand" or die; foreach (1..1300000) {$x=""; next if rand()>0.3333; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024; print length $x} '|uniq -c

431710 8192

1434 0

real 1m23.130s

Dropping the caches is a reasonable way to do this type of benchmark, because it simulates what would happen if your data set was much larger than RAM, without needing to actually use a data set much larger than RAM.

It would be interesting to see what other people get for similar low level tests, as well actual bitmap scans.

Cheers,

Jeff

pgsql-hackers by date:

From: Michael Paquier
Date: 21 December 2017, 10:32:35
Subject: Re: Add hint about replication slots when nearing wraparound

From: Andrey Borodin
Date: 21 December 2017, 11:42:27
Subject: Re: GSoC 2018

Re: Bitmap table scan cost per page formula - Mailing list pgsql-hackers

Previous

Next