Thread: Instead of using the bloom index, a parallel sequencial scan is used with this example

Instead of using the bloom index, a parallel sequencial scan is used with this example

From

PG Doc comments form

Date:

24 October 2019, 15:17:06

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/bloom.html
Description:

postgres=# CREATE TABLE tbloom AS
postgres-#    SELECT
postgres-#      (random() * 1000000)::int as i1,
postgres-#      (random() * 1000000)::int as i2,
postgres-#      (random() * 1000000)::int as i3,
postgres-#      (random() * 1000000)::int as i4,
postgres-#      (random() * 1000000)::int as i5,
postgres-#      (random() * 1000000)::int as i6
postgres-#    FROM
postgres-#   generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5,
i6);
ERROR:  access method "bloom" does not exist
postgres=# create extension bloom;
CREATE EXTENSION
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5,
i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 =
123451;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Gather  (cost=1000.00..127220.00 rows=250 width=24) (actual
time=974.467..974.513 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Parallel Seq Scan on tbloom  (cost=0.00..126195.00 rows=104 width=24)
(actual time=895.448..895.448 rows=0 loops=3)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 3333333
Planning Time: 16.006 ms
Execution Time: 974.635 ms
(8 rows)

postgres=# analyze tbloom;
ANALYZE
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 =
123451;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
Gather  (cost=1000.00..127195.82 rows=1 width=24) (actual
time=803.314..803.436 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Parallel Seq Scan on tbloom  (cost=0.00..126195.72 rows=1 width=24)
(actual time=775.911..775.911 rows=0 loops=3)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 3333333
Planning Time: 0.416 ms
Execution Time: 803.471 ms
(8 rows)

Re: Instead of using the bloom index, a parallel sequencial scan isused with this example

From

Bruce Momjian

Date:

05 November 2019, 20:02:51

On Thu, Oct 24, 2019 at 03:17:06PM +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/12/bloom.html
> Description:

I cleaned up your script and created an SQL file that can be piped into
psql, attached.  I see the bloomidx index being used without and with
the ANALYZE, output attached.  I tested this on git master, and back
through PG 10.  Would you please run these queries and post the output:

    SELECT version();

    SELECT name, current_setting(name), source
    FROM pg_settings
    WHERE source NOT IN ('default', 'override');

My guess is that you have some non-default setting that is causing
bloomidx not to be used.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Attachment

Re: Instead of using the bloom index, a parallel sequencial scan isused with this example

From

"Daniel Westermann (DWE)"

Date:

05 November 2019, 20:28:57

>>On Thu, Oct 24, 2019 at 03:17:06PM +0000, PG Doc comments form wrote:

>> The following documentation comment has been logged on the website:
>>
>>Page: https://www.postgresql.org/docs/12/bloom.html
>> Description:

>I cleaned up your script and created an SQL file that can be piped into
>psql, attached. I see the bloomidx index being used without and with
>the ANALYZE, output attached. I tested this on git master, and back
>through PG 10. Would you please run these queries and post the output:

Thanks for having a look, Bruce. You test case is not exactly the same as in the documentation. For you "temporary table" test case I indeed see the bloom index getting used. Doing the same with a normal table results in a parallel seq scan.

postgres=# SELECT version();

version

----------------------------------------------------------------------------------------------------------------------------

PostgreSQL 12.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

(1 row)

postgres=# SELECT name, current_setting(name), source

postgres-# FROM pg_settings

postgres-# WHERE source NOT IN ('default', 'override');

name | current_setting |

----------------------------+--------------------+----------------------

application_name | psql | client

client_encoding | UTF8 | client

DateStyle | ISO, MDY | configuration file

default_text_search_config | pg_catalog.english | configuration file

dynamic_shared_memory_type | posix | configuration file

lc_messages | en_US.utf8 | configuration file

lc_monetary | de_CH.UTF-8 | configuration file

lc_numeric | de_CH.UTF-8 | configuration file

lc_time | en_US.UTF-8 | configuration file

listen_addresses | * | configuration file

log_timezone | Europe/Zurich | configuration file

max_connections | 100 | configuration file

max_stack_depth | 2MB | environment variable

max_wal_size | 1GB | configuration file

min_wal_size | 80MB | configuration file

port | 5432 | environment variable

shared_buffers | 128MB | configuration file

TimeZone | Europe/Zurich | configuration file

Regards

Daniel

Attachment

bloom2.sql

Re: Instead of using the bloom index, a parallel sequencial scan isused with this example

From

Bruce Momjian

Date:

05 November 2019, 23:18:54

On Tue, Nov  5, 2019 at 08:28:57PM +0000, Daniel Westermann (DWE) wrote:
> 
> >>On Thu, Oct 24, 2019 at 03:17:06PM +0000, PG Doc comments form wrote:
> >> The following documentation comment has been logged on the website:
> >>
> >>Page: https://www.postgresql.org/docs/12/bloom.html
> >> Description:
> 
> >I cleaned up your script and created an SQL file that can be piped into
> >psql, attached.  I see the bloomidx index being used without and with
> >the ANALYZE, output attached.  I tested this on git master, and back
> >through PG 10.  Would you please run these queries and post the output:
> 
> Thanks for having a look, Bruce. You test case is not exactly the same as in
> the documentation. For you "temporary table" test case I indeed see the bloom
> index getting used. Doing the same with a normal table results in a parallel
> seq scan.

Ah, I see your point about the temporary tables.  I have retested with
permanent tables, and see the same output you do.  By reducing the
generate_series by 100x, I am able to get the desired EXPLAIN plans by
turning on/off various optimizer settings.  I am attaching my test
script and its output.

I suggestion is what I reduce generate_series by 100x and update the
EXPLAIN plans in the docs.  The docs are unclear on how these different
plans are generated, though I am not excited about adding the exact
optimizer settings to generate each plan.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +