Thread: pgsql: Bloom index contrib module

pgsql: Bloom index contrib module

From
Teodor Sigaev
Date:
Bloom index contrib module

Module provides new access method. It is actually a simple Bloom filter
implemented as pgsql's index. It could give some benefits on search
with large number of columns.

Module is a single way to test generic WAL interface committed earlier.

Author: Teodor Sigaev, Alexander Korotkov
Reviewers: Aleksander Alekseev, Michael Paquier, Jim Nasby

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/9ee014fc899a28a198492b074e32b60ed8915ea9

Modified Files
--------------
contrib/Makefile                 |   1 +
contrib/bloom/.gitignore         |   4 +
contrib/bloom/Makefile           |  24 ++
contrib/bloom/blcost.c           |  48 ++++
contrib/bloom/blinsert.c         | 313 ++++++++++++++++++++++++++
contrib/bloom/bloom--1.0.sql     |  19 ++
contrib/bloom/bloom.control      |   5 +
contrib/bloom/bloom.h            | 178 +++++++++++++++
contrib/bloom/blscan.c           | 175 +++++++++++++++
contrib/bloom/blutils.c          | 463 +++++++++++++++++++++++++++++++++++++++
contrib/bloom/blvacuum.c         | 212 ++++++++++++++++++
contrib/bloom/blvalidate.c       | 220 +++++++++++++++++++
contrib/bloom/expected/bloom.out | 122 +++++++++++
contrib/bloom/sql/bloom.sql      |  47 ++++
contrib/bloom/t/001_wal.pl       |  75 +++++++
doc/src/sgml/bloom.sgml          | 218 ++++++++++++++++++
doc/src/sgml/contrib.sgml        |   1 +
doc/src/sgml/filelist.sgml       |   1 +
18 files changed, 2126 insertions(+)


Re: pgsql: Bloom index contrib module

From
Erik Rijkers
Date:
On 2016-04-01 15:49, Teodor Sigaev wrote:
> Bloom index contrib module
>
> Module provides new access method. It is actually a simple Bloom filter
> implemented as pgsql's index. It could give some benefits on search
> with large number of columns.
>
> doc/src/sgml/bloom.sgml          | 218 ++++++++++++++++++

I edited the bloom.sgml text a bit.

Great stuff, thanks!

Erik Rijkers


Attachment

Re: pgsql: Bloom index contrib module

From
Teodor Sigaev
Date:
Several non-x86 members of pgbuildfarm aren't happy with it, we are
investigating the problem

Teodor Sigaev wrote:
> Bloom index contrib module
>
> Module provides new access method. It is actually a simple Bloom filter
> implemented as pgsql's index. It could give some benefits on search
> with large number of columns.
>
> Module is a single way to test generic WAL interface committed earlier.
>
> Author: Teodor Sigaev, Alexander Korotkov
> Reviewers: Aleksander Alekseev, Michael Paquier, Jim Nasby
>
> Branch
> ------
> master
>
> Details
> -------
> http://git.postgresql.org/pg/commitdiff/9ee014fc899a28a198492b074e32b60ed8915ea9
>
> Modified Files
> --------------
> contrib/Makefile                 |   1 +
> contrib/bloom/.gitignore         |   4 +
> contrib/bloom/Makefile           |  24 ++
> contrib/bloom/blcost.c           |  48 ++++
> contrib/bloom/blinsert.c         | 313 ++++++++++++++++++++++++++
> contrib/bloom/bloom--1.0.sql     |  19 ++
> contrib/bloom/bloom.control      |   5 +
> contrib/bloom/bloom.h            | 178 +++++++++++++++
> contrib/bloom/blscan.c           | 175 +++++++++++++++
> contrib/bloom/blutils.c          | 463 +++++++++++++++++++++++++++++++++++++++
> contrib/bloom/blvacuum.c         | 212 ++++++++++++++++++
> contrib/bloom/blvalidate.c       | 220 +++++++++++++++++++
> contrib/bloom/expected/bloom.out | 122 +++++++++++
> contrib/bloom/sql/bloom.sql      |  47 ++++
> contrib/bloom/t/001_wal.pl       |  75 +++++++
> doc/src/sgml/bloom.sgml          | 218 ++++++++++++++++++
> doc/src/sgml/contrib.sgml        |   1 +
> doc/src/sgml/filelist.sgml       |   1 +
> 18 files changed, 2126 insertions(+)
>
>

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: pgsql: Bloom index contrib module

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
> Bloom index contrib module

skink provided some pretty suggestive evidence about why this
is unstable:

==32446== VALGRINDERROR-BEGIN
==32446== Conditional jump or move depends on uninitialised value(s)
==32446==    at 0x4E2E71: writeDelta (generic_xlog.c:137)
==32446==    by 0x4E341E: GenericXLogFinish (generic_xlog.c:313)
==32446==    by 0x14E83324: blbulkdelete (blvacuum.c:149)
==32446==    by 0x4BCEE7: index_bulk_delete (indexam.c:627)
==32446==    by 0x5DE577: lazy_vacuum_index (vacuumlazy.c:1581)
==32446==    by 0x5DFB52: lazy_scan_heap (vacuumlazy.c:1273)
==32446==    by 0x5E03AA: lazy_vacuum_rel (vacuumlazy.c:249)
==32446==    by 0x5DC7B7: vacuum_rel (vacuum.c:1375)
==32446==    by 0x5DD5F7: vacuum (vacuum.c:296)
==32446==    by 0x693B71: autovacuum_do_vac_analyze (autovacuum.c:2807)
==32446==    by 0x695B2A: do_autovacuum (autovacuum.c:2328)
==32446==    by 0x696055: AutoVacWorkerMain (autovacuum.c:1647)
==32446==  Uninitialised value was created by a stack allocation
==32446==    at 0x14E82CAB: blbulkdelete (blvacuum.c:36)
==32446==
==32446== VALGRINDERROR-END

            regards, tom lane


Re: pgsql: Bloom index contrib module

From
Erik Rijkers
Date:
On 2016-04-01 14:36, Erik Rijkers wrote:
> On 2016-04-01 15:49, Teodor Sigaev wrote:
>> Bloom index contrib module
>>
>> doc/src/sgml/bloom.sgml          | 218 ++++++++++++++++++
>


The size of example table (in bloom.sgml):

CREATE TABLE tbloom AS
SELECT
     random()::int as i1,
     random()::int as i2,
[...]
     random()::int as i12,
     random()::int as i13
FROM
     generate_series(1,1000);

seems too small to demonstrate the index-use.

For me, both on $BigServer at work as on $ModestDesktop at home the 1000
rows are not enough.

I suggest making the rowcount in that example a larger, for instance
10000, so: generate_series(1,10000).

Does that make sense?  I realize the behavior is probably somewhat
dependent from hardware and settings...


thanks,


Erik Rijkers