Thread: Updated bitmap index patch

Updated bitmap index patch

From
Gavin Sherry
Date:
Hi all,

Attached is an updated bitmap index patch. It contains bug fixes, API
changes, binary changes (page identifier to distinguish it from other
indexes) and has been brought up to HEAD.

I worked on a few approaches to VACUUM, none very satisfactory. The
problem is, breaking a compressed word representing matches can have
serious consequences -- at the least, creation of new words, at the worst,
creation of a new page. If a lot of this were to happen, REINDEX would be
much more efficient (this is what earlier patches did).

One approach I looked at was modifying the existing read API to be able to
do something like "kill prior tuple". This, I think, made the API quite
complex and it was hard to implement, since the existing mechanism
decompresses words on the fly and it would be hard to identify which TID
is no longer a match. So, I dropped this idea pretty quickly.

The second approach is to just manually traverse each vector and change
matches to non-matches where necessary. The complexity then is in managing
the consequences of breaking compressed words, doing WAL (efficiently) and
calculating free space. I've only partially implemented this approach. At
this stage, I don't have time to finish it due to other commitments.

Thanks,

Gavin

Attachment

Re: Updated bitmap index patch

From
Mark Kirkwood
Date:
Gavin Sherry wrote:
> Hi all,
>
> Attached is an updated bitmap index patch. It contains bug fixes, API
> changes, binary changes (page identifier to distinguish it from other
> indexes) and has been brought up to HEAD.
>

I have applied this to todays HEAD performed some quick tests - looks
good! I have to re-create a TPC-H dataset to test one of the previous
bugs, so I'll probably look at that tomorrow or so.

> I worked on a few approaches to VACUUM, none very satisfactory. The
> problem is, breaking a compressed word representing matches can have
> serious consequences -- at the least, creation of new words, at the worst,
> creation of a new page. If a lot of this were to happen, REINDEX would be
> much more efficient (this is what earlier patches did).
>
> One approach I looked at was modifying the existing read API to be able to
> do something like "kill prior tuple". This, I think, made the API quite
> complex and it was hard to implement, since the existing mechanism
> decompresses words on the fly and it would be hard to identify which TID
> is no longer a match. So, I dropped this idea pretty quickly.
>
> The second approach is to just manually traverse each vector and change
> matches to non-matches where necessary. The complexity then is in managing
> the consequences of breaking compressed words, doing WAL (efficiently) and
> calculating free space. I've only partially implemented this approach. At
> this stage, I don't have time to finish it due to other commitments.
>

The second approach seems like better the way to go (as far as I
understand the issues...). How much work is remaining on this? - not
sure that I'll have time to look at it either ... but may as well know
the size to the job :-) !

Cheers

Mark

Re: Updated bitmap index patch

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Gavin Sherry wrote:
> Hi all,
>
> Attached is an updated bitmap index patch. It contains bug fixes, API
> changes, binary changes (page identifier to distinguish it from other
> indexes) and has been brought up to HEAD.
>
> I worked on a few approaches to VACUUM, none very satisfactory. The
> problem is, breaking a compressed word representing matches can have
> serious consequences -- at the least, creation of new words, at the worst,
> creation of a new page. If a lot of this were to happen, REINDEX would be
> much more efficient (this is what earlier patches did).
>
> One approach I looked at was modifying the existing read API to be able to
> do something like "kill prior tuple". This, I think, made the API quite
> complex and it was hard to implement, since the existing mechanism
> decompresses words on the fly and it would be hard to identify which TID
> is no longer a match. So, I dropped this idea pretty quickly.
>
> The second approach is to just manually traverse each vector and change
> matches to non-matches where necessary. The complexity then is in managing
> the consequences of breaking compressed words, doing WAL (efficiently) and
> calculating free space. I've only partially implemented this approach. At
> this stage, I don't have time to finish it due to other commitments.
>
> Thanks,
>
> Gavin
Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Updated bitmap index patch

From
Mark Kirkwood
Date:
Mark Kirkwood wrote:

>
> I have applied this to todays HEAD performed some quick tests - looks
> good! I have to re-create a TPC-H dataset to test one of the previous
> bugs, so I'll probably look at that tomorrow or so.
>

The TPC-H query query that previously produced a SIGSEGV now runs and
gives the correct answer.

Cheers

Mark

Re: Updated bitmap index patch

From
Finlay Thompson
Date:
Hi Gavin

Thanks for the new patch!

I ran some address matching on the patched code and have generated
another "ERROR:  out of memory" problem.

The strange thing is that it runs over 150 queries without problem and
then crashes.

I have attached the logfile (well some of it).

If you want I can send you the schema as well....


Finlay


On Thu, 2007-05-03 at 13:51 +1000, Gavin Sherry wrote:
> Hi all,
>
> Attached is an updated bitmap index patch. It contains bug fixes, API
> changes, binary changes (page identifier to distinguish it from other
> indexes) and has been brought up to HEAD.
>
> I worked on a few approaches to VACUUM, none very satisfactory. The
> problem is, breaking a compressed word representing matches can have
> serious consequences -- at the least, creation of new words, at the worst,
> creation of a new page. If a lot of this were to happen, REINDEX would be
> much more efficient (this is what earlier patches did).
>
> One approach I looked at was modifying the existing read API to be able to
> do something like "kill prior tuple". This, I think, made the API quite
> complex and it was hard to implement, since the existing mechanism
> decompresses words on the fly and it would be hard to identify which TID
> is no longer a match. So, I dropped this idea pretty quickly.
>
> The second approach is to just manually traverse each vector and change
> matches to non-matches where necessary. The complexity then is in managing
> the consequences of breaking compressed words, doing WAL (efficiently) and
> calculating free space. I've only partially implemented this approach. At
> this stage, I don't have time to finish it due to other commitments.
>
> Thanks,
>
> Gavin
> ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster

Attachment

Re: Updated bitmap index patch

From
"Jie Zhang"
Date:
Finlay,

Thanks for testing. If you can send me the schema, that would be great.

Thanks,
Jie


On 5/4/07 5:41 AM, "Finlay Thompson" <finlay@catalyst.net.nz> wrote:

> Hi Gavin
>
> Thanks for the new patch!
>
> I ran some address matching on the patched code and have generated
> another "ERROR:  out of memory" problem.
>
> The strange thing is that it runs over 150 queries without problem and
> then crashes.
>
> I have attached the logfile (well some of it).
>
> If you want I can send you the schema as well....
>
>
> Finlay
>
>
> On Thu, 2007-05-03 at 13:51 +1000, Gavin Sherry wrote:
>> Hi all,
>>
>> Attached is an updated bitmap index patch. It contains bug fixes, API
>> changes, binary changes (page identifier to distinguish it from other
>> indexes) and has been brought up to HEAD.
>>
>> I worked on a few approaches to VACUUM, none very satisfactory. The
>> problem is, breaking a compressed word representing matches can have
>> serious consequences -- at the least, creation of new words, at the worst,
>> creation of a new page. If a lot of this were to happen, REINDEX would be
>> much more efficient (this is what earlier patches did).
>>
>> One approach I looked at was modifying the existing read API to be able to
>> do something like "kill prior tuple". This, I think, made the API quite
>> complex and it was hard to implement, since the existing mechanism
>> decompresses words on the fly and it would be hard to identify which TID
>> is no longer a match. So, I dropped this idea pretty quickly.
>>
>> The second approach is to just manually traverse each vector and change
>> matches to non-matches where necessary. The complexity then is in managing
>> the consequences of breaking compressed words, doing WAL (efficiently) and
>> calculating free space. I've only partially implemented this approach. At
>> this stage, I don't have time to finish it due to other commitments.
>>
>> Thanks,
>>
>> Gavin
>> ---------------------------(end of broadcast)--------------------------- TIP
>> 2: Don't 'kill -9' the postmaster
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match



Re: Updated bitmap index patch

From
Alvaro Herrera
Date:
Gavin Sherry wrote:
> Hi all,
>
> Attached is an updated bitmap index patch. It contains bug fixes, API
> changes, binary changes (page identifier to distinguish it from other
> indexes) and has been brought up to HEAD.

Very minor nitpick: I accidentally noticed that the Makefile in this
patch uses the "depend" target.  We don't use that anymore.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Updated bitmap index patch

From
Bruce Momjian
Date:
Due to unfinished VACUUM:

This has been saved for the 8.4 release:

    http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Gavin Sherry wrote:
> Hi all,
>
> Attached is an updated bitmap index patch. It contains bug fixes, API
> changes, binary changes (page identifier to distinguish it from other
> indexes) and has been brought up to HEAD.
>
> I worked on a few approaches to VACUUM, none very satisfactory. The
> problem is, breaking a compressed word representing matches can have
> serious consequences -- at the least, creation of new words, at the worst,
> creation of a new page. If a lot of this were to happen, REINDEX would be
> much more efficient (this is what earlier patches did).
>
> One approach I looked at was modifying the existing read API to be able to
> do something like "kill prior tuple". This, I think, made the API quite
> complex and it was hard to implement, since the existing mechanism
> decompresses words on the fly and it would be hard to identify which TID
> is no longer a match. So, I dropped this idea pretty quickly.
>
> The second approach is to just manually traverse each vector and change
> matches to non-matches where necessary. The complexity then is in managing
> the consequences of breaking compressed words, doing WAL (efficiently) and
> calculating free space. I've only partially implemented this approach. At
> this stage, I don't have time to finish it due to other commitments.
>
> Thanks,
>
> Gavin
Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Updated bitmap index patch

From
Alvaro Herrera
Date:
Bruce Momjian wrote:
>
> Due to unfinished VACUUM:
>
> This has been saved for the 8.4 release:
>
>     http://momjian.postgresql.org/cgi-bin/pgpatches_hold

While we're at this, let's consider Heikki's patch for the streaming
indexscan API stuff.  That patch was supposed to come from the bitmap
index patch, but it was also supposed to help the Grouped Index Tuples
(GIT) patch.

In fact, as far as I understood the discussion, GIT would be helped by
the streaming indexscan API patch, because it would help de-uglify
certain parts of that patch.  It is my impression that we would not want
to commit the ugly GIT patch; so it would mean that either we commit the
streaming indexscan patch first, and a beautiful GIT patch shortly later,
or we don't commit any of them.

So, the question is: do we want the GIT patch in 8.3?  If we do, then we
need the streaming indexscan API patch.  And thus we would need an
updated (beautiful) GIT patch.

Is there a resolution on this?  Alexey Kluykin and I would be interested
in helping review this set of patches, if it is decided that they are
wanted for 8.3.

I want to point out that the GIT patch is one of the things sitting in
the reviewing queue from very early.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Updated bitmap index patch

From
Bruce Momjian
Date:
I though the stream bitmaps was a cleanup of on-disk bitmaps, but I
think you are right that it was for GII too:

    http://archives.postgresql.org/pgsql-patches/2007-03/msg00163.php

Heikki, can you clarify this and send us an updated version?

---------------------------------------------------------------------------

Alvaro Herrera wrote:
> Bruce Momjian wrote:
> >
> > Due to unfinished VACUUM:
> >
> > This has been saved for the 8.4 release:
> >
> >     http://momjian.postgresql.org/cgi-bin/pgpatches_hold
>
> While we're at this, let's consider Heikki's patch for the streaming
> indexscan API stuff.  That patch was supposed to come from the bitmap
> index patch, but it was also supposed to help the Grouped Index Tuples
> (GIT) patch.
>
> In fact, as far as I understood the discussion, GIT would be helped by
> the streaming indexscan API patch, because it would help de-uglify
> certain parts of that patch.  It is my impression that we would not want
> to commit the ugly GIT patch; so it would mean that either we commit the
> streaming indexscan patch first, and a beautiful GIT patch shortly later,
> or we don't commit any of them.
>
> So, the question is: do we want the GIT patch in 8.3?  If we do, then we
> need the streaming indexscan API patch.  And thus we would need an
> updated (beautiful) GIT patch.
>
> Is there a resolution on this?  Alexey Kluykin and I would be interested
> in helping review this set of patches, if it is decided that they are
> wanted for 8.3.
>
> I want to point out that the GIT patch is one of the things sitting in
> the reviewing queue from very early.
>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Updated bitmap index patch

From
Heikki Linnakangas
Date:
Alvaro Herrera wrote:
> While we're at this, let's consider Heikki's patch for the streaming
> indexscan API stuff.  That patch was supposed to come from the bitmap
> index patch, but it was also supposed to help the Grouped Index Tuples
> (GIT) patch.
>
> In fact, as far as I understood the discussion, GIT would be helped by
> the streaming indexscan API patch, because it would help de-uglify
> certain parts of that patch.  It is my impression that we would not want
> to commit the ugly GIT patch; so it would mean that either we commit the
> streaming indexscan patch first, and a beautiful GIT patch shortly later,
> or we don't commit any of them.

Right, for GIT the relevant part of the patch is the support for
returning candidate matches.

However, that's just one source of ugliness in the GIT patch. The
amgettuple interface needs to be changed as well, here's the proposal
for that:

http://archives.postgresql.org/pgsql-hackers/2007-03/msg01079.php

The internals aren't that beautiful either, but I wanted to start from
the API.

> So, the question is: do we want the GIT patch in 8.3?  If we do, then we
> need the streaming indexscan API patch.  And thus we would need an
> updated (beautiful) GIT patch.
>
> Is there a resolution on this?  Alexey Kluykin and I would be interested
> in helping review this set of patches, if it is decided that they are
> wanted for 8.3.

Thanks!

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Updated bitmap index patch

From
Andrew Dunstan
Date:

Gavin Sherry wrote:
> Hi all,
>
> Attached is an updated bitmap index patch. It contains bug fixes, API
> changes, binary changes (page identifier to distinguish it from other
> indexes) and has been brought up to HEAD.
>
> I worked on a few approaches to VACUUM, none very satisfactory. The
> problem is, breaking a compressed word representing matches can have
> serious consequences -- at the least, creation of new words, at the worst,
> creation of a new page. If a lot of this were to happen, REINDEX would be
> much more efficient (this is what earlier patches did).
>
> One approach I looked at was modifying the existing read API to be able to
> do something like "kill prior tuple". This, I think, made the API quite
> complex and it was hard to implement, since the existing mechanism
> decompresses words on the fly and it would be hard to identify which TID
> is no longer a match. So, I dropped this idea pretty quickly.
>
> The second approach is to just manually traverse each vector and change
> matches to non-matches where necessary. The complexity then is in managing
> the consequences of breaking compressed words, doing WAL (efficiently) and
> calculating free space. I've only partially implemented this approach. At
> this stage, I don't have time to finish it due to other commitments.
>
>
>

What exactly is the state of this patch? Reading this email it looks to
me like something that should wait for 8.4. Or is there some part of it
that is ready for 8.3? If so, which part?

cheers

andrew

Re: Updated bitmap index patch

From
"Simon Riggs"
Date:
On Thu, 2007-05-17 at 18:20 -0400, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> >
> > Due to unfinished VACUUM:
> >
> > This has been saved for the 8.4 release:
> >
> >     http://momjian.postgresql.org/cgi-bin/pgpatches_hold
>
> While we're at this, let's consider Heikki's patch for the streaming
> indexscan API stuff.  That patch was supposed to come from the bitmap
> index patch, but it was also supposed to help the Grouped Index Tuples
> (GIT) patch.
>
> In fact, as far as I understood the discussion, GIT would be helped by
> the streaming indexscan API patch, because it would help de-uglify
> certain parts of that patch.  It is my impression that we would not want
> to commit the ugly GIT patch; so it would mean that either we commit the
> streaming indexscan patch first, and a beautiful GIT patch shortly later,
> or we don't commit any of them.
>
> So, the question is: do we want the GIT patch in 8.3?  If we do, then we
> need the streaming indexscan API patch.  And thus we would need an
> updated (beautiful) GIT patch.
>
> Is there a resolution on this?  Alexey Kluykin and I would be interested
> in helping review this set of patches, if it is decided that they are
> wanted for 8.3.
>
> I want to point out that the GIT patch is one of the things sitting in
> the reviewing queue from very early.

Alvaro,

As you note above there is some linkage between bit map indexes and
clustered indexes, so it seems like we'll either get both or neither.

I notice the GIT patch is being listed as under review by Alexey and
yourself. Are you actively working on this, or is anyone else planning
on working on this review?

I'd like to help where I can if nobody else is currently doing this. I
would focus initially on some analysis of the various use cases to give
a better view on what we would need B-tree, clustered indexes and bitmap
indexes to do for us.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: Updated bitmap index patch

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> Alvaro,
>
> As you note above there is some linkage between bit map indexes and
> clustered indexes, so it seems like we'll either get both or neither.
>
> I notice the GIT patch is being listed as under review by Alexey and
> yourself. Are you actively working on this, or is anyone else planning
> on working on this review?

Sorry, no, I'm currently stuck elsewhere.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support