Thread: Re: [PATCHES] Bitmapscan changes

Re: [PATCHES] Bitmapscan changes

From

Heikki Linnakangas

Date:

14 March 2007, 07:22:09

Tom Lane wrote:
> At this point I'm feeling unconvinced that we want it at all.  It's
> sounding like a large increase in complexity (both implementation-wise
> and in terms of API ugliness) for a fairly narrow use-case --- just how
> much territory is going to be left for this between HOT and bitmap indexes?

I'm in a awkward situation right now. I've done my best to describe the 
use cases for clustered indexes. I know the patch needs refactoring, 
I've refrained from making API changes and tried to keep all the 
ugliness inside the b-tree, knowing that there's changes to the indexam 
API coming from the bitmap index patch as well.

I've been seeking for comments on the design since November, knowing 
that this is a non-trivial change. I have not wanted to spend too much 
time polishing the patch, in case I need to rewrite it from scratch 
because of some major design flaw or because someone comes up with a 
much better idea.

It's frustrating to have the patch dismissed at this late stage on the 
grounds of "it's not worth it". As I said in February, I have the time 
to work on this, but if major changes are required to the current 
design, I need to know.

Just to recap the general idea: reduce index size taking advantage of 
clustering in the heap.

Clustered indexes have roughly the same performance effect and use cases 
as clustered indexes on MS SQL Server, and Index-Organized-Tables on 
Oracle, but the way I've implemented them is significantly different. On 
other DBMSs, the index and heap are combined to a single b-tree 
structure. The way I've implemented them is less invasive, there's no 
changes to the heap for example, and it doesn't require moving live tuples.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] Bitmapscan changes

From

Hannu Krosing

Date:

14 March 2007, 12:02:08

Ühel kenal päeval, K, 2007-03-14 kell 10:22, kirjutas Heikki
Linnakangas:
> Tom Lane wrote:
> > At this point I'm feeling unconvinced that we want it at all.  It's
> > sounding like a large increase in complexity (both implementation-wise
> > and in terms of API ugliness) for a fairly narrow use-case --- just how
> > much territory is going to be left for this between HOT and bitmap indexes?
> 
> I'm in a awkward situation right now. I've done my best to describe the 
> use cases for clustered indexes. 

...

> Just to recap the general idea: reduce index size taking advantage of 
> clustering in the heap.
> 
> Clustered indexes have roughly the same performance effect and use cases 
> as clustered indexes on MS SQL Server, and Index-Organized-Tables on 
> Oracle, but the way I've implemented them is significantly different. On 
> other DBMSs, the index and heap are combined to a single b-tree 
> structure. The way I've implemented them is less invasive, there's no 
> changes to the heap for example, and it doesn't require moving live tuples.

Do you keep visibility info in the index ?

How does this info get updated when visibility data changes in the
heap ?

If there is no visibility data in index, then I can't see, how it gets
the same performance effect as Index-Organized-Tables, as lot of random
heap access is still needed.

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

Re: [PATCHES] Bitmapscan changes

From

"Joshua D. Drake"

Date:

14 March 2007, 12:21:17

Hannu Krosing wrote:
> Ühel kenal päeval, K, 2007-03-14 kell 10:22, kirjutas Heikki
> Linnakangas:
>> Tom Lane wrote:
>>> At this point I'm feeling unconvinced that we want it at all.  It's
>>> sounding like a large increase in complexity (both implementation-wise
>>> and in terms of API ugliness) for a fairly narrow use-case --- just how
>>> much territory is going to be left for this between HOT and bitmap indexes?
>> I'm in a awkward situation right now. I've done my best to describe the 
>> use cases for clustered indexes. 
> 
> ...
> 
>> Just to recap the general idea: reduce index size taking advantage of 
>> clustering in the heap.

This is what I suggest.

Provide a tarball of -head with the patch applied.

Provide a couple of use cases that can be run with explanation of how to
verify the use cases.

Allow the community to drive the inclusion by making it as easy as
possible to allow a proactive argument to take place by the people
actually using the product.

Proving that a user could and would use the feature is something that is
a very powerful argument.

Sincerely,

Joshua D. Drake





-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: [PATCHES] Bitmapscan changes

From

Alvaro Herrera

Date:

14 March 2007, 13:26:59

Joshua D. Drake wrote:

> Allow the community to drive the inclusion by making it as easy as
> possible to allow a proactive argument to take place by the people
> actually using the product.

This seems to be a rather poor decision making process: "Are the users
happy with the new feature?  If so, then apply the patch."  It leads to
unmanageable code.

Which is why we don't do things that way.  The code must fit within the
general architecture before application -- particularly if it's an
internal API change.  That's what the review process is for.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [PATCHES] Bitmapscan changes

From

"Joshua D. Drake"

Date:

14 March 2007, 13:42:51

Alvaro Herrera wrote:
> Joshua D. Drake wrote:
> 
>> Allow the community to drive the inclusion by making it as easy as
>> possible to allow a proactive argument to take place by the people
>> actually using the product.
> 
> This seems to be a rather poor decision making process: "Are the users
> happy with the new feature?  If so, then apply the patch."  It leads to
> unmanageable code.

Perhaps reading my message again is in order. I think it is pretty
obvious that the a user shouldn't determine if a patch should be applied.

My whole point was that if people are clamoring for the feature, it
could drive that feature to be more aggressively reviewed.

I can't even count how many times I see:

This seems like a corner case feature, I don't think we should add it.

So I am suggesting a way to insure that the feature is not considered
corner case. (if it is indeed not a corner case)

Sincerely,

Joshua D. Drake

-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: [PATCHES] Bitmapscan changes

From

Heikki Linnakangas

Date:

15 March 2007, 06:48:01

Hannu Krosing wrote:
> Ühel kenal päeval, K, 2007-03-14 kell 10:22, kirjutas Heikki
> Linnakangas:
>> Clustered indexes have roughly the same performance effect and use cases 
>> as clustered indexes on MS SQL Server, and Index-Organized-Tables on 
>> Oracle, but the way I've implemented them is significantly different. On 
>> other DBMSs, the index and heap are combined to a single b-tree 
>> structure. The way I've implemented them is less invasive, there's no 
>> changes to the heap for example, and it doesn't require moving live tuples.
> 
> Do you keep visibility info in the index ?

No.

> If there is no visibility data in index, then I can't see, how it gets
> the same performance effect as Index-Organized-Tables, as lot of random
> heap access is still needed.

Let me illustrate the effect in the best case, with a table that 
consists of just the key:

Normal b-tree:

Root -> leaf -> heap

aaa -> aaa -> aaa       bbb -> bbb       ccc -> ccc
ddd -> ddd -> ddd       eee -> eee       fff -> fff
ggg -> ggg -> ggg       hhh -> hhh       iii -> iii

Clustered b-tree:

Root -> heap

aaa -> aaa       bbb       ccc
ddd -> ddd       eee       fff
ggg -> ggg       hhh       iii

The index is much smaller, one level shallower in the best case. A 
smaller index means that more of it fits in cache. If you're doing 
random access through the index, that means that you need to do less I/O 
because you don't need to fetch so many index pages. You need to access 
the heap anyway for the visibility information, as you pointed out, but 
the savings are coming from having to do less index I/O.

How close to the best case do you get in practice? It depends on your 
schema, narrow tables or tables with wide keys gain the most, and on the 
clusteredness of the table.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] Bitmapscan changes

From

Heikki Linnakangas

Date:

15 March 2007, 06:53:40

Alvaro Herrera wrote:
> Which is why we don't do things that way.  The code must fit within the
> general architecture before application -- particularly if it's an
> internal API change.  That's what the review process is for.

Yes, of course. As I've said, I have the time to work on this, but I 
need get the review process *started*. Otherwise I'll just tweak and 
polish the patch for weeks, and end up with something that gets rejected 
in the end anyway.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] Bitmapscan changes

From

Heikki Linnakangas

Date:

15 March 2007, 07:38:32

Joshua D. Drake wrote:
> This is what I suggest.
> 
> Provide a tarball of -head with the patch applied.

Here you are:

http://community.enterprisedb.com/git/pgsql-git-20070315.tar.gz

> Provide a couple of use cases that can be run with explanation of how to
> verify the use cases.

There's a number of simple test cases on the web page that I've used 
(perfunittests). I can try to simplify them and add explanations.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] Bitmapscan changes

From

"Joshua D. Drake"

Date:

15 March 2007, 12:30:09

Heikki Linnakangas wrote:
> Joshua D. Drake wrote:
>> This is what I suggest.
>>
>> Provide a tarball of -head with the patch applied.
> 
> Here you are:
> 
> http://community.enterprisedb.com/git/pgsql-git-20070315.tar.gz
> 
>> Provide a couple of use cases that can be run with explanation of how to
>> verify the use cases.
> 
> There's a number of simple test cases on the web page that I've used
> (perfunittests). I can try to simplify them and add explanations.

I am downloading now.

Joshua D. Drake




-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: [PATCHES] Bitmapscan changes

From

"Joshua D. Drake"

Date:

16 March 2007, 16:19:44

Heikki Linnakangas wrote:
> Joshua D. Drake wrote:
>> This is what I suggest.
>>
>> Provide a tarball of -head with the patch applied.
> 
> Here you are:
> 
> http://community.enterprisedb.com/git/pgsql-git-20070315.tar.gz
> 
>> Provide a couple of use cases that can be run with explanation of how to
>> verify the use cases.
> 
> There's a number of simple test cases on the web page that I've used
> (perfunittests). I can try to simplify them and add explanations.

O.k. maybe I am the only one, but I actually dug the archives for what
website you were talking about and then said, "Aha!, he means:
http://community.enterprisedb.com/git/".

So I will accept my own paperbag, and hopefully save some from the same
fate by posted the above link.

Joshua D. Drake




-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: [PATCHES] Bitmapscan changes

From

"Joshua D. Drake"

Date:

16 March 2007, 16:22:16

Heikki Linnakangas wrote:
> Joshua D. Drake wrote:
>> This is what I suggest.
>>
>> Provide a tarball of -head with the patch applied.
> 
> Here you are:
> 
> http://community.enterprisedb.com/git/pgsql-git-20070315.tar.gz
> 
>> Provide a couple of use cases that can be run with explanation of how to
>> verify the use cases.
> 
> There's a number of simple test cases on the web page that I've used
> (perfunittests). I can try to simplify them and add explanations.
> 
This URL is not working:


http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz

File not found.

Sincerely,

Joshua D. Drake


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: [PATCHES] Bitmapscan changes

From

Heikki Linnakangas

Date:

16 March 2007, 16:28:11

Joshua D. Drake wrote:
> This URL is not working:
> 
> 
> http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz

Sorry about that, typo in the filename. Fixed.


--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] Bitmapscan changes

From

"Joshua D. Drake"

Date:

16 March 2007, 17:46:02

Heikki Linnakangas wrote:
> Joshua D. Drake wrote:
>> This URL is not working:
>>
>>
>> http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz
> 
> Sorry about that, typo in the filename. Fixed.
> 
> 
Here are my results on a modest 3800X2 2 Gig of ram, RAID 1 dual SATA


http://pgsql.privatepaste.com/170yD8c0gr

Sincerely,

Joshua D. Drake
-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: [PATCHES] Bitmapscan changes

From

Heikki Linnakangas

Date:

16 March 2007, 18:12:36

Joshua D. Drake wrote:
> Heikki Linnakangas wrote:
>> Joshua D. Drake wrote:
>>> This URL is not working:
>>>
>>>
>>> http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz
>> Sorry about that, typo in the filename. Fixed.
>>
>>
> Here are my results on a modest 3800X2 2 Gig of ram, RAID 1 dual SATA

Thanks for looking into this, though that test alone doesn't really tell
us anything. You'd have to run the same tests with and without clustered
indexes enabled, and compare. With the default settings the test data
fits in memory anyway, so you're not seeing the I/O benefit but only the
CPU overhead.

Attached is a larger test case with a data set of > 2 GB. Run the
git_demo_init.sql first to create tables and indexes, and
git_demo_run.sql to perform selects on them. The test runs for quite a
long time, depending on your hardware, and print the time spent on the
selects, with and without clustered index.

You'll obviously need to run it with the patch applied. I'd suggest to
enable stats_block_level to see the effect on buffer cache hit/miss ratio.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

git_demo.tar.gz

Re: [PATCHES] Bitmapscan changes

From

"Joshua D. Drake"

Date:

17 March 2007, 01:37:12

Heikki Linnakangas wrote:
> Joshua D. Drake wrote:
>> Heikki Linnakangas wrote:
>>> Joshua D. Drake wrote:
>>>> This URL is not working:
>>>>
>>>>
>>>> http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz
>>> Sorry about that, typo in the filename. Fixed.
>>>
>>>
>> Here are my results on a modest 3800X2 2 Gig of ram, RAID 1 dual SATA
> 
heap_pages | normal_index_pages | clustered_index_pages
------------+--------------------+-----------------------    216217 |             109679 |                  1316

select_with_normal_index
--------------------------                  100000
(1 row)

Time: 1356524.743 msselect_with_normal_index
--------------------------                  100000
(1 row)

Time: 1144832.597 msselect_with_normal_index
--------------------------                  100000
(1 row)

Time: 1111445.236 ms


And now run the same tests with clustered index
Timing is on.select_with_clustered_index
-----------------------------                     100000
(1 row)

Time: 815622.768 msselect_with_clustered_index
-----------------------------                     100000
(1 row)

Time: 535749.457 msselect_with_clustered_index
-----------------------------                     100000
(1 row)


select relname,indexrelname,idx_blks_read,idx_blks_hit from
pg_statio_all_indexes where schemaname = 'public';  relname    |         indexrelname         | idx_blks_read |
idx_blks_hit
--------------+------------------------------+---------------+--------------narrowtable  | narrowtable_index
|       296973 |       904654narrowtable2 | narrowtable2_clustered_index |         44556 |       857269
 
(2 rows)
select relname,heap_blks_read,heap_blks_hit,idx_blks_read,idx_blks_hit
from pg_statio_user_tables  ;  relname    | heap_blks_read | heap_blks_hit | idx_blks_read |
idx_blks_hit
--------------+----------------+---------------+---------------+--------------narrowtable2 |         734312 |
40304136|         44556 |
 
857269narrowtable  |         952044 |      40002609 |        296973 |
904654


Seems like a clear win to me. Anyone else want to try?

Sincerely,

Joshua D. Drake




-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: [PATCHES] Bitmapscan changes

From

Grzegorz Jaskiewicz

Date:

17 March 2007, 07:11:19

On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote:

>
> You'll obviously need to run it with the patch applied. I'd suggest  
> to enable stats_block_level to see the effect on buffer cache hit/ 
> miss ratio.

groupeditems-42-pghead.patch.gz is enough, or it needs  
maintain_cluster_order_v5.patch ??

-- 
Grzegorz Jaskiewicz

C/C++ freelance for hire

Re: [PATCHES] Bitmapscan changes

From

"Joshua D. Drake"

Date:

17 March 2007, 11:25:53

Grzegorz Jaskiewicz wrote:
> 
> On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote:
> 
> 
>>
>> You'll obviously need to run it with the patch applied. I'd suggest to
>> enable stats_block_level to see the effect on buffer cache hit/miss
>> ratio.
> 
> groupeditems-42-pghead.patch.gz is enough, or it needs
> maintain_cluster_order_v5.patch ??

He has a patched source ball here of the whole thing, which is what I used:


http://community.enterprisedb.com/git/pgsql-git-20070315.tar.gz

The you just need to run the tests.

> 
> 
> 
> 
> --Grzegorz Jaskiewicz
> 
> C/C++ freelance for hire
> 
> 
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>               http://archives.postgresql.org
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: [PATCHES] Bitmapscan changes

From

Grzegorz Jaskiewicz

Date:

17 March 2007, 18:15:24

This is on dual ultra 2 sparc. with ultrawide 320 scsi drives. 512MB  
ram.
I had to drop size of DB, because the DB drive is 4GB (I do welecome  
bigger drives as donation, if someone asks - UWscsi 320).

here are my results. With only 4.2 patch (no maintain cluster order  
v5 patch). If the v5 patch was needed, please tell me - I am going  
rerun it with.

hope it is usefull.

Repeat 3 times to ensure repeatable results.
Timing is on.
select_with_normal_index
--------------------------                   100000
(1 row)

Time: 1727891.334 ms
select_with_normal_index
--------------------------                   100000
(1 row)

Time: 1325561.252 ms
select_with_normal_index
--------------------------                   100000
(1 row)

Time: 1348530.100 ms
Timing is off.
And now run the same tests with clustered index
Timing is on.
select_with_clustered_index
-----------------------------                      100000
(1 row)

Time: 870246.856 ms
select_with_clustered_index
-----------------------------                      100000
(1 row)

Time: 477089.456 ms
select_with_clustered_index
-----------------------------                      100000
(1 row)

Time: 381880.965 ms
Timing is off.

Re: [PATCHES] Bitmapscan changes

From

Heikki Linnakangas

Date:

19 March 2007, 07:16:59

Grzegorz Jaskiewicz wrote:
> 
> On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote:
>> You'll obviously need to run it with the patch applied. I'd suggest to 
>> enable stats_block_level to see the effect on buffer cache hit/miss 
>> ratio.
> 
> groupeditems-42-pghead.patch.gz is enough, or it needs 
> maintain_cluster_order_v5.patch ??

No, it won't make a difference unless you're inserting to the table, and 
the inserts are not in cluster order.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Bitmapscan changes - Requesting further feedback

From

"Joshua D. Drake"

Date:

20 March 2007, 15:44:58

Hackers et al... I was wondering if there are any outstanding issues
that need to be resolved in terms of the clustered index/bitmap changes?

From the testing that I have done, plus a couple of others it is a net
win (at least from DBA space).

Sincerely,

Joshua D. Drake



-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: Bitmapscan changes - Requesting further feedback

From

Gavin Sherry

Date:

20 March 2007, 19:37:10

On Tue, 20 Mar 2007, Joshua D. Drake wrote:

> Hackers et al... I was wondering if there are any outstanding issues
> that need to be resolved in terms of the clustered index/bitmap changes?
>
> >From the testing that I have done, plus a couple of others it is a net
> win (at least from DBA space).

Not sure if you're talking about bitmap indexes here. If so, I'm working
on VACUUM support.

Gavin

Re: Bitmapscan changes - Requesting further feedback

From

"Joshua D. Drake"

Date:

20 March 2007, 20:02:45

Gavin Sherry wrote:
> On Tue, 20 Mar 2007, Joshua D. Drake wrote:
> 
>> Hackers et al... I was wondering if there are any outstanding issues
>> that need to be resolved in terms of the clustered index/bitmap changes?
>>
>> >From the testing that I have done, plus a couple of others it is a net
>> win (at least from DBA space).
> 
> Not sure if you're talking about bitmap indexes here. If so, I'm working
> on VACUUM support.

I was talking about the patch for Clustered indexes and I realize now I
might have used the wrong thread. ;

Joshua D. Drake

> 
> Gavin
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: Bitmapscan changes - Requesting further feedback

From

Heikki Linnakangas

Date:

21 March 2007, 07:18:13

Joshua D. Drake wrote:
> Hackers et al... I was wondering if there are any outstanding issues
> that need to be resolved in terms of the clustered index/bitmap changes?

I have a todo list of smaller items for clustered indexes, but the main 
design issues at the moment are:

1. How to handle sorting tuples in a scan, or should we choose a design 
that doesn't require it?

Should we add support for sorting tuples in scans on the fly, which 
gives more space savings when there's updates, and would also be useful 
in the future to support binned bitmap indexes?

Or should we only form groups from tuples that are completely in order 
on page-level? That makes a clustered index to lose its space savings 
quicker, when tuples are updated. HOT reduces that affect, though. This 
approach would also reduce the CPU overhead of scans, because we could 
do binary searches within groups.

At the moment, I'm leaning towards the latter approach. What do others 
think?

2. Clustered indexes need the support for candidate-matches. That needs 
to be added to the amgetmulti and amgettuple interfaces. I've sent a 
patch for amgetmulti, and a proposal for the amgettuple.

3. Clustered index needs to reach out to the heap for some operations, 
like uniqueness checks do today, blurring the modularity between heap 
and index. Are we willing to live with that? Is there something we can 
do to make it less ugly?

I'd like to get some kind of confirmation first that 1 and 3 are not 
showstoppers, to avoid wasting time on a patch that'll just get rejected 
in the end, and then submit a patch for 2, and have that committed 
before the main patch.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] Bitmapscan changes

From

Grzegorz Jaskiewicz

Date:

21 March 2007, 12:18:06

On Mar 19, 2007, at 11:16 AM, Heikki Linnakangas wrote:

> Grzegorz Jaskiewicz wrote:
>> On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote:
>>> You'll obviously need to run it with the patch applied. I'd  
>>> suggest to enable stats_block_level to see the effect on buffer  
>>> cache hit/miss ratio.
>> groupeditems-42-pghead.patch.gz is enough, or it needs  
>> maintain_cluster_order_v5.patch ??
>
> No, it won't make a difference unless you're inserting to the  
> table, and the inserts are not in cluster order.
well, that's okay than. I see really good improvement in terms of  
speed and db size (which reflects obviously in i/o performance).
Let me know if further testing can be done. I would happily see it in  
mainline.



-- 
Grzegorz Jaskiewicz

C/C++ freelance for hire

Re: [PATCHES] Bitmapscan changes

From

"Joshua D. Drake"

Date:

21 March 2007, 12:22:46

Grzegorz Jaskiewicz wrote:
> 
> On Mar 19, 2007, at 11:16 AM, Heikki Linnakangas wrote:
> 
>> Grzegorz Jaskiewicz wrote:
>>> On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote:
>>>> You'll obviously need to run it with the patch applied. I'd suggest
>>>> to enable stats_block_level to see the effect on buffer cache
>>>> hit/miss ratio.
>>> groupeditems-42-pghead.patch.gz is enough, or it needs
>>> maintain_cluster_order_v5.patch ??
>>
>> No, it won't make a difference unless you're inserting to the table,
>> and the inserts are not in cluster order.
> well, that's okay than. I see really good improvement in terms of speed
> and db size (which reflects obviously in i/o performance).
> Let me know if further testing can be done. I would happily see it in
> mainline.
> 

Right. My understanding is that the clustered index will gradually
degrade to a normal btree, is that correct heikki?

We could of course resolve this by doing a reindex.

The other item I think this would be great for is fairly static tables.
Think about tables that are children of partitions that haven't been
touched in 6 months. Why are we wasting space with them?

Anyway, from a "feature" perspective I can't see any negative. I can not
speak from a code injection (into core) perspective.

Joshua D. Drake



> 
> 
> --Grzegorz Jaskiewicz
> 
> C/C++ freelance for hire
> 
> 
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: [PATCHES] Bitmapscan changes

From

Heikki Linnakangas

Date:

21 March 2007, 13:23:30

Grzegorz Jaskiewicz wrote:
> 
> On Mar 19, 2007, at 11:16 AM, Heikki Linnakangas wrote:
> 
>> Grzegorz Jaskiewicz wrote:
>>> On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote:
>>>> You'll obviously need to run it with the patch applied. I'd suggest 
>>>> to enable stats_block_level to see the effect on buffer cache 
>>>> hit/miss ratio.
>>> groupeditems-42-pghead.patch.gz is enough, or it needs 
>>> maintain_cluster_order_v5.patch ??
>>
>> No, it won't make a difference unless you're inserting to the table, 
>> and the inserts are not in cluster order.
> well, that's okay than. I see really good improvement in terms of speed 
> and db size (which reflects obviously in i/o performance).
> Let me know if further testing can be done. I would happily see it in 
> mainline.

If you have a real-world database you could try it with, that would be 
nice. The test I sent you is pretty much a best-case scenario, it'd be 
interesting to get anecdotal evidence of improvements in real applications.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] Bitmapscan changes

From

Heikki Linnakangas

Date:

21 March 2007, 14:27:55

Joshua D. Drake wrote:
> Right. My understanding is that the clustered index will gradually
> degrade to a normal btree, is that correct heikki?

That's right.

> We could of course resolve this by doing a reindex.

Not reindex, but cluster. How clustered the index can be depends on the 
clusteredness of the heap.

> The other item I think this would be great for is fairly static tables.
> Think about tables that are children of partitions that haven't been
> touched in 6 months. Why are we wasting space with them?

By touched, you mean updated, right? Yes, it's particularly suitable for 
static tables, since once you cluster them, they stay clustered. 
Log-tables that are only inserted to, in monotonically increasing key 
order, also stay clustered naturally.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] Bitmapscan changes

From

Grzegorz Jaskiewicz

Date:

21 March 2007, 15:44:05

On Mar 21, 2007, at 5:22 PM, Heikki Linnakangas wrote:

> Grzegorz Jaskiewicz wrote:
>> On Mar 19, 2007, at 11:16 AM, Heikki Linnakangas wrote:
>>> Grzegorz Jaskiewicz wrote:
>>>> On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote:
>>>>> You'll obviously need to run it with the patch applied. I'd  
>>>>> suggest to enable stats_block_level to see the effect on buffer  
>>>>> cache hit/miss ratio.
>>>> groupeditems-42-pghead.patch.gz is enough, or it needs  
>>>> maintain_cluster_order_v5.patch ??
>>>
>>> No, it won't make a difference unless you're inserting to the  
>>> table, and the inserts are not in cluster order.
>> well, that's okay than. I see really good improvement in terms of  
>> speed and db size (which reflects obviously in i/o performance).
>> Let me know if further testing can be done. I would happily see it  
>> in mainline.
>
> If you have a real-world database you could try it with, that would  
> be nice. The test I sent you is pretty much a best-case scenario,  
> it'd be interesting to get anecdotal evidence of improvements in  
> real applications.

Sure, I'll check it with my network statistics thingie. 30GB db atm,  
with milions of rows. (traffic analysies for wide network , ethernet  
level, from/to/protocol/size kinda of thing). Loads of updates on 2  
tables (that's where I also see HOT would benefit me).


-- 
Grzegorz Jaskiewicz

C/C++ freelance for hire

Re: [PATCHES] Bitmapscan changes

From

Grzegorz Jaskiewicz

Date:

21 March 2007, 15:56:50

any idea how this patch is going to play with hot ? or should I just  
give it a spin, and see if my world collapses :D

-- 
Grzegorz Jaskiewicz

C/C++ freelance for hire

Re: [PATCHES] Bitmapscan changes

From

Heikki Linnakangas

Date:

21 March 2007, 16:01:10

Grzegorz Jaskiewicz wrote:
> any idea how this patch is going to play with hot ? or should I just 
> give it a spin, and see if my world collapses :D

I've run tests with both patches applied. I haven't tried with the 
latest HOT-versions, but they should in theory work fine together. 
You'll get a conflict on the pg_stats-views, both patches add 
statistics, but IIRC you can just ignore that and it works. I think 
there's a conflict in regression tests as well.

Give it a shot and let me know if there's problems :).

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] Bitmapscan changes

From

"Pavan Deolasee"

Date:

22 March 2007, 03:25:27

On 3/22/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:

Grzegorz Jaskiewicz wrote:
> any idea how this patch is going to play with hot ? or should I just
> give it a spin, and see if my world collapses :D

I've run tests with both patches applied. I haven't tried with the
latest HOT-versions, but they should in theory work fine together.
You'll get a conflict on the pg_stats-views, both patches add
statistics, but IIRC you can just ignore that and it works. I think
there's a conflict in regression tests as well.

Give it a shot and let me know if there's problems :).

Heikki, the signature of heap_fetch is changed slightly (we pass
a boolean to guide HOT-chain following) with HOT. That might
cause a conflict, I haven't tested though.

Grzegorz, if you can try HOT as well, that will be great.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [PATCHES] Bitmapscan changes

From

Grzegorz Jaskiewicz

Date:

22 March 2007, 04:15:55

On Mar 22, 2007, at 7:25 AM, Pavan Deolasee wrote:
>
>
> Grzegorz, if you can try HOT as well, that will be great.
>

I tried, and it worked very well with 4.2 v of patch, as I remember.  
My point was, since 'the day' comes closer, and you guys work on  
close areas inside pg - I would like to be able to safely run both  
patches.
I will give both a go, once I get some free time here.

-- 
Grzegorz Jaskiewicz

starving C/C++ freelance for hire