Thread: Re: [PATCHES] Bitmapscan changes
Tom Lane wrote: > At this point I'm feeling unconvinced that we want it at all. It's > sounding like a large increase in complexity (both implementation-wise > and in terms of API ugliness) for a fairly narrow use-case --- just how > much territory is going to be left for this between HOT and bitmap indexes? I'm in a awkward situation right now. I've done my best to describe the use cases for clustered indexes. I know the patch needs refactoring, I've refrained from making API changes and tried to keep all the ugliness inside the b-tree, knowing that there's changes to the indexam API coming from the bitmap index patch as well. I've been seeking for comments on the design since November, knowing that this is a non-trivial change. I have not wanted to spend too much time polishing the patch, in case I need to rewrite it from scratch because of some major design flaw or because someone comes up with a much better idea. It's frustrating to have the patch dismissed at this late stage on the grounds of "it's not worth it". As I said in February, I have the time to work on this, but if major changes are required to the current design, I need to know. Just to recap the general idea: reduce index size taking advantage of clustering in the heap. Clustered indexes have roughly the same performance effect and use cases as clustered indexes on MS SQL Server, and Index-Organized-Tables on Oracle, but the way I've implemented them is significantly different. On other DBMSs, the index and heap are combined to a single b-tree structure. The way I've implemented them is less invasive, there's no changes to the heap for example, and it doesn't require moving live tuples. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Ühel kenal päeval, K, 2007-03-14 kell 10:22, kirjutas Heikki Linnakangas: > Tom Lane wrote: > > At this point I'm feeling unconvinced that we want it at all. It's > > sounding like a large increase in complexity (both implementation-wise > > and in terms of API ugliness) for a fairly narrow use-case --- just how > > much territory is going to be left for this between HOT and bitmap indexes? > > I'm in a awkward situation right now. I've done my best to describe the > use cases for clustered indexes. ... > Just to recap the general idea: reduce index size taking advantage of > clustering in the heap. > > Clustered indexes have roughly the same performance effect and use cases > as clustered indexes on MS SQL Server, and Index-Organized-Tables on > Oracle, but the way I've implemented them is significantly different. On > other DBMSs, the index and heap are combined to a single b-tree > structure. The way I've implemented them is less invasive, there's no > changes to the heap for example, and it doesn't require moving live tuples. Do you keep visibility info in the index ? How does this info get updated when visibility data changes in the heap ? If there is no visibility data in index, then I can't see, how it gets the same performance effect as Index-Organized-Tables, as lot of random heap access is still needed. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Hannu Krosing wrote: > Ühel kenal päeval, K, 2007-03-14 kell 10:22, kirjutas Heikki > Linnakangas: >> Tom Lane wrote: >>> At this point I'm feeling unconvinced that we want it at all. It's >>> sounding like a large increase in complexity (both implementation-wise >>> and in terms of API ugliness) for a fairly narrow use-case --- just how >>> much territory is going to be left for this between HOT and bitmap indexes? >> I'm in a awkward situation right now. I've done my best to describe the >> use cases for clustered indexes. > > ... > >> Just to recap the general idea: reduce index size taking advantage of >> clustering in the heap. This is what I suggest. Provide a tarball of -head with the patch applied. Provide a couple of use cases that can be run with explanation of how to verify the use cases. Allow the community to drive the inclusion by making it as easy as possible to allow a proactive argument to take place by the people actually using the product. Proving that a user could and would use the feature is something that is a very powerful argument. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Joshua D. Drake wrote: > Allow the community to drive the inclusion by making it as easy as > possible to allow a proactive argument to take place by the people > actually using the product. This seems to be a rather poor decision making process: "Are the users happy with the new feature? If so, then apply the patch." It leads to unmanageable code. Which is why we don't do things that way. The code must fit within the general architecture before application -- particularly if it's an internal API change. That's what the review process is for. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > Joshua D. Drake wrote: > >> Allow the community to drive the inclusion by making it as easy as >> possible to allow a proactive argument to take place by the people >> actually using the product. > > This seems to be a rather poor decision making process: "Are the users > happy with the new feature? If so, then apply the patch." It leads to > unmanageable code. Perhaps reading my message again is in order. I think it is pretty obvious that the a user shouldn't determine if a patch should be applied. My whole point was that if people are clamoring for the feature, it could drive that feature to be more aggressively reviewed. I can't even count how many times I see: This seems like a corner case feature, I don't think we should add it. So I am suggesting a way to insure that the feature is not considered corner case. (if it is indeed not a corner case) Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Hannu Krosing wrote: > Ühel kenal päeval, K, 2007-03-14 kell 10:22, kirjutas Heikki > Linnakangas: >> Clustered indexes have roughly the same performance effect and use cases >> as clustered indexes on MS SQL Server, and Index-Organized-Tables on >> Oracle, but the way I've implemented them is significantly different. On >> other DBMSs, the index and heap are combined to a single b-tree >> structure. The way I've implemented them is less invasive, there's no >> changes to the heap for example, and it doesn't require moving live tuples. > > Do you keep visibility info in the index ? No. > If there is no visibility data in index, then I can't see, how it gets > the same performance effect as Index-Organized-Tables, as lot of random > heap access is still needed. Let me illustrate the effect in the best case, with a table that consists of just the key: Normal b-tree: Root -> leaf -> heap aaa -> aaa -> aaa bbb -> bbb ccc -> ccc ddd -> ddd -> ddd eee -> eee fff -> fff ggg -> ggg -> ggg hhh -> hhh iii -> iii Clustered b-tree: Root -> heap aaa -> aaa bbb ccc ddd -> ddd eee fff ggg -> ggg hhh iii The index is much smaller, one level shallower in the best case. A smaller index means that more of it fits in cache. If you're doing random access through the index, that means that you need to do less I/O because you don't need to fetch so many index pages. You need to access the heap anyway for the visibility information, as you pointed out, but the savings are coming from having to do less index I/O. How close to the best case do you get in practice? It depends on your schema, narrow tables or tables with wide keys gain the most, and on the clusteredness of the table. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Alvaro Herrera wrote: > Which is why we don't do things that way. The code must fit within the > general architecture before application -- particularly if it's an > internal API change. That's what the review process is for. Yes, of course. As I've said, I have the time to work on this, but I need get the review process *started*. Otherwise I'll just tweak and polish the patch for weeks, and end up with something that gets rejected in the end anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Joshua D. Drake wrote: > This is what I suggest. > > Provide a tarball of -head with the patch applied. Here you are: http://community.enterprisedb.com/git/pgsql-git-20070315.tar.gz > Provide a couple of use cases that can be run with explanation of how to > verify the use cases. There's a number of simple test cases on the web page that I've used (perfunittests). I can try to simplify them and add explanations. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Joshua D. Drake wrote: >> This is what I suggest. >> >> Provide a tarball of -head with the patch applied. > > Here you are: > > http://community.enterprisedb.com/git/pgsql-git-20070315.tar.gz > >> Provide a couple of use cases that can be run with explanation of how to >> verify the use cases. > > There's a number of simple test cases on the web page that I've used > (perfunittests). I can try to simplify them and add explanations. I am downloading now. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Heikki Linnakangas wrote: > Joshua D. Drake wrote: >> This is what I suggest. >> >> Provide a tarball of -head with the patch applied. > > Here you are: > > http://community.enterprisedb.com/git/pgsql-git-20070315.tar.gz > >> Provide a couple of use cases that can be run with explanation of how to >> verify the use cases. > > There's a number of simple test cases on the web page that I've used > (perfunittests). I can try to simplify them and add explanations. O.k. maybe I am the only one, but I actually dug the archives for what website you were talking about and then said, "Aha!, he means: http://community.enterprisedb.com/git/". So I will accept my own paperbag, and hopefully save some from the same fate by posted the above link. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Heikki Linnakangas wrote: > Joshua D. Drake wrote: >> This is what I suggest. >> >> Provide a tarball of -head with the patch applied. > > Here you are: > > http://community.enterprisedb.com/git/pgsql-git-20070315.tar.gz > >> Provide a couple of use cases that can be run with explanation of how to >> verify the use cases. > > There's a number of simple test cases on the web page that I've used > (perfunittests). I can try to simplify them and add explanations. > This URL is not working: http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz File not found. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Joshua D. Drake wrote: > This URL is not working: > > > http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz Sorry about that, typo in the filename. Fixed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Joshua D. Drake wrote: >> This URL is not working: >> >> >> http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz > > Sorry about that, typo in the filename. Fixed. > > Here are my results on a modest 3800X2 2 Gig of ram, RAID 1 dual SATA http://pgsql.privatepaste.com/170yD8c0gr Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Joshua D. Drake wrote: > Heikki Linnakangas wrote: >> Joshua D. Drake wrote: >>> This URL is not working: >>> >>> >>> http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz >> Sorry about that, typo in the filename. Fixed. >> >> > Here are my results on a modest 3800X2 2 Gig of ram, RAID 1 dual SATA Thanks for looking into this, though that test alone doesn't really tell us anything. You'd have to run the same tests with and without clustered indexes enabled, and compare. With the default settings the test data fits in memory anyway, so you're not seeing the I/O benefit but only the CPU overhead. Attached is a larger test case with a data set of > 2 GB. Run the git_demo_init.sql first to create tables and indexes, and git_demo_run.sql to perform selects on them. The test runs for quite a long time, depending on your hardware, and print the time spent on the selects, with and without clustered index. You'll obviously need to run it with the patch applied. I'd suggest to enable stats_block_level to see the effect on buffer cache hit/miss ratio. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
Heikki Linnakangas wrote: > Joshua D. Drake wrote: >> Heikki Linnakangas wrote: >>> Joshua D. Drake wrote: >>>> This URL is not working: >>>> >>>> >>>> http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz >>> Sorry about that, typo in the filename. Fixed. >>> >>> >> Here are my results on a modest 3800X2 2 Gig of ram, RAID 1 dual SATA > heap_pages | normal_index_pages | clustered_index_pages ------------+--------------------+----------------------- 216217 | 109679 | 1316 select_with_normal_index -------------------------- 100000 (1 row) Time: 1356524.743 msselect_with_normal_index -------------------------- 100000 (1 row) Time: 1144832.597 msselect_with_normal_index -------------------------- 100000 (1 row) Time: 1111445.236 ms And now run the same tests with clustered index Timing is on.select_with_clustered_index ----------------------------- 100000 (1 row) Time: 815622.768 msselect_with_clustered_index ----------------------------- 100000 (1 row) Time: 535749.457 msselect_with_clustered_index ----------------------------- 100000 (1 row) select relname,indexrelname,idx_blks_read,idx_blks_hit from pg_statio_all_indexes where schemaname = 'public'; relname | indexrelname | idx_blks_read | idx_blks_hit --------------+------------------------------+---------------+--------------narrowtable | narrowtable_index | 296973 | 904654narrowtable2 | narrowtable2_clustered_index | 44556 | 857269 (2 rows) select relname,heap_blks_read,heap_blks_hit,idx_blks_read,idx_blks_hit from pg_statio_user_tables ; relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit --------------+----------------+---------------+---------------+--------------narrowtable2 | 734312 | 40304136| 44556 | 857269narrowtable | 952044 | 40002609 | 296973 | 904654 Seems like a clear win to me. Anyone else want to try? Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote: > > You'll obviously need to run it with the patch applied. I'd suggest > to enable stats_block_level to see the effect on buffer cache hit/ > miss ratio. groupeditems-42-pghead.patch.gz is enough, or it needs maintain_cluster_order_v5.patch ?? -- Grzegorz Jaskiewicz C/C++ freelance for hire
Grzegorz Jaskiewicz wrote: > > On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote: > > >> >> You'll obviously need to run it with the patch applied. I'd suggest to >> enable stats_block_level to see the effect on buffer cache hit/miss >> ratio. > > groupeditems-42-pghead.patch.gz is enough, or it needs > maintain_cluster_order_v5.patch ?? He has a patched source ball here of the whole thing, which is what I used: http://community.enterprisedb.com/git/pgsql-git-20070315.tar.gz The you just need to run the tests. > > > > > --Grzegorz Jaskiewicz > > C/C++ freelance for hire > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
This is on dual ultra 2 sparc. with ultrawide 320 scsi drives. 512MB ram. I had to drop size of DB, because the DB drive is 4GB (I do welecome bigger drives as donation, if someone asks - UWscsi 320). here are my results. With only 4.2 patch (no maintain cluster order v5 patch). If the v5 patch was needed, please tell me - I am going rerun it with. hope it is usefull. Repeat 3 times to ensure repeatable results. Timing is on. select_with_normal_index -------------------------- 100000 (1 row) Time: 1727891.334 ms select_with_normal_index -------------------------- 100000 (1 row) Time: 1325561.252 ms select_with_normal_index -------------------------- 100000 (1 row) Time: 1348530.100 ms Timing is off. And now run the same tests with clustered index Timing is on. select_with_clustered_index ----------------------------- 100000 (1 row) Time: 870246.856 ms select_with_clustered_index ----------------------------- 100000 (1 row) Time: 477089.456 ms select_with_clustered_index ----------------------------- 100000 (1 row) Time: 381880.965 ms Timing is off.
Grzegorz Jaskiewicz wrote: > > On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote: >> You'll obviously need to run it with the patch applied. I'd suggest to >> enable stats_block_level to see the effect on buffer cache hit/miss >> ratio. > > groupeditems-42-pghead.patch.gz is enough, or it needs > maintain_cluster_order_v5.patch ?? No, it won't make a difference unless you're inserting to the table, and the inserts are not in cluster order. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hackers et al... I was wondering if there are any outstanding issues that need to be resolved in terms of the clustered index/bitmap changes? From the testing that I have done, plus a couple of others it is a net win (at least from DBA space). Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On Tue, 20 Mar 2007, Joshua D. Drake wrote: > Hackers et al... I was wondering if there are any outstanding issues > that need to be resolved in terms of the clustered index/bitmap changes? > > >From the testing that I have done, plus a couple of others it is a net > win (at least from DBA space). Not sure if you're talking about bitmap indexes here. If so, I'm working on VACUUM support. Gavin
Gavin Sherry wrote: > On Tue, 20 Mar 2007, Joshua D. Drake wrote: > >> Hackers et al... I was wondering if there are any outstanding issues >> that need to be resolved in terms of the clustered index/bitmap changes? >> >> >From the testing that I have done, plus a couple of others it is a net >> win (at least from DBA space). > > Not sure if you're talking about bitmap indexes here. If so, I'm working > on VACUUM support. I was talking about the patch for Clustered indexes and I realize now I might have used the wrong thread. ; Joshua D. Drake > > Gavin > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Joshua D. Drake wrote: > Hackers et al... I was wondering if there are any outstanding issues > that need to be resolved in terms of the clustered index/bitmap changes? I have a todo list of smaller items for clustered indexes, but the main design issues at the moment are: 1. How to handle sorting tuples in a scan, or should we choose a design that doesn't require it? Should we add support for sorting tuples in scans on the fly, which gives more space savings when there's updates, and would also be useful in the future to support binned bitmap indexes? Or should we only form groups from tuples that are completely in order on page-level? That makes a clustered index to lose its space savings quicker, when tuples are updated. HOT reduces that affect, though. This approach would also reduce the CPU overhead of scans, because we could do binary searches within groups. At the moment, I'm leaning towards the latter approach. What do others think? 2. Clustered indexes need the support for candidate-matches. That needs to be added to the amgetmulti and amgettuple interfaces. I've sent a patch for amgetmulti, and a proposal for the amgettuple. 3. Clustered index needs to reach out to the heap for some operations, like uniqueness checks do today, blurring the modularity between heap and index. Are we willing to live with that? Is there something we can do to make it less ugly? I'd like to get some kind of confirmation first that 1 and 3 are not showstoppers, to avoid wasting time on a patch that'll just get rejected in the end, and then submit a patch for 2, and have that committed before the main patch. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mar 19, 2007, at 11:16 AM, Heikki Linnakangas wrote: > Grzegorz Jaskiewicz wrote: >> On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote: >>> You'll obviously need to run it with the patch applied. I'd >>> suggest to enable stats_block_level to see the effect on buffer >>> cache hit/miss ratio. >> groupeditems-42-pghead.patch.gz is enough, or it needs >> maintain_cluster_order_v5.patch ?? > > No, it won't make a difference unless you're inserting to the > table, and the inserts are not in cluster order. well, that's okay than. I see really good improvement in terms of speed and db size (which reflects obviously in i/o performance). Let me know if further testing can be done. I would happily see it in mainline. -- Grzegorz Jaskiewicz C/C++ freelance for hire
Grzegorz Jaskiewicz wrote: > > On Mar 19, 2007, at 11:16 AM, Heikki Linnakangas wrote: > >> Grzegorz Jaskiewicz wrote: >>> On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote: >>>> You'll obviously need to run it with the patch applied. I'd suggest >>>> to enable stats_block_level to see the effect on buffer cache >>>> hit/miss ratio. >>> groupeditems-42-pghead.patch.gz is enough, or it needs >>> maintain_cluster_order_v5.patch ?? >> >> No, it won't make a difference unless you're inserting to the table, >> and the inserts are not in cluster order. > well, that's okay than. I see really good improvement in terms of speed > and db size (which reflects obviously in i/o performance). > Let me know if further testing can be done. I would happily see it in > mainline. > Right. My understanding is that the clustered index will gradually degrade to a normal btree, is that correct heikki? We could of course resolve this by doing a reindex. The other item I think this would be great for is fairly static tables. Think about tables that are children of partitions that haven't been touched in 6 months. Why are we wasting space with them? Anyway, from a "feature" perspective I can't see any negative. I can not speak from a code injection (into core) perspective. Joshua D. Drake > > > --Grzegorz Jaskiewicz > > C/C++ freelance for hire > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Grzegorz Jaskiewicz wrote: > > On Mar 19, 2007, at 11:16 AM, Heikki Linnakangas wrote: > >> Grzegorz Jaskiewicz wrote: >>> On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote: >>>> You'll obviously need to run it with the patch applied. I'd suggest >>>> to enable stats_block_level to see the effect on buffer cache >>>> hit/miss ratio. >>> groupeditems-42-pghead.patch.gz is enough, or it needs >>> maintain_cluster_order_v5.patch ?? >> >> No, it won't make a difference unless you're inserting to the table, >> and the inserts are not in cluster order. > well, that's okay than. I see really good improvement in terms of speed > and db size (which reflects obviously in i/o performance). > Let me know if further testing can be done. I would happily see it in > mainline. If you have a real-world database you could try it with, that would be nice. The test I sent you is pretty much a best-case scenario, it'd be interesting to get anecdotal evidence of improvements in real applications. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Joshua D. Drake wrote: > Right. My understanding is that the clustered index will gradually > degrade to a normal btree, is that correct heikki? That's right. > We could of course resolve this by doing a reindex. Not reindex, but cluster. How clustered the index can be depends on the clusteredness of the heap. > The other item I think this would be great for is fairly static tables. > Think about tables that are children of partitions that haven't been > touched in 6 months. Why are we wasting space with them? By touched, you mean updated, right? Yes, it's particularly suitable for static tables, since once you cluster them, they stay clustered. Log-tables that are only inserted to, in monotonically increasing key order, also stay clustered naturally. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mar 21, 2007, at 5:22 PM, Heikki Linnakangas wrote: > Grzegorz Jaskiewicz wrote: >> On Mar 19, 2007, at 11:16 AM, Heikki Linnakangas wrote: >>> Grzegorz Jaskiewicz wrote: >>>> On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote: >>>>> You'll obviously need to run it with the patch applied. I'd >>>>> suggest to enable stats_block_level to see the effect on buffer >>>>> cache hit/miss ratio. >>>> groupeditems-42-pghead.patch.gz is enough, or it needs >>>> maintain_cluster_order_v5.patch ?? >>> >>> No, it won't make a difference unless you're inserting to the >>> table, and the inserts are not in cluster order. >> well, that's okay than. I see really good improvement in terms of >> speed and db size (which reflects obviously in i/o performance). >> Let me know if further testing can be done. I would happily see it >> in mainline. > > If you have a real-world database you could try it with, that would > be nice. The test I sent you is pretty much a best-case scenario, > it'd be interesting to get anecdotal evidence of improvements in > real applications. Sure, I'll check it with my network statistics thingie. 30GB db atm, with milions of rows. (traffic analysies for wide network , ethernet level, from/to/protocol/size kinda of thing). Loads of updates on 2 tables (that's where I also see HOT would benefit me). -- Grzegorz Jaskiewicz C/C++ freelance for hire
any idea how this patch is going to play with hot ? or should I just give it a spin, and see if my world collapses :D -- Grzegorz Jaskiewicz C/C++ freelance for hire
Grzegorz Jaskiewicz wrote: > any idea how this patch is going to play with hot ? or should I just > give it a spin, and see if my world collapses :D I've run tests with both patches applied. I haven't tried with the latest HOT-versions, but they should in theory work fine together. You'll get a conflict on the pg_stats-views, both patches add statistics, but IIRC you can just ignore that and it works. I think there's a conflict in regression tests as well. Give it a shot and let me know if there's problems :). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 3/22/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
Grzegorz Jaskiewicz wrote:
> any idea how this patch is going to play with hot ? or should I just
> give it a spin, and see if my world collapses :D
I've run tests with both patches applied. I haven't tried with the
latest HOT-versions, but they should in theory work fine together.
You'll get a conflict on the pg_stats-views, both patches add
statistics, but IIRC you can just ignore that and it works. I think
there's a conflict in regression tests as well.
Give it a shot and let me know if there's problems :).
Heikki, the signature of heap_fetch is changed slightly (we pass
a boolean to guide HOT-chain following) with HOT. That might
cause a conflict, I haven't tested though.
Grzegorz, if you can try HOT as well, that will be great.
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
On Mar 22, 2007, at 7:25 AM, Pavan Deolasee wrote: > > > Grzegorz, if you can try HOT as well, that will be great. > I tried, and it worked very well with 4.2 v of patch, as I remember. My point was, since 'the day' comes closer, and you guys work on close areas inside pg - I would like to be able to safely run both patches. I will give both a go, once I get some free time here. -- Grzegorz Jaskiewicz starving C/C++ freelance for hire