Thread: 9.5 alpha: some small comments on BRIN and btree_gin
Hello, First: KUDO !!! The release notes are extremely promising in regard to performance improvements :-) I've made some (dirty) tests with BRIN and btree_gin. (on a smalll Windows laptop ...) just a few remarks: - btree_gin deserve a better description than that: "However, they are useful for GIN testing and as a base for developing other GIN operator classes." I came to similartimes between btree and gin for indexes on "category" columns (ca 20 to 200 distinct values) For me, gin clearlywins here thanks to the index size difference. You should really consider moving btree_gin to core ... - btree_gin on integers doesn't cope well with BETWEEN. Seems to always lead to a full index scan I think I understand why,but maybe this is worth a comment in the doc to underline the difference to btree. SELECT * from tgin_1 WHERE cat_id between 1 and 2: http://explain.depesz.com/s/fmqn SELECT * from tgin_1 WHERE cat_id IN(1,2): http://explain.depesz.com/s/bYg - BRIN cost: I've made a silly test, where all distinct values exist in all BRIN page ranges: INSERT into tbrin_1 (cat_id,....) SELECT s%20, ... FROM generate_series(1,3000000 )s;CREATE INDEX cat_brin_1 on tbrin_1 using BRIN (cat_id)with(pages_per_range=64);SELECT * from tbrin_1 WHERE cat_id=10; http://explain.depesz.com/s/9YQR There seems to be no "fence" against useless BRIN indexes that would allow a fallback on a table scan. But the time overheadremind small :) best regards, Marc Mamin
On Mon, Jul 6, 2015 at 12:20 AM, Marc Mamin <M.Mamin@intershop.de> wrote:
Hello,
First: KUDO !!!
The release notes are extremely promising in regard to performance improvements :-)
I've made some (dirty) tests with BRIN and btree_gin.
(on a smalll Windows laptop ...)
just a few remarks:
- btree_gin deserve a better description than that:
"However, they are useful for GIN testing and as a base for developing other GIN operator classes."
I came to similar times between btree and gin for indexes on "category" columns (ca 20 to 200 distinct values)
For me, gin clearly wins here thanks to the index size difference.
I reached the same conclusion for things with higher distinct values, but still several copies of each distinct value. except I don't think we should change the description until the BETWEEN issue is resolved. That is a pretty serious limitation, I think.
Or at least, if you we invite people to use it for this purpose, we would have to warn them that it is not suitable for range queries. It wouldn't be so bad if it merely didn't support them well, but as things are now it positively pulls the planner away from better options, because it looks falsely attractive.
I've looked into doing myself, but I'm afraid it is beyond me.
Cheers,
Jeff
On 07/06/2015 12:20 AM, Marc Mamin wrote: > There seems to be no "fence" against useless BRIN indexes that would allow a fallback on a table scan. > But the time overhead remind small :) When have we ever stopped users from creating useless indexes? For one thing, just because the index isn't useful *now* doesn't mean it won't be in the future. Now, it would be useful to have a brin_index_effectiveness() function so that DBAs could check for themselves whether they should dump indexes. However, I don't see needing that for 9.5. Are there usage stats in pg_stat_user_indexes for BRIN? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Tue, Jul 7, 2015 at 9:04 AM, Josh Berkus <josh@agliodbs.com> wrote: > Are there usage stats in pg_stat_user_indexes for BRIN? Yes, they are here. -- Michael
Marc Mamin wrote: > - BRIN cost: I've made a silly test, where all distinct values exist in all BRIN page ranges: > > INSERT into tbrin_1 (cat_id, ....) SELECT s%20, ... FROM generate_series(1,3000000 )s; > CREATE INDEX cat_brin_1 on tbrin_1 using BRIN (cat_id)with (pages_per_range=64); > SELECT * from tbrin_1 WHERE cat_id=10; > http://explain.depesz.com/s/9YQR > > There seems to be no "fence" against useless BRIN indexes that would allow a fallback on a table scan. > But the time overhead remind small :) Hmm, I guess the costing function for brin could stand some improvement. Clearly we're not covering all bases. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> -----Original Message----- > From: Josh Berkus [mailto:josh@agliodbs.com] > Sent: Dienstag, 7. Juli 2015 02:04 > > On 07/06/2015 12:20 AM, Marc Mamin wrote: > > There seems to be no "fence" against useless BRIN indexes that > would allow a fallback on a table scan. > > But the time overhead remind small :) > > When have we ever stopped users from creating useless indexes? Sure, but on the other hand, they are so small and quick to build that they seem to be a good alternative when other index types are too costly, even if theses indexes can't deal well with all data ranges passed as query condition. Hence it would be fine if the planner could reject these indexes in the bad cases. I don't mean this is something I'm counting on, but it could be a good idea to mention this limitation in the doc. regards, Marc Mamin > For one > thing, just because the index isn't useful *now* doesn't mean it won't > be in the future. > > Now, it would be useful to have a brin_index_effectiveness() function > so that DBAs could check for themselves whether they should dump > indexes. > However, I don't see needing that for 9.5. > > Are there usage stats in pg_stat_user_indexes for BRIN? > > -- > Josh Berkus > PostgreSQL Experts Inc. > http://pgexperts.com
On 07/07/2015 06:28 AM, Marc Mamin wrote: > Sure, but on the other hand, they are so small and quick to build > that they seem to be a good alternative when other index types are too costly, > even if theses indexes can't deal well with all data ranges passed as query condition. > > Hence it would be fine if the planner could reject these indexes in the bad cases. Oh, sorry! I didn't realize that the planner was using the BRIN index even when it was useless; your email wasn't clear. The problem here is that the usefulness of BRIN indexes as a cost calculation should take correlation into account, heavily. Can we do that? Is correlation even part of the index costing method now? How accurate are our correlation estimates? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com