Re: BRIN indexes - TRAP: BadArgument - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: BRIN indexes - TRAP: BadArgument |
Date | |
Msg-id | 20141006223359.GL7043@eldon.alvh.no-ip.org Whole thread Raw |
In response to | Re: BRIN indexes - TRAP: BadArgument (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: BRIN indexes - TRAP: BadArgument
BRIN range operator class Re: BRIN indexes - TRAP: BadArgument |
List | pgsql-hackers |
Heikki Linnakangas wrote: > On 09/23/2014 10:04 PM, Alvaro Herrera wrote: > >+ <para> > >+ The <acronym>BRIN</acronym> implementation in <productname>PostgreSQL</productname> > >+ is primarily maintained by Álvaro Herrera. > >+ </para> > > We don't usually have such verbiage in the docs. The GIN and GiST > pages do, but I think those are a historic exceptions, not something > we want to do going forward. Removed. > >+ <variablelist> > >+ <varlistentry> > >+ <term><function>BrinOpcInfo *opcInfo(void)</></term> > >+ <listitem> > >+ <para> > >+ Returns internal information about the indexed columns' summary data. > >+ </para> > >+ </listitem> > >+ </varlistentry> > > I think you should explain what that internal information is. The > minmax-19a.patch adds the type OID argument to this; remember to > update the docs. Updated. > In SP-GiST, the similar function is called "config". It might be > good to use the same name here, for consistency across indexams > (although I actually like the "opcInfo" name better than "config") Well, I'm not sure that there's any value in being consistent if the new name is better than the old one. Most likely, a person trying to implement an spgist opclass wouldn't try to do a brin opclass at the same time, so it's not like there's a lot of value in being consistent there, anyway. > The docs for the other support functions need to be updated, now > that you changed the arguments from DeformedBrTuple to BrinValues. Updated. > >+ <!-- this needs improvement ... --> > >+ To implement these methods in a generic ways, normally the opclass > >+ defines its own internal support functions. For instance, minmax > >+ opclasses add the support functions for the four inequality operators > >+ for the datatype. > >+ Additionally, the operator class must supply appropriate > >+ operator entries, > >+ to enable the optimizer to use the index when those operators are > >+ used in queries. > > The above needs improvement ;-) I rechecked and while I tweaked it here and there, I wasn't able to add much more to it. > >+ BRIN indexes (a shorthand for Block Range indexes) > >+ store summaries about the values stored in consecutive table physical block ranges. > > "consecutive table physical block ranges" is quite a mouthful. I reworded this introduction. I hope it makes more sense now. > >+ For datatypes that have a linear sort order, the indexed data > >+ corresponds to the minimum and maximum values of the > >+ values in the column for each block range, > >+ which support indexed queries using these operators: > >+ > >+ <simplelist> > >+ <member><literal><</literal></member> > >+ <member><literal><=</literal></member> > >+ <member><literal>=</literal></member> > >+ <member><literal>>=</literal></member> > >+ <member><literal>></literal></member> > >+ </simplelist> > > That's the built-in minmax indexing strategy, yes, but you could > have others, even for datatypes with a linear sort order. I "fixed" this by removing this list. It's not possible to be comprehensive here, I think, and anyway I don't think there's much point. > >+ To find out the index tuple for a particular page range, we have an internal > > s/find out/find/ > > >+ new heap tuple contains null values but the index tuple indicate there are no > > s/indicate/indicates/ Both fixed. > >+ Open questions > >+ -------------- > >+ > >+ * Same-size page ranges? > >+ Current related literature seems to consider that each "index entry" in a > >+ BRIN index must cover the same number of pages. There doesn't seem to be a > > What is the related literature? Is there an academic paper or > something that should be cited as a reference for BRIN? I the original "minmax-proposal" file, I had these four URLs: : Other database systems already have similar features. Some examples: : : * Oracle Exadata calls this "storage indexes" : http://richardfoote.wordpress.com/category/storage-indexes/ : : * Netezza has "zone maps" : http://nztips.com/2010/11/netezza-integer-join-keys/ : : * Infobright has this automatically within their "data packs" according to a : May 3rd, 2009 blog post : http://www.infobright.org/index.php/organizing_data_and_more_about_rough_data_contest/ : : * MonetDB also uses this technique, according to a published paper : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.2662 : "Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS" I gave them all a quick look and none of them touches the approach in detail; in fact other than the Oracle Exadata one, they are all talking about something else and mention the "minmax" stuff only in passing. I don't think any of them is worth citing. > >+ * TODO > >+ * * ScalarArrayOpExpr (amsearcharray -> SK_SEARCHARRAY) > >+ * * add support for unlogged indexes > >+ * * ditto expressional indexes > > We don't have unlogged indexes in general, so no need to list that > here. What would be needed to implement ScalarArrayOpExprs? Well, it requires a different way to handle ScanKeys. Anyway the queries that it is supposed to serve can already be served in some other ways for AMs that don't have amsearcharray, so I don't think it's a huge loss if we don't implement it. We can add that later. > I didn't realize that expression indexes are still not supported. > And I see that partial indexes are not supported either. Why not? I > wouldn't expect BRIN to need to care about those things in > particular; the expressions for an expressional or partial index are > handled in the executor, no? Yeah; those restrictions were leftovers from back when I didn't really know how they were supposed to be implemented. I took out the restrictions and there wasn't anything else required to support both these features. > >+ /* > >+ * A tuple in the heap is being inserted. To keep a brin index up to date, > >+ * we need to obtain the relevant index tuple, compare its stored values with > >+ * those of the new tuple; if the tuple values are consistent with the summary > >+ * tuple, there's nothing to do; otherwise we need to update the index. > > s/compare/and compare/. Perhaps replace one of the semicolons with a > full stop. Fixed. > >+ * If the range is not currently summarized (i.e. the revmap returns InvalidTid > >+ * for it), there's nothing to do either. > >+ */ > >+ Datum > >+ brininsert(PG_FUNCTION_ARGS) > > There is no InvalidTid, as a constant or a #define. Perhaps replace > with "invalid item pointer". Fixed -- actually it doesn't return invalid TID anymore, only NULL. > >+ /* > >+ * XXX We need to know the size of the table so that we know how long to > >+ * iterate on the revmap. There's room for improvement here, in that we > >+ * could have the revmap tell us when to stop iterating. > >+ */ > > The revmap doesn't know how large the table is. Remember that you > have to return all blocks that are not in the revmap, so you can't > just stop when you reach the end of the revmap. I think the current > design is fine. Yeah, I was leaning towards the same conclusion myself. I have removed the comment. (We could think about having brininsert update the metapage so that the index keeps track of what's the last heap page, which could help us support this, but I'm not sure there's much point. Anyway we can tweak this later.) > I have to stop now to do some other stuff. Overall, this is in > pretty good shape. In addition to little cleanup of things I listed > above, and similar stuff elsewhere that I didn't read through right > now, there are a few medium-sized items I'd still like to see > addressed before you commit this: > > * expressional/partial index support > * the difficulty of testing the union support function that we > discussed earlier I added an USE_ASSERTION-only block in brininsert that runs the union support proc and compares the output with the one from regular addValue. I haven't tested this too much yet. > * clarify the memory context stuff of support functions that we also > discussed earlier I re-checked this stuff. Turns out that the support functions don't palloc/pfree memory too much, except to update the stuff stored in BrinValues, by using datumCopy(). This memory is only freed when we need to update a previous Datum. There's no way for the brin.c code to know when the Datum is going to be released by the support proc, and thus no way for a temp context to be used. The memory context experiments I alluded to earlier are related to pallocs done in brininsert / bringetbitmap themselves, not in the opclass-provided support procs. All in all, I don't think there's much room for improvement, other than perhaps doing so in brininsert/ bringetbitmap. Don't really care too much about this either way. Once again, many thanks for the review. Here's a new version. I have added operator classes for int8, text, and actually everything that btree supports except: bool record oidvector anyarray tsvector tsquery jsonb range since I'm not sure that it makes sense to have opclasses for any of these -- at least not regular minmax opclasses. There are some interesting possibilities, for example for range types, whereby we store in the index tuple the union of all the range in the block range. (I had an opclass for anyenum too, but on further thought I removed it because it is going to be pointless in nearly all cases.) contrib/pageinspect/Makefile | 2 +- contrib/pageinspect/brinfuncs.c | 410 +++++++++++ contrib/pageinspect/pageinspect--1.2.sql | 37 + contrib/pg_xlogdump/rmgrdesc.c | 1 + doc/src/sgml/brin.sgml | 498 +++++++++++++ doc/src/sgml/filelist.sgml | 1 + doc/src/sgml/indices.sgml | 36 +- doc/src/sgml/postgres.sgml | 1 + src/backend/access/Makefile | 2 +- src/backend/access/brin/Makefile | 18 + src/backend/access/brin/README | 179 +++++ src/backend/access/brin/brin.c | 1116 ++++++++++++++++++++++++++++++ src/backend/access/brin/brin_minmax.c | 320 +++++++++ src/backend/access/brin/brin_pageops.c | 712 +++++++++++++++++++ src/backend/access/brin/brin_revmap.c | 473 +++++++++++++ src/backend/access/brin/brin_tuple.c | 553 +++++++++++++++ src/backend/access/brin/brin_xlog.c | 319 +++++++++ src/backend/access/common/reloptions.c | 7 + src/backend/access/heap/heapam.c | 22 +- src/backend/access/rmgrdesc/Makefile | 3 +- src/backend/access/rmgrdesc/brindesc.c | 112 +++ src/backend/access/transam/rmgr.c | 1 + src/backend/catalog/index.c | 24 + src/backend/replication/logical/decode.c | 1 + src/backend/storage/page/bufpage.c | 179 ++++- src/backend/utils/adt/selfuncs.c | 74 +- src/include/access/brin.h | 52 ++ src/include/access/brin_internal.h | 87 +++ src/include/access/brin_page.h | 70 ++ src/include/access/brin_pageops.h | 36 + src/include/access/brin_revmap.h | 39 ++ src/include/access/brin_tuple.h | 97 +++ src/include/access/brin_xlog.h | 107 +++ src/include/access/heapam.h | 2 + src/include/access/reloptions.h | 3 +- src/include/access/relscan.h | 4 +- src/include/access/rmgrlist.h | 1 + src/include/catalog/index.h | 8 + src/include/catalog/pg_am.h | 2 + src/include/catalog/pg_amop.h | 164 +++++ src/include/catalog/pg_amproc.h | 245 +++++++ src/include/catalog/pg_opclass.h | 32 + src/include/catalog/pg_opfamily.h | 28 + src/include/catalog/pg_proc.h | 38 + src/include/storage/bufpage.h | 2 + src/include/utils/selfuncs.h | 1 + src/test/regress/expected/opr_sanity.out | 14 +- src/test/regress/sql/opr_sanity.sql | 7 +- 48 files changed, 6122 insertions(+), 18 deletions(-) (I keep naming the patch file "minmax", but nothing in the code is actually called that way anymore, except the opclasses). -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: