Re: Index Unqiueness - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Index Unqiueness |
Date | |
Msg-id | CA+TgmoZOyaTanfEvNUdiHBCuu9Zh0JVP1e_UTVbx6Rvj9vFC9Q@mail.gmail.com Whole thread Raw |
In response to | Re: Index Unqiueness (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: Index Unqiueness
|
List | pgsql-hackers |
On Fri, Mar 8, 2013 at 8:58 AM, Andres Freund <andres@2ndquadrant.com> wrote: > I'd be interested in something slightly related that is doing HOT on a > per-index basis. Currently we don't do hot if any index is updated even > though quite possibly most of the indexes don't change. > I think that might result in some quite nice speedups... Yeah, I agree. The thing that seems tricky is that, at present, once you have the CTID of a tuple in a HOT chain, you can follow the chain all the way to the end without inspecting the tuple contents - just xmin/xmax; and you know that all the tuples in the chain agree on all indexed attributes. If we relax that, consider a chain A -> B -> C, where A and B agree on all indexed attributes but B and C agree on only some of them. At present, only A will have any index pointers. If we break that assumption, then we the ability to HOT-prune away B and eventually C without index vac, which is a bummer. If we keep that assumption, then we can have an index pointer whose attributes don't match those of the tuple it references, which threatens to break index-only scans (at least). I haven't been able to think of a way to make this work out cleanly. I think it's important to understand whether we're trying to extend the benefits of HOT update (skipping index insertions) or HOT cleanup (page-at-a-time vacuuming). They're both significant. If the goal is to optimize away index maintenance for the columns that aren't updated, we could change the way vacuum works: when vacuum finds a dead tuple, instead of scanning for the index pointers and removing them, it could scan for the index pointers and update them to point to the new CTID (which is available from looking at the original tuple). Then, potentially, updates don't need to insert into indexes on unchanged columns, because scans can follow the CTID pointers forward where needed. For sanity's sake, it might be best to limit this to cases where the old and new tuples are on the same page, both to minimize the overhead during scans, and also because there's not enough space available in the line pointer to store a full CTID (and we definitely don't want to postpone truncating tuples to line pointers to get this optimization). I'm waving my hands wildly here; I might be totally off-base in thinking that any of this can work. More broadly, I'm not sure if it's the right thing to optimize. It essentially aims to speed up updates, but the cost would be slower scans; indeed, it's hard to think of anything we could do in this area that wouldn't make scans more complex and therefore potentially slower, and it's not clear that's the right way to go. HOT cleanups - or hint bit setting - during scans are already a frequent cause of performance complaints. We could skip them during read-only operations but we do them for a good reason: otherwise, repeated scans can get very painful. So I think there's a decent argument that we're already over-optimized for writes at the expense of reads. I'm increasingly tempted to think that our heap representation needs a much broader rethink. We hear endless complaints in this forum to the effect that hint bits cause problems, both directly with performance and indirectly by greatly complicating things for other features, such as page checksums. We all love HOT, but it doesn't cover enough cases, and the HOT cleanups can occasionally cause pain. Freezing sucks. The often-expressed desire to cluster a table and have it stay clustered is unimplementable. Returning space to the OS requires painfully expensive maintenance. The 24-byte-per-tuple overhead seems badly overpriced for insert-only tables. These are not fundamental truths of the universe, or even of PostgreSQL; they are specific consequences of the representation we've chosen for heaps. Many of them are things that we've grown into, rather than designed with malice aforethought: for example, freezing is a consequence of the after-the-fact desire to be able to support more than 4bn transactions over the lifetime of the database. So it's way better than what we had before, and yet, if we all sat down and designed a new on-disk storage format for a new product today, I'm sure none of us would pick one that expires after 2bn transactions. We can continue to whittle away at these problems incrementally, and I hope we do, but I've started to feel like we're bumping up against the limits of what is feasible given the design decisions to which we've already committed. To take one concrete example, suppose we sat down and designed a new heap format, and suppose we made it our principal goal to make it write-once and zero-maintenance. In other words, if a user sat down and bulk-loaded data into a table, and never modified it, there would be no further writes to that table after the initial flush to disk (no hint bits, no visibility map bits, no freezing) and no scans of the data except as a result of user-initiated activity (i.e. no scans to see whether there is data needing to be frozen). Could we do it? I bet we could, or darn close. Will we ever get there through incremental improvements to what we have now? I doubt it. Instead, we end up with things like COPY FREEZE. Now that's a good tool, and, importantly, it's real, whereas everything I'm saying here is pie in the sky. But it's clearly also a special-case hack, and more generally I don't know how much more we can do that's going to be anything BUT a special-case hack. Some, certainly. But I'm not sure how much. Curious to hear your thoughts. Is anyone else feeling this same itch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: