Re: 9.3 Pre-proposal: Range Merge Join - Mailing list pgsql-hackers

From Greg Stark
Subject Re: 9.3 Pre-proposal: Range Merge Join
Date
Msg-id CAM-w4HPM5ScSy3wGfQLXS8t6mjzfH6rj80i_haWykBMUiZFYUw@mail.gmail.com
Whole thread Raw
In response to Re: 9.3 Pre-proposal: Range Merge Join  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Mon, Apr 16, 2012 at 10:20 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The thing I like most about temp indexes is that they needn't be temporary.
>
> I'd like to see something along the lines of demand-created optional
> indexes, that we reclaim space/maintenance overhead on according to
> some cache management scheme. More space you have, the more of the
> important ones hang around. The rough same idea applies to
> materialised views.

I find this a) really scary, b) a huge increase in scope, and c) kind
of pointless.

a) DDL and shared caches require all kinds of additional
synchronization that would be really annoying to be incurring in the
middle of DML queries unexpectedly. If these indexes are to be useful
for other transactions they would also impose a hidden and
uncontrollable cost on all updates and inserts on the table.

b) I think it would be a lot harder to get working cleanly. You would
need to invent a UI to control the lifetime and resources used by
these objects and deal with duplication between manually created and
dynamically created objects. It's fundamentally a huge shift in the
design of the database changing what things the user maintains as part
of their schema and what things the database maintains transparently.
This is a big project on its own aside from the technical side of
things.

c) If these indexes are useful for many queries then the user can
choose to create them themself. These indexes you're proposing would
be just as expensive as any those indexes and offer few advantages
aside from their automaticness. The same could be accomplished with
some external demon that just looked at the query logs and determined
which indexes to create or not.

The main advantage of creating dynamic indexes as part of the query
would be lost. Namely that these would be local data structures that
don't need to be synchronized with other transactions and don't need
to be updated by other transactions. They're just part of the query
execution the way spilled hash tables and tuplesort tapes are. You
could build indexes on materialized data resulting from earlier joins
or aggregates and so on.

The point is that if you make them equivalent to normal indexes just
dynamically maintained then all you've done is change the way normal
indexes work but haven't really changed the set of queries they're
useful for. That might be a neat UI Feature but If you want to change
the set of queries postgres can handle efficiently at all then you
need something that's fundamentally different from a table index.

As an aside I think some of what you're looking for could be better
handled with some kind of query result cache that could keep around
the materialized data from plan nodes until an event happens that
changes the data. This might be related to the underlying
infrastructure needed for materialized views but I haven't thought too
deeply about it. It seems like a lot of work and a big change from the
current very isolated per-process execution model.

-- 
greg


pgsql-hackers by date:

Previous
From: Jay Levitt
Date:
Subject: Re: Bug tracker tool we need
Next
From: Greg Stark
Date:
Subject: Re: Gsoc2012 idea, tablesample