Thread: [PATCH] Partial indicies again

[PATCH] Partial indicies again

From
Martijn van Oosterhout
Date:
Well, getting closer. Maybe I should start version numbering the patches?

Things changed in this version:

* As Tom Lane pointed out, what to do with EXTEND INDEX. I've decided to
  remove it since there is no defined syntax for it and you cannot make an
  index more restrictive with it. Since the names of indicies don't matter,
  you may as well drop the index and make a new one if you want to change
  the predicate.

  At this stage I have only removed it from the grammer because complete
  removal would require many line-by-line changes and would end up with a
  patch much larger than this one. I'd prefer to submit that as a separate
  patch after this one is completed.

* Updates to the documentation as well as comments within the code. If you
  see a spot I missed, let me know.

* Changes to the cost estimation. This is much better than before but still
  needs some refining since index_selectivity != rel_selectivity. Feedback
  welcomed.

Issues still needing to be fixed:

* Make it pg_dump-able. I've tried to extract the expression out the of
  system tables by using stringToNode and deparse_expression but it doesn't
  seem to work. I keep getting the error: "get_names_for_var: bogus
  varlevelsup 0". Anyone know what's going on? See attachment <<expr.c>>.

* VACUUM still complains about tuple count mismatch. Anyone have a good idea
  on how to deal with this?

* The regression tests need to be changed to deal with the newly added
  regression tests for these indicies.

Oh, and it appears I accidently changed the allowed syntax a bit. You used
to have to qualify each field in the partial index predicate with the name
of the relation. That's no longer required.

Everyone still alive out there?
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Attachment

Re: [PATCH] Partial indicies again

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> * As Tom Lane pointed out, what to do with EXTEND INDEX. I've decided to
>   remove it since there is no defined syntax for it and you cannot make an
>   index more restrictive with it. Since the names of indicies don't matter,
>   you may as well drop the index and make a new one if you want to change
>   the predicate.

I can imagine ways to deal with the make-it-more-restrictive problem
(one creative way is to mark the index lossy if we can't prove that the
old predicate implies the new one).  But I'd favor just removing the
statement, instead.  It doesn't seem useful enough to be worth the
substantial baggage.  (In addition to the statement itself, there's
cruft in each instance of index build to deal with EXTEND.  All that
stuff should come out IMHO.)

> * Changes to the cost estimation. This is much better than before but still
>   needs some refining since index_selectivity != rel_selectivity. Feedback
>   welcomed.

I suspect you need to run cnfify after and-ing together the predicate
and index quals.  clauselist_selectivity assumes it's working on
cnf'ified expressions.

> * Make it pg_dump-able. I've tried to extract the expression out the of
>   system tables by using stringToNode and deparse_expression but it doesn't
>   seem to work. I keep getting the error: "get_names_for_var: bogus
>   varlevelsup 0". Anyone know what's going on? See attachment <<expr.c>>.

You didn't provide context in which to decode Var references.  See
deparse_context_for().  The SQL function will need to take a relation
name (or OID? not sure which is preferable) as well as the compiled
predicate.  Btw, don't set forceprefix true; we don't need that if
there's only going to be one relation involved.

> * VACUUM still complains about tuple count mismatch. Anyone have a good idea
>   on how to deal with this?

I'd be inclined to just weaken VACUUM's test from "=" to "<=" whenever
the index has a predicate.

> Oh, and it appears I accidently changed the allowed syntax a bit. You used
> to have to qualify each field in the partial index predicate with the name
> of the relation. That's no longer required.

This is definitely an improvement; we should consider the unqualified
way to be the default, recommended style.  You should change the
regression test to exercise both forms.

            regards, tom lane

Re: [PATCH] Partial indicies again

From
Martijn van Oosterhout
Date:
On Mon, Jul 09, 2001 at 07:25:33PM -0400, Tom Lane wrote:
> substantial baggage.  (In addition to the statement itself, there's
> cruft in each instance of index build to deal with EXTEND.  All that
> stuff should come out IMHO.)

Actually, there is far, far more code to come out. There are complete
functions which need to be removed. That's why I suggested leaving that to a
separate patch.

> I suspect you need to run cnfify after and-ing together the predicate
> and index quals.  clauselist_selectivity assumes it's working on
> cnf'ified expressions.

OK. Do I need to worry about that function destroying either of the input
lists?

> You didn't provide context in which to decode Var references.  See
> deparse_context_for().  The SQL function will need to take a relation
> name (or OID? not sure which is preferable) as well as the compiled
> predicate.  Btw, don't set forceprefix true; we don't need that if
> there's only going to be one relation involved.

Oh. So the Var references in the tree are not linked to any particular
table? I understand then. deparse_context_for() takes both an oid and a
relation name so I'd need a name_to_oid or oid_to_name function.

> > * VACUUM still complains about tuple count mismatch. Anyone have a good idea
> >   on how to deal with this?
>
> I'd be inclined to just weaken VACUUM's test from "=" to "<=" whenever
> the index has a predicate.

I'll have to check again, but I thought that the VACUUM code only had the
OID of the index, so how it is supposed to work out if it's a partial index.
I'll look at this again tonight.

Thanks for your feedback,
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Re: [PATCH] Partial indicies again

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
>> I suspect you need to run cnfify after and-ing together the predicate
>> and index quals.  clauselist_selectivity assumes it's working on
>> cnf'ified expressions.

> OK. Do I need to worry about that function destroying either of the input
> lists?

AFAIR, cnfify doesn't modify its inputs.  But watch out for the
difference between explicit and implicit ANDing.

> I'll have to check again, but I thought that the VACUUM code only had the
> OID of the index, so how it is supposed to work out if it's a partial index.

You'll need to look it up.

In practice, I seem to recall that VACUUM is broken for partial indexes
anyway, specifically because it does not pay attention to partial-ness:
when it moves a tuple it shouldn't make an index entry for the new copy
if the index is partial and the tuple fails the predicate check.  The
correct fix for this is not to add code, but to remove it.  VACUUM
should never have had its own index-entry-making code in the first
place; it should be using ExecOpenIndices and friends from the main
executor.  Once you do that, the info returned by ExecOpenIndices will
include the predicate, and you can just look there.

            regards, tom lane

Re: [PATCH] Partial indicies again

From
Martijn van Oosterhout
Date:
On Mon, Jul 09, 2001 at 08:22:43PM -0400, Tom Lane wrote:
> AFAIR, cnfify doesn't modify its inputs.  But watch out for the
> difference between explicit and implicit ANDing.

OK, I'm pretty sure I got it right now.

> In practice, I seem to recall that VACUUM is broken for partial indexes
> anyway, specifically because it does not pay attention to partial-ness:

OK, I don't feel too confident playing with the vacuum code, seems to be a
real quick way to destroy your database.

To actually be able to use ExecInsertIndexTuples, you need to create an
EState, a ResultRelInfo and put each tuple in a TupleSlot for a while. Does
it sound like I'm on the right track? That's quite a few changes.

Isn't someone else playing with the vacuum code for 7.2 anyway (for
background vacuums)? We'd better make sure we don't clash.

Back to other issues. pg_dump now works for partial indicies, as long as the
pg_get_expr function is defined. To make that an internal function I have
add it to pg_proc.h and initdb again, right?

http://svana.org/kleptog/pgsql/partial-indicies.patch
http://svana.org/kleptog/pgsql/expr.c
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Re: [PATCH] Partial indicies again

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> To actually be able to use ExecInsertIndexTuples, you need to create an
> EState, a ResultRelInfo and put each tuple in a TupleSlot for a while. Does
> it sound like I'm on the right track? That's quite a few changes.

Right.  I'd recommend copy.c as a model.

> Isn't someone else playing with the vacuum code for 7.2 anyway (for
> background vacuums)? We'd better make sure we don't clash.

That would be me.

> Back to other issues. pg_dump now works for partial indicies, as long as the
> pg_get_expr function is defined. To make that an internal function I have
> add it to pg_proc.h and initdb again, right?

Right.  I'd suggest sticking the source in ruleutils.c.

            regards, tom lane

Re: [PATCH] Partial indicies again

From
Peter Eisentraut
Date:
Martijn van Oosterhout writes:

> Well, getting closer. Maybe I should start version numbering the patches?

For one thing, you might want to post them to pgsql-patches instead.  And
you should start generating the patches against the 7.2devel sources
because I already suspect yours generating conflicts.

> * Updates to the documentation as well as comments within the code. If you
>   see a spot I missed, let me know.

FYI, it's called partial "indexes".

> * Make it pg_dump-able. I've tried to extract the expression out the of
>   system tables by using stringToNode and deparse_expression but it doesn't
>   seem to work. I keep getting the error: "get_names_for_var: bogus
>   varlevelsup 0". Anyone know what's going on? See attachment <<expr.c>>.

It might be an option to store the unparsed condition in the system
catalogs, similar to what is done with the default values (see
pg_attrdef).

--
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter


Re: [PATCH] Partial indicies again

From
Martijn van Oosterhout
Date:
On Tue, Jul 10, 2001 at 04:47:49PM +0200, Peter Eisentraut wrote:
> Martijn van Oosterhout writes:
>
> > Well, getting closer. Maybe I should start version numbering the patches?
>
> For one thing, you might want to post them to pgsql-patches instead.  And
> you should start generating the patches against the 7.2devel sources
> because I already suspect yours generating conflicts.

Well, Tom Lane certainly keeps pointing out problems with it. I'm pulling
out the latest CVS to see if there are going to be any problems there.

> > * Updates to the documentation as well as comments within the code. If you
> >   see a spot I missed, let me know.
>
> FYI, it's called partial "indexes".

Actually, it seems to depend on who you ask. Both spellings are scattered
throughout the documentation and the source. According to my dictionary,
both spellings are allowed.

> > * Make it pg_dump-able. I've tried to extract the expression out the of
> >   system tables by using stringToNode and deparse_expression but it doesn't
> >   seem to work. I keep getting the error: "get_names_for_var: bogus
> >   varlevelsup 0". Anyone know what's going on? See attachment <<expr.c>>.
>
> It might be an option to store the unparsed condition in the system
> catalogs, similar to what is done with the default values (see
> pg_attrdef).

See my later email. This has been fixed already.

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Re: [PATCH] Partial indicies again

From
Peter Eisentraut
Date:
Martijn van Oosterhout writes:

> > FYI, it's called partial "indexes".
>
> Actually, it seems to depend on who you ask. Both spellings are scattered
> throughout the documentation and the source. According to my dictionary,
> both spellings are allowed.

We consulted some dictionaries a while ago and decided that "indexes" is
the correct form for this usage.  Not everything has been changed to that
effect yet, but it will soon.  In any case, the alternative
would be "indices".

--
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter


Re: [PATCH] Partial indicies again

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> It might be an option to store the unparsed condition in the system
> catalogs, similar to what is done with the default values (see
> pg_attrdef).

In the long run, I have been thinking that we should eliminate storage
of preparsed expressions in all the system catalogs, and just store the
original source text of rules, default expressions, etc.  This would
eliminate one major source of forced initdbs, and it'd make it a lot
easier to behave in a sensible fashion when tables underlying views
change.  The additional cost of reparsing when we read the value should
be small (and seldom paid, since we cache all these things anyway).

The major stumbling block is how to extract the source text from the
initial query.  What I'd like to do is extend the lexer/parser so that
every parse node includes starting and ending indexes into the original
query string.  That would solve this problem and also make it possible
to generate much nicer messages for syntax errors.  Isn't likely to get
done for 7.2 though.

In any case, I think Martijn is on the right track for index predicates
for the moment --- the code is designed around storing just the parsed
version, and there's not much point in changing only this part of the
system.

            regards, tom lane

Re: [PATCH] Partial indicies again

From
Martijn van Oosterhout
Date:
On Tue, Jul 10, 2001 at 05:36:35PM +0200, Peter Eisentraut wrote:
> Martijn van Oosterhout writes:
>
> > > FYI, it's called partial "indexes".
> >
> > Actually, it seems to depend on who you ask. Both spellings are scattered
> > throughout the documentation and the source. According to my dictionary,
> > both spellings are allowed.
>
> We consulted some dictionaries a while ago and decided that "indexes" is
> the correct form for this usage.  Not everything has been changed to that
> effect yet, but it will soon.  In any case, the alternative
> would be "indices".

*blink*

You're right, put one too many I's in there. I'll correct that.

So everything is going to be changed to the other spelling soon anyway?

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Re: [PATCH] Partial indicies again

From
Martijn van Oosterhout
Date:
On Tue, Jul 10, 2001 at 09:56:38AM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > Back to other issues. pg_dump now works for partial indicies, as long as the
> > pg_get_expr function is defined. To make that an internal function I have
> > add it to pg_proc.h and initdb again, right?
>
> Right.  I'd suggest sticking the source in ruleutils.c.

I notice that in pg_proc.h each line is assigned an OID. Do I just pick the
next available one or is there a method there?

I've rediffed against the latest CVS and everything went except one file
disappeared (sql_help.h). I guess that's done differently now. When I get
the cvs version running I'll see what happend.

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Re: [PATCH] Partial indicies again

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> I notice that in pg_proc.h each line is assigned an OID. Do I just pick the
> next available one or is there a method there?

Pick anything that's not shown as used by the unused_oids script (but be
sure to run that against up-to-date sources from CVS).

> I've rediffed against the latest CVS and everything went except one file
> disappeared (sql_help.h).

That file is generated during build, it's not in CVS.

            regards, tom lane

Re: [PATCH] Partial indicies again

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> So everything is going to be changed to the other spelling soon anyway?

All the user documentation, at least.  I'm not sure if Peter is planning
to attack the source code itself or not ...

            regards, tom lane

Re: [PATCH] Partial indicies again

From
Bruce Momjian
Date:
> On Tue, Jul 10, 2001 at 09:56:38AM -0400, Tom Lane wrote:
> > Martijn van Oosterhout <kleptog@svana.org> writes:
> > > Back to other issues. pg_dump now works for partial indicies, as long as the
> > > pg_get_expr function is defined. To make that an internal function I have
> > > add it to pg_proc.h and initdb again, right?
> >
> > Right.  I'd suggest sticking the source in ruleutils.c.
>
> I notice that in pg_proc.h each line is assigned an OID. Do I just pick the
> next available one or is there a method there?
>
> I've rediffed against the latest CVS and everything went except one file
> disappeared (sql_help.h). I guess that's done differently now. When I get
> the cvs version running I'll see what happend.

sql_help.h is autogenerated from the SGML docs.  It is generated by
create_help.pl.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026