Re: index(fct(primary key)) kills INSERTs - Mailing list pgsql-bugs

From Frank Miles
Subject Re: index(fct(primary key)) kills INSERTs
Date
Msg-id Pine.A41.4.21.0011100844290.117708-100000@mead2.u.washington.edu
Whole thread Raw
In response to Re: index(fct(primary key)) kills INSERTs  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: index(fct(primary key)) kills INSERTs
List pgsql-bugs
On Fri, 10 Nov 2000, Tom Lane wrote:

> Frank Miles <fpm@u.washington.edu> writes:
> > If an index is created based on a function of the primary key,
> > you cannot insert new entries into the database.
>
> I think the critical point here is that your "function of the primary
> key" is actually doing a SELECT from the table:
>
>                 SELECT INTO prec * FROM test_table WHERE tt_id = dum_int;
>                 IF NOT FOUND THEN
>                         RAISE EXCEPTION ''project % not found'', dum_int;
>                 END IF;
>
> When I try your example, I get
>
> play=> INSERT INTO test_table (tt_descr) VALUES ('third - will fail');
> ERROR:  project 3 not found
>
> which surprises me not at all, because at the point where this function
> is invoked, the new record with tt_id 3 hasn't been entered into the
> table yet.

This makes sense if the index is updated before the new record is completely
added.  (showing my ignorance here:) Is that necessary?  Can't the new
index be updated after entry?  How can it be that the table has the
item (it's attempting to update the index based on a table that includes
the record '3'), yet the table doesn't have it (project 3 not found).
This smells like a race condition.

> I'm not sure what you are really trying to accomplish here --- as you
> say, it's a stripped-down example and not too intelligible.  As far
> as the example goes, you could skip the SELECT and just use the
> passed-in parameter value.  What was the original goal that made you
> feel you needed to SELECT the about-to-be-inserted row?

I don't need the function to work for the insertion.  That's only
used for queries.  But having the index clearly makes insertion impossible.

In the real tables, I have several indexes which depend on a variety
of values in the table.  A part of the database (eventually) will be
used to track projects done by my group for researchers at this University
and elsewhere.  Unfortunately different people at different times have
used different enumeration schemes for tracking these projects.  The
indexes allow me to have a simple SERIAL-based enumeration scheme,
while allowing the records to be accessed using older-format enumerations.

It also keeps response time to queries reasonable for these old formats.
The conversion is roughly a two-step process.  I have a set of python
routines that convert the old data formats into a form suitable for
postgres.  In this part of the process, the indicies are critical in
avoiding sequential scans, as the python routines query the database
in order to do their part of the conversion.  This has been especially
important when converting 'large numbers' (hundreds) of records from
various legacy data sources.  Without these indicies, processing can take
on the order of a second per record.  Once a set has been prepared,
the whole array is inserted as a single transaction.  The original data
have inconsistencies that are trapped by constraints set up in the
database.  While this is good -- we will end up with a cleaner database
-- it means that we have to go back to the data source, diagnose the
problem, make corrections, re-run the python scripts (need the indicies),
then try the insert transaction again (can't have the indicies).  The
whole process is ugly.

Clearly when I'm doing the inserts, these extra indicies aren't necessary.
In fact, my (hopefully temporary) workaround is to drop the indicies
when I need to insert new records; then restore the indicies.

In the long term, it's not critical for me, since eventually we won't be
importing 'large numbers' of records -- the functional index can be
killed, and people can simply wait the second or so for a query to be
processed.

I regard this bug as a 'wishlist'-level item : it seems that you should
be able to do this without the database choking.  It's not critical
for me -- I have a workaround.  I clearly know little about databases,
you are undoubtedly more knowledgable regarding whether my use of
this index might be something someone else might want.

Thank you for your attention to this matter.

    -frank

pgsql-bugs by date:

Previous
From: Robert Monro
Date:
Subject: Bug with return characters when using pg_dump and a re-import
Next
From: Tom Lane
Date:
Subject: Re: index(fct(primary key)) kills INSERTs