Re: index(fct(primary key)) kills INSERTs - Mailing list pgsql-bugs
From | Frank Miles |
---|---|
Subject | Re: index(fct(primary key)) kills INSERTs |
Date | |
Msg-id | Pine.A41.4.21.0011100844290.117708-100000@mead2.u.washington.edu Whole thread Raw |
In response to | Re: index(fct(primary key)) kills INSERTs (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: index(fct(primary key)) kills INSERTs
|
List | pgsql-bugs |
On Fri, 10 Nov 2000, Tom Lane wrote: > Frank Miles <fpm@u.washington.edu> writes: > > If an index is created based on a function of the primary key, > > you cannot insert new entries into the database. > > I think the critical point here is that your "function of the primary > key" is actually doing a SELECT from the table: > > SELECT INTO prec * FROM test_table WHERE tt_id = dum_int; > IF NOT FOUND THEN > RAISE EXCEPTION ''project % not found'', dum_int; > END IF; > > When I try your example, I get > > play=> INSERT INTO test_table (tt_descr) VALUES ('third - will fail'); > ERROR: project 3 not found > > which surprises me not at all, because at the point where this function > is invoked, the new record with tt_id 3 hasn't been entered into the > table yet. This makes sense if the index is updated before the new record is completely added. (showing my ignorance here:) Is that necessary? Can't the new index be updated after entry? How can it be that the table has the item (it's attempting to update the index based on a table that includes the record '3'), yet the table doesn't have it (project 3 not found). This smells like a race condition. > I'm not sure what you are really trying to accomplish here --- as you > say, it's a stripped-down example and not too intelligible. As far > as the example goes, you could skip the SELECT and just use the > passed-in parameter value. What was the original goal that made you > feel you needed to SELECT the about-to-be-inserted row? I don't need the function to work for the insertion. That's only used for queries. But having the index clearly makes insertion impossible. In the real tables, I have several indexes which depend on a variety of values in the table. A part of the database (eventually) will be used to track projects done by my group for researchers at this University and elsewhere. Unfortunately different people at different times have used different enumeration schemes for tracking these projects. The indexes allow me to have a simple SERIAL-based enumeration scheme, while allowing the records to be accessed using older-format enumerations. It also keeps response time to queries reasonable for these old formats. The conversion is roughly a two-step process. I have a set of python routines that convert the old data formats into a form suitable for postgres. In this part of the process, the indicies are critical in avoiding sequential scans, as the python routines query the database in order to do their part of the conversion. This has been especially important when converting 'large numbers' (hundreds) of records from various legacy data sources. Without these indicies, processing can take on the order of a second per record. Once a set has been prepared, the whole array is inserted as a single transaction. The original data have inconsistencies that are trapped by constraints set up in the database. While this is good -- we will end up with a cleaner database -- it means that we have to go back to the data source, diagnose the problem, make corrections, re-run the python scripts (need the indicies), then try the insert transaction again (can't have the indicies). The whole process is ugly. Clearly when I'm doing the inserts, these extra indicies aren't necessary. In fact, my (hopefully temporary) workaround is to drop the indicies when I need to insert new records; then restore the indicies. In the long term, it's not critical for me, since eventually we won't be importing 'large numbers' of records -- the functional index can be killed, and people can simply wait the second or so for a query to be processed. I regard this bug as a 'wishlist'-level item : it seems that you should be able to do this without the database choking. It's not critical for me -- I have a workaround. I clearly know little about databases, you are undoubtedly more knowledgable regarding whether my use of this index might be something someone else might want. Thank you for your attention to this matter. -frank
pgsql-bugs by date: