Thread: A couple of errors encountered in 7.1.3=>7.2.1-2 data migration
Hi- The background info: Debian 2.4.14 on both source & destination boxes PostgreSQL 7.1.3 on source PostgreSQL 7.2.1-2 on destination We are testing 7.2 with our application by installing it on a separate box & accessing it via the network. The 7.2 install is just a plain vanilla install using the Debian package which went well. (Well done, as usual Oliver!) The export of data from the old system was simply a pg_dumpall, and the import was a load of the dump file using psql and an empty newly created database. The load returned two errors that I don't understand: (1) ERROR: DefineIndex: index function must be marked iscachable unfortunately, I don't know which line of the dumpfile we were on when this occurred, but It only happened once, and the only indexes I can think of that might be different from the rest are functional indexes. One of them is an index on upper(field), and the other is based on a function defined in psql that uses the translate function. (2) ERROR: cannot find attribute 1 of relation schedule_notification_log ERROR: cannot find attribute 1 of relation schedule_notification_log We got this one twice, just as shown above. When I do a \d schedule_notification_log, I get this result: develop=# \d schedule_notification_log Table "schedule_notification_log" Column | Type | Modifiers --------+------+----------- Other than these errors, the data migration went very smoothly, and our applications have functioned at least partially without modification. We haven't been able to test thoroughly yet, because a few queries are very slow, and the second error above seems to keep vacuum analyze from running as well. Any thoughts on these errors are appreciated. Regards, -Nick -------------------------------------------------------------------------- Nick Fankhauser nickf@ontko.com Phone 1.765.935.4283 Fax 1.765.962.9788 Ray Ontko & Co. Software Consulting Services http://www.ontko.com/
"Nick Fankhauser" <nickf@ontko.com> writes: > The load returned two errors that I don't understand: > ERROR: DefineIndex: index function must be marked iscachable This means what it says: add "with (iscachable)" to your function declaration. This is intended to remind you that it is not sensible to build an index on a function that has time-varying results... > ERROR: cannot find attribute 1 of relation schedule_notification_log This is more disturbing. Can you provide the sequence of commands that led up to it? regards, tom lane
Ray Ontko <rayo@ontko.com> writes: > develop=# select count(*) from temp_event ; > ERROR: cannot find attribute 1 of relation temp_event I'm beginning to think there is something seriously messed up about your installation. The simplest theory is that the indexes on pg_attribute are corrupted. You could try a query that won't use the indexes, eg select (select relname from pg_class where oid = attrelid) from pg_attribute where attnum = 1 order by 1; and look through the output to see if temp_event and the other broken table are mentioned. If so, then the rows are there but the index is wrong. You should be able to recover using REINDEX --- read the reference page for it carefully before executing the recovery procedure! If that doesn't work, re-initdb and re-importation of your dump would be a good next step. Something must have dropped a bit somewhere... regards, tom lane
Tom- Thanks! your diagnosis was correct & the repair worked. -Nick & Ray > I'm beginning to think there is something seriously messed up about your > installation. The simplest theory is that the indexes on pg_attribute > are corrupted. ... > You should be able to recover using REINDEX
Okay, I guess I misunderstood something about "VACUUM FULL". My understanding was that a VACUUM (without FULL), marked unused records for reuse. VACUUM FULL moved records from the end of a file to the holes where unused records were stored and truncated the file to free disk space. So far so good, but.... I have had continued loss of performance on one of my largest tables (600,000,000 records). There are 4 fields in a record like such: CREATE TABLE foo ( a int not null references bar(a) on delete cascade on update no action, b int not null references baz(b) on delete cascade on update no action, c int, d smallint, primary key(a, b)) ; CREATE INDEX foo_ac ON foo (a,c) ; CREATE INDEX foo_ad on foo (a,d) ; And there are 3 triggers which fire before insert/delete/update. I INSERT/UPDATE/DELETE approximately 300,000 records per day, but this number is increasing on a daily basis as I make changes which improve the performance of my data gathering tools (spiders ;). Two days ago, it had reached the point where a search for a 3-word term (ie. free news servers) took about 60 seconds. I have just spent 48 hours running a VACUUM FULL on my table, and now the same search takes < 10 seconds. I assume that the increase in performance is due to the decrease in table/index size which added up to approximate 1GB of freed space on the machine, which was approximately 4% of the original size of the table and all its indices. But, a 4% decrease in size should not add up to a 84% increase in performance (is that right? I always get the ratio confused :). If all that VACUUM FULL did was move records from file 12345678.6 to file 12345678, the database would still being doing a large number of random accesses on the table. However, if VACUUM FULL clusters the data according to the primary key, it would still be doing a large number of random access on the table, because the primary key has almost nothing to do with how I actually access the data in real life. So, is VACUUM FULL looking somewhere in pg_statistics (or pg_stat_user_indexes), to determine which index I actually use most (foo_ad), and then clustering the data that way, or is there some other agent at work here. - brian Wm. Brian McCane | Life is full of doors that won't open Search http://recall.maxbaud.net/ | when you knock, equally spaced amid those Usenet http://freenews.maxbaud.net/ | that open when you don't want them to. Auction http://www.sellit-here.com/ | - Roger Zelazny "Blood of Amber"
On 10 May 2002 at 19:18, Brian McCane wrote: > But, a 4% decrease in size should not add up to a 84% increase in > performance (is that right? I always get the ratio confused :). Have you tried an ANALYZE as well? It wasn't mentioned. See http://www.postgresql.org/idocs/index.php?routine-vacuuming.html PostgreSQL's VACUUM command must be run on a regular basis for several reasons: 1. To recover disk space occupied by updated or deleted rows. 2. To update data statistics used by the PostgreSQL query planner. 3. To protect against loss of very old data due to transaction ID wraparound. Read 8.2.2. Updating planner statistics -- Dan Langille The FreeBSD Diary - http://freebsddiary.org/ - practical examples
Brian McCane <bmccane@mccons.net> writes: > I INSERT/UPDATE/DELETE approximately 300,000 records per day, but this > number is increasing on a daily basis as I make changes which improve the > performance of my data gathering tools (spiders ;). Two days ago, it had > reached the point where a search for a 3-word term (ie. free news servers) > took about 60 seconds. I have just spent 48 hours running a VACUUM FULL > on my table, and now the same search takes < 10 seconds. I would have to assume that the change is due to a change in plan, and not directly due to VACUUM having compacted out a small part of the space. Did you do VACUUM ANALYZE, or just plain VACUUM? When was the last time you did VACUUM (or ANALYZE) on this table? What plan is being used now? Do you have any idea what plan was used before? regards, tom lane
Hi Brian , are you performing full text search in any case? Apart from optimizing the TABLE/INDEXES (thru VACUUM measures) i feel using text indexes provided by contrib/tsearch can also lead to significant improvement in search performance. regds mallah. On Saturday 11 May 2002 05:48 am, Brian McCane wrote: > Okay, I guess I misunderstood something about "VACUUM FULL". My > understanding was that a VACUUM (without FULL), marked unused records for > reuse. VACUUM FULL moved records from the end of a file to the holes > where unused records were stored and truncated the file to free disk > space. So far so good, but.... > > I have had continued loss of performance on one of my largest tables > (600,000,000 records). There are 4 fields in a record like such: > > CREATE TABLE foo ( > a int not null references bar(a) on delete cascade on update no action, > b int not null references baz(b) on delete cascade on update no action, > c int, > d smallint, > primary key(a, b)) ; > CREATE INDEX foo_ac ON foo (a,c) ; > CREATE INDEX foo_ad on foo (a,d) ; > And there are 3 triggers which fire before insert/delete/update. > > I INSERT/UPDATE/DELETE approximately 300,000 records per day, but this > number is increasing on a daily basis as I make changes which improve the > performance of my data gathering tools (spiders ;). Two days ago, it had > reached the point where a search for a 3-word term (ie. free news servers) > took about 60 seconds. I have just spent 48 hours running a VACUUM FULL > on my table, and now the same search takes < 10 seconds. I assume that > the increase in performance is due to the decrease in table/index size > which added up to approximate 1GB of freed space on the machine, which was > approximately 4% of the original size of the table and all its indices. > But, a 4% decrease in size should not add up to a 84% increase in > performance (is that right? I always get the ratio confused :). > > If all that VACUUM FULL did was move records from file 12345678.6 to file > 12345678, the database would still being doing a large number of random > accesses on the table. However, if VACUUM FULL clusters the data > according to the primary key, it would still be doing a large number of > random access on the table, because the primary key has almost nothing to > do with how I actually access the data in real life. So, is VACUUM FULL > looking somewhere in pg_statistics (or pg_stat_user_indexes), to determine > which index I actually use most (foo_ad), and then clustering the data > that way, or is there some other agent at work here. > > - brian > > > Wm. Brian McCane | Life is full of doors that won't open > Search http://recall.maxbaud.net/ | when you knock, equally spaced amid > those Usenet http://freenews.maxbaud.net/ | that open when you don't want > them to. Auction http://www.sellit-here.com/ | - Roger Zelazny "Blood of > Amber" > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Rajesh Kumar Mallah, Project Manager (Development) Infocom Network Limited, New Delhi phone: +91(11)6152172 (221) (L) ,9811255597 (M) Visit http://www.trade-india.com , India's Leading B2B eMarketplace.
I have tried contrib/tsearch, but I do fairly complex score computation based on word placement (headers,body,graphics,metatags, etc). And also do additional computations on the scores when the user does a search. It is very quick to use something like tsearch to figure out which pages match my criterion, but very slow to then compute a valid score for each page. Instead, in an attempt to speed things up, a score is computed for each distinct word on a page, and that score is stored in this table. Then when a search is requested, I use previously existing search information and data from this table to compute the new results. The upshot of all this computing is that I get fairly relevant results, but I have to do a lot of math to get there. I have been considering combining the two methods, doing something like a full text search to find pages that meet my criterion, and then using this table to actually compute a pages score. - brian On Sat, 11 May 2002, Rajesh Kumar Mallah. wrote: > > Hi Brian , > > are you performing full text search in any case? > > Apart from optimizing the TABLE/INDEXES (thru VACUUM measures) > > i feel using text indexes provided by contrib/tsearch can also > lead to significant improvement in search performance. > > regds > mallah. > > On Saturday 11 May 2002 05:48 am, Brian McCane wrote: > > Okay, I guess I misunderstood something about "VACUUM FULL". My > > understanding was that a VACUUM (without FULL), marked unused records for > > reuse. VACUUM FULL moved records from the end of a file to the holes > > where unused records were stored and truncated the file to free disk > > space. So far so good, but.... > > > > I have had continued loss of performance on one of my largest tables > > (600,000,000 records). There are 4 fields in a record like such: > > > > CREATE TABLE foo ( > > a int not null references bar(a) on delete cascade on update no action, > > b int not null references baz(b) on delete cascade on update no action, > > c int, > > d smallint, > > primary key(a, b)) ; > > CREATE INDEX foo_ac ON foo (a,c) ; > > CREATE INDEX foo_ad on foo (a,d) ; > > And there are 3 triggers which fire before insert/delete/update. > > > > I INSERT/UPDATE/DELETE approximately 300,000 records per day, but this > > number is increasing on a daily basis as I make changes which improve the > > performance of my data gathering tools (spiders ;). Two days ago, it had > > reached the point where a search for a 3-word term (ie. free news servers) > > took about 60 seconds. I have just spent 48 hours running a VACUUM FULL > > on my table, and now the same search takes < 10 seconds. I assume that > > the increase in performance is due to the decrease in table/index size > > which added up to approximate 1GB of freed space on the machine, which was > > approximately 4% of the original size of the table and all its indices. > > But, a 4% decrease in size should not add up to a 84% increase in > > performance (is that right? I always get the ratio confused :). > > > > If all that VACUUM FULL did was move records from file 12345678.6 to file > > 12345678, the database would still being doing a large number of random > > accesses on the table. However, if VACUUM FULL clusters the data > > according to the primary key, it would still be doing a large number of > > random access on the table, because the primary key has almost nothing to > > do with how I actually access the data in real life. So, is VACUUM FULL > > looking somewhere in pg_statistics (or pg_stat_user_indexes), to determine > > which index I actually use most (foo_ad), and then clustering the data > > that way, or is there some other agent at work here. > > > > - brian > > > > > > Wm. Brian McCane | Life is full of doors that won't open > > Search http://recall.maxbaud.net/ | when you knock, equally spaced amid > > those Usenet http://freenews.maxbaud.net/ | that open when you don't want > > them to. Auction http://www.sellit-here.com/ | - Roger Zelazny "Blood of > > Amber" > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > -- > Rajesh Kumar Mallah, > Project Manager (Development) > Infocom Network Limited, New Delhi > phone: +91(11)6152172 (221) (L) ,9811255597 (M) > > Visit http://www.trade-india.com , > India's Leading B2B eMarketplace. > > Wm. Brian McCane | Life is full of doors that won't open Search http://recall.maxbaud.net/ | when you knock, equally spaced amid those Usenet http://freenews.maxbaud.net/ | that open when you don't want them to. Auction http://www.sellit-here.com/ | - Roger Zelazny "Blood of Amber"
Hi Brian, Yes contrib/tsearch does not provide relevence ranking at the moment, But OpenFTS-perl/tcl (openfts.sf.net) (from which tsearch is derieved) do provide relevence ranking and those are also very fast. and they work in similar fashion as you have metioned in your reply, also OpenFTS does stemming in you search phrases. but i understand applicability of those solutions are also subject to your current requirements. regds mallah. On Saturday 11 May 2002 10:23 pm, Brian McCane wrote: > I have tried contrib/tsearch, but I do fairly complex score computation > based on word placement (headers,body,graphics,metatags, etc). And also > do additional computations on the scores when the user does a search. It > is very quick to use something like tsearch to figure out which pages > match my criterion, but very slow to then compute a valid score for each > page. Instead, in an attempt to speed things up, a score is computed for > each distinct word on a page, and that score is stored in this table. > Then when a search is requested, I use previously existing search > information and data from this table to compute the new results. > > The upshot of all this computing is that I get fairly relevant results, > but I have to do a lot of math to get there. I have been considering > combining the two methods, doing something like a full text search to find > pages that meet my criterion, and then using this table to actually > compute a pages score. > > - brian > > On Sat, 11 May 2002, Rajesh Kumar Mallah. wrote: > > Hi Brian , > > > > are you performing full text search in any case? > > > > Apart from optimizing the TABLE/INDEXES (thru VACUUM measures) > > > > i feel using text indexes provided by contrib/tsearch can also > > lead to significant improvement in search performance. > > > > regds > > mallah. > > > > On Saturday 11 May 2002 05:48 am, Brian McCane wrote: > > > Okay, I guess I misunderstood something about "VACUUM FULL". My > > > understanding was that a VACUUM (without FULL), marked unused records > > > for reuse. VACUUM FULL moved records from the end of a file to the > > > holes where unused records were stored and truncated the file to free > > > disk space. So far so good, but.... > > > > > > I have had continued loss of performance on one of my largest tables > > > (600,000,000 records). There are 4 fields in a record like such: > > > > > > CREATE TABLE foo ( > > > a int not null references bar(a) on delete cascade on update no > > > action, b int not null references baz(b) on delete cascade on update no > > > action, c int, > > > d smallint, > > > primary key(a, b)) ; > > > CREATE INDEX foo_ac ON foo (a,c) ; > > > CREATE INDEX foo_ad on foo (a,d) ; > > > And there are 3 triggers which fire before insert/delete/update. > > > > > > I INSERT/UPDATE/DELETE approximately 300,000 records per day, but this > > > number is increasing on a daily basis as I make changes which improve > > > the performance of my data gathering tools (spiders ;). Two days ago, > > > it had reached the point where a search for a 3-word term (ie. free > > > news servers) took about 60 seconds. I have just spent 48 hours > > > running a VACUUM FULL on my table, and now the same search takes < 10 > > > seconds. I assume that the increase in performance is due to the > > > decrease in table/index size which added up to approximate 1GB of freed > > > space on the machine, which was approximately 4% of the original size > > > of the table and all its indices. But, a 4% decrease in size should not > > > add up to a 84% increase in performance (is that right? I always get > > > the ratio confused :). > > > > > > If all that VACUUM FULL did was move records from file 12345678.6 to > > > file 12345678, the database would still being doing a large number of > > > random accesses on the table. However, if VACUUM FULL clusters the > > > data according to the primary key, it would still be doing a large > > > number of random access on the table, because the primary key has > > > almost nothing to do with how I actually access the data in real life. > > > So, is VACUUM FULL looking somewhere in pg_statistics (or > > > pg_stat_user_indexes), to determine which index I actually use most > > > (foo_ad), and then clustering the data that way, or is there some other > > > agent at work here. > > > > > > - brian > > > > > > > > > Wm. Brian McCane | Life is full of doors that won't > > > open Search http://recall.maxbaud.net/ | when you knock, equally > > > spaced amid those Usenet http://freenews.maxbaud.net/ | that open when > > > you don't want them to. Auction http://www.sellit-here.com/ | - Roger > > > Zelazny "Blood of Amber" > > > > > > Rajesh Kumar Mallah, Project Manager (Development) Infocom Network Limited, New Delhi phone: +91(11)6152172 (221) (L) ,9811255597 (M) Visit http://www.trade-india.com , India's Leading B2B eMarketplace.
Tom, et al, > > ERROR: cannot find attribute 1 of relation schedule_notification_log > > This is more disturbing. Can you provide the sequence of commands that > led up to it? Here are a few related datapoints. 1) We think that schedule_notification_log is the only table that we loaded from the dump that had _no_ rows in it. 2) We encountered the same "cannot find attribute" error when doing something like this: develop=# create table temp_event as select * from event ; ... develop=# \d temp_event Table "temp_event" Attribute | Type | Modifier -----------+------+---------- develop=# select count(*) from temp_event ; ERROR: cannot find attribute 1 of relation temp_event develop=# drop table temp_event ; ERROR: cannot find attribute 1 of relation temp_event Note that "event" has plenty of rows. It may be our only table that includes the INTERVAL datatype. BTW, How do I get rid of temp_event? So: Is there something about a table that might cause it to NOT load correctly? Is there something about a table that might cause it to not copy correctly (using create as select * from )? Ray ---------------------------------------------------------------------- Ray Ontko rayo@ontko.com Phone 1.765.935.4283 Fax 1.765.962.9788 Ray Ontko & Co. Software Consulting Services http://www.ontko.com/