Thread: Slow Inserts on 1 table?
I have one particular insert query that is running orders of magnitude slower than other insert queries, and I cannot understand why. For example, Inserts into "conceptProperty" (detailed below) are at least 5 times faster than inserts into "conceptPropertyMultiAttributes". When I am running the inserts, postmaster shows as pegging one CPU on the Fedora Core 3 server it is running on at nearly 100%. Any advice is appreciated. Here is a lot of info that may shed light on the issue to someone with more experience than me: Example Insert Query with data: INSERT INTO conceptPropertyMultiAttributes (codingSchemeName, conceptCode, propertyId, attributeName, attributeValue) VALUES ('NCI MetaThesaurus', 'C0000005', 'T-2', 'Source', 'MSH2005_2004_10_12') EXPLAIN ANALYZE output: QUERY PLAN Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.008 rows=1 loops=1) Total runtime: 4.032 ms Table Structure: CREATE TABLE conceptpropertymultiattributes ( codingschemename character varying(70) NOT NULL, conceptcode character varying(100) NOT NULL, propertyid character varying(50) NOT NULL, attributename character varying(50) NOT NULL, attributevalue character varying(250) NOT NULL ); Primary Key: ALTER TABLE ONLY conceptpropertymultiattributes ADD CONSTRAINT conceptpropertymultiattributes_pkey PRIMARY KEY (codingschemename, conceptcode, propertyid, attributename, attributevalue); Foreign Key: ALTER TABLE ONLY conceptpropertymultiattributes ADD CONSTRAINT f FOREIGN KEY (codingschemename, conceptcode, propertyid) REFERENCES conceptproperty(codingschemename, conceptcode, propertyid); Structure of Table Referenced by Foreign Key: CREATE TABLE conceptproperty ( codingschemename character varying(70) NOT NULL, conceptcode character varying(100) NOT NULL, propertyid character varying(50) NOT NULL, property character varying(250) NOT NULL, "language" character varying(32), presentationformat character varying(50), datatype character varying(50), ispreferred boolean, degreeoffidelity character varying(50), matchifnocontext boolean, representationalform character varying(50), propertyvalue text NOT NULL ); Primary Key: ALTER TABLE ONLY conceptproperty ADD CONSTRAINT conceptproperty_pkey PRIMARY KEY (codingschemename, conceptcode, propertyid); Thanks, Dan
What indexes are defined on both tables? Are there any triggers or rules? On Wed, Jul 20, 2005 at 09:50:54AM -0500, Dan Armbrust wrote: > I have one particular insert query that is running orders of magnitude > slower than other insert queries, and I cannot understand why. > For example, Inserts into "conceptProperty" (detailed below) are at > least 5 times faster than inserts into "conceptPropertyMultiAttributes". > > When I am running the inserts, postmaster shows as pegging one CPU on > the Fedora Core 3 server it is running on at nearly 100%. > > Any advice is appreciated. Here is a lot of info that may shed light on > the issue to someone with more experience than me: > > Example Insert Query with data: > INSERT INTO conceptPropertyMultiAttributes (codingSchemeName, > conceptCode, propertyId, attributeName, attributeValue) VALUES ('NCI > MetaThesaurus', 'C0000005', 'T-2', 'Source', 'MSH2005_2004_10_12') > > EXPLAIN ANALYZE output: > QUERY PLAN > Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.008 > rows=1 loops=1) > Total runtime: 4.032 ms > > Table Structure: > CREATE TABLE conceptpropertymultiattributes ( > codingschemename character varying(70) NOT NULL, > conceptcode character varying(100) NOT NULL, > propertyid character varying(50) NOT NULL, > attributename character varying(50) NOT NULL, > attributevalue character varying(250) NOT NULL > ); > > Primary Key: > ALTER TABLE ONLY conceptpropertymultiattributes > ADD CONSTRAINT conceptpropertymultiattributes_pkey PRIMARY KEY > (codingschemename, conceptcode, propertyid, attributename, attributevalue); > > Foreign Key: > ALTER TABLE ONLY conceptpropertymultiattributes > ADD CONSTRAINT f FOREIGN KEY (codingschemename, conceptcode, > propertyid) REFERENCES conceptproperty(codingschemename, conceptcode, > propertyid); > > > Structure of Table Referenced by Foreign Key: > CREATE TABLE conceptproperty ( > codingschemename character varying(70) NOT NULL, > conceptcode character varying(100) NOT NULL, > propertyid character varying(50) NOT NULL, > property character varying(250) NOT NULL, > "language" character varying(32), > presentationformat character varying(50), > datatype character varying(50), > ispreferred boolean, > degreeoffidelity character varying(50), > matchifnocontext boolean, > representationalform character varying(50), > propertyvalue text NOT NULL > ); > > Primary Key: > ALTER TABLE ONLY conceptproperty > ADD CONSTRAINT conceptproperty_pkey PRIMARY KEY (codingschemename, > conceptcode, propertyid); > > Thanks, > > Dan > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
Dan Armbrust wrote: > I have one particular insert query that is running orders of magnitude > slower than other insert queries, and I cannot understand why. > For example, Inserts into "conceptProperty" (detailed below) are at > least 5 times faster than inserts into "conceptPropertyMultiAttributes". > > When I am running the inserts, postmaster shows as pegging one CPU on > the Fedora Core 3 server it is running on at nearly 100%. > > Any advice is appreciated. Here is a lot of info that may shed light > on the issue to someone with more experience than me: > > Example Insert Query with data: > INSERT INTO conceptPropertyMultiAttributes (codingSchemeName, > conceptCode, propertyId, attributeName, attributeValue) VALUES ('NCI > MetaThesaurus', 'C0000005', 'T-2', 'Source', 'MSH2005_2004_10_12') > > EXPLAIN ANALYZE output: > QUERY PLAN > Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.008 > rows=1 loops=1) > Total runtime: 4.032 ms > > Table Structure: > CREATE TABLE conceptpropertymultiattributes ( > codingschemename character varying(70) NOT NULL, > conceptcode character varying(100) NOT NULL, > propertyid character varying(50) NOT NULL, > attributename character varying(50) NOT NULL, > attributevalue character varying(250) NOT NULL > ); > > Primary Key: > ALTER TABLE ONLY conceptpropertymultiattributes > ADD CONSTRAINT conceptpropertymultiattributes_pkey PRIMARY KEY > (codingschemename, conceptcode, propertyid, attributename, > attributevalue); > > Foreign Key: > ALTER TABLE ONLY conceptpropertymultiattributes > ADD CONSTRAINT f FOREIGN KEY (codingschemename, conceptcode, > propertyid) REFERENCES conceptproperty(codingschemename, conceptcode, > propertyid); > > > Structure of Table Referenced by Foreign Key: > CREATE TABLE conceptproperty ( > codingschemename character varying(70) NOT NULL, > conceptcode character varying(100) NOT NULL, > propertyid character varying(50) NOT NULL, > property character varying(250) NOT NULL, > "language" character varying(32), > presentationformat character varying(50), > datatype character varying(50), > ispreferred boolean, > degreeoffidelity character varying(50), > matchifnocontext boolean, > representationalform character varying(50), > propertyvalue text NOT NULL > ); > > Primary Key: > ALTER TABLE ONLY conceptproperty > ADD CONSTRAINT conceptproperty_pkey PRIMARY KEY (codingschemename, > conceptcode, propertyid); > > Thanks, > > Dan > Well, I now have a further hunch on why the inserts are so slow on 1 table. Most of the time, when I am doing bulk inserts, I am starting with an empty database. My insertion program creates the tables, indexes and foreign keys. The problem seems to be the foreign key - PostgreSQL is apparently being to stupid to use the indexes while loading and checking the foreign key between two large tables - my guess is because analyze has not been run yet, so it thinks all of the tables are size 0. If I let it run for a while, then kill the load process, run Analyze, empty the tables, and then restart, things perform fine. But that is kind of a ridiculous sequence to have to use to load a database. Why can't postgres compile some rough statistics on tables without running analyze? Seems that it would be pretty easy to keep track of the number of inserts/deletions that have taken place since the last Analyze execution... It may not be the exact right number, but it would certainly be smarter than continuing to assume that the tables are size 0, even though it has been doing constant inserts on the tables in question.... I have already had to disable sequential scans, since the planner is almost _always_ wrong in deciding whether or not to use an index. I put the indexes on the columns I choose for a reason - it is because I KNOW the index read will ALWAYS be faster since I designed the indexes for the queries I am running. But it still must be doing a sequential scan on these inserts... -- **************************** Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/
Dan Armbrust wrote: > Dan Armbrust wrote: > >> I have one particular insert query that is running orders of magnitude >> slower than other insert queries, and I cannot understand why. >> For example, Inserts into "conceptProperty" (detailed below) are at >> least 5 times faster than inserts into "conceptPropertyMultiAttributes". > Well, I now have a further hunch on why the inserts are so slow on 1 > table. Most of the time, when I am doing bulk inserts, I am starting > with an empty database. My insertion program creates the tables, > indexes and foreign keys. > The problem seems to be the foreign key - PostgreSQL is apparently being > to stupid to use the indexes while loading and checking the foreign key > between two large tables - my guess is because analyze has not been run > yet, so it thinks all of the tables are size 0. If you haven't analysed them since creation, it should think size=1000, which is a safety measure to reduce this sort of problem. > If I let it run for a > while, then kill the load process, run Analyze, empty the tables, and > then restart, things perform fine. But that is kind of a ridiculous > sequence to have to use to load a database. > Why can't postgres compile some rough statistics on tables without > running analyze? Seems that it would be pretty easy to keep track of > the number of inserts/deletions that have taken place since the last > Analyze execution... It may not be the exact right number, but it would > certainly be smarter than continuing to assume that the tables are size > 0, even though it has been doing constant inserts on the tables in > question.... Yep, but it would have to do it all the time. That's overhead on every query. > I have already had to disable sequential scans, since the planner is > almost _always_ wrong in deciding whether or not to use an index. Then either your stats are badly out, or your other configuration settings are. > I put > the indexes on the columns I choose for a reason - it is because I KNOW > the index read will ALWAYS be faster since I designed the indexes for > the queries I am running. But it still must be doing a sequential scan > on these inserts... What, ALWAYS faster, even for the first FK check when there's only one row in the target table and that's cached? If you're really in a hurry doing your bulk loads: 1. Use COPY. 2. Drop/restore the foreign-key constraints before/after. That will be hugely faster than INSERTs, although it's not always an applicable solution. -- Richard Huxton Archonet Ltd
> my guess is because analyze has not been run yet, so it thinks all of > the tables are size 0. If I let it run for a while, then kill the > load process, run Analyze, empty the tables, and then restart, things > perform fine. But that is kind of a ridiculous sequence to have to > use to load a database. So automate it. After discovering exactly this behavior, I've developed an idiom for load scripts where I (optionally) commit at some linear interval, and (optionally) analyze at some exponential interval. I presume this has been invented countless times, but here's my basic idea in pseudo-code: commitInterval = 1000 analyzeFactor = 2 whenToCommit = whenToAnalyze = commitInterval nInserts = 0 loop over input data if we decide to insert insert nInserts++ if whenToCommit < nInserts commmit whenToCommit += commitInterval if whenToAnalyze < nInserts analyze whenToAnalyze *= 2 .... So (with these constants) we commit after 1000 total inserts, then after 2000, 3000, etc. And we analyze after 1000 inserts, then after 2000, 4000, etc. This is perhaps way too conservative - in particular, I suspect that it's only the first one or two analyzes that matter - but it works for me. The only annoyance is that the interface I use most often, Python's pgdb, runs everything in a transaction, and you can't analyze in a transaction. I've gotten around this in a variety of ways, some less principled than others. - John D. Burger MITRE
> Why can't postgres compile some rough statistics on tables without > running analyze? Why can't you just run analyze? You don't have to empty the tables to do so and you can alter the statistics on the fly. Heck you can even run analyze while doing the inserts. Perhaps the fine manual would be of assistance? http://www.postgresql.org/docs/8.0/interactive/sql-analyze.html http://www.postgresql.org/docs/8.0/interactive/sql-altertable.html > I have already had to disable sequential scans, since the planner is > almost _always_ wrong in deciding whether or not to use an index. Sounds again like you are not setting up your statistics correctly or running analyze as and when it should. Sincerely, Joshua D. Drake -- Your PostgreSQL solutions provider, Command Prompt, Inc. 24x7 support - 1.800.492.2240, programming, and consulting Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit http://www.commandprompt.com / http://www.postgresql.org
> > What, ALWAYS faster, even for the first FK check when there's only one > row in the target table and that's cached? > > If you're really in a hurry doing your bulk loads: > 1. Use COPY. > 2. Drop/restore the foreign-key constraints before/after. > That will be hugely faster than INSERTs, although it's not always an > applicable solution. > -- > Richard Huxton > Archonet Ltd > It seems like the query planner goes to great lengths to avoid using indexes because it might take 5 ms longer to execute an index lookup on a table with one row. But then, when the table has 1 million rows, and a full scan takes 3 minutes, and the index scan takes 3 seconds, it has no problem picking the 3 minute route. I'll gladly give up the 5 ms in turn for not having to wait 3 minutes, which is why I disabled the sequential scans. If I have a small table, where indexes won't speed things up, I wont build an index on it. The other factor, is that most of my tables have at least thousands, and usually millions of rows. Sequential scans will never be faster for the queries that I am doing - like I said, that is why I created the indexes. My loading is done programatically, from another format, so COPY is not an option. Neither is removing foreign keys, as they are required to guarantee valid data. I don't really have a problem with the insert speed when it is working properly - it is on par with other DBs that I have on the same hardware. The problem is when it stops using the indexes, for no good reason. Example, last night, I kicked off a load process - this morning, it had only managed to make it through about 600,000 rows (split across several tables). After restarting it this morning, it made it through the same data in 30 minutes. If thats not bad and buggy behavior, I don't know what is.... Dan -- **************************** Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/
Joshua D. Drake wrote: >> Why can't postgres compile some rough statistics on tables without >> running analyze? > > > Why can't you just run analyze? You don't have to empty the tables to > do so and you can alter the statistics on the fly. Heck you can even > run analyze while doing the inserts. I shouldn't have to manually run Analyze to make the DB be capable of handling inserts involving tables with foreign keys correctly. My code that is doing the inserts is a java application that works across multiple DBS - MySQL, PostgreSQL, DB2, MS Access, Sybase, etc. I shouldn't have to put custom code into it just to make postgres deal with inserts properly. No other database that I insert data into has problems like this. This will look really nice in the instructions for my data loader - * - If you are using PostgreSQL for your database server, it has a bug that causes its performance to become abysmal unless you manually run this "Analyze" command a little while after you start the load process. I will have users that don't even know what a database is, much less have to go out and run manual sysadmin level commands on it to make the thing work. > >> I have already had to disable sequential scans, since the planner is >> almost _always_ wrong in deciding whether or not to use an index. > > > Sounds again like you are not setting up your statistics correctly or > running analyze as and when it should. Doesn't matter if the statistics are perfectly up to date. It still doesn't use the indexes. If the default statistics are so poorly set up that the planner thinks a 5 million row table scan will be quicker for a query that is doing exact matches on indexed columns, I would say it is poorly implemented. So I just disabled that "feature". And it works fine with sequential scans disabled - I have no problem with it in this respect, since I can turn it off. Is there any way that I can disable sequential scans for foreign key checks? Dan -- **************************** Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/
Dan Armbrust wrote: > >> >> What, ALWAYS faster, even for the first FK check when there's only one >> row in the target table and that's cached? >> >> If you're really in a hurry doing your bulk loads: >> 1. Use COPY. >> 2. Drop/restore the foreign-key constraints before/after. >> That will be hugely faster than INSERTs, although it's not always an >> applicable solution. >> -- >> Richard Huxton >> Archonet Ltd >> > It seems like the query planner goes to great lengths to avoid using > indexes because it might take 5 ms longer to execute an index lookup on > a table with one row. > But then, when the table has 1 million rows, and a full scan takes 3 > minutes, and the index scan takes 3 seconds, it has no problem picking > the 3 minute route. > I'll gladly give up the 5 ms in turn for not having to wait 3 minutes, > which is why I disabled the sequential scans. If I have a small table, > where indexes won't speed things up, I wont build an index on it. > > The other factor, is that most of my tables have at least thousands, and > usually millions of rows. Sequential scans will never be faster for the > queries that I am doing - like I said, that is why I created the indexes. The issue is nothing to do with special "small table" handling code. It's all to do with not having up-to-date stats. Of course, once you've analysed your table the system knows your index is good. > My loading is done programatically, from another format, so COPY is not > an option. Why not? A lot of my bulk-loads are generated from other systems and I go through a temporary-file/pipe via COPY when I can. When I don't I block inserts into groups of e.g. 1000 and stick in an analyse/etc as required. > Neither is removing foreign keys, as they are required to > guarantee valid data. Ah, but you can still guarantee your data. You can wrap the whole drop-FK, bulk-load, recreate-FK in a single transaction, and it can still be faster. Obviously doing this on a high-activity table won't win though, you'll have to block everyone else doing updates. > I don't really have a problem with the insert > speed when it is working properly - it is on par with other DBs that I > have on the same hardware. The problem is when it stops using the > indexes, for no good reason. > > Example, last night, I kicked off a load process - this morning, it had > only managed to make it through about 600,000 rows (split across several > tables). After restarting it this morning, it made it through the same > data in 30 minutes. > If thats not bad and buggy behavior, I don't know what is.... So run ANALYSE in parallel with your load, or break the bulk-load into blocks and analyse in-line. I'm not sure ripping out PG's cost-based query analyser will be a popular solution just to address bulk-loads. -- Richard Huxton Archonet Ltd
On Tue, Aug 02, 2005 at 10:01:50AM -0500, Dan Armbrust wrote: > I shouldn't have to manually run Analyze to make the DB be capable of > handling inserts involving tables with foreign keys correctly. My code > that is doing the inserts is a java application that works across > multiple DBS - MySQL, PostgreSQL, DB2, MS Access, Sybase, etc. [etc, rant removed] You don't _have_ to be rude. This is a known limitation, and people have suggested the usual workarounds. This is an open source project -- if you think you can make it better, please by all means post a patch. HAND. -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) "El sentido de las cosas no viene de las cosas, sino de las inteligencias que las aplican a sus problemas diarios en busca del progreso." (Ernesto Hernández-Novich)
>> My loading is done programatically, from another format, so COPY is >> not an option. > > > Why not? A lot of my bulk-loads are generated from other systems and I > go through a temporary-file/pipe via COPY when I can. When I don't I > block inserts into groups of e.g. 1000 and stick in an analyse/etc as > required. I guess I should clarify - my inserts are done by a Java application running on a client machine. This isn't bulk load in the normal definition. I don't have any problem with the speed of the inserts when they are working correctly. The only problem is that the query analyzer is making a really poor decision when it is executing insert statements on tables that have foreign keys. > > So run ANALYSE in parallel with your load, or break the bulk-load into > blocks and analyse in-line. I'm not sure ripping out PG's cost-based > query analyser will be a popular solution just to address bulk-loads. I never suggested that it needed to be ripped out. It just seems that when it is looking to check foreign keys, and the statistics are not up to date (or have not yet been created) it should default to using the indexes, rather than not using the indexes. The time savings of using indexes when things are big is FAR bigger than the time savings of not using indexes when things are small. Dan -- **************************** Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/
Alvaro Herrera wrote: On Tue, Aug 02, 2005 at 10:01:50AM -0500, Dan Armbrust wrote: I shouldn't have to manually run Analyze to make the DB be capable of handling inserts involving tables with foreign keys correctly. My code that is doing the inserts is a java application that works across multiple DBS - MySQL, PostgreSQL, DB2, MS Access, Sybase, etc. [etc, rant removed] You don't _have_ to be rude. This is a known limitation, and people have suggested the usual workarounds. This is an open source project -- if you think you can make it better, please by all means post a patch. HAND. My apologies, I don't intend to be rude. But it is rather easy to get into rant mode when the prevailing opinion is that not being able to insert rows into a table with a foreign key without running Analyze after X rows is a misuse of the DB, rather than a bug. I did not know that this is a known limitation, I have not been able to find any documentation that talks about how foreign keys, indexes, and the query planner relate. My first assumption was that since foreign key creation implicitly creates the necessary indexes, that these indexes would always be used for foreign key checks. I wouldn't have even guessed that the query planner was involved in this portion. But, these are all (apparently wrong) guesses - I don't know the internals. The performance probably is better on small tables to not use these indexes. But it seems to me, that if you know that the statistics are out of date (which I would think that you should know, if analyze hasn't been run since the tables were created) that the safer choice would be to use the indexes, rather than not using the indexes. Dan -- **************************** Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/
Force PostgreSQL to use indexes on foreign key lookups - Was: Slow Inserts on 1 table?
From
Dan Armbrust
Date:
An aha moment: http://archives.postgresql.org/pgsql-bugs/2005-03/msg00183.php Some of the ensuing conversation seemed to indicate that a change was made in the 8.0 branch in March, that was intended to fix this issue. Any idea if that fix would have made it into the 8.0.3 release? Or maybe the fix didn't fix the issue. Dan -- **************************** Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/
"John D. Burger" <john@mitre.org> writes: > The only annoyance is that the interface I use most often, Python's > pgdb, runs everything in a transaction, and you can't analyze in a > transaction. Hm? We've allowed ANALYZE inside a transaction for a long time. The real solution to Dan's problem, of course, is to throw away the cached plan for the FK check and re-plan it once the table sizes have changed enough to invalidate the plan. Neil Conway was working on infrastructure for this, but it didn't get done in time for 8.1 ... maybe it will be there in 8.2. In the meantime, though, I don't see any mention in the thread of exactly which PG version Dan is using. If it's 8.0.0 or 8.0.1, an update would probably help --- we tweaked the rules for never-yet- vacuumed tables in 8.0.2. regards, tom lane
On Tue, Aug 02, 2005 at 10:41:01AM -0500, Dan Armbrust wrote: > But it is rather easy to get into rant mode when the prevailing > opinion is that not being able to insert rows into a table with a > foreign key without running Analyze after X rows is a misuse of the > DB, rather than a bug. I havn't seen it in this thread but it reminds me of someone who had a similar problem a long time ago. Basically, he was doing: TRUNCATE ANALYZE <load database> This screwed everything up, because the ANALYZE set the statistics to zero size tables. The solution was: *Don't* analyze the table when it's empty. If he left out the ANALYZE altogether it worked. It also works because just after a CREATE TABLE it defaults to using indexes too. The *only* time it starts worrying about seq scans is if you run ANALYZE on an empty table. So don't do that. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
>> The only annoyance is that the interface I use most often, Python's >> pgdb, runs everything in a transaction, and you can't analyze in a >> transaction. > > Hm? We've allowed ANALYZE inside a transaction for a long time. I'm stuck with using 7.2, for now, and I get this: ERROR: ANALYZE cannot run inside a BEGIN/END block I'll be happy to find out that this restriction's been removed in later versions. - John D. Burger MITRE
John D. Burger wrote: >>> The only annoyance is that the interface I use most often, Python's >>> pgdb, runs everything in a transaction, and you can't analyze in a >>> transaction. >> >> >> Hm? We've allowed ANALYZE inside a transaction for a long time. > > > I'm stuck with using 7.2, for now, and I get this: > > ERROR: ANALYZE cannot run inside a BEGIN/END block > > I'll be happy to find out that this restriction's been removed in later > versions. Your running 7.2? That is all kinds of level of... huh? Why? Sincerely, Joshua D. Drake > > - John D. Burger > MITRE > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
> Your running 7.2? That is all kinds of level of... huh? Why? I'm not running it, my organization is. Not sure how to interpret "all kinds of level of..." Are there any huge suckages that I can use to leverage an update? I'm familiar with some of the smaller ones. - John D. Burger G63
"John D. Burger" <john@mitre.org> writes: >> Your running 7.2? That is all kinds of level of... huh? Why? > I'm not running it, my organization is. Not sure how to interpret "all > kinds of level of..." Are there any huge suckages that I can use to > leverage an update? I'm familiar with some of the smaller ones. Lots, what does it take to get their attention? Feature-wise, there are such small matters as schemas. Performance-wise, there are quite a lot of improvements since 7.2. Security-wise, there are unfixable holes in 7.2 (try "select cash_out(2)"). I hope you're at least on 7.2.8, else you are also vulnerable to a number of data-loss-grade bugs. I don't recall at the moment whether there were any data-loss issues that we deemed unfixable in 7.2.*, but it wouldn't surprise me. Try perusing the release notes at http://developer.postgresql.org/docs/postgres/release.html for ammunition. regards, tom lane
> I hope you're at least on 7.2.8, else you are also vulnerable to a > number of data-loss-grade bugs. Sadly, it seems to be exactly 7.2. > Try perusing the release notes at > http://developer.postgresql.org/docs/postgres/release.html > for ammunition. Excellent - the various security holes might get their attention. Thanks! - John D. Burger MITRE
On Tue, 2005-08-02 at 15:52, John D. Burger wrote: > > I hope you're at least on 7.2.8, else you are also vulnerable to a > > number of data-loss-grade bugs. > > Sadly, it seems to be exactly 7.2. > > > Try perusing the release notes at > > http://developer.postgresql.org/docs/postgres/release.html > > for ammunition. > > Excellent - the various security holes might get their attention. > Thanks! Also, the index bloat issues in 7.2 can be quite a performance killer. Just sayin...
On Aug 2, 2005, at 8:16 AM, Alvaro Herrera wrote:
On Tue, Aug 02, 2005 at 10:01:50AM -0500, Dan Armbrust wrote:I shouldn't have to manually run Analyze to make the DB be capable ofhandling inserts involving tables with foreign keys correctly. My codethat is doing the inserts is a java application that works acrossmultiple DBS - MySQL, PostgreSQL, DB2, MS Access, Sybase, etc.[etc, rant removed]You don't _have_ to be rude. This is a known limitation, and peoplehave suggested the usual workarounds. This is an open source project --if you think you can make it better, please by all means post a patch.
Not to fan the flames, so to speak, but I do have on question. If this is a known limitation, I'd expect the behavior to be consistent. Instead, he is saying that the problem is intermittent. Sometimes it runs fine, other times it slows down to a snail's pace.
So, does that mean the known problem is indeed intermittent, and not something that happens every time? Or, is this an issue that can be mostly eliminated with appropriate tuning?
I ask, because I may soon be facing similar problems, and forewarned is forearmed. I'd like to anticipate this and tune things accordingly before it becomes an issue.
Thanks,
Greg
On Tue, Aug 02, 2005 at 10:55:22AM -0700, Gregory Youngblood wrote: > Not to fan the flames, so to speak, but I do have on question. If > this is a known limitation, I'd expect the behavior to be consistent. > Instead, he is saying that the problem is intermittent. Sometimes it > runs fine, other times it slows down to a snail's pace. > > So, does that mean the known problem is indeed intermittent, and not > something that happens every time? Or, is this an issue that can be > mostly eliminated with appropriate tuning? The problem is basically that PostgreSQL chooses a seq scan if it thinks the table is empty. But a normal CREATE TABLE fudges the statistics so that it chooses an index scan. The problem mostly arises if you use an ANALYZE or VACUUM ANALYZE *after* you've deleted all the rows but *before* you load the data. Don't do that. If you need to run VACUUM to clear out the table, do so, just don't analyze at the same time. Or use TRUNCATE. At least that's my experience. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.