Thread: Delete query takes exorbitant amount of time
v8.0.1 on a Sun v20Z running gentoo linux, 1 cpu, 1GB Ram, 1 10k scsi disk I have a (fairly) newly rebuilt database. In the last month it has undergone extensive testing, hence thousands of inserts and deletes in the table in question. After each mass unload/load cycle, I vacuum full analyze verbose. I tried to build a test case to isolate the issue, but the problem does not manifest itself, so I think I have somehow made postgresql angry. I could drop the whole db and start over, but I am interested in not reproducing this issue. Here is the statement: orfs=# explain analyze DELETE FROM int_sensor_meas_type WHERE id_meas_type IN (SELECT * FROM meas_type_ids); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=11.53..42.06 rows=200 width=6) (actual time=1.564..2.840 rows=552 loops=1) Hash Cond: ("outer".id_meas_type = "inner".id_meas_type) -> Seq Scan on int_sensor_meas_type (cost=0.00..25.36 rows=636 width=10) (actual time=0.005..0.828 rows=748 loops=1) -> Hash (cost=11.03..11.03 rows=200 width=4) (actual time=1.131..1.131 rows=0 loops=1) -> HashAggregate (cost=11.03..11.03 rows=200 width=4) (actual time=0.584..0.826 rows=552 loops=1) -> Seq Scan on meas_type_ids (cost=0.00..9.42 rows=642 width=4) (actual time=0.002..0.231 rows=552 loops=1) Total runtime: 2499616.216 ms (7 rows) Yes, that's *40 minutes*. It drives cpu (as viewed in top) to 99%+ for the entire duration of the query, but %mem hangs at 1% or lower. meas_type_ids is a temp table with the id's I want to nuke. Here is a similar query behaving as expected: orfs=# explain analyze DELETE FROM int_station_sensor WHERE id_sensor IN (SELECT * FROM sensor_ids); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=4.18..21.13 rows=272 width=6) (actual time=0.479..0.847 rows=169 loops=1) Hash Cond: ("outer".id_sensor = "inner".id_sensor) -> Seq Scan on int_station_sensor (cost=0.00..11.49 rows=549 width=10) (actual time=0.007..0.265 rows=267 loops=1) -> Hash (cost=3.68..3.68 rows=200 width=4) (actual time=0.325..0.325 rows=0 loops=1) -> HashAggregate (cost=3.68..3.68 rows=200 width=4) (actual time=0.177..0.256 rows=169 loops=1) -> Seq Scan on sensor_ids (cost=0.00..3.14 rows=214 width=4) (actual time=0.003..0.057 rows=169 loops=1) Total runtime: 1.340 ms (7 rows) I have posted my tables, data and test cases here: http://ccl.cens.nau.edu/~kan4/testing/long-delete Where do I go from here? Thanks in advance, -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221
Karim Nassar <Karim.Nassar@acm.org> writes: > Here is the statement: > orfs=# explain analyze DELETE FROM int_sensor_meas_type WHERE > id_meas_type IN (SELECT * FROM meas_type_ids); > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=11.53..42.06 rows=200 width=6) (actual > time=1.564..2.840 rows=552 loops=1) > ... > Total runtime: 2499616.216 ms > (7 rows) Notice that the actual join is taking 2.8 ms. The other ~40 minutes is in operations that we cannot see in this plan, but we can surmise are ON DELETE triggers. > Where do I go from here? Look at what your triggers are doing. My private bet is that you have unindexed foreign keys referencing this table, and so each deletion forces a seqscan of some other, evidently very large, table(s). regards, tom lane
Tom, I've got a similar problem with deletes taking a very long time. I know that there are lots of foreign keys referencing this table, and other foreign keys referencing those tables, etc. I've been curious, is there a way to find out how long the foreign key checks take for each dependent table? -- Mark Lewis On Thu, 2005-03-24 at 16:52, Tom Lane wrote: > Karim Nassar <Karim.Nassar@acm.org> writes: > > Here is the statement: > > > orfs=# explain analyze DELETE FROM int_sensor_meas_type WHERE > > id_meas_type IN (SELECT * FROM meas_type_ids); > > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------- > > Hash Join (cost=11.53..42.06 rows=200 width=6) (actual > > time=1.564..2.840 rows=552 loops=1) > > ... > > Total runtime: 2499616.216 ms > > (7 rows) > > Notice that the actual join is taking 2.8 ms. The other ~40 minutes is > in operations that we cannot see in this plan, but we can surmise are ON > DELETE triggers. > > > Where do I go from here? > > Look at what your triggers are doing. My private bet is that you have > unindexed foreign keys referencing this table, and so each deletion > forces a seqscan of some other, evidently very large, table(s). > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings
On Thu, 2005-03-24 at 19:52 -0500, Tom Lane wrote: > Karim Nassar <Karim.Nassar@acm.org> writes: > > Here is the statement: > > > orfs=# explain analyze DELETE FROM int_sensor_meas_type WHERE > > id_meas_type IN (SELECT * FROM meas_type_ids); > > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------- > > Hash Join (cost=11.53..42.06 rows=200 width=6) (actual > > time=1.564..2.840 rows=552 loops=1) > > ... > > Total runtime: 2499616.216 ms > > (7 rows) > > Notice that the actual join is taking 2.8 ms. The other ~40 minutes is > in operations that we cannot see in this plan, but we can surmise are ON > DELETE triggers. There are no DELETE triggers (that I have created). > > Where do I go from here? > > Look at what your triggers are doing. My private bet is that you have > unindexed foreign keys referencing this table, and so each deletion > forces a seqscan of some other, evidently very large, table(s). Almost. I have a large table (6.3 million rows) with a foreign key reference to this one (which has 749 rows), however it is indexed. I deleted the fk, ran the delete, then recreated the foreign key in about 15 seconds. Thanks! Problem now is: this referencing table I expect to grow to about 110 million rows in the next 2 months, then by 4 million rows per month thereafter. I expect that the time for recreating the foreign key will grow linearly with size. Is this just the kind of thing I need to watch out for? Any other suggestions for dealing with tables of this size? What can I do to my indexes to make them mo' betta? -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221
Karim Nassar <karim.nassar@NAU.EDU> writes: >> Look at what your triggers are doing. My private bet is that you have >> unindexed foreign keys referencing this table, and so each deletion >> forces a seqscan of some other, evidently very large, table(s). > Almost. I have a large table (6.3 million rows) with a foreign key > reference to this one (which has 749 rows), however it is indexed. In that case there's a datatype mismatch between the referencing and referenced columns, which prevents the index from being used for the FK check. regards, tom lane
Mark Lewis <mark.lewis@mir3.com> writes: > I've got a similar problem with deletes taking a very long time. I know > that there are lots of foreign keys referencing this table, and other > foreign keys referencing those tables, etc. I've been curious, is there > a way to find out how long the foreign key checks take for each > dependent table? There is not any easy way at the moment. Hmm ... I wonder how hard it would be to teach EXPLAIN ANALYZE to show the runtime expended in each trigger when the statement is of a kind that has triggers. We couldn't break down the time *within* the triggers, but even this info would help a lot in terms of finger pointing ... Seq Scan on ... (nn.nnn ms) Trigger foo: nn.mmm ms Trigger bar: nn.mmm ms Total time: nn.mmm ms regards, tom lane
Watch your pg_stats_* views before and after the delete and check what related tables have had lots of seqscans. Chris Mark Lewis wrote: > Tom, > > I've got a similar problem with deletes taking a very long time. I know > that there are lots of foreign keys referencing this table, and other > foreign keys referencing those tables, etc. I've been curious, is there > a way to find out how long the foreign key checks take for each > dependent table? > > -- Mark Lewis > > On Thu, 2005-03-24 at 16:52, Tom Lane wrote: > >>Karim Nassar <Karim.Nassar@acm.org> writes: >> >>>Here is the statement: >> >>>orfs=# explain analyze DELETE FROM int_sensor_meas_type WHERE >>>id_meas_type IN (SELECT * FROM meas_type_ids); >>> QUERY PLAN >>>----------------------------------------------------------------------------------------------------------------------------- >>> Hash Join (cost=11.53..42.06 rows=200 width=6) (actual >>>time=1.564..2.840 rows=552 loops=1) >>>... >>> Total runtime: 2499616.216 ms >>>(7 rows) >> >>Notice that the actual join is taking 2.8 ms. The other ~40 minutes is >>in operations that we cannot see in this plan, but we can surmise are ON >>DELETE triggers. >> >> >>>Where do I go from here? >> >>Look at what your triggers are doing. My private bet is that you have >>unindexed foreign keys referencing this table, and so each deletion >>forces a seqscan of some other, evidently very large, table(s). >> >> regards, tom lane >> >>---------------------------(end of broadcast)--------------------------- >>TIP 7: don't forget to increase your free space map settings > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match
> In that case there's a datatype mismatch between the referencing and > referenced columns, which prevents the index from being used for the > FK check. Is creating such a foreign key a WARNING yet? Chris
> Hmm ... I wonder how hard it would be to teach EXPLAIN ANALYZE to show > the runtime expended in each trigger when the statement is of a kind > that has triggers. We couldn't break down the time *within* the > triggers, but even this info would help a lot in terms of finger > pointing ... > > Seq Scan on ... (nn.nnn ms) > Trigger foo: nn.mmm ms > Trigger bar: nn.mmm ms > Total time: nn.mmm ms That would be really cool...
On Mar 24, 2005, at 10:38 PM, Christopher Kings-Lynne wrote: >> In that case there's a datatype mismatch between the referencing and >> referenced columns, which prevents the index from being used for the >> FK check. > > Is creating such a foreign key a WARNING yet? > I recall getting such a warning when importing my schema from a 7.4 to 8.0 server. I had one table with char and the other with varchar.
Karim, > Problem now is: this referencing table I expect to grow to about 110 > million rows in the next 2 months, then by 4 million rows per month > thereafter. I expect that the time for recreating the foreign key will > grow linearly with size. > > Is this just the kind of thing I need to watch out for? Any other > suggestions for dealing with tables of this size? What can I do to my > indexes to make them mo' betta? How about getting some decent disk support? A single 10K SCSI disk is a bit sub-par for a database with 100's of millions of records. Too bad you didn't get a v40z ... Beyond that, you'll want to do the same thing whenever you purge the referencing table; drop keys, delete, re-create keys. Or think about why it is you need to delete batches of records from this FKed table at all. -- Josh Berkus Aglio Database Solutions San Francisco
On Thu, 24 Mar 2005, Tom Lane wrote: > Mark Lewis <mark.lewis@mir3.com> writes: >> I've got a similar problem with deletes taking a very long time. I know >> that there are lots of foreign keys referencing this table, and other >> foreign keys referencing those tables, etc. I've been curious, is there >> a way to find out how long the foreign key checks take for each >> dependent table? > > There is not any easy way at the moment. > > Hmm ... I wonder how hard it would be to teach EXPLAIN ANALYZE to show > the runtime expended in each trigger when the statement is of a kind > that has triggers. We couldn't break down the time *within* the > triggers, but even this info would help a lot in terms of finger > pointing ... > > Seq Scan on ... (nn.nnn ms) > Trigger foo: nn.mmm ms > Trigger bar: nn.mmm ms > Total time: nn.mmm ms and if you add Index foo_idx: nn.mm ss Heap foo_tbl: nn.mm ss > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: >> Hmm ... I wonder how hard it would be to teach EXPLAIN ANALYZE to show >> the runtime expended in each trigger when the statement is of a kind >> that has triggers. > Could SPI "know" that an explain analyze is being run and add their > output and timings to the output? If it did, we'd be double-counting the time. regards, tom lane
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: >> In that case there's a datatype mismatch between the referencing and >> referenced columns, which prevents the index from being used for the >> FK check. > Is creating such a foreign key a WARNING yet? I believe so as of 8.0. It's a bit tricky since 8.0 does allow some cross-type cases to be indexed, but IIRC we have a test that understands about that... regards, tom lane
On Fri, 2005-03-25 at 01:58 -0500, Tom Lane wrote: > Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > >> In that case there's a datatype mismatch between the referencing and > >> referenced columns, which prevents the index from being used for the > >> FK check. > > > Is creating such a foreign key a WARNING yet? > > I believe so as of 8.0. It's a bit tricky since 8.0 does allow some > cross-type cases to be indexed, but IIRC we have a test that understands > about that... src/backend/commands/tablecmds.c, line 3966 in CVSTIP /* * Check that the found operator is compatible with the PK index, * and generate a warning if not, since otherwise costly seqscans * will be incurred to check FK validity. */ if (!op_in_opclass(oprid(o), opclasses[i])) ereport(WARNING, (errmsg("foreign key constraint \"%s\" " "will require costly sequential scans", fkconstraint->constr_name), errdetail("Key columns \"%s\" and \"%s\" " "are of different types: %s and %s.", strVal(list_nth(fkconstraint->fk_attrs, i)), strVal(list_nth(fkconstraint->pk_attrs, i)), format_type_be(fktypoid[i]), format_type_be(pktypoid[i])))); So, yes to the WARNING. Not sure about the cross-type cases... Karim: Did this happen? If not, can you drop and re-create and confirm that you get the WARNING? If not, we have problems. I vote to make this an ERROR in 8.1 - I see little benefit in allowing this situation to continue. If users do create a FK like this, it just becomes another performance problem on list... Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > I vote to make this an ERROR in 8.1 - I see little benefit in allowing > this situation to continue. Other than spec compliance, you mean? SQL99 says ... The declared type of each referencing column shall be comparable to the declared type of the corresponding referenced column. It doesn't say that it has to be indexable, and most definitely not that there has to be an index. regards, tom lane
On Fri, 2005-03-25 at 10:17 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > I vote to make this an ERROR in 8.1 - I see little benefit in allowing > > this situation to continue. > > Other than spec compliance, you mean? SQL99 says > > ... The declared type of each referencing column shall be > comparable to the declared type of the corresponding referenced > column. > > It doesn't say that it has to be indexable, and most definitely not that > there has to be an index. specs at dawn, eh? Well, SQL:2003 Foundation, p.550 clause 3a) states that the the <reference columns> in the referencing table must match a unique constraint on the referenced table, or the PRIMARY KEY if the columns are not specified. Either way, the referenced columns are a unique constraint (which makes perfect sense from a logical data perspective). We implement unique constraints via an index, so for PostgreSQL the clause implies that it must refer to an index. touche, Monsieur Lane and Happy Easter :-) But even without that, there is little benefit in allowing it... WARNING -> ERROR, please. Best Regards, Simon Riggs
On Fri, 25 Mar 2005, Simon Riggs wrote: > On Fri, 2005-03-25 at 10:17 -0500, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > I vote to make this an ERROR in 8.1 - I see little benefit in allowing > > > this situation to continue. > > > > Other than spec compliance, you mean? SQL99 says > > > > ... The declared type of each referencing column shall be > > comparable to the declared type of the corresponding referenced > > column. > > > > It doesn't say that it has to be indexable, and most definitely not that > > there has to be an index. > > specs at dawn, eh? > > Well, SQL:2003 Foundation, p.550 clause 3a) states that the the > <reference columns> in the referencing table must match a unique > constraint on the referenced table, or the PRIMARY KEY if the columns > are not specified. Either way, the referenced columns are a unique > constraint (which makes perfect sense from a logical data perspective). > > We implement unique constraints via an index, so for PostgreSQL the > clause implies that it must refer to an index. IMHO, that reference is irrrelevant. Yes, there must be an index due to our implementation, however that doesn't imply that the types must be the same, nor even that the index must be usable for the cross table comparison.
Karim, > I guess I was wondering if there is other general tuning advice for such > large table indexes such as increasing statistics, etc. Well, your index use problem is being explained by Tom, Stephan and Simon; basically your FKed data types are incompatible for index use purposes so the system *can't* use an index while loading. If you're going with the drop/load/recreate option, then I'd suggest increasing work_mem for the duration. Hmmm ... or maintenance_work_mem? What gets used for FK checks? Simon? -- Josh Berkus Aglio Database Solutions San Francisco
On Fri, 2005-03-25 at 08:23 -0800, Stephan Szabo wrote: > On Fri, 25 Mar 2005, Simon Riggs wrote: > > > On Fri, 2005-03-25 at 10:17 -0500, Tom Lane wrote: > > > Simon Riggs <simon@2ndquadrant.com> writes: > > > > I vote to make this an ERROR in 8.1 - I see little benefit in allowing > > > > this situation to continue. > > > > > > Other than spec compliance, you mean? SQL99 says > > > > > > ... The declared type of each referencing column shall be > > > comparable to the declared type of the corresponding referenced > > > column. > > > > > > It doesn't say that it has to be indexable, and most definitely not that > > > there has to be an index. > > > > specs at dawn, eh? > > > > Well, SQL:2003 Foundation, p.550 clause 3a) states that the the > > <reference columns> in the referencing table must match a unique > > constraint on the referenced table, or the PRIMARY KEY if the columns > > are not specified. Either way, the referenced columns are a unique > > constraint (which makes perfect sense from a logical data perspective). > > > > We implement unique constraints via an index, so for PostgreSQL the > > clause implies that it must refer to an index. > > IMHO, that reference is irrrelevant. Tom had said SQL99 required this; I have pointed out SQL:2003, which supercedes the SQL99 standard, does not require this. Leading us back to my original point - what is the benefit of continuing with having a WARNING when that leads people into trouble later? > Yes, there must be an index due to > our implementation, however that doesn't imply that the types must be the > same No, it doesn't imply it, but what benefit do you see from the interpretation that they are allowed to differ? That interpretation currently leads to many mistakes leading to poor performance. There is clear benefit from forcing them to be the same. In logical data terms, they *should* be the same. I don't check fruit.apple_grade against fruit_type.orange_grade. When would I want to make a check of that nature? If there is a reason, thats great, lets keep status quo then. I respect the effort and thought that has already gone into the implementation; I seek only to offer a very minor improvement based upon recent list issues. > nor even that the index must be usable for the cross table > comparison. Thats a separate discussion, possibly the next one. Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > On Fri, 2005-03-25 at 10:17 -0500, Tom Lane wrote: >>> Other than spec compliance, you mean? SQL99 says >>> >>> ... The declared type of each referencing column shall be >>> comparable to the declared type of the corresponding referenced >>> column. > Tom had said SQL99 required this; I have pointed out SQL:2003, which > supercedes the SQL99 standard, does not require this. You're reading the wrong part of SQL:2003. 11.8 <referential constraint definition> syntax rule 9 still has the text I quoted. > Leading us back to my original point - what is the benefit of continuing > with having a WARNING when that leads people into trouble later? Accepting spec-compliant schemas. regards, tom lane
On Fri, 2005-03-25 at 13:47 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > On Fri, 2005-03-25 at 10:17 -0500, Tom Lane wrote: > >>> Other than spec compliance, you mean? SQL99 says > >>> > >>> ... The declared type of each referencing column shall be > >>> comparable to the declared type of the corresponding referenced > >>> column. > > > Tom had said SQL99 required this; I have pointed out SQL:2003, which > > supercedes the SQL99 standard, does not require this. > > You're reading the wrong part of SQL:2003. 11.8 <referential constraint > definition> syntax rule 9 still has the text I quoted. So, we have this from SQL:2003 section 11.8 p.550 - 3a) requires us to have an index - 9) requires the data types to be "comparable" In the name of spec-compliance we wish to accept an interpretation of the word "comparable" that means we will accept two datatypes that are not actually the same. So we are happy to enforce having the index, but not happy to ensure the index is actually usable for the task? > > Leading us back to my original point - what is the benefit of continuing > > with having a WARNING when that leads people into trouble later? > > Accepting spec-compliant schemas. I definitely want this too - as you know I have worked on documenting compliance previously. Is the word "comparable" defined elsewhere in the standard? Currently, datatypes with similar type categories are comparable and yet (in 8.0) will now use the index. So, we are taking comparable to include fairly radically different datatypes? Could it be that because PostgreSQL has a very highly developed sense of datatype comparison that we might be taking this to extremes? Would any other RDBMS consider two different datatypes to be comparable? Please consider this. Best Regards, Simon Riggs
On Fri, 25 Mar 2005, Simon Riggs wrote: > On Fri, 2005-03-25 at 13:47 -0500, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > On Fri, 2005-03-25 at 10:17 -0500, Tom Lane wrote: > > >>> Other than spec compliance, you mean? SQL99 says > > >>> > > >>> ... The declared type of each referencing column shall be > > >>> comparable to the declared type of the corresponding referenced > > >>> column. > > > > > Tom had said SQL99 required this; I have pointed out SQL:2003, which > > > supercedes the SQL99 standard, does not require this. > > > > You're reading the wrong part of SQL:2003. 11.8 <referential constraint > > definition> syntax rule 9 still has the text I quoted. > > So, we have this from SQL:2003 section 11.8 p.550 > - 3a) requires us to have an index > - 9) requires the data types to be "comparable" > > In the name of spec-compliance we wish to accept an interpretation of > the word "comparable" that means we will accept two datatypes that are > not actually the same. > > So we are happy to enforce having the index, but not happy to ensure the > index is actually usable for the task? The indexes "usability" only applies to the purpose of guaranteeing uniqueness which doesn't depend on the referencing type AFAICS. > > > Leading us back to my original point - what is the benefit of continuing > > > with having a WARNING when that leads people into trouble later? > > > > Accepting spec-compliant schemas. > > I definitely want this too - as you know I have worked on documenting > compliance previously. > > Is the word "comparable" defined elsewhere in the standard? Yes. And at least in SQL99, there's a bunch of statements in 4.* about what are comparable. > Currently, datatypes with similar type categories are comparable and yet > (in 8.0) will now use the index. So, we are taking comparable to include > fairly radically different datatypes? Not entirely. I believe a referenced column of int, and a referencing column of numeric currently displays that warning, but appears to be allowed by the spec (as the numeric types are considered mutually comparable). > Could it be that because PostgreSQL has a very highly developed sense of > datatype comparison that we might be taking this to extremes? Would any > other RDBMS consider two different datatypes to be comparable? We do have a broader comparable than the spec. However, if we were to limit it to the spec then many of the implicit casts and cross-type comparison operators we have would be invalid as well since the comparison between those types would have to fail as well unless we treat the comparable used by <comparison predicate> differently.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > On Fri, 25 Mar 2005, Simon Riggs wrote: >> Could it be that because PostgreSQL has a very highly developed sense of >> datatype comparison that we might be taking this to extremes? Would any >> other RDBMS consider two different datatypes to be comparable? > We do have a broader comparable than the spec. However, the set of comparisons that we can presently support *with indexes* is narrower than the spec, so rejecting nonindexable cases would be a problem. It's worth noting also that the test being discussed checks whether the PK index is usable for testing the RI constraint. In the problem that started this thread, the difficulty is lack of a usable index on the FK column, not the PK (because that's the table that has to be searched to do a delete in the PK table). We cannot enforce that there be a usable index on the FK column (since indexes on the FK table may not have been built yet when the constraint is declared), and shouldn't anyway because there are reasonable usage patterns where you don't need one. regards, tom lane
On Fri, 2005-03-25 at 15:10 +0000, Simon Riggs wrote: > Karim: Did this happen? If not, can you drop and re-create and confirm > that you get the WARNING? If not, we have problems. No. Nor do I think that I should. SERIAL is shortcut for INTEGER, no? I think there is some other (TBD) problem causing my big seq scan. orfs=# ALTER TABLE measurement DROP CONSTRAINT measurement_id_int_sensor_meas_type_fkey; ALTER TABLE orfs=# ALTER TABLE ONLY measurement ADD CONSTRAINT measurement_id_int_sensor_meas_type_fkey orfs-# FOREIGN KEY (id_int_sensor_meas_type) REFERENCES int_sensor_meas_type(id_int_sensor_meas_type); ALTER TABLE orfs=# The add constraint statement comes directly from a pg_dump. For clarity, the table/indexes were created as such: CREATE TABLE int_sensor_meas_type( id_int_sensor_meas_type SERIAL PRIMARY KEY, id_sensor integer NOT NULL REFERENCES sensor, id_meas_type integer NOT NULL REFERENCES meas_type UNIQUE); CREATE TABLE measurement ( id_measurement SERIAL PRIMARY KEY, id_int_sensor_meas_type integer NOT NULL REFERENCES int_sensor_meas_type, datetime timestamp WITH TIME ZONE NOT NULL, value numeric(15,5) NOT NULL, created timestamp with time zone NOT NULL DEFAULT now(), created_by TEXT NOT NULL REFERENCES public.person(id_person)); CREATE INDEX measurement__id_int_sensor_meas_type_idx ON measurement(id_int_sensor_meas_type); Regards, -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221
On Fri, 2005-03-25 at 16:25 -0500, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > On Fri, 25 Mar 2005, Simon Riggs wrote: > >> Could it be that because PostgreSQL has a very highly developed sense of > >> datatype comparison that we might be taking this to extremes? Would any > >> other RDBMS consider two different datatypes to be comparable? > > > We do have a broader comparable than the spec. > > However, the set of comparisons that we can presently support *with > indexes* is narrower than the spec, so rejecting nonindexable cases > would be a problem. OK. Can we have a TODO item then? * Ensure that all SQL:2003 comparable datatypes are also indexable when compared ...or something like that > It's worth noting also that the test being discussed checks whether the > PK index is usable for testing the RI constraint. In the problem that > started this thread, the difficulty is lack of a usable index on the FK > column, not the PK (because that's the table that has to be searched to > do a delete in the PK table). We cannot enforce that there be a usable > index on the FK column (since indexes on the FK table may not have been > built yet when the constraint is declared), and shouldn't anyway because > there are reasonable usage patterns where you don't need one. Yes, I agree for CASCADE we wouldn't always want an index. Alright then, time to leave it there. I want to write up some additional comments for performance tips: - Consider defining RI constraints after tables have been loaded - Remember to add an index on the referencing table if the constraint is defined as CASCADEing Have a good Easter, all, wherever you are and whatever you believe in. Best Regards, Simon Riggs
> There is clear benefit from forcing them to be the same. In logical data > terms, they *should* be the same. I don't check fruit.apple_grade > against fruit_type.orange_grade. When would I want to make a check of > that nature? If there is a reason, thats great, lets keep status quo > then. > > I respect the effort and thought that has already gone into the > implementation; I seek only to offer a very minor improvement based upon > recent list issues. The main problem would be people getting errors when upgrading their databases, or restoring from a backup, say. Chris
On Fri, 25 Mar 2005, Karim Nassar wrote: > On Fri, 2005-03-25 at 15:10 +0000, Simon Riggs wrote: > > Karim: Did this happen? If not, can you drop and re-create and confirm > > that you get the WARNING? If not, we have problems. > > No. Nor do I think that I should. SERIAL is shortcut for INTEGER, no? I > think there is some other (TBD) problem causing my big seq scan. > > orfs=# ALTER TABLE measurement DROP CONSTRAINT measurement_id_int_sensor_meas_type_fkey; > ALTER TABLE > orfs=# ALTER TABLE ONLY measurement ADD CONSTRAINT measurement_id_int_sensor_meas_type_fkey > orfs-# FOREIGN KEY (id_int_sensor_meas_type) REFERENCES int_sensor_meas_type(id_int_sensor_meas_type); > ALTER TABLE > orfs=# > > The add constraint statement comes directly from a pg_dump. > > For clarity, the table/indexes were created as such: > > CREATE TABLE int_sensor_meas_type( > id_int_sensor_meas_type SERIAL PRIMARY KEY, > id_sensor integer NOT NULL REFERENCES sensor, > id_meas_type integer NOT NULL REFERENCES meas_type UNIQUE); > > CREATE TABLE measurement ( > id_measurement SERIAL PRIMARY KEY, > id_int_sensor_meas_type integer NOT NULL REFERENCES int_sensor_meas_type, > datetime timestamp WITH TIME ZONE NOT NULL, > value numeric(15,5) NOT NULL, > created timestamp with time zone NOT NULL DEFAULT now(), > created_by TEXT NOT NULL REFERENCES public.person(id_person)); > > CREATE INDEX measurement__id_int_sensor_meas_type_idx ON measurement(id_int_sensor_meas_type); That seems like it should be okay, hmm, what does something like: PREPARE test(int) AS SELECT 1 from measurement where id_int_sensor_meas_type = $1 FOR UPDATE; EXPLAIN ANALYZE EXECUTE TEST(1); give you as the plan?
On Sat, 2005-03-26 at 07:55 -0800, Stephan Szabo wrote: > That seems like it should be okay, hmm, what does something like: > > PREPARE test(int) AS SELECT 1 from measurement where > id_int_sensor_meas_type = $1 FOR UPDATE; > EXPLAIN ANALYZE EXECUTE TEST(1); > > give you as the plan? QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Seq Scan on measurement (cost=0.00..164559.16 rows=509478 width=6) (actual time=11608.402..11608.402 rows=0 loops=1) Filter: (id_int_sensor_meas_type = $1) Total runtime: 11608.441 ms (3 rows) -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221
On Sat, 26 Mar 2005, Karim Nassar wrote: > On Sat, 2005-03-26 at 07:55 -0800, Stephan Szabo wrote: > > That seems like it should be okay, hmm, what does something like: > > > > PREPARE test(int) AS SELECT 1 from measurement where > > id_int_sensor_meas_type = $1 FOR UPDATE; > > EXPLAIN ANALYZE EXECUTE TEST(1); > > > > give you as the plan? > > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------- > Seq Scan on measurement (cost=0.00..164559.16 rows=509478 width=6) > (actual time=11608.402..11608.402 rows=0 loops=1) > Filter: (id_int_sensor_meas_type = $1) > Total runtime: 11608.441 ms > (3 rows) Hmm, has measurement been analyzed recently? You might want to see if raising the statistics target on measurement.id_int_sensor_meas_type and reanalyzing changes the estimated rows down from 500k.
On Sat, 2005-03-26 at 15:18 -0800, Stephan Szabo wrote: > On Sat, 26 Mar 2005, Karim Nassar wrote: > > > On Sat, 2005-03-26 at 07:55 -0800, Stephan Szabo wrote: > > > That seems like it should be okay, hmm, what does something like: > > > > > > PREPARE test(int) AS SELECT 1 from measurement where > > > id_int_sensor_meas_type = $1 FOR UPDATE; > > > EXPLAIN ANALYZE EXECUTE TEST(1); > > > > > > give you as the plan? > > > > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------- > > Seq Scan on measurement (cost=0.00..164559.16 rows=509478 width=6) > > (actual time=11608.402..11608.402 rows=0 loops=1) > > Filter: (id_int_sensor_meas_type = $1) > > Total runtime: 11608.441 ms > > (3 rows) > > Hmm, has measurement been analyzed recently? You might want to see if > raising the statistics target on measurement.id_int_sensor_meas_type and > reanalyzing changes the estimated rows down from 500k. orfs=# ALTER TABLE measurement ALTER COLUMN id_int_sensor_meas_type SET STATISTICS 1000; ALTER TABLE orfs=# VACUUM FULL ANALYZE VERBOSE; <snip> INFO: free space map: 52 relations, 13501 pages stored; 9760 total pages needed DETAIL: Allocated FSM size: 1000 relations + 300000 pages = 1864 kB shared memory. VACUUM orfs=# PREPARE test(int) AS SELECT 1 from measurement where orfs-# id_int_sensor_meas_type = $1 FOR UPDATE; PREPARE orfs=# EXPLAIN ANALYZE EXECUTE TEST(1); QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Seq Scan on measurement (cost=0.00..164559.16 rows=509478 width=6) (actual time=8948.452..8948.452 rows=0 loops=1) Filter: (id_int_sensor_meas_type = $1) Total runtime: 8948.494 ms (3 rows) orfs=# EXPLAIN ANALYZE EXECUTE TEST(1); QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Seq Scan on measurement (cost=0.00..164559.16 rows=509478 width=6) (actual time=3956.616..3956.616 rows=0 loops=1) Filter: (id_int_sensor_meas_type = $1) Total runtime: 3956.662 ms (3 rows) Some improvement. Even better once it's cached. Row estimate didn't change. Is this the best I can expect? Is there any other optimizations I am missing? TIA, -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221
On Sat, 26 Mar 2005, Karim Nassar wrote: > On Sat, 2005-03-26 at 15:18 -0800, Stephan Szabo wrote: > > On Sat, 26 Mar 2005, Karim Nassar wrote: > > > > > On Sat, 2005-03-26 at 07:55 -0800, Stephan Szabo wrote: > > > > That seems like it should be okay, hmm, what does something like: > > > > > > > > PREPARE test(int) AS SELECT 1 from measurement where > > > > id_int_sensor_meas_type = $1 FOR UPDATE; > > > > EXPLAIN ANALYZE EXECUTE TEST(1); > > > > > > > > give you as the plan? > > > > > > QUERY PLAN > > > ----------------------------------------------------------------------------------------------------------------------- > > > Seq Scan on measurement (cost=0.00..164559.16 rows=509478 width=6) > > > (actual time=11608.402..11608.402 rows=0 loops=1) > > > Filter: (id_int_sensor_meas_type = $1) > > > Total runtime: 11608.441 ms > > > (3 rows) > > > > Hmm, has measurement been analyzed recently? You might want to see if > > raising the statistics target on measurement.id_int_sensor_meas_type and > > reanalyzing changes the estimated rows down from 500k. > > orfs=# ALTER TABLE measurement ALTER COLUMN id_int_sensor_meas_type SET STATISTICS 1000; > ALTER TABLE > orfs=# VACUUM FULL ANALYZE VERBOSE; > <snip> > INFO: free space map: 52 relations, 13501 pages stored; 9760 total pages needed > DETAIL: Allocated FSM size: 1000 relations + 300000 pages = 1864 kB shared memory. > VACUUM > orfs=# PREPARE test(int) AS SELECT 1 from measurement where > orfs-# id_int_sensor_meas_type = $1 FOR UPDATE; > PREPARE > orfs=# EXPLAIN ANALYZE EXECUTE TEST(1); > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------- > Seq Scan on measurement (cost=0.00..164559.16 rows=509478 width=6) (actual time=8948.452..8948.452 rows=0 loops=1) > Filter: (id_int_sensor_meas_type = $1) > Total runtime: 8948.494 ms > (3 rows) > > orfs=# EXPLAIN ANALYZE EXECUTE TEST(1); > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------- > Seq Scan on measurement (cost=0.00..164559.16 rows=509478 width=6) (actual time=3956.616..3956.616 rows=0 loops=1) > Filter: (id_int_sensor_meas_type = $1) > Total runtime: 3956.662 ms > (3 rows) > > > > Some improvement. Even better once it's cached. Row estimate didn't > change. Is this the best I can expect? Is there any other optimizations > I am missing? I'm not sure, really. Running a seq scan for each removed row in the referenced table doesn't seem like a particularly good plan in general though, especially if the average number of rows being referenced isn't on the order of 500k per value. I don't know what to look at next though.
On Thu, 2005-03-24 at 20:48 -0500, Tom Lane wrote: > In that case there's a datatype mismatch between the referencing and > referenced columns, which prevents the index from being used for the > FK check. Can I have more words on this? Here is how I created the tables: CREATE TABLE int_sensor_meas_type( id_int_sensor_meas_type SERIAL PRIMARY KEY, id_sensor integer NOT NULL REFERENCES sensor, id_meas_type integer NOT NULL REFERENCES meas_type UNIQUE); CREATE TABLE measurement ( id_measurement SERIAL PRIMARY KEY, id_int_sensor_meas_type integer NOT NULL REFERENCES int_sensor_meas_type, datetime timestamp WITH TIME ZONE NOT NULL, value numeric(15,5) NOT NULL, created timestamp with time zone NOT NULL DEFAULT now(), created_by TEXT NOT NULL REFERENCES public.person(id_person)); CREATE INDEX measurement__id_int_sensor_meas_type_idx ON measurement(id_int_sensor_meas_type); Do I need to cast the id_int_sensor_meas_type column when creating the index? Both referrer and referenced look like INTEGER to me... http://www.postgresql.org/docs/8.0/interactive/datatype.html#DATATYPE-SERIAL says: "The type names serial and serial4 are equivalent: both create integer columns" TIA, -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221
On Thu, 2005-03-24 at 21:24 -0800, Josh Berkus wrote: > Karim, > How about getting some decent disk support? A single 10K SCSI disk is a bit > sub-par for a database with 100's of millions of records. Too bad you didn't > get a v40z ... Hehe. I have one I am setting up that will be dedicated to postgresql, hence my question about a week ago about disk partitioning/striping :-) > Beyond that, you'll want to do the same thing whenever you purge the > referencing table; drop keys, delete, re-create keys. Or think about why it > is you need to delete batches of records from this FKed table at all. The database is for weather data from multiple sources. When adding a new dataset, I have to create/test/delete/recreate the config in the FKed table. Users don't have this power, but I need it. Drop/delete/recreate is a totally acceptable solution for this scenario. I guess I was wondering if there is other general tuning advice for such large table indexes such as increasing statistics, etc. Thanks, -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221
On Sun, 2005-03-27 at 07:05 -0800, Stephan Szabo wrote: > On Sat, 26 Mar 2005, Karim Nassar wrote: > > Some improvement. Even better once it's cached. Row estimate didn't > > change. Is this the best I can expect? Is there any other optimizations > > I am missing? > > I'm not sure, really. Running a seq scan for each removed row in the > referenced table doesn't seem like a particularly good plan in general > though, especially if the average number of rows being referenced isn't > on the order of 500k per value. I don't know what to look at next though. > Karim, please... run the EXPLAIN after doing SET enable_seqscan = off Thanks, Best Regards, Simon Riggs
On Mon, 28 Mar 2005, Karim A Nassar wrote: > On Mon, 28 Mar 2005, Simon Riggs wrote: > > run the EXPLAIN after doing > > SET enable_seqscan = off > > The results I previously supplied were searching for a non-existent > value, so I have provided output for both cases. > > *** > *** Searching for non-existent value > *** > > orfs=# PREPARE test2(int) AS SELECT 1 from measurement where > orfs-# id_int_sensor_meas_type = $1 FOR UPDATE; > PREPARE > orfs=# EXPLAIN ANALYZE EXECUTE TEST2(1); > QUERY PLAN > -------------------------------------------------------------------------- > Seq Scan on measurement > (cost=0.00..164559.16 rows=509478 width=6) > (actual time=6421.849..6421.849 rows=0 loops=1) > Filter: (id_int_sensor_meas_type = $1) > Total runtime: 6421.917 ms > (3 rows) > > orfs=# SET enable_seqscan = off; I think you have to prepare with enable_seqscan=off, because it effects how the query is planned and prepared.
Tom Lane Wrote: > Hmm ... I wonder how hard it would be to teach EXPLAIN ANALYZE to show > the runtime expended in each trigger when the statement is of a kind > that has triggers. We couldn't break down the time *within* the > triggers, but even this info would help a lot in terms of finger > pointing ... > > Seq Scan on ... (nn.nnn ms) > Trigger foo: nn.mmm ms > Trigger bar: nn.mmm ms > Total time: nn.mmm ms So I got the latest from CVS on Friday night to see how hard it would be to implement this, but it turns out that Tom has already committed the improvement, so I'm in Tom's fan club today. I imported my test dataset and was almost immediately able to track down the cause of my performance problem. Thanks! Mark Lewis
On Mon, 2005-03-28 at 09:37 -0700, Karim A Nassar wrote: > On Mon, 28 Mar 2005, Stephan Szabo wrote: > > > On Mon, 28 Mar 2005, Simon Riggs wrote: > > > > run the EXPLAIN after doing > > > > SET enable_seqscan = off > > ... > > > I think you have to prepare with enable_seqscan=off, because it > > effects how the query is planned and prepared. > > orfs=# SET enable_seqscan = off; > SET > orfs=# PREPARE test2(int) AS SELECT 1 from measurement where > orfs-# id_int_sensor_meas_type = $1 FOR UPDATE; > PREPARE > orfs=# EXPLAIN ANALYZE EXECUTE TEST2(1); -- non-existent > > QUERY PLAN > ------------------------------------------------------------------------- > Index Scan using measurement__id_int_sensor_meas_type_idx on measurement > (cost=0.00..883881.49 rows=509478 width=6) > (actual time=29.207..29.207 rows=0 loops=1) > Index Cond: (id_int_sensor_meas_type = $1) > Total runtime: 29.277 ms > (3 rows) > > orfs=# EXPLAIN ANALYZE EXECUTE TEST2(197); -- existing value > > QUERY PLAN > ------------------------------------------------------------------------- > Index Scan using measurement__id_int_sensor_meas_type_idx on measurement > (cost=0.00..883881.49 rows=509478 width=6) > (actual time=12.903..37478.167 rows=509478 loops=1) > Index Cond: (id_int_sensor_meas_type = $1) > Total runtime: 38113.338 ms > (3 rows) > "That process starts upon the supposition that when you have eliminated all which is impossible, then whatever remains, however improbable, must be the truth." - Sherlock Holmes Well, based upon the evidence so far, the Optimizer got it right: Normal SeqScan, value=1 elapsed= 6.4s cost=164559 SeqScan, value=197 elapsed=28.1s cost=164559 SeqScan=off IndexScan, value=1 elapsed= 29ms cost=883881 IndexScan, value=197 elapsed=38.1s cost=883881 With SeqScan=off the index is used, proving that it has been correctly defined for use in queries. The FK CASCADE delete onto measurement will only be triggered by the deletion of a real row, so the actual value will be the time taken. This is longer than a SeqScan, so the Optimizer is correct. My guess is that Measurement has a greatly non-uniform distribution of values and that 197 is one of the main values. Other values exist in the lookup table, but are very infrequently occurring in the larger table. Karim, Please do: select id_int_sensor_meas_type, count(*) from measurement group by id_int_sensor_meas_type order by count(*) desc; Best Regards, Simon Riggs
On Mon, 2005-03-28 at 13:03 -0700, Karim A Nassar wrote: > > Well, based upon the evidence so far, the Optimizer got it right: > > Agreed. So, this means that the answer to my original question is "that > delete gonna take a long time"? > > Seems that there is still something wrong. From what I can tell from > everyones questions, the FK constraint on measurement is causing multiple > seq scans for each value deleted from int_sensor_meas_type. However, when > deleting a single value, the FK check should use the index, so my ~190 > deletes *should* be fast, no? No. > > IndexScan, value=1 elapsed= 29ms cost=883881 > > 190 * 29ms is much less than 40 minutes. What am I missing here? It all depends upon your data. There are *no* values in *your* table that take 29ms to delete... > > Karim, > > Please do: > > > > select id_int_sensor_meas_type, count(*) > > from measurement > > group by id_int_sensor_meas_type > > order by count(*) desc; > > id_int_sensor_meas_type | count > -------------------------+-------- > 31 | 509478 > 30 | 509478 > 206 | 509478 > 205 | 509478 > 204 | 509478 > 40 | 509478 > 39 | 509478 > 197 | 509478 > 35 | 509478 > 34 | 509478 > 33 | 509478 > 32 | 509478 > 41 | 509477 > > This sample dataset has 13 measurements from a weather station over 3 > years, hence the even distribution. Each value has 1/13th of the table, which is too many rows per value to make an IndexScan an efficient way of deleting rows from the table. Thats it. If you have more values when measurement is bigger, the delete will eventually switch plans (if you reconnect) and use the index. But not yet. There's a few ways to (re)design around it, but the distribution of your data is not *currently* conducive to the using an index. Best Regards, Simon Riggs
On Fri, 2005-03-25 at 09:38 -0800, Josh Berkus wrote: > > I guess I was wondering if there is other general tuning advice for such > > large table indexes such as increasing statistics, etc. > > If you're going with the drop/load/recreate option, then I'd suggest > increasing work_mem for the duration. Hmmm ... or maintenance_work_mem? > What gets used for FK checks? Simon? > In 8.0, maintenance_work_mem is used for index creation, vacuum and initial check of FK checks at time of creation. Everything else uses work_mem as the limit. Best Regards, Simon Riggs
On Tue, 2005-03-29 at 01:48 -0700, Karim A Nassar wrote: > > Each value has 1/13th of the table, which is too many rows per value to > > make an IndexScan an efficient way of deleting rows from the table. > > But, the original question was that the delete that was taking a long time > was on a different table. I tried to delete 150 rows from a table with 750 > rows, which is FK referenced from this large table. If I understand > correctly, Tom suggested that the length of time was due to a sequential > scan being done on the large table for each value being deleted from the > small one. > For this FK check, there only need be one referring id to invalidate the > delete. ISTM that for any delete with a FK reference, the index could > always be used to search for a single value in the referring table > (excepting very small tables). Why then must a sequential scan be > performed in this case, and/or in general? My understanding was that you were doing a DELETE on the smaller table and that this was doing a DELETE on the measurement table because you had the FK defined as ON DELETE CASCADE. You are right - only a single row is sufficient to RESTRICT the DELETE. But for an ON UPDATE/DELETE action of CASCADE then you will want to touch all rows referenced, so a SeqScan is a perfectly valid consequence of such actions. I think now that you are using the default action, rather than specifically requesting CASCADE? Stephan, Tom: The SQL generated for RI checking by the RI triggers currently applies a limit at execution time, not at prepare time. i.e. there is no LIMIT clause in the SQL. We know whether the check will be limit 1 or limit 0 at prepare time, so why not add a LIMIT clause to the SQL so it changes the plan, not just the number of rows returned when the check query executes? (I note that PREPARE does allow you to supply a LIMIT 1 clause). That is *ought* to have some effect on the plan used by the RI check queries. In costsize.c:cost_index we would have tuples_fetched==1 and it would be hard (but not impossible) for the index cost to ever be more than the cost of a SeqScan. ...but, I see no way for OidFunctionCall8 to ever return an answer of "always just 1 row, no matter how big the relation"...so tuples_fetched is always proportional to the size of the relation. Are unique indexes treated just as very-low-selectivity indexes? - they're a very similar situation in terms of forcing an absolute, not relative, number of rows returned. Best Regards, Simon Riggs
On Tue, 29 Mar 2005, Simon Riggs wrote: > On Tue, 2005-03-29 at 01:48 -0700, Karim A Nassar wrote: > > > Each value has 1/13th of the table, which is too many rows per value to > > > make an IndexScan an efficient way of deleting rows from the table. > > > > But, the original question was that the delete that was taking a long time > > was on a different table. I tried to delete 150 rows from a table with 750 > > rows, which is FK referenced from this large table. If I understand > > correctly, Tom suggested that the length of time was due to a sequential > > scan being done on the large table for each value being deleted from the > > small one. > > > For this FK check, there only need be one referring id to invalidate the > > delete. ISTM that for any delete with a FK reference, the index could > > always be used to search for a single value in the referring table > > (excepting very small tables). Why then must a sequential scan be > > performed in this case, and/or in general? > > My understanding was that you were doing a DELETE on the smaller table > and that this was doing a DELETE on the measurement table because you > had the FK defined as ON DELETE CASCADE. You are right - only a single > row is sufficient to RESTRICT the DELETE. But for an ON UPDATE/DELETE > action of CASCADE then you will want to touch all rows referenced, so a > SeqScan is a perfectly valid consequence of such actions. > I think now that you are using the default action, rather than > specifically requesting CASCADE? > > Stephan, Tom: > The SQL generated for RI checking by the RI triggers currently applies a > limit at execution time, not at prepare time. i.e. there is no LIMIT > clause in the SQL. > > We know whether the check will be limit 1 or limit 0 at prepare time, so > why not add a LIMIT clause to the SQL so it changes the plan, not just > the number of rows returned when the check query executes? Because IIRC, FOR UPDATE and LIMIT at least historically didn't play nicely together, so you could sometimes get a result where if the first row was locked, the FOR UPDATE would wait on it, but if it was deleted by the other transaction you could get 0 rows back in the trigger.
On Tue, 29 Mar 2005, Stephan Szabo wrote: > On Tue, 29 Mar 2005, Simon Riggs wrote: > > > On Tue, 2005-03-29 at 01:48 -0700, Karim A Nassar wrote: > > > > Each value has 1/13th of the table, which is too many rows per value to > > > > make an IndexScan an efficient way of deleting rows from the table. > > > > > > But, the original question was that the delete that was taking a long time > > > was on a different table. I tried to delete 150 rows from a table with 750 > > > rows, which is FK referenced from this large table. If I understand > > > correctly, Tom suggested that the length of time was due to a sequential > > > scan being done on the large table for each value being deleted from the > > > small one. > > > > > For this FK check, there only need be one referring id to invalidate the > > > delete. ISTM that for any delete with a FK reference, the index could > > > always be used to search for a single value in the referring table > > > (excepting very small tables). Why then must a sequential scan be > > > performed in this case, and/or in general? > > > > My understanding was that you were doing a DELETE on the smaller table > > and that this was doing a DELETE on the measurement table because you > > had the FK defined as ON DELETE CASCADE. You are right - only a single > > row is sufficient to RESTRICT the DELETE. But for an ON UPDATE/DELETE > > action of CASCADE then you will want to touch all rows referenced, so a > > SeqScan is a perfectly valid consequence of such actions. > > I think now that you are using the default action, rather than > > specifically requesting CASCADE? > > > > Stephan, Tom: > > The SQL generated for RI checking by the RI triggers currently applies a > > limit at execution time, not at prepare time. i.e. there is no LIMIT > > clause in the SQL. > > > > We know whether the check will be limit 1 or limit 0 at prepare time, so > > why not add a LIMIT clause to the SQL so it changes the plan, not just > > the number of rows returned when the check query executes? > > Because IIRC, FOR UPDATE and LIMIT at least historically didn't play > nicely together, so you could sometimes get a result where if the first > row was locked, the FOR UPDATE would wait on it, but if it was deleted by > the other transaction you could get 0 rows back in the trigger. If there were some way to pass a "limit" into SPI_prepare that was treated similarly to a LIMIT clause for planning purposes but didn't actually change the output plan to only return that number of rows, we could use that.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > On Tue, 29 Mar 2005, Simon Riggs wrote: >> The SQL generated for RI checking by the RI triggers currently applies a >> limit at execution time, not at prepare time. i.e. there is no LIMIT >> clause in the SQL. >> >> We know whether the check will be limit 1 or limit 0 at prepare time, so >> why not add a LIMIT clause to the SQL so it changes the plan, not just >> the number of rows returned when the check query executes? > Because IIRC, FOR UPDATE and LIMIT at least historically didn't play > nicely together, so you could sometimes get a result where if the first > row was locked, the FOR UPDATE would wait on it, but if it was deleted by > the other transaction you could get 0 rows back in the trigger. Yeah, this is still true. It would probably be a good idea to change it but I haven't looked into exactly what would be involved. The basic problem is that the FOR UPDATE filter needs to execute before LIMIT instead of after, so presumably the FOR UPDATE shenanigans in execMain.c would need to be pushed into a separate plan node that could go underneath the LIMIT node. Originally this would have led to even more broken behavior --- locks taken on rows that weren't returned --- because the original coding of the LIMIT node tended to pull one more row from the lower plan than it would actually return. But we fixed that. I think having such a node might allow us to support FOR UPDATE in subqueries, as well, but I haven't looked at the details. (Whether that is a good idea is another question --- the problem of pulling rows that aren't nominally necessary, and thereby locking them, would apply in spades.) regards, tom lane
Simon Riggs <simon@2ndquadrant.com> writes: > ...but, I see no way for OidFunctionCall8 to ever return an answer of > "always just 1 row, no matter how big the relation"...so tuples_fetched > is always proportional to the size of the relation. Are unique indexes > treated just as very-low-selectivity indexes? Yeah. It is not the job of amcostestimate to estimate the number of rows, only the index access cost. (IIRC there is someplace in the planner that explicitly considers unique indexes as a part of developing selectivity estimates ... but it's not that part.) regards, tom lane
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > If there were some way to pass a "limit" into SPI_prepare that was treated > similarly to a LIMIT clause for planning purposes but didn't actually > change the output plan to only return that number of rows, we could use > that. Hmm ... the planner does have the ability to do that sort of thing (we use it for cursors). SPI_prepare doesn't expose the capability. Perhaps adding a SPI_prepare variant that does expose it would be the quickest route to a solution. I get a headache every time I look at the RI triggers ;-). Do they always know at the time of preparing a plan which way it will be used? regards, tom lane
On Tue, 2005-03-29 at 05:50 -0800, Stephan Szabo wrote: > On Tue, 29 Mar 2005, Simon Riggs wrote: > > > On Tue, 2005-03-29 at 01:48 -0700, Karim A Nassar wrote: > > > > Each value has 1/13th of the table, which is too many rows per value to > > > > make an IndexScan an efficient way of deleting rows from the table. > > > > > > But, the original question was that the delete that was taking a long time > > > was on a different table. I tried to delete 150 rows from a table with 750 > > > rows, which is FK referenced from this large table. If I understand > > > correctly, Tom suggested that the length of time was due to a sequential > > > scan being done on the large table for each value being deleted from the > > > small one. > > > > > For this FK check, there only need be one referring id to invalidate the > > > delete. ISTM that for any delete with a FK reference, the index could > > > always be used to search for a single value in the referring table > > > (excepting very small tables). Why then must a sequential scan be > > > performed in this case, and/or in general? > > > > My understanding was that you were doing a DELETE on the smaller table > > and that this was doing a DELETE on the measurement table because you > > had the FK defined as ON DELETE CASCADE. You are right - only a single > > row is sufficient to RESTRICT the DELETE. But for an ON UPDATE/DELETE > > action of CASCADE then you will want to touch all rows referenced, so a > > SeqScan is a perfectly valid consequence of such actions. > > I think now that you are using the default action, rather than > > specifically requesting CASCADE? > > > > Stephan, Tom: > > The SQL generated for RI checking by the RI triggers currently applies a > > limit at execution time, not at prepare time. i.e. there is no LIMIT > > clause in the SQL. > > > > We know whether the check will be limit 1 or limit 0 at prepare time, so > > why not add a LIMIT clause to the SQL so it changes the plan, not just > > the number of rows returned when the check query executes? > > Because IIRC, FOR UPDATE and LIMIT at least historically didn't play > nicely together, so you could sometimes get a result where if the first > row was locked, the FOR UPDATE would wait on it, but if it was deleted by > the other transaction you could get 0 rows back in the trigger. > Well, sorry to ask more... ...but surely we only need FOR UPDATE clause if we are performing a CASCADE action? whereas we only want the LIMIT 1 clause if we are NOT performing a CASCADE action? That way the two clauses are mutually exclusive and the problem you outline should never (need to) occur. The current code doesn't seem to vary the check query according to the requested FK action... Best Regards, Simon Riggs
On Tue, 2005-03-29 at 09:56 -0500, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > If there were some way to pass a "limit" into SPI_prepare that was treated > > similarly to a LIMIT clause for planning purposes but didn't actually > > change the output plan to only return that number of rows, we could use > > that. > > Hmm ... the planner does have the ability to do that sort of thing (we > use it for cursors). SPI_prepare doesn't expose the capability. > Perhaps adding a SPI_prepare variant that does expose it would be the > quickest route to a solution. > > I get a headache every time I look at the RI triggers ;-). Do they > always know at the time of preparing a plan which way it will be used? If action is NO ACTION or RESTRICT then we need to SELECT at most 1 row that matches the criteria which means we can use LIMIT 1 If action is CASCADE, SET NULL, SET DEFAULT then we need to UPDATE or DELETE all rows that match the criteria which means we musnt use LIMIT and need to use FOR UPDATE We know that at CONSTRAINT creation time, which always occurs before plan preparation time. Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > If action is NO ACTION or RESTRICT then > we need to SELECT at most 1 row that matches the criteria > which means we can use LIMIT 1 > If action is CASCADE, SET NULL, SET DEFAULT then > we need to UPDATE or DELETE all rows that match the criteria > which means we musnt use LIMIT and need to use FOR UPDATE Huh? UPDATE/DELETE don't use FOR UPDATE. I think you have failed to break down the cases sufficiently. In particular it matters which side of the RI constraint you are working from ... regards, tom lane
On Tue, 2005-03-29 at 10:31 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > If action is NO ACTION or RESTRICT then > > we need to SELECT at most 1 row that matches the criteria > > which means we can use LIMIT 1 > > > If action is CASCADE, SET NULL, SET DEFAULT then > > we need to UPDATE or DELETE all rows that match the criteria > > which means we musnt use LIMIT and need to use FOR UPDATE > > Huh? UPDATE/DELETE don't use FOR UPDATE. I think you have failed > to break down the cases sufficiently. In particular it matters which > side of the RI constraint you are working from ... OK... too quick, sorry. I'll hand over to Stephan for a better and more exhaustive explanation/analysis... but AFAICS we *can* always know the correct formulation of the query prepare time, whether or not we do currently. Best Regards, Simon Riggs
On Tue, 29 Mar 2005, Simon Riggs wrote: > On Tue, 2005-03-29 at 10:31 -0500, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > If action is NO ACTION or RESTRICT then > > > we need to SELECT at most 1 row that matches the criteria > > > which means we can use LIMIT 1 > > > > > If action is CASCADE, SET NULL, SET DEFAULT then > > > we need to UPDATE or DELETE all rows that match the criteria > > > which means we musnt use LIMIT and need to use FOR UPDATE > > > > Huh? UPDATE/DELETE don't use FOR UPDATE. I think you have failed > > to break down the cases sufficiently. In particular it matters which > > side of the RI constraint you are working from ... > > OK... too quick, sorry. I'll hand over to Stephan for a better and more > exhaustive explanation/analysis... but AFAICS we *can* always know the > correct formulation of the query prepare time, whether or not we do > currently. We currently use FOR UPDATE on the NO ACTION check, because otherwise we might get back a row that's already marked for deletion by a concurrent transaction. I think that's intended to wait and succeed, not fail.
On Tue, 29 Mar 2005, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > If there were some way to pass a "limit" into SPI_prepare that was treated > > similarly to a LIMIT clause for planning purposes but didn't actually > > change the output plan to only return that number of rows, we could use > > that. > > Hmm ... the planner does have the ability to do that sort of thing (we > use it for cursors). SPI_prepare doesn't expose the capability. > Perhaps adding a SPI_prepare variant that does expose it would be the > quickest route to a solution. > > I get a headache every time I look at the RI triggers ;-). Do they Me too, honestly. > always know at the time of preparing a plan which way it will be used? I believe so. I think each saved plan pretty much lives for a single trigger type/argument set and is basically used in only one place.
On Tue, 2005-03-29 at 09:40 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > ...but, I see no way for OidFunctionCall8 to ever return an answer of > > "always just 1 row, no matter how big the relation"...so tuples_fetched > > is always proportional to the size of the relation. Are unique indexes > > treated just as very-low-selectivity indexes? > > Yeah. It is not the job of amcostestimate to estimate the number of > rows, only the index access cost. (IIRC there is someplace in the > planner that explicitly considers unique indexes as a part of developing > selectivity estimates ... but it's not that part.) Well, I mention this because costsize.c:cost_index *does* calculate the number of rows returned. If unique indexes are handled elsewhere then this would not cause problems for them...but for LIMIT queries..? cost_index gets the selectivity then multiplies that by number of tuples in the relation to calc tuples_fetched, so it can use that in the Mackert & Lohman formula. There's no consideration of the query limits. That implies to me that LIMIT queries are not considered correctly in the M&L formula and thus we are more likely to calculate a too-high cost for using an index in those circumstances....and thus more likely to SeqScan for medium sized relations? Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > That implies to me that LIMIT queries are not considered correctly in > the M&L formula and thus we are more likely to calculate a too-high cost > for using an index in those circumstances....and thus more likely to > SeqScan for medium sized relations? You misunderstand how LIMIT is handled. The plan structure is LIMIT ... regular plan ... and so the strategy is to plan and cost the regular plan as though it would be carried out in full, and then take an appropriate fraction of that at the LIMIT stage. regards, tom lane
On Tue, 2005-03-29 at 12:31 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > That implies to me that LIMIT queries are not considered correctly in > > the M&L formula and thus we are more likely to calculate a too-high cost > > for using an index in those circumstances....and thus more likely to > > SeqScan for medium sized relations? > > You misunderstand how LIMIT is handled. Huh? Well, not this time. (Though my error rate is admittedly high.) > The plan structure is > > LIMIT ... > regular plan ... > > and so the strategy is to plan and cost the regular plan as though it > would be carried out in full, and then take an appropriate fraction > of that at the LIMIT stage. To cost it as if it would be carried out in full and then not execute in full is the same thing as saying it overestimates the actual execution cost. Which can lead to selection of SeqScan plan when the IndexScan would have been cheaper, all things considered. ...it could work like this LIMIT .... regular plan (plan chosen knowing that LIMIT follows) so that the LIMIT would be considered in the M&L formula. Not that I am driven by how other systems work, but both DB2 and Oracle allow this form of optimization. There's not a huge benefit in sending LIMIT 1 through on the FK check queries unless they'd be taken into account in the planning. Anyway, I'm not saying I know how to do this yet/ever, just to say it is possible to use the information available to better effect. This looks like a TODO item to me? Thoughts? Best Regards, Simon Riggs
> Each value has 1/13th of the table, which is too many rows per value to > make an IndexScan an efficient way of deleting rows from the table. But, the original question was that the delete that was taking a long time was on a different table. I tried to delete 150 rows from a table with 750 rows, which is FK referenced from this large table. If I understand correctly, Tom suggested that the length of time was due to a sequential scan being done on the large table for each value being deleted from the small one. (I have no formal training in database administration nor database theory, so please excuse me if I am being dumb.) For this FK check, there only need be one referring id to invalidate the delete. ISTM that for any delete with a FK reference, the index could always be used to search for a single value in the referring table (excepting very small tables). Why then must a sequential scan be performed in this case, and/or in general? -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221
> Well, based upon the evidence so far, the Optimizer got it right: Agreed. So, this means that the answer to my original question is "that delete gonna take a long time"? Seems that there is still something wrong. From what I can tell from everyones questions, the FK constraint on measurement is causing multiple seq scans for each value deleted from int_sensor_meas_type. However, when deleting a single value, the FK check should use the index, so my ~190 deletes *should* be fast, no? > IndexScan, value=1 elapsed= 29ms cost=883881 190 * 29ms is much less than 40 minutes. What am I missing here? > Karim, > Please do: > > select id_int_sensor_meas_type, count(*) > from measurement > group by id_int_sensor_meas_type > order by count(*) desc; id_int_sensor_meas_type | count -------------------------+-------- 31 | 509478 30 | 509478 206 | 509478 205 | 509478 204 | 509478 40 | 509478 39 | 509478 197 | 509478 35 | 509478 34 | 509478 33 | 509478 32 | 509478 41 | 509477 This sample dataset has 13 measurements from a weather station over 3 years, hence the even distribution. Continued thanks, -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221
On Mon, 28 Mar 2005, Stephan Szabo wrote: > > On Mon, 28 Mar 2005, Simon Riggs wrote: > > > run the EXPLAIN after doing > > > SET enable_seqscan = off ... > I think you have to prepare with enable_seqscan=off, because it > effects how the query is planned and prepared. orfs=# SET enable_seqscan = off; SET orfs=# PREPARE test2(int) AS SELECT 1 from measurement where orfs-# id_int_sensor_meas_type = $1 FOR UPDATE; PREPARE orfs=# EXPLAIN ANALYZE EXECUTE TEST2(1); -- non-existent QUERY PLAN ------------------------------------------------------------------------- Index Scan using measurement__id_int_sensor_meas_type_idx on measurement (cost=0.00..883881.49 rows=509478 width=6) (actual time=29.207..29.207 rows=0 loops=1) Index Cond: (id_int_sensor_meas_type = $1) Total runtime: 29.277 ms (3 rows) orfs=# EXPLAIN ANALYZE EXECUTE TEST2(197); -- existing value QUERY PLAN ------------------------------------------------------------------------- Index Scan using measurement__id_int_sensor_meas_type_idx on measurement (cost=0.00..883881.49 rows=509478 width=6) (actual time=12.903..37478.167 rows=509478 loops=1) Index Cond: (id_int_sensor_meas_type = $1) Total runtime: 38113.338 ms (3 rows) -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221
On Tue, Mar 29, 2005 at 01:48:48 -0700, Karim A Nassar <Karim.Nassar@NAU.EDU> wrote: > > For this FK check, there only need be one referring id to invalidate the > delete. ISTM that for any delete with a FK reference, the index could > always be used to search for a single value in the referring table > (excepting very small tables). Why then must a sequential scan be > performed in this case, and/or in general? First the index needs to exist. It isn't created automatically because not everyone wants such an index. Second, you need to have analyzed the referencing table so that the planner will know it is big enough that using an indexed search is worthwhile. The planner is getting better about dealing with size changes without reanalyzing, but it seems there are still some gotchas in 8.0.
Mark Lewis wrote: > I imported my test dataset > and was almost immediately able to track down the cause of my > performance problem. Why don't you tell us what the problem was :-) ? Regards Gaetano Mendola