Thread: 7.4 - FK constraint performance
PostgreSQL 7.4 on i386-linux-gnu, compiled by GCC 2.96 -- about 10 records CREATE TABLE my.Small ( id my.dint NOT NULL, code my.dvalue NOT NULL, CONSTRAINT pk_1 PRIMARY KEY (id), ) WITHOUT OIDS; -- about 80M rows CREATE TABLE my.Large ( id my.dlong NOT NULL, small_id my.dint NOT NULL, value my.value, CONSTRAINT pk_2 PRIMARY KEY (id), CONSTRAINT fk_2 FOREIGNKEY (small_id) REFERENCES my.small (id) ON UPDATE RESTRICT ON DELETE RESTRICT, ) WITHOUT OIDS; CREATE INDEX small_fk ON my.Large USING btree (small_id); --------------------------------------------- The fowllowing queiries run in less than 40 ms. 1) select 1 from Large where small_id = 239 2) SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x 3) delete from Small where id = 239 Feb 11 18:07:36 srv07 postgres[2091]: [91-1] LOG: statement: SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = $1 FOR UPDATE OF x Feb 11 18:10:41 srv07 postgres[2091]: [92-1] LOG: duration: 185273.262 ms When I try to delete record, it takes > 3 min. Why is it taking so long if practically the same select query (see (2)) is running very quickly. Anything that can be done to fix it? Thanks __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
On Wed, 11 Feb 2004, ow wrote: > PostgreSQL 7.4 on i386-linux-gnu, compiled by GCC 2.96 > > -- about 10 records > CREATE TABLE my.Small > ( > id my.dint NOT NULL, > code my.dvalue NOT NULL, > CONSTRAINT pk_1 PRIMARY KEY (id), > ) WITHOUT OIDS; > > -- about 80M rows > CREATE TABLE my.Large > ( > id my.dlong NOT NULL, > small_id my.dint NOT NULL, > value my.value, > CONSTRAINT pk_2 PRIMARY KEY (id), > CONSTRAINT fk_2 FOREIGN KEY (small_id) REFERENCES my.small (id) ON UPDATE > RESTRICT ON DELETE RESTRICT, > ) WITHOUT OIDS; > > CREATE INDEX small_fk ON my.Large USING btree (small_id); > > --------------------------------------------- > The fowllowing queiries run in less than 40 ms. > 1) select 1 from Large where small_id = 239 > 2) SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x > > 3) delete from Small where id = 239 > Feb 11 18:07:36 srv07 postgres[2091]: [91-1] LOG: statement: SELECT 1 FROM > ONLY "my"."Large" x WHERE "small_id" = $1 FOR UPDATE OF x Feb 11 18:10:41 srv07 > > postgres[2091]: [92-1] LOG: duration: 185273.262 ms > > When I try to delete record, it takes > 3 min. Why is it taking so long if > practically the same select query (see (2)) is running very quickly. Anything > that can be done to fix it? Hmm, I'd wonder if maybe it's choosing a sequential scan in the second case? As a random question, does increasing the statistics target on Large.small_id and re-analyzing change its behavior?
ow <oneway_111@yahoo.com> writes: > When I try to delete record, it takes > 3 min. Why is it taking so long if > practically the same select query (see (2)) is running very quickly. Anything > that can be done to fix it? What are those column datatypes? regards, tom lane
--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > Hmm, I'd wonder if maybe it's choosing a sequential scan in the second > case? As a random question, does increasing the statistics target on > Large.small_id and re-analyzing change its behavior? Ran analyze, the result is the same. Here's more info: 1) There's 1 row in "Large" for "small_id" = 239 SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x Quick query. Explain shows index scan. 2) There are many rows in "Large" for "small_id" = 1 SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 1 FOR UPDATE OF x Runs for about 3 min. Explain shows table scan. 3) delete from Small where id = 239 Runs for about 3 min. It does appear that table scan is used for FK verification. But why? Am deleting "id = 239" not "id = 1" and the query in (1) runs very quickly. Had suspicion that wrong id is passed during FK verification but FK constraint DOES work. 4) Domain types used in the example above my.dint = int my.dlong = int8 my.dvalue = varchar(15) Thanks __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
On Thu, 12 Feb 2004, ow wrote: > > --- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > > Hmm, I'd wonder if maybe it's choosing a sequential scan in the second > > case? As a random question, does increasing the statistics target on > > Large.small_id and re-analyzing change its behavior? > > Ran analyze, the result is the same. Here's more info: You also did the alter table to up the statistics target on the column, right? > 3) delete from Small where id = 239 > Runs for about 3 min. It does appear that table scan is used for FK > verification. But why? Am deleting "id = 239" not "id = 1" and the query in > (1) runs very quickly. Had suspicion that wrong id is passed during FK > verification but FK constraint DOES work. It doesn't plan it as id=239 but as id=$1 and then executes it with $1=239. The plan data gets reused for other id values if it needs the same fk action again later in the session. I'd hoped that upping the statistics target and re-analyzing would make it choose an index scan for the case where it doesn't know what constant is going to be used. Hmm, what is the estimated cost difference and real time difference on id=1 between seqscan and index scan (explain analyze output with and without enable_seqscan=off should show you).
On Thu, 12 Feb 2004, ow wrote: > --- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > > You also did the alter table to up the statistics target on the column, > > right? > > Not really. I did not change the the default stats settings in the > postgresql.conf. Not sure what needs to be changed, can you clarify? Basically, run something like: -- 1000 is just an arbitrary choice, but we can lower it later if this -- works. ALTER TABLE my.Large ALTER COLUMN small_id SET STATISTICS 1000; ANALYZE my.Large; This increases the target for just the column in question which means it should have a better idea of the distribution and may make it make a better guess if you've got a somewhat uneven distribution.
ow <oneway_111@yahoo.com> writes: > When I try to delete record, it takes > 3 min. I think it must be using a seqscan for the foreign key check query. Could you try this and show the results? prepare foo(my.dint) as SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x; explain analyze execute foo(239); When I try it I see an indexscan plan, but maybe there's some aspect of your setup that's causing problems. regards, tom lane
--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > You also did the alter table to up the statistics target on the column, > right? Not really. I did not change the the default stats settings in the postgresql.conf. Not sure what needs to be changed, can you clarify? Thanks __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > ow <oneway_111@yahoo.com> writes: > > When I try to delete record, it takes > 3 min. > > I think it must be using a seqscan for the foreign key check query. > Could you try this and show the results? 1) SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = 201 FOR UPDATE OF x; QUERY PLAN Index Scan using small_fk on large x (cost=0.00..6.01 rows=1 width=6) (actual time=0.251..0.251 rows=0 loops=1) Index Cond: ((small_id)::integer = 201) Total runtime: 0.338 ms 2) prepare foo(my.dint) as SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x; explain analyze execute foo(201); QUERY PLAN Seq Scan on large x (cost=0.00..1787052.30 rows=7893843 width=6) (actual time=210566.301..210566.301 rows=0 loops=1) Filter: ((small_id)::integer = ($1)::integer) Total runtime: 210566.411 ms Thanks __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Well, there's the smoking gun all right. Why does it think there are > going to be 7893843 matching rows!? Could we see the pg_stats row for > the large.small_id column? > > regards, tom lane schemaname tablename attname null_frac avg_width n_distinct most_common_vals most_common_freqs histogram_bounds correlation my large small_id 0 4 10 {7,3,5,1,4,2,8,10,6,9} {0.108667,0.105,0.104333,0.101333,0.100667,0.0983333,0.0983333,0.0983333,0.094,0.091}0.0597573 __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
ow <oneway_111@yahoo.com> writes: > schemaname tablename attname null_frac avg_width n_distinct most_common_vals > most_common_freqs histogram_bounds correlation > my large small_id 0 4 10 {7,3,5,1,4,2,8,10,6,9} > {0.108667,0.105,0.104333,0.101333,0.100667,0.0983333,0.0983333,0.0983333,0.094,0.091} > 0.0597573 According to this entry, your small_id column only contains the ten values 1..10, roughly evenly distributed. So why are you probing for 239?? The planner is certainly going to estimate a probe for an unspecified value as retrieving 10% of the table, and under that assumption it's quite right to use a seqscan. If this estimate is not right, perhaps you could give us a more accurate view of the column statistics? regards, tom lane
ow <oneway_111@yahoo.com> writes: > --- Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think it must be using a seqscan for the foreign key check query. > 2) prepare foo(my.dint) as > SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x; > explain analyze execute foo(201); > QUERY PLAN > Seq Scan on large x (cost=0.00..1787052.30 rows=7893843 width=6) (actual > time=210566.301..210566.301 rows=0 loops=1) > Filter: ((small_id)::integer = ($1)::integer) > Total runtime: 210566.411 ms Well, there's the smoking gun all right. Why does it think there are going to be 7893843 matching rows!? Could we see the pg_stats row for the large.small_id column? regards, tom lane
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Can't see how we optimize your case without pessimizing more-common cases. My case appears to be pretty common, i.e. 1 small and 1 large table with RI constraint between them. In order to delete a record from the small table, the large table must not have records that are dependent on the deleted row. I think other RDBMSs simply use preset value instead of partial table scan when there's not enough stat info. Might be a better way. Thanks __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
ow <oneway_111@yahoo.com> writes: > Sounds pretty bad for my case. Any way to avoid the 10% scan? Can't see how we optimize your case without pessimizing more-common cases. Sorry. regards, tom lane
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > According to this entry, your small_id column only contains the ten > values 1..10, roughly evenly distributed. So why are you probing for > 239?? Let's say we have City (small) and Person (large) tables. A new city was added (mistakenly) with id=239, it does not have any "persons" assigned yet. Hence, we want to remove the wrong "city" record. In any case, one can't remove record from "small" unless there are NO records in "large", RI will not allow it. The initial problem was that I tried to delete a record from "small" and it was taking about 3 min to do that. > The planner is certainly going to estimate a probe for an unspecified > value as retrieving 10% of the table, and under that assumption it's > quite right to use a seqscan. Sounds pretty bad for my case. Any way to avoid the 10% scan? Thanks __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
Rod Taylor <rbt@rbt.ca> writes: > Statistics say there are 10 values. Statistics list the 10 most common > values (all of them). Given this, would it not be reasonable to assume > that 239 is a recent addition (if there at all) to the table and not > very common? We don't know that it's 239 when we make the plan. In order to know that, we'd have to abandon caching of RI check query plans and re-plan for each row. That strikes me as inevitably a losing proposition. regards, tom lane
ow <oneway_111@yahoo.com> writes: > --- Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Can't see how we optimize your case without pessimizing more-common cases. > I think other RDBMSs simply use preset value instead of partial table > scan when there's not enough stat info. Might be a better way. The problem here cannot be described as "not enough stat info". The available stats are complete and they point very strongly to the conclusion that searches in the large table should be seqscans. To do otherwise would be folly in general, even if it happens to be the correct thing in this particular example. regards, tom lane
On Friday 13 February 2004 04:25, Tom Lane wrote: > Rod Taylor <rbt@rbt.ca> writes: > > Statistics say there are 10 values. Statistics list the 10 most common > > values (all of them). Given this, would it not be reasonable to assume > > that 239 is a recent addition (if there at all) to the table and not > > very common? > > We don't know that it's 239 when we make the plan. In order to know > that, we'd have to abandon caching of RI check query plans and re-plan > for each row. That strikes me as inevitably a losing proposition. In this precise example, could you not: 1. Check index for value 2. If found, seq-scan Of course that's only going to be a sensible thing to do if you're expecting one of two results: 1. Value not there 2. Lengthy seq-scan if it is there -- Richard Huxton Archonet Ltd
On Thu, 12 Feb 2004, Tom Lane wrote: > Rod Taylor <rbt@rbt.ca> writes: > > Statistics say there are 10 values. Statistics list the 10 most common > > values (all of them). Given this, would it not be reasonable to assume > > that 239 is a recent addition (if there at all) to the table and not > > very common? > > We don't know that it's 239 when we make the plan. In order to know > that, we'd have to abandon caching of RI check query plans and re-plan > for each row. That strikes me as inevitably a losing proposition. One thing is that IIRC we're going to ask for only one row when we do the SPI_execp_current. However, unless I misremember, the behavior of for update and limit means that saying limit 1 is potentially unsafe (if you block on a row that goes away). Is there anyway for us to let the planner know this?
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > One thing is that IIRC we're going to ask for only one row when we do the > SPI_execp_current. However, unless I misremember, the behavior of for > update and limit means that saying limit 1 is potentially unsafe (if you > block on a row that goes away). Is there anyway for us to let the planner > know this? I was looking at that last night. It seems like we could add a LIMIT at least in some contexts. In the case at hand, we're just going to error out immediately if we find a matching row, and so there's no need for FOR UPDATE, is there? However, I'm not sure it would help the OP anyway. With the stats he had, the planner would still take a seqscan, because it's going to expect that it can find a match by probing the first ten or so rows of the first page. With anything close to the normal cost parameters, that's going to look more expensive than an index probe. Possibly if the table had a few more values it would work. But in general it would be a good idea if the planner knew that plan evaluation would stop after the first row. We could look at passing that info down out-of-band instead of using LIMIT. There's already support for this to allow EXISTS() subqueries to be planned properly; see the tuple_fraction stuff in planner.c. We just can't get at it via SPI ... regards, tom lane
On Fri, 13 Feb 2004, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > One thing is that IIRC we're going to ask for only one row when we do the > > SPI_execp_current. However, unless I misremember, the behavior of for > > update and limit means that saying limit 1 is potentially unsafe (if you > > block on a row that goes away). Is there anyway for us to let the planner > > know this? > > I was looking at that last night. It seems like we could add a LIMIT at > least in some contexts. In the case at hand, we're just going to error > out immediately if we find a matching row, and so there's no need for > FOR UPDATE, is there? I think there still is, because a not yet committed transaction could have deleted them all in which case I think the correct behavior is to wait and if that transaction commits allow the action and if it rolls back to error. Really we'd want a different behavior where we're only blocking in these cases if all the matching rows are locked by other transactions. > However, I'm not sure it would help the OP anyway. With the stats he > had, the planner would still take a seqscan, because it's going to > expect that it can find a match by probing the first ten or so rows of > the first page. With anything close to the normal cost parameters, > that's going to look more expensive than an index probe. Possibly if > the table had a few more values it would work. Hmm, that's true. It also doesn't help the real actions (cascade, set *) since those really do need to get at all the rows, but it probably helps in a reasonable number of cases.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > On Fri, 13 Feb 2004, Tom Lane wrote: >> I was looking at that last night. It seems like we could add a LIMIT at >> least in some contexts. In the case at hand, we're just going to error >> out immediately if we find a matching row, and so there's no need for >> FOR UPDATE, is there? > I think there still is, because a not yet committed transaction could have > deleted them all in which case I think the correct behavior is to wait and > if that transaction commits allow the action and if it rolls back to > error. Good point. Okay, we can't put in a LIMIT. But we could still hack the planner to prefer a fast-start plan by passing an out-of-band tuple fraction, for those RI plans where it's appropriate. That would not affect correctness. >> However, I'm not sure it would help the OP anyway. With the stats he >> had, the planner would still take a seqscan, because it's going to >> expect that it can find a match by probing the first ten or so rows of >> the first page. With anything close to the normal cost parameters, >> that's going to look more expensive than an index probe. s/more expensive/less expensive/ ... need more caffeine obviously ... regards, tom lane
On Fri, 13 Feb 2004, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > On Fri, 13 Feb 2004, Tom Lane wrote: > >> I was looking at that last night. It seems like we could add a LIMIT at > >> least in some contexts. In the case at hand, we're just going to error > >> out immediately if we find a matching row, and so there's no need for > >> FOR UPDATE, is there? > > > I think there still is, because a not yet committed transaction could have > > deleted them all in which case I think the correct behavior is to wait and > > if that transaction commits allow the action and if it rolls back to > > error. > > Good point. Okay, we can't put in a LIMIT. But we could still hack the > planner to prefer a fast-start plan by passing an out-of-band tuple > fraction, for those RI plans where it's appropriate. That would not > affect correctness. Right, I can try to look through the stuff you pointed at in the previous message over the weekend. > >> However, I'm not sure it would help the OP anyway. With the stats he > >> had, the planner would still take a seqscan, because it's going to > >> expect that it can find a match by probing the first ten or so rows of > >> the first page. With anything close to the normal cost parameters, > >> that's going to look more expensive than an index probe. > > s/more expensive/less expensive/ ... need more caffeine obviously ... Me too apparently, since I knew what you were saying and agreed despite the wording.
> In this precise example, could you not: > 1. Check index for value > 2. If found, seq-scan > > Of course that's only going to be a sensible thing to do if you're expecting > one of two results: > 1. Value not there > 2. Lengthy seq-scan if it is there Most of the queries are going to be for the other values (in which case you've wasted an index scan) which is minor, but in the event there is a single 239 you're still taking a big hit. That is an awful lot of work to handle the non-existant case only. -- Rod Taylor <rbt [at] rbt [dot] ca> Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/rbtpub.asc
On Thu, 2004-02-12 at 23:25, Tom Lane wrote: > Rod Taylor <rbt@rbt.ca> writes: > > Statistics say there are 10 values. Statistics list the 10 most common > > values (all of them). Given this, would it not be reasonable to assume > > that 239 is a recent addition (if there at all) to the table and not > > very common? > > We don't know that it's 239 when we make the plan. In order to know > that, we'd have to abandon caching of RI check query plans and re-plan > for each row. That strikes me as inevitably a losing proposition. Right, sorry, I forgot this was out of a fk trigger. -- Rod Taylor <rbt [at] rbt [dot] ca> Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/rbtpub.asc
On Thu, 2004-02-12 at 20:10, Tom Lane wrote: > ow <oneway_111@yahoo.com> writes: > > Sounds pretty bad for my case. Any way to avoid the 10% scan? > > Can't see how we optimize your case without pessimizing more-common cases. > Sorry. Statistics say there are 10 values. Statistics list the 10 most common values (all of them). Given this, would it not be reasonable to assume that 239 is a recent addition (if there at all) to the table and not very common? -- Rod Taylor <rbt [at] rbt [dot] ca> Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/rbtpub.asc
> In this precise example, could you not: > 1. Check index for value > 2. If found, seq-scan > > Of course that's only going to be a sensible thing to do if you're expecting > one of two results: > 1. Value not there > 2. Lengthy seq-scan if it is there Most of the queries are going to be for the other values (in which case you've wasted an index scan) which is minor, but in the event there is a single 239 you're still taking a big hit. That is an awful lot of work to handle the non-existant case only. -- Rod Taylor <rbt [at] rbt [dot] ca> Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/rbtpub.asc
On Thu, 2004-02-12 at 23:25, Tom Lane wrote: > Rod Taylor <rbt@rbt.ca> writes: > > Statistics say there are 10 values. Statistics list the 10 most common > > values (all of them). Given this, would it not be reasonable to assume > > that 239 is a recent addition (if there at all) to the table and not > > very common? > > We don't know that it's 239 when we make the plan. In order to know > that, we'd have to abandon caching of RI check query plans and re-plan > for each row. That strikes me as inevitably a losing proposition. Right, sorry, I forgot this was out of a fk trigger. -- Rod Taylor <rbt [at] rbt [dot] ca> Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/rbtpub.asc
Hello, I just discovered the following change to CHAR(n) (taken from varlena.com, general bits, issue 62). This will cause me serious heart-ache, back-ache and bug-ache if we upgrade to 7.4. Is there any compatibility flag available to use pre-7.4 behaviour for CHAR(n)? Regards, John ================== In 7.4, one of the changes was that the char(n) type now truncates trailing spaces. This causes a problem for those of us using ::char(n) to create fixed length fields. Creating fixed length field batch files are usually required to interface with legacy systems. In the example below, psql is called from a shell with tuples only (t) and unaligned (A). The result of the selection creates a fixed width file. One of the techniques used to create fixed length fields in 7.3 and earlier was to cast the value to char(n). This along with the various to_char() functions used to be able to create fixed length records. For example: % psql -At > batch.out << ENDselect accountid::char(30), to_char( transaction_amount, 'FM000.00'), (lastname || ',' ||firstname )::char(40), bankid::char(15), to_char( now(), 'YYYYMMDD'); END In 7.4 this no longer works. The fields created with the cast are no longer fixed length. Instead of using the cast to make fixed length fields, use rpad() and lpad() to do the same thing.rpad(string text, length integer [, fill text])lpad(string text, length integer [, fill text]) The previous selection should now be written as follows. % psql -At > batch.out << ENDselect rpad(accountid,30), to_char( transaction_amount, 'FM000.00'), rpad( (lastname ||',' || firstname ), 40), rpad(bankid, 15), to_char( now(), 'YYYYMMDD'); END
"news.postgresql.org" <jlim@natsoft.com.my> writes: > I just discovered the following change to CHAR(n) (taken from varlena.com, > general bits, issue 62). The description you quote doesn't appear to have much of anything to do with the actual behavior of 7.4. 7.4 will trim trailing spaces when converting char(n) to varchar or text, but the example query does not do that. It just coerces query output columns to char(n), and that works the same as it did before. For instance regression=# select 'zit'::char(77); bpchar -------------------------------------------------------------------------------zit (1 row) regards, tom lane
This is an example of the problem. It used to expand the middle thing to 15. elein=# select 'x' || ' '::char(15) || 'x';?column? ----------xx (1 row) On Tue, Feb 17, 2004 at 06:10:56PM -0500, Tom Lane wrote: > "news.postgresql.org" <jlim@natsoft.com.my> writes: > > I just discovered the following change to CHAR(n) (taken from varlena.com, > > general bits, issue 62). > > The description you quote doesn't appear to have much of anything to do > with the actual behavior of 7.4. > > 7.4 will trim trailing spaces when converting char(n) to varchar or > text, but the example query does not do that. It just coerces query > output columns to char(n), and that works the same as it did before. > For instance > > regression=# select 'zit'::char(77); > bpchar > ------------------------------------------------------------------------------- > zit > (1 row) > > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
elein <elein@varlena.com> writes: > This is an example of the problem. It used to expand > the middle thing to 15. > elein=# select 'x' || ' '::char(15) || 'x'; > ?column? > ---------- > xx > (1 row) Still does, but then the spaces go away again when the value goes into the concatenation, because concatenation is a text operator. regards, tom lane
So the problem is there. But blaming it on char was wrong. It should be blamed on the varchar change. Hey, I thought the truncation was for varchar and not text? It was for both? It would be semantically tricky to change the operator. The precendence is to convert to text. Now with the implicit update of the char(n) to text for the operator "corrupts" the char() value. elein On Tue, Feb 17, 2004 at 06:40:49PM -0500, Tom Lane wrote: > elein <elein@varlena.com> writes: > > This is an example of the problem. It used to expand > > the middle thing to 15. > > > elein=# select 'x' || ' '::char(15) || 'x'; > > ?column? > > ---------- > > xx > > (1 row) > > Still does, but then the spaces go away again when the value goes into > the concatenation, because concatenation is a text operator. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org
On Tue, 17 Feb 2004, Tom Lane wrote: > elein <elein@varlena.com> writes: > > This is an example of the problem. It used to expand > > the middle thing to 15. > > > elein=# select 'x' || ' '::char(15) || 'x'; > > ?column? > > ---------- > > xx > > (1 row) > > Still does, but then the spaces go away again when the value goes into > the concatenation, because concatenation is a text operator. But then this: select 'x'||' '||'x' should produce xx, but it produces x x.
"scott.marlowe" <scott.marlowe@ihs.com> writes: > But then this: > select 'x'||' '||'x' > should produce xx, but it produces x x. No, because the imputed type of those literals is text. You'd have to cast the middle guy to char(n) explicitly to make its trailing spaces go away when it's reconverted to text. The real issue here is that trailing spaces in char(n) are semantically insignificant according to the SQL spec. The spec is pretty vague about which operations should actually honor that insignificance --- it's clear that comparisons should, less clear about other things. I think the 7.4 behavior is more consistent than what we had before, but I'm willing to be persuaded to change it again if someone can give an alternate definition that's more workable than this one. regards, tom lane
> select 'x'||' '||'x' > > should produce xx, but it produces x x. > INCORRECT This select 'x'||' '::char ||'x' Should produce xx This select 'x'||' '||'x' is restateable as select 'x'|| ' '::text ||'x' And the || operand for text is not dropping the extra spaces hence correctly x x Terry Fielder Manager Software Development and Deployment Great Gulf Homes / Ashton Woods Homes terry@greatgulfhomes.com Fax: (416) 441-9085 > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of scott.marlowe > Sent: Tuesday, February 17, 2004 7:07 PM > To: Tom Lane > Cc: elein; news.postgresql.org; pgsql-sql@postgresql.org > Subject: Re: [SQL] CHAR(n) always trims trailing spaces in 7.4 > > > On Tue, 17 Feb 2004, Tom Lane wrote: > > > elein <elein@varlena.com> writes: > > > This is an example of the problem. It used to expand > > > the middle thing to 15. > > > > > elein=# select 'x' || ' '::char(15) || 'x'; > > > ?column? > > > ---------- > > > xx > > > (1 row) > > > > Still does, but then the spaces go away again when the > value goes into > > the concatenation, because concatenation is a text operator. > > But then this: > > select 'x'||' '||'x' > > should produce xx, but it produces x x. > > > ---------------------------(end of > broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
Apparently the ::char is cast to varchar and then text? That explains x || ' ' || x On Tue, Feb 17, 2004 at 05:07:24PM -0700, scott.marlowe wrote: > On Tue, 17 Feb 2004, Tom Lane wrote: > > > elein <elein@varlena.com> writes: > > > This is an example of the problem. It used to expand > > > the middle thing to 15. > > > > > elein=# select 'x' || ' '::char(15) || 'x'; > > > ?column? > > > ---------- > > > xx > > > (1 row) > > > > Still does, but then the spaces go away again when the value goes into > > the concatenation, because concatenation is a text operator. > > But then this: > > select 'x'||' '||'x' > > should produce xx, but it produces x x.
elein <elein@varlena.com> writes: > Apparently the ::char is cast to varchar and then text? No, directly to text, because the || operator is defined as taking text inputs. But there's no practical difference between text and varchar on this point. regards, tom lane
On Wednesday 18 February 2004 00:25, Tom Lane wrote: > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > But then this: > > select 'x'||' '||'x' > > should produce xx, but it produces x x. > > No, because the imputed type of those literals is text. You'd have to > cast the middle guy to char(n) explicitly to make its trailing spaces go > away when it's reconverted to text. > > The real issue here is that trailing spaces in char(n) are semantically > insignificant according to the SQL spec. The spec is pretty vague about > which operations should actually honor that insignificance --- it's > clear that comparisons should, less clear about other things. I think > the 7.4 behavior is more consistent than what we had before, but I'm > willing to be persuaded to change it again if someone can give an > alternate definition that's more workable than this one. [rant on] I've never really understood the rationale behind char(n) in SQL databases (other than as backward compatibility with some old mainframe DB). Insignificant spaces? If it's not significant, why is it there? You could have a formatting rule that specifies left-aligned strings space-padded (as printf) but that's not the same as mucking about appending and trimming spaces. The only sensible definition of char(n) that I can see would be: A text value of type char(n) is always "n" characters in length. If you assign less than "n" characters, it is right-padded with spaces. In all other respects it behaves as any other text type of length "n" with right-trailing spaces. [rant off - ah, feel better for that :-] -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > I've never really understood the rationale behind char(n) in SQL databases > (other than as backward compatibility with some old mainframe DB). There are (or were) systems in which the benefit of using fixed-width columns is a lot higher than it is in Postgres. The spec is evidently trying to cater to them. Too bad the writers whacked the semantics around so cruelly to do it :-( > The only sensible definition of char(n) that I can see would be: > A text value of type char(n) is always "n" characters in length. If the SQL spec were willing to leave it at that, I'd be happy. But we've got this problem that the trailing spaces are supposed to be insignificant in at least some contexts. I find the pre-7.4 behavior to be pretty inconsistent. For example, 7.3 and 7.4 agree on this: regression=# select ('foo '::char(6)) = ('foo');?column? ----------t (1 row) Now given the above, wouldn't it stand to reason that regression=# select ('foo '::char(6) || 'bar') = ('foo' || 'bar');?column? ----------f (1 row) or how about regression=# select ('bar' || 'foo '::char(6)) = ('bar' || 'foo');?column? ----------f (1 row) In 7.4 both of these do yield true. A closely related example is regression=# select ('foo '::char(6)) = ('foo'::text); which yields false in 7.3 and true in 7.4. I don't object to revisiting the behavior again, but 7.3 was not so ideal that I want to just go back to it. regards, tom lane
Also, to make char(n) even more annoying, I had the one character value "K" stored in a column that was char(2). When I pulled it from the database and tried to compare it to a variable with a value of "K" it came out inequal. Of course in mysql, that was not a problem. Jeremy -----Original Message----- From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Richard Huxton Sent: Wednesday, February 18, 2004 4:40 AM To: Tom Lane; scott.marlowe Cc: elein; news.postgresql.org; pgsql-sql@postgresql.org Subject: Re: [SQL] CHAR(n) always trims trailing spaces in 7.4 On Wednesday 18 February 2004 00:25, Tom Lane wrote: > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > But then this: > > select 'x'||' '||'x' > > should produce xx, but it produces x x. > > No, because the imputed type of those literals is text. You'd have to > cast the middle guy to char(n) explicitly to make its trailing spaces go > away when it's reconverted to text. > > The real issue here is that trailing spaces in char(n) are semantically > insignificant according to the SQL spec. The spec is pretty vague about > which operations should actually honor that insignificance --- it's > clear that comparisons should, less clear about other things. I think > the 7.4 behavior is more consistent than what we had before, but I'm > willing to be persuaded to change it again if someone can give an > alternate definition that's more workable than this one. [rant on] I've never really understood the rationale behind char(n) in SQL databases (other than as backward compatibility with some old mainframe DB). Insignificant spaces? If it's not significant, why is it there? You could have a formatting rule that specifies left-aligned strings space-padded (as printf) but that's not the same as mucking about appending and trimming spaces. The only sensible definition of char(n) that I can see would be: A text value of type char(n) is always "n" characters in length. If you assign less than "n" characters, it is right-padded with spaces. In all other respects it behaves as any other text type of length "n" with right-trailing spaces. [rant off - ah, feel better for that :-] -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
Hi, in mysql I was able to make an insert such as: INSERT INTO TABLE (integervariable) VALUES ('') and have it either insert that variable, or insert the default if it had been assigned. In postgresql it gives and error every time that this is attempted. Since I have so many queries that do this on my site already, is there any way to set up a table so that it just accepts this sort of query? Thanks, Jeremy
On Wed, 18 Feb 2004, Jeremy Smith wrote: > Hi, > > in mysql I was able to make an insert such as: > > INSERT INTO TABLE (integervariable) VALUES ('') > > and have it either insert that variable, or insert the default if it had > been assigned. In postgresql it gives and error every time that this is > attempted. Since I have so many queries that do this on my site already, is > there any way to set up a table so that it just accepts this sort of query? First off, the reason for this problem is that Postgresql adheres to the SQL standard while MySQL heads off on their own, making it up as they go along. This causes many problems for people migrating from MySQL to almost ANY database. Phew, now that that's out of the way, here's the standard ways of doing it. Use DEFAULT: If no default is it will insert a NULL, otherwise the default will be inserted: insert into table (integervar) values (DEFAULT); OR Leave it out of the list of vars to be inserted insert into table (othervars, othervars2) values ('abc',123); OR Insert a NULL if that's what you want: insert into table (integervar) values (NULL); Note that NULL and DEFAULT are not quoted.
So exactly what is the order of casts that produces different results with: 'x' || ' ' || 'x' and 'x' || ' '::char15 || 'x' Are operators being invoked both (text,text)? I'm trying to understand the precedence that causes the different results. elein On Tue, Feb 17, 2004 at 10:53:17PM -0500, Tom Lane wrote: > elein <elein@varlena.com> writes: > > Apparently the ::char is cast to varchar and then text? > > No, directly to text, because the || operator is defined as taking text > inputs. But there's no practical difference between text and varchar on > this point. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org
Scott, I understand that MySQL's adherence to the standards must be lazy as I am running into frequent issues as I transfer my site. Unfortunately I have over 2500 queries, and many more of them needed to be rewritten than I ever would have imagined. I guess MySQL is the IE of open source DB, and PostgreSQL is Netscape / Mozilla, in more ways than one. I guess in some sense, since I relied on MySQL's laziness, my code also became a bit lazy. There are many locations where I accept user input from a form, and then have a process page. And on that process page I might have hundreds of variables that look like: $input = $_POST['input']; and in the old days, if that was an empty value and inserted into a mysql query, it would just revert to the default. Now it looks like I need to: $input = $_POST['input']; if (!$input) { $input = DEFAULT; } over and over and over and over.... :) I guess I am just looking for a shortcut since the site conversion has already taken a week and counting, when I originally was misguided enough to think it would take hours. Anyway, the help on this list is much appreciated.. Jeremy -----Original Message----- From: scott.marlowe [mailto:scott.marlowe@ihs.com] Sent: Wednesday, February 18, 2004 2:44 PM To: Jeremy Smith Cc: pgsql-sql@postgresql.org Subject: Re: [SQL] Inserting NULL into Integer column On Wed, 18 Feb 2004, Jeremy Smith wrote: > Hi, > > in mysql I was able to make an insert such as: > > INSERT INTO TABLE (integervariable) VALUES ('') > > and have it either insert that variable, or insert the default if it had > been assigned. In postgresql it gives and error every time that this is > attempted. Since I have so many queries that do this on my site already, is > there any way to set up a table so that it just accepts this sort of query? First off, the reason for this problem is that Postgresql adheres to the SQL standard while MySQL heads off on their own, making it up as they go along. This causes many problems for people migrating from MySQL to almost ANY database. Phew, now that that's out of the way, here's the standard ways of doing it. Use DEFAULT: If no default is it will insert a NULL, otherwise the default will be inserted: insert into table (integervar) values (DEFAULT); OR Leave it out of the list of vars to be inserted insert into table (othervars, othervars2) values ('abc',123); OR Insert a NULL if that's what you want: insert into table (integervar) values (NULL); Note that NULL and DEFAULT are not quoted.
On Wed, 18 Feb 2004, Jeremy Smith wrote: > Scott, > > I understand that MySQL's adherence to the standards must be lazy as I am > running into frequent issues as I transfer my site. Unfortunately I have > over 2500 queries, and many more of them needed to be rewritten than I ever > would have imagined. I guess MySQL is the IE of open source DB, and > PostgreSQL is Netscape / Mozilla, in more ways than one. Good comparison. > I guess in some sense, since I relied on MySQL's laziness, my code also > became a bit lazy. There are many locations where I accept user input from > a form, and then have a process page. And on that process page I might have > hundreds of variables that look like: > > $input = $_POST['input']; > > and in the old days, if that was an empty value and inserted into a mysql > query, it would just revert to the default. Now it looks like I need to: > > $input = $_POST['input']; > if (!$input) { > $input = DEFAULT; > } I've run into this kind of thing before. IT helps if you have an array of all your fields like: $fields = array("field1","field3","last_name"); and then you can foreach across the input: foreach($fields as $f){ if (!$_POST[$f]){ $_POST[$f]='DEFAULT'; } else { $_POST[$f] = "'".$_POST[$f]."'"; } } > over and over and over and over.... :) I guess I am just looking for a > shortcut since the site conversion has already taken a week and counting, > when I originally was misguided enough to think it would take hours. Well, you might find yourself rewriting fair portions of your site, but usually you wind up with better code and better checking, so it's a bit of a trade off. > Anyway, the help on this list is much appreciated.. > > Jeremy > > -----Original Message----- > From: scott.marlowe [mailto:scott.marlowe@ihs.com] > Sent: Wednesday, February 18, 2004 2:44 PM > To: Jeremy Smith > Cc: pgsql-sql@postgresql.org > Subject: Re: [SQL] Inserting NULL into Integer column > > > On Wed, 18 Feb 2004, Jeremy Smith wrote: > > > Hi, > > > > in mysql I was able to make an insert such as: > > > > INSERT INTO TABLE (integervariable) VALUES ('') > > > > and have it either insert that variable, or insert the default if it had > > been assigned. In postgresql it gives and error every time that this is > > attempted. Since I have so many queries that do this on my site already, > is > > there any way to set up a table so that it just accepts this sort of > query? > > First off, the reason for this problem is that Postgresql adheres to the > SQL standard while MySQL heads off on their own, making it up as they go > along. This causes many problems for people migrating from MySQL to > almost ANY database. > > Phew, now that that's out of the way, here's the standard ways of doing > it. > > Use DEFAULT: If no default is it will insert a NULL, otherwise the > default will be inserted: > insert into table (integervar) values (DEFAULT); > > OR > > Leave it out of the list of vars to be inserted > insert into table (othervars, othervars2) values ('abc',123); > > OR > > Insert a NULL if that's what you want: > > insert into table (integervar) values (NULL); > > Note that NULL and DEFAULT are not quoted. > > > >
I've run into this kind of thing before. IT helps if you have an array of all your fields like: $fields = array("field1","field3","last_name"); and then you can foreach across the input: foreach($fields as $f){ if (!$_POST[$f]){ $_POST[$f]='DEFAULT'; } else { $_POST[$f] = "'".$_POST[$f]."'"; } } Wow, great idea. I will definitely do this, thanks alot. Well, you might find yourself rewriting fair portions of your site, but usually you wind up with better code and better checking, so it's a bit of a trade off. No doubt that this is true. Of course even without the better code and error checking, the extra features like stored procedures and automatic row locking was more than enough to make the switch worth it. Thanks again! Jeremy
On Wed, 18 Feb 2004, Jeremy Smith wrote: > > > I've run into this kind of thing before. IT helps if you have an array of > all your fields like: > > $fields = array("field1","field3","last_name"); > > and then you can foreach across the input: > > foreach($fields as $f){ > if (!$_POST[$f]){ > $_POST[$f]='DEFAULT'; > } else { > $_POST[$f] = "'".$_POST[$f]."'"; > } > } > > Wow, great idea. I will definitely do this, thanks alot. > > > > Well, you might find yourself rewriting fair portions of your site, but > usually you wind up with better code and better checking, so it's a bit of > a trade off. > > > No doubt that this is true. Of course even without the better code and > error checking, the extra features like stored procedures and automatic row > locking was more than enough to make the switch worth it. > > Thanks again! You're welcome! Enjoy getting to know Postgresql and all the great folks on the lists, I know I have.
> and then you can foreach across the input: > > foreach($fields as $f){ > if (!$_POST[$f]){ > $_POST[$f]='DEFAULT'; > } else { > $_POST[$f] = "'".$_POST[$f]."'"; > } > } Default in quotes isn't going to work, and please tell me you escape those things with pg_escape_string() at some point.
On Wed, 18 Feb 2004, Rod Taylor wrote: > > and then you can foreach across the input: > > > > foreach($fields as $f){ > > if (!$_POST[$f]){ > > $_POST[$f]='DEFAULT'; > > } else { > > $_POST[$f] = "'".$_POST[$f]."'"; > > } > > } > > Default in quotes isn't going to work, and please tell me you escape > those things with pg_escape_string() at some point. Note that the ' marks aren't part of the string, they are the delimiter of the string, and I always run every server with magic_quotes_gpc on. anything else? :-)
> Note that the ' marks aren't part of the string, they are the delimiter of > the string, and I always run every server with magic_quotes_gpc on. > > anything else? :-) Good point. I looked at the single quotes of the second line and somehow the DEFAULT got quoted as well ;)
On Wed, 18 Feb 2004, Rod Taylor wrote: > > Note that the ' marks aren't part of the string, they are the delimiter of > > the string, and I always run every server with magic_quotes_gpc on. > > > > anything else? :-) > > Good point. I looked at the single quotes of the second line and somehow > the DEFAULT got quoted as well ;) Oh, and I'm stuck using add_slashes (or the magic_quotes_gpc thingie) 'cause I'm on a server that's being eoled in favor of .net, and it's running PHP 4.0.6... ugh. We really gotta get it upgraded soon.
elein <elein@varlena.com> writes: > So exactly what is the order of casts that produces > different results with: > 'x' || ' ' || 'x' and 'x' || ' '::char15 || 'x' > Are operators being invoked both (text,text)? The only relevant operator is "text || text" (there are also some || operators for arrays, bytea, and BIT, but these will all be discarded as not the most plausible match). Therefore, in your first example the unspecified literals will all be presumed to be text, so the space does not get trimmed. One of the things we could think about as a way to tweak the behavior is creating "||" variants that are declared to accept char(n) on one or both sides. These could actually use the same C implementation function (textcat) of course. But declaring them that way would suppress the invocation of rtrim() as char-to-text conversion. However, if we did that then "||" would behave differently from other operators on character strings, so it doesn't seem like a very attractive option to me. regards, tom lane
I guess I am asking about the cast sequence from char(n) to text. (' '::char(n))::text trims spaces. This is wrong, imo.' '::text does not trim spaces.' '::char(n)does not trim spaces and pads. char(n) should not trim spaces, right? And it doesn't on an insert. Text does not trim spaces. Somewhere the space trimming occurs. If it is in the operator || then the operator is wrong. If char(n) is properly defined to not trim spaces then there should be a separate cat for char(n). It is correct for it to behave differently than cat for text and varchar because of the different trimming behaviour. I can do this patch if there is agreement. But I may not be able to do it immediately. elein On Wed, Feb 18, 2004 at 11:58:37PM -0500, Tom Lane wrote: > elein <elein@varlena.com> writes: > > So exactly what is the order of casts that produces > > different results with: > > > 'x' || ' ' || 'x' and 'x' || ' '::char15 || 'x' > > > Are operators being invoked both (text,text)? > > The only relevant operator is "text || text" (there are also some || > operators for arrays, bytea, and BIT, but these will all be discarded > as not the most plausible match). Therefore, in your first example the > unspecified literals will all be presumed to be text, so the space does > not get trimmed. > > One of the things we could think about as a way to tweak the behavior is > creating "||" variants that are declared to accept char(n) on one or > both sides. These could actually use the same C implementation function > (textcat) of course. But declaring them that way would suppress the > invocation of rtrim() as char-to-text conversion. > > However, if we did that then "||" would behave differently from other > operators on character strings, so it doesn't seem like a very > attractive option to me. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match
elein <elein@varlena.com> writes: > Somewhere the space trimming occurs. The cast from char(n) to text (or varchar) is what's doing the trimming in 7.4. I think you can mostly revert the change by changing that pg_cast entry to specify no conversion function instead of rtrim(). However that would probably result in squirrely, non-spec behavior for comparisons. > If char(n) is properly defined to not trim spaces then > there should be a separate cat for char(n). Possibly, but I think that is considering the issue much too narrowly. Concatenation is not the only textual operator. regards, tom lane