Thread: Possible savepoint bug

Possible savepoint bug

From
Rod Taylor
Date:
As you can see, we have duplicates within the table (heap) of a primary
key value. The index itself only references one of these tuples.

Nearly all data inserted into this table is wrapped in a subtransaction,
and is created a single tuple per subtransaction. About 20% of entries
are duplicate, so we catch the UNIQUE VIOLATION and restore to the
savepoint.

I did keep a copy of the table. Compressed it is about 24MB.

After trying everything below, I also gave it a run with vacuum full. It
did not change the output.


ssdb=# select version();                                 version
---------------------------------------------------------------------------PostgreSQL 8.0.3 on sparc-sun-solaris2.9,
compiledby GCC gcc (GCC)
 
3.4.2
(1 row)

ssdb=# \d feature_keyword_supply_google                  Table "public.feature_keyword_supply_google"
Column                   |         Type          |
 
Modifiers
----------------------------------------------+-----------------------+-----------account_instance_id
      | integer               |
 
not nullkeyword_id                                   | integer               |
not nullfeature_keyword_supply_google_score          | natural_number        |
not nullfeature_keyword_supply_google_last_collected | ss_timestamp_recorded |
not null
Indexes:   "feature_keyword_supply_google_pkey" PRIMARY KEY, btree
(account_instance_id, keyword_id)
Foreign-key constraints:   "feature_keyword_supply_google_account_instance_id_fkey" FOREIGN KEY
(account_instance_id, keyword_id) REFERENCES
feature_keyword(account_instance_id, keyword_id) ON UPDATE CASCADE ON
DELETE CASCADE

ssdb=# set enable_indexscan TO off;
SET
ssdb=# select ctid, xmin, cmin, xmax, account_instance_id = 11916,
keyword_id = 1985374 from feature_keyword_supply_google where keyword_id
= 1985374 and account_instance_id = 11916;  ctid    |   xmin    | cmin |   xmax    | ?column? | ?column?
-----------+-----------+------+-----------+----------+----------(4277,60) | 506766160 | 3593 | 744608069 | t        |
t(4277,72)| 397750949 | 4828 | 506766160 | t        | t
 
(2 rows)

ssdb=# set enable_indexscan TO on;
SET
ssdb=# select ctid, xmin, cmin, xmax, account_instance_id = 11916,
keyword_id = 1985374 from feature_keyword_supply_google where keyword_id
= 1985374 and account_instance_id = 11916;  ctid    |   xmin    | cmin |   xmax    | ?column? | ?column?
-----------+-----------+------+-----------+----------+----------(4277,60) | 506766160 | 3593 | 744608069 | t        |
t
(1 row)

ssdb=# vacuum feature_keyword_supply_google;
VACUUM
ssdb=# set enable_indexscan = off;
SET
ssdb=#  select ctid, xmin, cmin, xmax, account_instance_id = 11916,
keyword_id = 1985374 from feature_keyword_supply_google where keyword_id
= 1985374 and account_instance_id = 11916;  ctid    |   xmin    | cmin |   xmax    | ?column? | ?column?
-----------+-----------+------+-----------+----------+----------(4277,60) | 506766160 | 3593 | 744608069 | t        |
t(4277,72)| 397750949 | 4828 | 506766160 | t        | t
 
(2 rows)

[root@DB1 rbt]# pg_controldata --version
pg_controldata (PostgreSQL) 8.0.3
[root@DB1 rbt]# pg_controldata /var/opt/pgsql/data
pg_control version number:            74
Catalog version number:               200411041
Database system identifier:           4769850195962887551
Database cluster state:               in production
pg_control last modified:             Wed Nov 09 10:10:26 2005
Current log file ID:                  860
Next log file segment:                170
Latest checkpoint location:           35C/A14C8D60
Prior checkpoint location:            35C/908D6470
Latest checkpoint's REDO location:    35C/A1440E20
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          745383235
Latest checkpoint's NextOID:          30513944
Time of latest checkpoint:            Wed Nov 09 09:42:53 2005
Database block size:                  8192
Blocks per segment of large relation: 131072
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum number of function arguments: 32
Date/time type storage:               floating-point numbers
Maximum length of locale name:        128
LC_COLLATE:                           en_US
LC_CTYPE:                             en_US

-- 



Re: Possible savepoint bug

From
Tom Lane
Date:
Rod Taylor <pg@rbt.ca> writes:
> As you can see, we have duplicates within the table (heap) of a primary
> key value. The index itself only references one of these tuples.

Can you put together a test case to reproduce this?  It doesn't have to
fail every time, as long as it fails once in awhile ...
        regards, tom lane


Re: Possible savepoint bug

From
Rod Taylor
Date:
On Wed, 2005-11-09 at 14:20 -0500, Tom Lane wrote:
> Rod Taylor <pg@rbt.ca> writes:
> > As you can see, we have duplicates within the table (heap) of a primary
> > key value. The index itself only references one of these tuples.
> 
> Can you put together a test case to reproduce this?  It doesn't have to
> fail every time, as long as it fails once in awhile ...

I'll see what I can put together.

I see I have a similar issue on another structure which involves the same type of process.




Re: Possible savepoint bug

From
Rod Taylor
Date:
On Wed, 2005-11-09 at 14:20 -0500, Tom Lane wrote:
> Rod Taylor <pg@rbt.ca> writes:
> > As you can see, we have duplicates within the table (heap) of a primary
> > key value. The index itself only references one of these tuples.
> 
> Can you put together a test case to reproduce this?  It doesn't have to
> fail every time, as long as it fails once in awhile ...

Seems not. I've done millions of iterations of the same type of
functionality that happens with these structures and haven't produced a
single case. These are fairly low usage structures, so I think I've done
about 3 months worth of work, which in production had 20 bad tuples. I
tried playing with various delays, vacuum schedules, and number of
parallel processes.

Whatever is happening is from interaction not contained within the
structures showing the symptoms.

I'll watch it a bit closer now that I know the problem exists to see if
I can find a pattern.
-- 



Re: Possible savepoint bug

From
Michael Paesold
Date:
Rod Taylor schrieb:
> On Wed, 2005-11-09 at 14:20 -0500, Tom Lane wrote:
>> Rod Taylor <pg@rbt.ca> writes:
>>> As you can see, we have duplicates within the table (heap) of a primary
>>> key value. The index itself only references one of these tuples.
>> Can you put together a test case to reproduce this?  It doesn't have to
>> fail every time, as long as it fails once in awhile ...
> 
> Seems not. I've done millions of iterations of the same type of
> functionality that happens with these structures and haven't produced a
> single case. These are fairly low usage structures, so I think I've done
> about 3 months worth of work, which in production had 20 bad tuples. I
> tried playing with various delays, vacuum schedules, and number of
> parallel processes.

I am seeing a similar unique index bug here...

This is PostgreSQL 8.1.1 on RHEL 3, Intel Xeon (i686).

We don't use SAVEPOINTs and we don't use autovacuum. It's quite unlikely 
that the problem is directly related to VACUUM since that is only run 
via cron during night hours.

The symptoms are duplicate entries in a unique index.

billing=> \d properties        Table "billing.properties"  Column  |       Type        | Modifiers
----------+-------------------+----------- language | character(2)      | not null key_name | character varying | not
nullvalue    | character varying | not null
 
Indexes:    "pk_properties" PRIMARY KEY, btree ("language", key_name)
Check constraints:    "tc_properties_key_name" CHECK (key_name::text ~ 
'^[a-zA-Z][a-zA-Z0-9_.]+$'::text)    "tc_properties_language" CHECK ("language" = 'de'::bpchar OR 
"language" = 'en'::bpchar)

billing=> reindex table properties;
ERROR:  could not create unique index
DETAIL:  Table contains duplicated values.

billing=> select ctid,xmin,xmax,cmin,cmax,language,key_name from 
properties where key_name = 'enum.server_task_log.status.keys';  ctid   |  xmin  | xmax | cmin | cmax | language |
      key_name
 
---------+--------+------+------+------+----------+---------------------------------- (31,64) | 505433 |    0 |    5 |
 0 | de       | 
 
enum.server_task_log.status.keys (31,57) | 505261 |    0 |    7 |    0 | de       | 
enum.server_task_log.status.keys (31,56) | 505261 |    0 |    5 |    0 | en       | 
enum.server_task_log.status.keys
(3 rows)

The state is the effect of only UPDATEs of the rows after a SELECT ... 
FOR UPDATE in the same transaction. It happend twice right now but I 
deleted the other rows... the table should still contain the data. I 
have disabled scheduled vacuums for now.

I could send the index and table files off-list. This is the only 
effected table right now. It is not updated frequently but is rather 
static. I upgraded to 8.1.1 around Dec 21, there should have been near 
zero updates since then until today.

Perhaps it's a problem with multi-column unique indexes?

Best Regards,
Michael Paesold


Re: Possible savepoint bug

From
Tom Lane
Date:
Michael Paesold <mpaesold@gmx.at> writes:
> I am seeing a similar unique index bug here...
> This is PostgreSQL 8.1.1 on RHEL 3, Intel Xeon (i686).

It looks like the problem is that index entries are being inserted out
of order.  I find this pair that should be in the other order:
Item 124 -- Length:   44  Offset: 2360 (0x0938)  Flags: USED Block Id: 3  linp Index: 55  Size: 44 Has Nulls: 0  Has
Varwidths:16384
 
 0938: 00000300 37002c40 06000000 64650000  ....7.,@....de.. 0948: 1a000000 656e756d 2e626f6f 6b696e67
....enum.booking0958: 2e747970 652e6b65 79730000           .type.keys..    
 
Item 125 -- Length:   40  Offset: 2320 (0x0910)  Flags: USED Block Id: 3  linp Index: 48  Size: 40 Has Nulls: 0  Has
Varwidths:16384
 
 0910: 00000300 30002840 06000000 64650000  ....0.(@....de.. 0920: 17000000 656e756d 2e626f6f 6b696e67
....enum.booking0930: 2e747970 652e4c00                    .type.L.        
 

and likewise here:
Item  60 -- Length:   52  Offset: 5060 (0x13c4)  Flags: USED Block Id: 4  linp Index: 38  Size: 52 Has Nulls: 0  Has
Varwidths:16384
 
 13c4: 00000400 26003440 06000000 64650000  ....&.4@....de.. 13d4: 24000000 656e756d 2e736572 7665725f
$...enum.server_13e4: 7461736b 5f6c6f67 2e737461 7475732e  task_log.status. 13f4: 6b657973
keys           
 
Item  61 -- Length:   56  Offset: 5004 (0x138c)  Flags: USED Block Id: 4  linp Index: 37  Size: 56 Has Nulls: 0  Has
Varwidths:16384
 
 138c: 00000400 25003840 06000000 64650000  ....%.8@....de.. 139c: 27000000 656e756d 2e736572 7665725f
'...enum.server_13ac: 7461736b 5f6c6f67 2e737461 7475732e  task_log.status. 13bc: 52554e4e 494e4700
RUNNING.       
 

All four of the referenced tuples were inserted by XMIN 986, CMIN 0,
which I assume was probably a COPY command.  So the breakage occurred
long before the update operations.

Did you create the index before or after loading the initial data?
If you have the original dump that was loaded, it'd be interesting
to see if re-loading it reproduces the corrupt index.
        regards, tom lane


Re: Possible savepoint bug

From
Tom Lane
Date:
I wrote:
> Michael Paesold <mpaesold@gmx.at> writes:
>> I am seeing a similar unique index bug here...
>> This is PostgreSQL 8.1.1 on RHEL 3, Intel Xeon (i686).

> It looks like the problem is that index entries are being inserted out
> of order.

After further investigation, it seems that the original sort order of
the index was not C-locale, but something else --- I can reproduce the
current index ordering except for a small number of new-ish tuples if
I sort the data in en_US.

We go out of our way to prevent the backend's locale from changing after
initdb.  Did you do something to override that?

Another theory is that this is a manifestation of the known problem with
plperl sometimes changing the backend's locale setting.  Is it possible
that the index was created in a session that had previously run some
plperl functions?
        regards, tom lane


Re: Possible savepoint bug

From
Michael Paesold
Date:
Tom Lane wrote:
> I wrote:
>> Michael Paesold <mpaesold@gmx.at> writes:
>>> I am seeing a similar unique index bug here...
>>> This is PostgreSQL 8.1.1 on RHEL 3, Intel Xeon (i686).
> 
>> It looks like the problem is that index entries are being inserted out
>> of order.
> 
> After further investigation, it seems that the original sort order of
> the index was not C-locale, but something else --- I can reproduce the
> current index ordering except for a small number of new-ish tuples if
> I sort the data in en_US.
> 
> We go out of our way to prevent the backend's locale from changing after
> initdb.  Did you do something to override that?

No, I am sure I did not do anything to change the locale itentionally. The 
cluster was initialized with "initdb --no-locale"... (and this is what it 
still is).

> Another theory is that this is a manifestation of the known problem with
> plperl sometimes changing the backend's locale setting.  Is it possible
> that the index was created in a session that had previously run some
> plperl functions?

This is a theory. The whole database was loaded using pg_restore, I still 
have the original dump so I will have a look at the dump now. The database 
actually contains some plperl functions.
Restoring to a file I find some perhaps interesting facts perhaps relevant:

*) SET check_function_bodies = false;
So at least the syntax checking function should not be called.

*) Old plperl call handler:
The dump from 7.4.x created the public.plperl_call_handler() function, 
which I only dropped after the full dump was loaded.

CREATE FUNCTION plperl_call_handler() RETURNS language_handler    AS '$libdir/plperl', 'plperl_call_handler'
LANGUAGEc;
 
ALTER FUNCTION public.plperl_call_handler() OWNER TO postgres;
CREATE TRUSTED PROCEDURAL LANGUAGE plperl HANDLER plperl_call_handler;

*) There is a single plperl function that is only used in a view. (Btw. 
this view is totally unrelated to the given table and should never be used 
in the same backend session.)
From the points above, I don't think the plperl function should have been 
called during load. Perhaps I am mistaken and plperl did really override 
the locale setting.

Looking at the environment set for the "postgres" unix user, which is used 
to run Postgres, I see that LANG is set to the default value of 
en_US.UTF-8. So it seems possible that setting LANG to C here, could fix 
the problem.

This still doesn't explain why the initial sort order is wrong here.

The creation order in the dump is:

CREATE TABLE... (without indexes)
COPY ...
ALTER TABLE ONLY properties ADD CONSTRAINT pk_properties...

Please tell me if you need further information.

Best Regards,
Michael Paesold


plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Tom Lane
Date:
Michael Paesold <mpaesold@gmx.at> writes:
> This is a theory. The whole database was loaded using pg_restore, I still 
> have the original dump so I will have a look at the dump now. The database 
> actually contains some plperl functions.

OK, I think I have reproduced the problem.  initdb in C locale, then
start postmaster with LANG=en_US.UTF-8 in its environment.  Then:

z1=# create language plperl;
CREATE LANGUAGE
z1=# select 'enum.server_task_log.status.RUNNING'::varchar < 'enum.server_task_log.status.keys'::varchar;?column?
----------t                           -- correct result for C locale
(1 row)

z1=# \c z1
You are now connected to database "z1".
z1=# SET check_function_bodies = false;
SET
z1=# create or replace function perlf() returns text as $$
z1$# return 'foo';
z1$# $$ language plperl;
CREATE FUNCTION
z1=# select 'enum.server_task_log.status.RUNNING'::varchar < 'enum.server_task_log.status.keys'::varchar;?column?
----------f                          -- WRONG result for C locale
(1 row)

So the mere act of defining a plperl function, even with
check_function_bodies = false, is sufficient to send control through
that bit of libperl code that does setlocale(LC_ALL, "").  Ugh.
This is much worse than I thought.

The reason I had not seen it before is that lc_collate_is_c caches its
result, which means that if you do any text/varchar comparisons before
first invoking libperl, you won't see any misbehavior (at least not when
you started in C locale).  The reconnect in the middle of the above test
sequence is essential to reproduce the failure.

We were talking last week about forcing the LANG/LC_ environment
variables to match our desired settings within the postmaster.
I think this example raises the priority of doing that by several
notches :-(

In the meantime, Michael, I'd suggest modifying your postmaster start
script to force LANG=C, and then reindexing all indexes you have on
text/varchar/char columns.  That should get you out of the immediate
problem and prevent it from recurring before we have a fix.  (The
system catalogs should be OK because they use "name" which is not
locale-sensitive.)
        regards, tom lane


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Tom Lane
Date:
I wrote:
> So the mere act of defining a plperl function, even with
> check_function_bodies = false, is sufficient to send control through
> that bit of libperl code that does setlocale(LC_ALL, "").  Ugh.
> This is much worse than I thought.

It seems one ingredient in this is that the plperl function validator
fails to honor check_function_bodies, and hence is calling libperl
anyway.  I wonder if that explains the sudden rise in incidents in 8.1?
        regards, tom lane


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
"Andrew Dunstan"
Date:
Tom Lane said:
> I wrote:
>> So the mere act of defining a plperl function, even with
>> check_function_bodies = false, is sufficient to send control through
>> that bit of libperl code that does setlocale(LC_ALL, "").  Ugh.
>> This is much worse than I thought.
>
> It seems one ingredient in this is that the plperl function validator
> fails to honor check_function_bodies, and hence is calling libperl
> anyway.  I wonder if that explains the sudden rise in incidents in 8.1?
>

That's probably because I was unaware of its existence.

It should certainly be fixed, but surely at best this would only delay
seeing the ugly locale effect - as soon as you call a perl function you'll
be back in the same boat regardless.

cheers

andrew




Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Tom Lane
Date:
"Andrew Dunstan" <andrew@dunslane.net> writes:
> It should certainly be fixed, but surely at best this would only delay
> seeing the ugly locale effect - as soon as you call a perl function you'll
> be back in the same boat regardless.

Right, I was not suggesting that fixing the validator would avoid the
bug.  It just surprised me that libperl was getting called in the
restore-a-dump scenario.

BTW, I was just about to go change the validator, but if you want to
do it go ahead...
        regards, tom lane


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
"Andrew Dunstan"
Date:
Tom Lane said:
> "Andrew Dunstan" <andrew@dunslane.net> writes:
>> It should certainly be fixed, but surely at best this would only delay
>> seeing the ugly locale effect - as soon as you call a perl function
>> you'll be back in the same boat regardless.
>
> Right, I was not suggesting that fixing the validator would avoid the
> bug.  It just surprised me that libperl was getting called in the
> restore-a-dump scenario.
>
> BTW, I was just about to go change the validator, but if you want to do
> it go ahead...
>


no, please do.

cheers

andrew





Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Michael Paesold
Date:
Tom Lane wrote:
> Michael Paesold <mpaesold@gmx.at> writes:
>> This is a theory. The whole database was loaded using pg_restore, I still 
>> have the original dump so I will have a look at the dump now. The database 
>> actually contains some plperl functions.
> 
> OK, I think I have reproduced the problem.  initdb in C locale, then
> start postmaster with LANG=en_US.UTF-8 in its environment.  Then:

I had reproduced the problem here with a stripped down dump file from my 
backup, but your test case is much simpler, as usual. :-)

> In the meantime, Michael, I'd suggest modifying your postmaster start
> script to force LANG=C, and then reindexing all indexes you have on
> text/varchar/char columns.  That should get you out of the immediate
> problem and prevent it from recurring before we have a fix.

I had already reindexed all tables in a clean session and have now added 
"export LANG=C" to the profile of the postgres unix account. I cannot 
reproduce the bug after doing so.

Thank you for your quick help debugging the problem.

Best Regards,
Michael Paesold


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Andrew Dunstan
Date:
Tom Lane wrote:

>Michael Paesold <mpaesold@gmx.at> writes:
>  
>
>>This is a theory. The whole database was loaded using pg_restore, I still 
>>have the original dump so I will have a look at the dump now. The database 
>>actually contains some plperl functions.
>>    
>>
>
>OK, I think I have reproduced the problem.  initdb in C locale, then
>start postmaster with LANG=en_US.UTF-8 in its environment.  Then:
>
>z1=# create language plperl;
>CREATE LANGUAGE
>z1=# select 'enum.server_task_log.status.RUNNING'::varchar < 'enum.server_task_log.status.keys'::varchar;
> ?column?
>----------
> t                           -- correct result for C locale
>(1 row)
>
>z1=# \c z1
>You are now connected to database "z1".
>z1=# SET check_function_bodies = false;
>SET
>z1=# create or replace function perlf() returns text as $$
>z1$# return 'foo';
>z1$# $$ language plperl;
>CREATE FUNCTION
>z1=# select 'enum.server_task_log.status.RUNNING'::varchar < 'enum.server_task_log.status.keys'::varchar;
> ?column?
>----------
> f                          -- WRONG result for C locale
>(1 row)
>
>  
>

Unfortunately we have not fixed this on Windows. I have confirmed the 
effect on 8.1.1, and I still see this effect on HEAD. We have fixed the 
check_function_bodies bit, but if that is on, or if I call a plperl 
func, I get the bad result shown above.

The log message from the commit that was supposed to fix this says:
 Arrange to set the LC_XXX environment variables to match our locale setup.  This protects against undesired changes in
localebehavior if someone carelessly does setlocale(LC_ALL, "") (and we know who you are, perl guys).
 


However, to the best of my knowledge, Windows does NOT consult the environment when set_locale is called. ISTM we
probablyneed to call set_locale ourselves on Windows with the desired values.
 

cheers

andrew




Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> However, to the best of my knowledge, Windows does NOT consult the environment when set_locale is called. ISTM we
probablyneed to call set_locale ourselves on Windows with the desired values.
 

We already do, during process startup.  The question becomes where the
heck is Perl looking for the locale information, when on Windows?
        regards, tom lane


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>However, to the best of my knowledge, Windows does NOT consult the environment when set_locale is called. ISTM we
probablyneed to call set_locale ourselves on Windows with the desired values.
 
>>    
>>
>
>We already do, during process startup.  The question becomes where the
>heck is Perl looking for the locale information, when on Windows?
>  
>

The Windows docs at 
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_crt_setlocale.2c_._wsetlocale.asp 
say:

|setlocale( LC_ALL, "" );|   Sets the locale to the default, which is the user-default ANSI code   page obtained from
theoperating system.
 

Does libperl call this only at interpreter startup? If so, maybe we 
should probably save out the settings and then restore them after the 
interpreter has started.

cheers

andrew

||




Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Andrew Dunstan
Date:

Andrew Dunstan wrote:

>
>
> Tom Lane wrote:
>
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>
>>
>>> However, to the best of my knowledge, Windows does NOT consult the
>>> environment when set_locale is called. ISTM we probably need to call
>>> set_locale ourselves on Windows with the desired values.
>>>
>>
>>
>> We already do, during process startup.  The question becomes where the
>> heck is Perl looking for the locale information, when on Windows?
>>
>>
>
> The Windows docs at
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_crt_setlocale.2c_._wsetlocale.asp
> say:
>
> |setlocale( LC_ALL, "" );|
>    Sets the locale to the default, which is the user-default ANSI code
>    page obtained from the operating system.
>
> Does libperl call this only at interpreter startup? If so, maybe we
> should probably save out the settings and then restore them after the
> interpreter has started.
>
>

After some analysis of perl's locale.c, I came up with the attached
patch, which seems to cure the previously observed problem on my Windows
box.

The questions are:
a) is this an acceptable solution, and
b) should we do this for all the LC_* settings (on Windows at least)?
Especially, should we do it for LC_CTYPE?


cheers

andrew


Index: plperl.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/pl/plperl/plperl.c,v
retrieving revision 1.98
diff -c -r1.98 plperl.c
*** plperl.c    29 Dec 2005 14:28:31 -0000    1.98
--- plperl.c    8 Jan 2006 00:49:57 -0000
***************
*** 45,50 ****
--- 45,51 ----
  #include <ctype.h>
  #include <fcntl.h>
  #include <unistd.h>
+ #include <stdlib.h>

  /* postgreSQL stuff */
  #include "commands/trigger.h"
***************
*** 60,65 ****
--- 61,72 ----
  extern DLLIMPORT bool check_function_bodies;

  /* perl stuff */
+
+ /* prevent perl headers from hijacking stdio and lots of other stuff */
+ #ifdef WIN32
+ #define WIN32IO_IS_STDIO
+ #endif
+
  #include "EXTERN.h"
  #include "perl.h"
  #include "XSUB.h"
***************
*** 263,268 ****
--- 270,297 ----
          "", "-e", PERLBOOT
      };

+
+     char lang_buf[256], col_buf[256] ;
+     char * env_entry;
+
+     lang_buf[0] = col_buf[0] = '\0';
+
+     env_entry= getenv("LANG");
+
+     if (env_entry && strlen(env_entry) > 0)
+     {
+         snprintf(lang_buf,256, "LANG=%s", env_entry);
+         unsetenv("LANG");
+     }
+
+     env_entry = getenv("LC_COLLATE");
+
+     if (env_entry && strlen(env_entry) > 0)
+     {
+         snprintf(col_buf,256,"LC_COLLATE=%s",env_entry);
+         unsetenv("LC_COLLATE");
+     }
+
      plperl_interp = perl_alloc();
      if (!plperl_interp)
          elog(ERROR, "could not allocate Perl interpreter");
***************
*** 272,277 ****
--- 301,311 ----
      perl_run(plperl_interp);

      plperl_proc_hash = newHV();
+
+     if (strlen(lang_buf) > 0)
+         putenv(lang_buf);
+     if(strlen(col_buf) > 0)
+         putenv(col_buf);
  }



Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Andrew Dunstan
Date:

I wrote:

>
>
> After some analysis of perl's locale.c, I came up with the attached 
> patch, which seems to cure the previously observed problem on my 
> Windows box.
>
>

Further testing shows the problem persisting. Back to the drawing board.

cheers

andrew


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Andrew Dunstan
Date:

I wrote:

>
> Further testing shows the problem persisting. Back to the drawing board.
>
>

The attached patch against cvs tip does seem to work. Instead of playing
with the environment, we simply allow perl to do its worst and then put
things back the way we wanted them.

Comments?

cheers

andrew
Index: plperl.c
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plperl/plperl.c,v
retrieving revision 1.98
diff -c -r1.98 plperl.c
*** plperl.c    29 Dec 2005 14:28:31 -0000    1.98
--- plperl.c    8 Jan 2006 16:54:53 -0000
***************
*** 45,50 ****
--- 45,51 ----
  #include <ctype.h>
  #include <fcntl.h>
  #include <unistd.h>
+ #include <locale.h>

  /* postgreSQL stuff */
  #include "commands/trigger.h"
***************
*** 263,268 ****
--- 264,303 ----
          "", "-e", PERLBOOT
      };

+ #ifdef WIN32
+
+     /*
+      * The perl library on startup does horrible things like call
+      * setlocale(LC_ALL,""). We have protected against that on most
+      * platforms by setting the environment appropriately. However, on
+      * Windows, setlocale() does not consult the environment, so we need
+      * to save the excisting locale settings before perl has a chance to
+      * mangle them and restore them after its dirty deeds are done.
+      *
+      * MSDN ref:
+      * http://msdn.microsoft.com/library/en-us/vclib/html/_crt_locale.asp
+      *
+      * It appaers that we only need to do this on interpreter startup, and
+      * subsequent calls to the interpreter don't mess with the locale
+      * settings.
+      */
+
+     char *loc;
+     char *save_collate, *save_ctype, *save_monetary, *save_numeric, *save_time;
+
+     loc = setlocale(LC_COLLATE,NULL);
+     save_collate = loc ? pstrdup(loc) : NULL;
+     loc = setlocale(LC_CTYPE,NULL);
+     save_ctype = loc ? pstrdup(loc) : NULL;
+     loc = setlocale(LC_MONETARY,NULL);
+     save_monetary = loc ? pstrdup(loc) : NULL;
+     loc = setlocale(LC_NUMERIC,NULL);
+     save_numeric = loc ? pstrdup(loc) : NULL;
+     loc = setlocale(LC_TIME,NULL);
+     save_time = loc ? pstrdup(loc) : NULL;
+
+ #endif
+
      plperl_interp = perl_alloc();
      if (!plperl_interp)
          elog(ERROR, "could not allocate Perl interpreter");
***************
*** 272,277 ****
--- 307,328 ----
      perl_run(plperl_interp);

      plperl_proc_hash = newHV();
+
+ #ifdef WIN32
+
+     if (save_collate != NULL)
+       setlocale(LC_COLLATE,save_collate);
+     if (save_ctype != NULL)
+       setlocale(LC_CTYPE,save_ctype);
+     if (save_monetary != NULL)
+       setlocale(LC_MONETARY,save_monetary);
+     if (save_numeric != NULL)
+       setlocale(LC_NUMERIC,save_numeric);
+     if (save_time != NULL)
+       setlocale(LC_TIME,save_time);
+
+ #endif
+
  }



Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> The attached patch against cvs tip does seem to work. Instead of playing 
> with the environment, we simply allow perl to do its worst and then put 
> things back the way we wanted them.

Doesn't that screw up Perl, though?  Its idea of what the locale is
will be wrong.

Maybe that's the least bad alternative available, but it doesn't seem
like we're there yet.

(Of course, the *big* problem with this approach is that it's
Perl-specific, and won't do a thing for any other libraries that
might try to do setlocale(LC_ALL, "").)
        regards, tom lane


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>The attached patch against cvs tip does seem to work. Instead of playing 
>>with the environment, we simply allow perl to do its worst and then put 
>>things back the way we wanted them.
>>    
>>
>
>Doesn't that screw up Perl, though?  Its idea of what the locale is
>will be wrong.
>
>Maybe that's the least bad alternative available, but it doesn't seem
>like we're there yet.
>
>(Of course, the *big* problem with this approach is that it's
>Perl-specific, and won't do a thing for any other libraries that
>might try to do setlocale(LC_ALL, "").)
>
>    
>  
>

All true, which is why I tried to avoid this solution.

However, I have not found another one that works. Specifically, 
unsetting LANG and LC_COLLATE did not work as I thought it would.

If anyone can provide a better solution I will be very happy.

cheers

andrew


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Greg Stark
Date:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > The attached patch against cvs tip does seem to work. Instead of playing 
> > with the environment, we simply allow perl to do its worst and then put 
> > things back the way we wanted them.

How does that affect to the API calls you can make from Perl back into the
database? What if you change the locale and then issue a query from within
Perl?

-- 
greg



Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Andrew Dunstan
Date:

Greg Stark wrote:

>>Andrew Dunstan <andrew@dunslane.net> writes:
>>    
>>
>>>The attached patch against cvs tip does seem to work. Instead of playing 
>>>with the environment, we simply allow perl to do its worst and then put 
>>>things back the way we wanted them.
>>>      
>>>
>
>How does that affect to the API calls you can make from Perl back into the
>database? What if you change the locale and then issue a query from within
>Perl?
>
>  
>

If you deliberately change the locale settings (especially LC_COLLATE), 
all bets are off, surely. REINDEX will be in your future.

Calling setlocale() is in fact a forbidden operation in trusted plperl.

AFAICT, perl doesn't keep any state about locale settings, it just 
reacts to whatever the current settings are, I think, but I could be wrong.

My main concern has been that we are pushing out a point release that 
advertises a fix for a problem, when the fix doesn't work on Windows. 
Either we need to find a fix (and I tried to supply one) or we need to 
change what we say about the release.

I'm also a bit distressed that nobody else has tested this, and we have 
just assumed that the fix would work, despite what we already know about 
how setlocale() works on Windows.

cheers

andrew





Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> AFAICT, perl doesn't keep any state about locale settings, it just 
> reacts to whatever the current settings are, I think, but I could be wrong.

If that's the case, why would it be bothering to issue setlocale during
startup at all?  If you look in locale.c in the Perl sources, it's
pretty clear that it saves away state about the settings during
Perl_init_i18nl10n().  I'm too lazy to track down where that state is
used or what the consequences are if it's wrong, but it sure looks to
me like *something* will be broken if we just change the locale back
to what we want afterward.
        regards, tom lane


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Andrew Dunstan
Date:
On Mon, 2006-01-09 at 11:03 -0500, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > AFAICT, perl doesn't keep any state about locale settings, it just 
> > reacts to whatever the current settings are, I think, but I could be wrong.
> 
> If that's the case, why would it be bothering to issue setlocale during
> startup at all?  If you look in locale.c in the Perl sources, it's
> pretty clear that it saves away state about the settings during
> Perl_init_i18nl10n().  I'm too lazy to track down where that state is
> used or what the consequences are if it's wrong, but it sure looks to
> me like *something* will be broken if we just change the locale back
> to what we want afterward.
> 


I don't know. Reading that code just makes my head spin ...

I should have thought a library shouldn't make too many assumptions
about locale settings anyway. What is to prevent another library from
doing the same thing. Then we'd have duelling settings, dependent on who
got called last.

I had thought, from reading perl's locale.c, that unsetting LC_COLLATE
and LANG would inhibit the calls to setlocale(LC_ALL,"") and
setlocale(LC_COLLLATE,""). But my testing seemed to prove that wrong. Of
course, it's possible that ActiveState's perl is not doing quite the
same thing. At any rate, what I think we know from that code is that on
Windows, just setting LANG and LC_* is precisely the wrong thing, since
the presence of those values will trigger a call to setlocale(LC_foo,"")
but the relevant environment value will not actually be used.

I'm just about out of ideas and right out of time to spend on this.

cheers

andrew



Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I don't know. Reading that code just makes my head spin ...

Yeah, too many ifdefs :-(.  But I suppose that the initial
"#ifdef LOCALE_ENVIRON_REQUIRED" block is not compiled on sane
platforms, meaning that the first code in the routine is the
unconditional   if (! setlocale(LC_ALL, ""))setlocale_failure = TRUE;

> I should have thought a library shouldn't make too many assumptions
> about locale settings anyway.

Indeed; I think a pretty strong case can be made that this is a Perl
bug.  It's reasonable to be doing the setlocale call in a standalone
Perl executable, but libperl should just work with whatever locale
settings have been chosen by the surrounding program (ie, all these
calls should be setlocale(LC_xxx, NULL) in the libperl case).

> I'm just about out of ideas and right out of time to spend on this.

We could just file a Perl bug report and wait for them to fix it.
        regards, tom lane


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Andrew Dunstan
Date:
On Mon, 2006-01-09 at 12:06 -0500, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > I don't know. Reading that code just makes my head spin ...
> 
> Yeah, too many ifdefs :-(.  But I suppose that the initial
> "#ifdef LOCALE_ENVIRON_REQUIRED" block is not compiled on sane
> platforms, meaning that the first code in the routine is the
> unconditional
>     if (! setlocale(LC_ALL, ""))
>     setlocale_failure = TRUE;
> 


*doh!* I had misread that. Now I see.

On Windows that pretty much gives the game away.



> 
> > I'm just about out of ideas and right out of time to spend on this.
> 
> We could just file a Perl bug report and wait for them to fix it.
> 


What's the data risk?

cheers

andrew



Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On Mon, 2006-01-09 at 12:06 -0500, Tom Lane wrote:
>> We could just file a Perl bug report and wait for them to fix it.

> What's the data risk?

Given that it took us this long to identify the problem, I'm guessing
that it doesn't affect too many people.  For starters you'd have to
run the postmaster under a locale environment different from what
initdb saw (or was told).
        regards, tom lane


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Andrew Dunstan
Date:

Tom Lane wrote:

>>I'm just about out of ideas and right out of time to spend on this.
>>    
>>
>
>We could just file a Perl bug report and wait for them to fix it.
>
>    
>  
>

done

cheers

andrew


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Bruce Momjian
Date:
Is there a TODO here, even if the Perl folks are supposed to fix it?

---------------------------------------------------------------------------

Andrew Dunstan wrote:
> 
> 
> Tom Lane wrote:
> 
> >>I'm just about out of ideas and right out of time to spend on this.
> >>    
> >>
> >
> >We could just file a Perl bug report and wait for them to fix it.
> >
> >    
> >  
> >
> 
> done
> 
> cheers
> 
> andrew
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Is there a TODO here, even if the Perl folks are supposed to fix it?

When and if they fix it, it'd be useful for us to document the gotcha
someplace (not sure where, though).  Maybe we should even go so far as
to refuse to work with older libperls on Windows.
        regards, tom lane


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Andrew Dunstan
Date:
It has probably been sufficiently mitigated on *nix. On Windows, the 
choice seems to be between living with the risk and trying my "put the 
locales back where they were" patch, which as Tom and Greg point out 
might have other consequences. Take your pick.

cheers

andrew

Bruce Momjian wrote:

>Is there a TODO here, even if the Perl folks are supposed to fix it?
>
>---------------------------------------------------------------------------
>
>Andrew Dunstan wrote:
>  
>
>>Tom Lane wrote:
>>
>>    
>>
>>>>I'm just about out of ideas and right out of time to spend on this.
>>>>   
>>>>
>>>>        
>>>>
>>>We could just file a Perl bug report and wait for them to fix it.
>>>
>>>    
>>> 
>>>
>>>      
>>>
>>done
>>
>>    
>>
>


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Bruce Momjian
Date:
I can put it in the Win32 section of the TODO list.  If we have
something not working on Win32, I would like to document it.

Is it:plperl changes the locale in Win32?

---------------------------------------------------------------------------

Andrew Dunstan wrote:
> 
> It has probably been sufficiently mitigated on *nix. On Windows, the 
> choice seems to be between living with the risk and trying my "put the 
> locales back where they were" patch, which as Tom and Greg point out 
> might have other consequences. Take your pick.
> 
> cheers
> 
> andrew
> 
> Bruce Momjian wrote:
> 
> >Is there a TODO here, even if the Perl folks are supposed to fix it?
> >
> >---------------------------------------------------------------------------
> >
> >Andrew Dunstan wrote:
> >  
> >
> >>Tom Lane wrote:
> >>
> >>    
> >>
> >>>>I'm just about out of ideas and right out of time to spend on this.
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>We could just file a Perl bug report and wait for them to fix it.
> >>>
> >>>    
> >>> 
> >>>
> >>>      
> >>>
> >>done
> >>
> >>    
> >>
> >
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Andrew Dunstan
Date:

Bruce Momjian wrote:

>I can put it in the Win32 section of the TODO list.  If we have
>something not working on Win32, I would like to document it.
>
>Is it:
>    plperl changes the locale in Win32?
>
>  
>

As long as the locale is consistent I think we're OK (is that right, Tom?)

Would that mean not using any of initdb's locale settings?

cheers

andrew


Re: plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> As long as the locale is consistent I think we're OK (is that right, Tom?)

Right.

> Would that mean not using any of initdb's locale settings?

Yeah, you'd want to not use the --locale switch for initdb, and also not
to change the system-wide locale settings afterwards.
        regards, tom lane