Thread: BUG #13442: ISBN doesn't always roundtrip with text
The following bug has been logged on the website: Bug reference: 13442 Logged by: B.Z Email address: bz@mailinator.com PostgreSQL version: 9.4.4 Operating system: Linux think 4.0.0-1-amd64 #1 SMP Debian 4.0.2-1 (2 Description: I have a table containing ISBNs stored as the isbn type provided by the isn extension. In certain cases an ISBN in ISBN10 form is considered unequal to the same ISBN in ISBN13 form. Assuming the isn extension is installed: CREATE TEMP TABLE t(i isbn); INSERT INTO t VALUES('9791020902573'); SELECT * FROM t WHERE i = '9791020902573'; -- Returns 10-209-0257-4 SELECT * FROM t WHERE i = '10-209-0257-4'; -- Returns 0 rows SELECT '9791020902573'::isbn = '10-209-0257-4'::isbn; -- Returns f Other examples of ISBNs with this behavior: select isbn from isbns where isbn::isbn != isbn::isbn::text::isbn; isbn 10-91986-01-0 10-91414-02-5 975891300-X 598687008-5 904072556-X 10-209-0257-4 10-209-0014-8 960838204-1 10-91447-03-9 963208937-5 10-209-0011-3 975297291-8 957849470-X 157607283-5 10-93509-00-7 10-209-0014-8 10-209-0273-6 975574374-X 979984511-4 953173635-9 853390230-1 963930301-1 10-223-0179-9 10-209-0141-1 10-209-0041-5 960218202-4 598340069-X 10-91600-01-5 598340069-X (29 rows)
bz@mailinator.com writes: > I have a table containing ISBNs stored as the isbn type provided by the isn > extension. In certain cases an ISBN in ISBN10 form is considered unequal to > the same ISBN in ISBN13 form. I poked at this a little bit. I think the issue may be that this bit at line 832 in isn.c: case ISBN: memcpy(buf, "978", 3); supposes that all ISBNs should have prefix 978, whereas your example is using prefix 979, which seems to be also valid according to code a few lines above. I don't know enough about the whole ISBN/EAN mess to understand what this should be doing instead, though. regards, tom lane
>> I have a table containing ISBNs stored as the isbn type provided by the isn >> extension. In certain cases an ISBN in ISBN10 form is considered unequal to >> the same ISBN in ISBN13 form. > > I poked at this a little bit. I think the issue may be that this bit > at line 832 in isn.c: > > case ISBN: > memcpy(buf, "978", 3); > > supposes that all ISBNs should have prefix 978, whereas your example is > using prefix 979, which seems to be also valid according to code a few > lines above. I don't know enough about the whole ISBN/EAN mess to > understand what this should be doing instead, though. Hmmm. ISBN-10 -> ISBN-13 is really just adding "978" in front, so there is no casted ISBN-10 number starting with 979. ISTM that "979" is just a recorded prefix for books, just as "978", for ISBN-13. AFAICR the prefix was chosen so that the checksum digit can be kept in the conversion. SELECT '9791020902573'::isbn = '10-209-0257-4'::isbn; -- Returns f Indeed, but this one should return true: SELECT '9781020902574'::isbn = '10-209-0257-4'::isbn; So I would say there is no bug as the number are indeed different, or that I missed something. -- Fabien.
> The key bug, then, is that the isn extension attempts to make this illegal > conversion: > > select '9791020902573'::isbn; --gives 10-209-0257-4 Indeed, this one looks like a bug, it should really be an error. -- Fabien.
(I'm the original reporter). I think this gets to the crux of the matter, although I disagree that no bug exists. ISBN10s, i.e. 10-digit ISBNs, can be converted to ISBN13s, and back again. i.e. they round-trip. ISBN13s with the 979 prefix can't be converted to ISBN10s. (Evidence includes http://pcn.loc.gov/isbncnvt.html failing on ISBN13s with 979- prefixes with "This 13-digit ISBN does not begin with "978" It cannot be converted to a 10-digit ISBN.", and https://247pearsoned.custhelp.com/app/answers/detail/a_id/6922/~/isbn-converter stating "The revised ISBN standard does not provide rules for converting or assigning 10-digit ISBNs to 13-digit ISBNs prefixed with "979". There is no industry approved method for converting ISBN-13s prefixed with "979" to an ISBN-10. Thirteen-digit ISBNs prefixed with "979" should not be represented to trading partners as 10-digit ISBNs. "). The key bug, then, is that the isn extension attempts to make this illegal conversion: select '9791020902573'::isbn; --gives 10-209-0257-4 I suspect the correct behaviour is raising an exception in these circumstances.
On Mon, Jun 15, 2015 at 10:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I poked at this a little bit. I think the issue may be that this bit > at line 832 in isn.c: > > case ISBN: > memcpy(buf, "978", 3); > > supposes that all ISBNs should have prefix 978, whereas your example is > using prefix 979, which seems to be also valid according to code a few > lines above. I actually didn't realize that the ISBN type assumed the "bookland" country code, which is 978. That's terrible, and is really a distinct way that the type is broken (distinct from the range enforcement misfeature that I always go on about). I was planning on suggesting completely removing any enforcement of ISBN ranges by the ISBN type. We could then rely entirely on the check digit, which has virtually no disadvantage relative to "more complete" enforcement. But even that's not going to help here, because the ISBN output function doesn't show the country code or the check digit: postgres=# select '978-0-393-04002-9'::isbn; isbn --------------- 0-393-04002-X (1 row) -- Peter Geoghegan
Fabien COELHO <coelho@cri.ensmp.fr> writes: >>> I have a table containing ISBNs stored as the isbn type provided by the isn >>> extension. In certain cases an ISBN in ISBN10 form is considered unequal to >>> the same ISBN in ISBN13 form. > So I would say there is no bug as the number are indeed different, or that > I missed something. I think this definitely indicates a bug: regression=# select '9791020902573'::isbn; isbn --------------- 10-209-0257-4 (1 row) regression=# select '10-209-0257-4'::isbn; isbn --------------- 1-02-090257-4 (1 row) I don't mind if '9791020902573' gets converted to something canonical, but surely the canonical value ought to reproduce without further change. Furthermore, regression=# select '9791020902573'::isbn = '10-209-0257-4'::isbn; ?column? ---------- f (1 row) so apparently the canonicalization conversion wasn't right to begin with. I have no idea which of these forms should be considered valid or canonical, but surely the two behaviors exhibited above are wrong. regards, tom lane
> I think this definitely indicates a bug: > > regression=# select '9791020902573'::isbn; > isbn > --------------- > 10-209-0257-4 > (1 row) Yes, indeed, this one is definitely a bug. > regression=# select '9791020902573'::isbn = '10-209-0257-4'::isbn; > ?column? > ---------- > f > (1 row) > > so apparently the canonicalization conversion wasn't right to begin with. I still do not get it, "False" is expected on this last one, as this is not the same book. > I have no idea which of these forms should be considered valid or > canonical, but surely the two behaviors exhibited above are wrong. I think that it is only the first one, i.e. converting the ISBN13 code beginning with 979 to a ISBN10 should have raised an exception. -- Fabien.
On Tuesday, June 16, 2015, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > I think this definitely indicates a bug: >> >> regression=# select '9791020902573'::isbn; >> isbn >> --------------- >> 10-209-0257-4 >> (1 row) >> > > Yes, indeed, this one is definitely a bug. > > regression=# select '9791020902573'::isbn = '10-209-0257-4'::isbn; >> ?column? >> ---------- >> f >> (1 row) >> >> so apparently the canonicalization conversion wasn't right to begin with. >> > > I still do not get it, "False" is expected on this last one, as this is > not the same book. > I have no idea which of these forms should be considered valid or >> canonical, but surely the two behaviors exhibited above are wrong. >> > > I think that it is only the first one, i.e. converting the ISBN13 code > beginning with 979 to a ISBN10 should have raised an exception. > > There is only one problem/bug (hopefully) but it is evidenced by the fact that the two queries provided contradict each other. It is basically a different form of proof. You know the first query is wrong using external knowledge but Tom has shown that a bug exists using strictly knowledge local to the system. Without the external knowledge one cannot say which alternative is correct but one must be incorrect. Fixing the first query to become an error brings the system basic into internal consistency since that result is congruous with the second query. And is externally affirmed to be the correct outcome (I trust...). Making the second query return true makes the system internally consistent too...but doesn't actually provide the correct answer (again, I trust) David J.
>> regression=# select '9791020902573'::isbn = '10-209-0257-4'::isbn; >> f >> beginning with 979 to a ISBN10 should have raised an exception. > Making the second query return true makes the system internally consistent > too...but doesn't actually provide the correct answer (again, I trust) Yes. On third thoughs, the above query should raise an exception because the cast is illegal, 979* is not an ISBN, so the answer is neither true nor false. The query should be simply written to use the larger type and not cast to the subtype. SELECT EAN13 '9791234567896' = ISBN '123456789X'; -- False SELECT EAN13 '9781234567897' = ISBN '123456789X'; -- True -- Fabien.
>> I think this definitely indicates a bug: >> >> regression=# select '9791020902573'::isbn; >> isbn >> --------------- >> 10-209-0257-4 >> (1 row) > > Yes, indeed, this one is definitely a bug. Attached patch makes isn to distinguish when needed between ISBN and ISBN13. ISBN is the legacy ISBN10 number which is part of ISBN13, but not all ISBN13 numbers are ISBN. The initial version was confusing both types, leading to accept valid ISBN13 number as valid ISBN(10) number although they use the 979 prefix, hence the reported issues. The patch also adds some tests. There is no clear upgrade path from 1.0 to 2.0, as the ISBN "in" function is changed, as well as some casts. I'm not sure of a safe way to upgrade, because the initial version was accepting invalid ISBN which would now be rejected, thus some store data may be considered corrupted in a database. On the other hand, probably few people would use a legacy ISBN type and put ISBN13 number in them, hopefully? Hmmm... -- Fabien.
On 06/20/2015 10:04 AM, Fabien COELHO wrote: > >>> I think this definitely indicates a bug: >>> >>> regression=# select '9791020902573'::isbn; >>> isbn >>> --------------- >>> 10-209-0257-4 >>> (1 row) >> >> Yes, indeed, this one is definitely a bug. > > Attached patch makes isn to distinguish when needed between ISBN and > ISBN13. ISBN is the legacy ISBN10 number which is part of ISBN13, but not > all ISBN13 numbers are ISBN. > > The initial version was confusing both types, leading to accept valid > ISBN13 number as valid ISBN(10) number although they use the 979 prefix, > hence the reported issues. > > The patch also adds some tests. > > There is no clear upgrade path from 1.0 to 2.0, as the ISBN "in" function > is changed, as well as some casts. I'm not sure of a safe way to upgrade, > because the initial version was accepting invalid ISBN which would now be > rejected, thus some store data may be considered corrupted in a database. > On the other hand, probably few people would use a legacy ISBN type and > put ISBN13 number in them, hopefully? Hmmm... 10-digit ISBNs are indeed legacy, I don't have much sympathy for an application that still uses the 10-digit isbn datatype in a table. The cast might well be used for display purposes, though. You might do something like "SELECT isbn_column as bisbn13, isbn_column::isbn as isbn FROM ...". So I think your patch is OK, and we should provide an upgrade script, and put a warning in the release notes that you should not be using the 10-digit isbn datatype as a column type anymore. As long as you don't use the isbn datatype in any tables or views, the upgrade script could just DROP the datatype, and recreate it. It's unfortunate that it won't work if you've used it in a view, but it seems acceptable. Could you write an upgrade script like that, please? In the long term, I think we should deprecate all the legacy "short" datatypes, isbn, issn, ismn and upc. As a replacement for display purposes, provide functions to print an ean13 in those legacy forms. I'm not even sure we need the isbn13, issn13 etc. datatypes, to be honest. A single ean13 datatype and a function to check if an EAN is an ISBN, ISSN etc. should be enough. You could then use that in a CHECK constraint or create a domain if you really need to. Oh, and get rid of the "weak input mode" while we're at it - that's just horrible. Perhaps we should add those functions now, so that you can rewrite your application to use them now, and we could remove the legacy datatypes in a future release. - Heikki
Hello Heikki, >> There is no clear upgrade path from 1.0 to 2.0, as the ISBN "in" function >> is changed, as well as some casts. I'm not sure of a safe way to upgrade, >> because the initial version was accepting invalid ISBN which would now be >> rejected, thus some store data may be considered corrupted in a database. >> On the other hand, probably few people would use a legacy ISBN type and >> put ISBN13 number in them, hopefully? Hmmm... > > 10-digit ISBNs are indeed legacy, I don't have much sympathy for an > application that still uses the 10-digit isbn datatype in a table. The cast > might well be used for display purposes, though. You might do something like > "SELECT isbn_column as bisbn13, isbn_column::isbn as isbn FROM ...". So I > think your patch is OK, and we should provide an upgrade script, and put a > warning in the release notes that you should not be using the 10-digit isbn > datatype as a column type anymore. > > As long as you don't use the isbn datatype in any tables or views, the > upgrade script could just DROP the datatype, and recreate it. It's > unfortunate that it won't work if you've used it in a view, but it seems > acceptable. Could you write an upgrade script like that, please? Probably I could. I'll have a look. It has to drop the type *and* some casts I think. Should it cascade? Probably not, because people will no like loosing their library data:-) > In the long term, I think we should deprecate all the legacy "short" > datatypes, isbn, issn, ismn and upc. As a replacement for display purposes, > provide functions to print an ean13 in those legacy forms. I'm not even sure > we need the isbn13, issn13 etc. datatypes, to be honest. A single ean13 > datatype and a function to check if an EAN is an ISBN, ISSN etc. should be > enough. You could then use that in a CHECK constraint or create a domain if > you really need to. Oh, and get rid of the "weak input mode" while we're at > it - that's just horrible. Hmmm. I like the principle of using a type to embed the constraint check, and I*N13 types are fine, only the short types are indeed legacy and have an issue. > Perhaps we should add those functions now, so that you can rewrite your > application to use them now, and we could remove the legacy datatypes in a > future release. I do not have an application, I just wrote the patch because I used these types some time ago. Also, I'm not sure how to deprecate a type. -- Fabien.
Mon, 27 Jul 2015 11:15:17 +0200 (CEST) Hello Heikki, >> Attached patch makes isn to distinguish when needed between ISBN and >> ISBN13. ISBN is the legacy ISBN10 number which is part of ISBN13, but not >> all ISBN13 numbers are ISBN. >> >> The initial version was confusing both types, leading to accept valid >> ISBN13 number as valid ISBN(10) number although they use the 979 prefix, >> hence the reported issues. >> >> The patch also adds some tests. >> >> There is no clear upgrade path from 1.0 to 2.0, as the ISBN "in" function >> is changed, as well as some casts. I'm not sure of a safe way to upgrade, >> because the initial version was accepting invalid ISBN which would now be >> rejected, thus some store data may be considered corrupted in a database. >> On the other hand, probably few people would use a legacy ISBN type and >> put ISBN13 number in them, hopefully? Hmmm... > > 10-digit ISBNs are indeed legacy, I don't have much sympathy for an > application that still uses the 10-digit isbn datatype in a table. The cast > might well be used for display purposes, though. You might do something like > "SELECT isbn_column as bisbn13, isbn_column::isbn as isbn FROM ...". So I > think your patch is OK, and we should provide an upgrade script, After some tests, it seems that an upgrade script cannot be provided, or I missed something. Basically version 1.0 was confusing ISBN & ISBN13, and for upgrading the ISBN type definition must be modified. However, there is no such thing as ALTER TYPE, and trying DROP TYPE/ALTER TYPE is rejected because the type belongs to an extension (hey I know that, this is what is being done!). Probably this is some kind of feature/bug of the extension mecanism, that I'm not planing to fix for isn... So the v2 attached provides the two versions (1.0 more or less bug compatible, 2.0 without the confusion bug), without path for an upgrade between the two, can't help it. Maybe providing a 1.0 buggy version is a bad idea, though. > and put a warning in the release notes that you should not be using the > 10-digit isbn datatype as a column type anymore. A warning would be a good think. -- Fabien.
On Mon, Jul 27, 2015 at 2:15 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > After some tests, it seems that an upgrade script cannot be provided, > or I missed something. > > Basically version 1.0 was confusing ISBN & ISBN13, and for upgrading the > ISBN type definition must be modified. However, there is no such thing as > ALTER TYPE, and trying DROP TYPE/ALTER TYPE is rejected because the type > belongs to an extension (hey I know that, this is what is being done!). > You can't test extension upgrade scripts by running the component statements at the command line. You have to make an actual dummy upgrade script and test it via "alter extension ... update ...". When executed in that context, they are exempted from the rule you are running into. (Or maybe there is another way that I don't know of). But I wonder what will happen when you try to drop a type which is in use? Cheers, Jeff
Hello, >> After some tests, it seems that an upgrade script cannot be provided, >> or I missed something. >> >> Basically version 1.0 was confusing ISBN & ISBN13, and for upgrading the >> ISBN type definition must be modified. However, there is no such thing as >> ALTER TYPE, and trying DROP TYPE/ALTER TYPE is rejected because the type >> belongs to an extension (hey I know that, this is what is being done!). > > You can't test extension upgrade scripts by running the component > statements at the command line. You have to make an actual dummy upgrade > script and test it via "alter extension ... update ...". Yep... but I actually tried that. > When executed in that context, they are exempted from the rule you are > running into. With DROP TYPE isbn: # ALTER EXTENSION isn UPDATE; ERROR: cannot drop type isbn because other objects depend on it DETAIL: extension isn depends on type isbn HINT: Use DROP ... CASCADE to drop the dependent objects too. With DROP TYPE isbn CASCADE: # ALTER EXTENSION isn UPDATE; ERROR: cannot drop extension "isn" because it is being modified > (Or maybe there is another way that I don't know of). > But I wonder what will happen when you try to drop a type which is in use? Probably it should drop the dependent objects recursively, but as isn depends on isbn we do not want to drop the extension while upgrading it. -- Fabien.
Fabien COELHO wrote: > After some tests, it seems that an upgrade script cannot be provided, > or I missed something. > > Basically version 1.0 was confusing ISBN & ISBN13, and for upgrading the > ISBN type definition must be modified. However, there is no such thing as > ALTER TYPE, and trying DROP TYPE/ALTER TYPE is rejected because the type > belongs to an extension (hey I know that, this is what is being done!). If I understand this correctly, you need to ALTER EXTENSION .. DROP (which removes the type from the extension), then drop the type, create it fresh, then ALTER EXTENSION ADD to put it back. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> Basically version 1.0 was confusing ISBN & ISBN13, and for upgrading the >> ISBN type definition must be modified. However, there is no such thing as >> ALTER TYPE, and trying DROP TYPE/ALTER TYPE is rejected because the type >> belongs to an extension (hey I know that, this is what is being done!). > > If I understand this correctly, you need to ALTER EXTENSION .. DROP > (which removes the type from the extension), then drop the type, create > it fresh, then ALTER EXTENSION ADD to put it back. Ok, so the upgrade script must do that manually, that is what I was missing. I'll try that. Thanks. -- Fabien.
> If I understand this correctly, you need to ALTER EXTENSION .. DROP > (which removes the type from the extension), then drop the type, create > it fresh, then ALTER EXTENSION ADD to put it back. After some testing with that, I have a conundrum: I would like the update to *fail* if the user actually uses the isbn type so as not to harm data. However, when the script is at: DROP TYPE isbn; It fails, because the isbn_in & out functions depends on it. If you try: DROP FUNCTION isbn_in(cstring); It fails, because the isbn type depends on it. Sure. The usual exit strategy is: DROP TYPE isbn CASCADE; Which works fine... but as a side effect silently remove ISBN columns that may exist in user table. I do not think it is a good idea for an extension update script to do that to a librarian:-) Bar any idea around this issue, I think that an extension upgrade script is really unadvisable, and the user should do a dump data, drop extension, create extension, restore data. -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: > The usual exit strategy is: > DROP TYPE isbn CASCADE; > Which works fine... but as a side effect silently remove ISBN columns that > may exist in user table. I do not think it is a good idea for an extension > update script to do that to a librarian:-) Certainly. > Bar any idea around this issue, I think that an extension upgrade script > is really unadvisable, and the user should do a dump data, drop extension, > create extension, restore data. AFAICS, that isn't exactly an improvement. You're telling the user "if you have an ISBN column, you're screwed, and we are going to make it as painful as it can possibly be to get out of that situation". I think we'd be better off trying to migrate to a situation where these type names all still exist but they all act like ISBN13. I'm not sure what the stages on that journey are. regards, tom lane
> AFAICS, that isn't exactly an improvement. You're telling the user > "if you have an ISBN column, you're screwed, and we are going to make it > as painful as it can possibly be to get out of that situation". > > I think we'd be better off trying to migrate to a situation where these > type names all still exist but they all act like ISBN13. I'm not sure > what the stages on that journey are. Hmm... This is more or less what the previous version was doing, including display bugs. So the attached v3: - distinguishes internally between ISBN & ISBN13 (so it is clean), but accepts values of one for the other transparently. - displays ISBN as ISBN if possible, otherwise they are show as ISBN13. This fixes the reported bug, and the application would have to adapt. - adds regression tests, including expected errors. The good news is that there is no extension version update, it is just a behavioral change when ISBN13 numbers are used with the ISBN type. -- Fabien.
On 07/28/2015 10:57 AM, Fabien COELHO wrote: > >> AFAICS, that isn't exactly an improvement. You're telling the user >> "if you have an ISBN column, you're screwed, and we are going to make it >> as painful as it can possibly be to get out of that situation". >> >> I think we'd be better off trying to migrate to a situation where these >> type names all still exist but they all act like ISBN13. I'm not sure >> what the stages on that journey are. > > Hmm... This is more or less what the previous version was doing, including > display bugs. > > So the attached v3: > - distinguishes internally between ISBN & ISBN13 (so it is clean), > but accepts values of one for the other transparently. > - displays ISBN as ISBN if possible, otherwise they are show as > ISBN13. This fixes the reported bug, and the application would > have to adapt. > - adds regression tests, including expected errors. > > The good news is that there is no extension version update, it is just a > behavioral change when ISBN13 numbers are used with the ISBN type. Hmm, I don't think we need or want the separate ISBN and ISBN13 codes internally. They're used interchangeably, anyway. If we just apply your change to ean2ISBN(), we're done. Per the attached. Thanks for the regression tests, that was very badly needed! - Heikki
Attachment
> [...] > > Hmm, I don't think we need or want the separate ISBN and ISBN13 codes > internally. They're used interchangeably, anyway. If we just apply your > change to ean2ISBN(), we're done. Per the attached. The simpler the better. Works for me. -- Fabien.
On 08/02/2015 08:18 PM, Fabien COELHO wrote: > >> [...] >> >> Hmm, I don't think we need or want the separate ISBN and ISBN13 codes >> internally. They're used interchangeably, anyway. If we just apply your >> change to ean2ISBN(), we're done. Per the attached. > > The simpler the better. Works for me. Ok, committed. Thanks! - Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 08/02/2015 08:18 PM, Fabien COELHO wrote: >> The simpler the better. Works for me. > Ok, committed. Thanks! I guess you forgot "git push"? regards, tom lane
On 08/02/2015 10:04 PM, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: >> On 08/02/2015 08:18 PM, Fabien COELHO wrote: >>> The simpler the better. Works for me. > >> Ok, committed. Thanks! > > I guess you forgot "git push"? Yep, pushed now for real. - Heikki