Thread: [Solved] SQL Server to PostgreSQL
A thanks to everyone on this list and especially; Jeffery Rhines, Chris Knight, Chris Bitmead, and Sevo Stille. The solution turned out to be very simple. After catching a SCSI BUS speed mismatch problem which caused a NT Backup 'Restore' failure I discovered that the actual data was in .mdb files! Copied the files to a system running MS Access (Office 97) and was able to export them to a delimited format which went into PostgreSQL with very few problems. Mostly there were split lines which the \copy command didn't like. Hand corrected them. I was able to get the table format by using MS Access. Only question left is what is the corresponding field type in PostgreSQL for a memo field in SQL Server/Access (varchar(nnnn))? Again thanks for all the help, Rod -- Roderick A. Anderson raanders@altoplanos.net Altoplanos Information Systems, Inc. Voice: 208.765.6149 212 S. 11th Street, Suite 5 FAX: 208.664.5299 Coeur d'Alene, ID 83814
Le 22.08.00 a 09:37, "Roderick A. Anderson" m'ecrivait : )I was able to get the table format by using MS Access. Only question left )is what is the corresponding field type in PostgreSQL for a memo field in )SQL Server/Access (varchar(nnnn))? 'text' type perhaps ? Lionel
I hate it when I do this. See an answer I want and run with it rather than find the real answer. Turned out the data files (.mdb) _didn't_ belong to the database. They were a piece of the database that was used for a report. Back to the old grind wheel. Rod -- Roderick A. Anderson raanders@altoplanos.net Altoplanos Information Systems, Inc. Voice: 208.765.6149 212 S. 11th Street, Suite 5 FAX: 208.664.5299 Coeur d'Alene, ID 83814
Tressens Lionel <tressens@etud.insa-tlse.fr> writes: > Le 22.08.00 a 09:37, "Roderick A. Anderson" m'ecrivait : > )I was able to get the table format by using MS Access. Only question left > )is what is the corresponding field type in PostgreSQL for a memo field in > )SQL Server/Access (varchar(nnnn))? > 'text' type perhaps ? Uh ... what's wrong with varchar(n) ? regards, tom lane
On Tue, 22 Aug 2000, Tom Lane wrote: > Tressens Lionel <tressens@etud.insa-tlse.fr> writes: > > Le 22.08.00 a 09:37, "Roderick A. Anderson" m'ecrivait : > > )I was able to get the table format by using MS Access. Only question left > > )is what is the corresponding field type in PostgreSQL for a memo field in > > )SQL Server/Access (varchar(nnnn))? > > > 'text' type perhaps ? > > Uh ... what's wrong with varchar(n) ? How big can our n be for varchar? By looking at his description I'm thinking SQL Server allows a large n. Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net 128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
I've wondered that myself, actually. What are the benefits and drawbacks to going with one over the other, besides the obvious 255-char field length limit for varchar? The reason to stay away from "memo" fields in other serious RDBMSs are typically more difficult maintenance, significantly lower performance, and requiring special function calls to get the data out. Do any of those apply to PG? Jeff Tom Lane wrote: > > Tressens Lionel <tressens@etud.insa-tlse.fr> writes: > > Le 22.08.00 a 09:37, "Roderick A. Anderson" m'ecrivait : > > )I was able to get the table format by using MS Access. Only question left > > )is what is the corresponding field type in PostgreSQL for a memo field in > > )SQL Server/Access (varchar(nnnn))? > > > 'text' type perhaps ? > > Uh ... what's wrong with varchar(n) ? > > regards, tom lane
"Jeffrey A. Rhines" <jrhines@email.com> writes: >> Uh ... what's wrong with varchar(n) ? > > I've wondered that myself, actually. What are the benefits and > drawbacks to going with one over the other, besides the obvious 255-char > field length limit for varchar? AFAIK there has *never* been a 255-char limit on char or varchar in pgsql ... you must be thinking of Some Other DBMS. The limit for these datatypes in 7.0 and before is BLCKSZ less some overhead --- ~8000 bytes in a default setup. Beginning in 7.1 it's an essentially arbitrary number. I set it at 10Mb in current sources, but there's no strong reason for that number over any other. In theory it could be up to 1Gb, but as Jan Wieck points out in a nearby thread, you probably wouldn't like the performance of shoving gigabyte-sized text values around. We need to think about offering API functions that will allow reading and writing huge field values in bite-sized chunks. There's no essential performance difference between char(n), varchar(n), and text in Postgres, given the same-sized data value. char(n) truncates or blank-pads to exactly n characters; varchar(n) truncates if more than n characters; text never truncates nor pads. Beyond that they are completely identical in storage requirements. Pick one based on the semantics you want for your application. regards, tom lane
As we are talking about 7.1 and huge field size... MS-SQL has a function that allows you to retreive part of a field a kind of mid$(field, start, length). This would be a good addition indeed. last problem. PG does not allow to store binary data. I'm not talking about the current implementation of BLOB, but what will happen in 7.1... If I want to store an image in a field. I cannot do that in 7.1 because the data sent by 7.1 and received in libpq must be formated in ASCII. I haven't play around to see if I could create a user type called varbinary(n), which will output via varbinary_out just the content of a buffer... May be varchar does it already (even if there is a \0?). I know I should submit this problem to the hacker list, but I don't want to subscribe to hacker just to submit one message... BTW is there an alpha/beta release of PG 7.1 ? Franck Martin Database Development Officer SOPAC South Pacific Applied Geoscience Commission Fiji E-mail: franck@sopac.org <mailto:franck@sopac.org> Web site: http://www.sopac.org/ <http://www.sopac.org/> This e-mail is intended for its recipients only. Do not forward this e-mail without approval. The views expressed in this e-mail may not be neccessarily the views of SOPAC. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, August 23, 2000 3:11 PM To: Jeffrey A. Rhines Cc: PostgreSQL::General List Subject: Re: [GENERAL] [Solved] SQL Server to PostgreSQL "Jeffrey A. Rhines" <jrhines@email.com> writes: >> Uh ... what's wrong with varchar(n) ? > > I've wondered that myself, actually. What are the benefits and > drawbacks to going with one over the other, besides the obvious 255-char > field length limit for varchar? AFAIK there has *never* been a 255-char limit on char or varchar in pgsql ... you must be thinking of Some Other DBMS. The limit for these datatypes in 7.0 and before is BLCKSZ less some overhead --- ~8000 bytes in a default setup. Beginning in 7.1 it's an essentially arbitrary number. I set it at 10Mb in current sources, but there's no strong reason for that number over any other. In theory it could be up to 1Gb, but as Jan Wieck points out in a nearby thread, you probably wouldn't like the performance of shoving gigabyte-sized text values around. We need to think about offering API functions that will allow reading and writing huge field values in bite-sized chunks. There's no essential performance difference between char(n), varchar(n), and text in Postgres, given the same-sized data value. char(n) truncates or blank-pads to exactly n characters; varchar(n) truncates if more than n characters; text never truncates nor pads. Beyond that they are completely identical in storage requirements. Pick one based on the semantics you want for your application. regards, tom lane
I think the ODBC spec limits varchar to 255 bytes. Some ODBC drivers enforce that limit. Tom Lane wrote: > "Jeffrey A. Rhines" <jrhines@email.com> writes: > >> Uh ... what's wrong with varchar(n) ? > > > > I've wondered that myself, actually. What are the benefits and > > drawbacks to going with one over the other, besides the obvious 255-char > > field length limit for varchar? > > AFAIK there has *never* been a 255-char limit on char or varchar in > pgsql ... you must be thinking of Some Other DBMS. > > [snip] > regards, tom lane
Franck Martin wrote: > > As we are talking about 7.1 and huge field size... > > MS-SQL has a function that allows you to retreive part of a field a kind of > mid$(field, start, length). This would be a good addition indeed. does substr() not handle this? > last problem. PG does not allow to store binary data. I'm not talking about > the current implementation of BLOB, but what will happen in 7.1... > > If I want to store an image in a field. I cannot do that in 7.1 because the > data sent by 7.1 and received in libpq must be formated in ASCII. I haven't > play around to see if I could create a user type called varbinary(n), which > will output via varbinary_out just the content of a buffer... May be varchar > does it already (even if there is a \0?). One thing I've thought about is creating an escape char to delimit this sort of thing. Maybe a control-A, followed by four bytes giving the length followed by that many byes of data. Escape \0's with ^A0 and a real ^A with ^A^A. I would love something like this because I always get tripped up by fields I'm inserting containing quotes and other characters that need to be escaped. > I know I should submit this problem to the hacker list, but I don't want to > subscribe to hacker just to submit one message... Heh, I know what you mean... -- Martijn van Oosterhout <kleptog@cupid.suninternet.com> http://cupid.suninternet.com/~kleptog/
*** Tom Lane <tgl@sss.pgh.pa.us> [Tuesday, 22.August.2000, 23:11 -0400]: > There's no essential performance difference between char(n), varchar(n), > and text in Postgres, given the same-sized data value. char(n) > truncates or blank-pads to exactly n characters; varchar(n) truncates > if more than n characters; text never truncates nor pads. Beyond that > they are completely identical in storage requirements. [.rs.] Does varchar(188) takes 188 bytes (+ bytes for length storage) every time, no matter if it contains 'my text' or 'my long 188 char text.....' ? -- radoslaw.stachowiak.........................................http://alter.pl/
>>>>> "Radoslaw" == Radoslaw Stachowiak <radek@alter.pl> writes: Radoslaw> Does varchar(188) takes 188 bytes (+ bytes for length Radoslaw> storage) every time, no matter if it contains 'my text' or Radoslaw> 'my long 188 char text.....' ? The way I understand it varchar(n) is variable-length, while char(n) is fixed-lenght. Thus the behaviour you describe above is that of char(n). Martin -- GPG public key: http://home1.stofanet.dk/factotum/gpgkey.txt
Martin Christensen wrote: > >>>>> "Radoslaw" == Radoslaw Stachowiak <radek@alter.pl> writes: > Radoslaw> Does varchar(188) takes 188 bytes (+ bytes for length > Radoslaw> storage) every time, no matter if it contains 'my text' or > Radoslaw> 'my long 188 char text.....' ? > > The way I understand it varchar(n) is variable-length, while char(n) > is fixed-lenght. Thus the behaviour you describe above is that of > char(n). Right for any pre-7.1 version. From 7.1 on the system will try to compress all types internally stored as variable length (char(), varchar(), text and some more). So the real amount of bytes for a char(188) will be "at maximum 192 - probably less". Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > From 7.1 on the system will try to compress all types > internally stored as variable length (char(), varchar(), text > and some more). So the real amount of bytes for a char(188) > will be "at maximum 192 - probably less". Don't variable-length records incur a performance overhead? In this case, ought I be able to specify the length for a record if I know ahead of time it will be the same in every case? :o Ian Turner -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.1 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE5sabgfn9ub9ZE1xoRAhayAKCwMjh/5tYlg8zZiAimJlgFSfCLsQCghBce Gxx6X8sSwIACIHvdbxBsgGQ= =bogc -----END PGP SIGNATURE-----
Ian Turner <vectro@pipeline.com> writes: > Don't variable-length records incur a performance overhead? Only to the extent that the system can't cache offset information for later columns in that table. While someone evidently once thought that was worthwhile, I've never seen the column-access code show up as a particularly hot spot in any profile I've run. I doubt you could actually measure any difference, let alone show it to be important enough to be worth worrying about. In any case, char(n) will still do what you want for reasonable-size records. The TOAST code only kicks in when the total tuple size exceeds BLCKSZ/4 ... and at that point, compression is a good idea in any case. Now that you mention it, though, doesn't TOAST break heapam's assumption that char(n) is fixed length? Seems like we'd better either remove that assumption or mark char(n) nontoastable. Any opinions which is better? regards, tom lane
Tom Lane wrote: > Now that you mention it, though, doesn't TOAST break heapam's assumption > that char(n) is fixed length? Seems like we'd better either remove that > assumption or mark char(n) nontoastable. Any opinions which is better? Is the saved overhead from assuming char(n) is fixed really that big that it's worth NOT to gain the TOAST advantages? After the GB benchmarks we know that we have some spare performance to waste for such things :-) Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Ian Turner <vectro@pipeline.com> writes: > Don't variable-length records incur a performance overhead? Only to the extent that the system can't cache offset information for later columns in that table. While someone evidently once thought that was worthwhile, I've never seen the column-access code show up as a particularly hot spot in any profile I've run. I doubt you could actually measure any difference, let alone show it to be important enough to be worth worrying about. In any case, char(n) will still do what you want for reasonable-size records. The TOAST code only kicks in when the total tuple size exceeds BLCKSZ/4 ... and at that point, compression is a good idea in any case. Now that you mention it, though, doesn't TOAST break heapam's assumption that char(n) is fixed length? Seems like we'd better either remove that assumption or mark char(n) nontoastable. Any opinions which is better? regards, tom lane
Tom Lane wrote: > Now that you mention it, though, doesn't TOAST break heapam's assumption > that char(n) is fixed length? Seems like we'd better either remove that > assumption or mark char(n) nontoastable. Any opinions which is better? Is the saved overhead from assuming char(n) is fixed really that big that it's worth NOT to gain the TOAST advantages? After the GB benchmarks we know that we have some spare performance to waste for such things :-) Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
[ Charset ISO-8859-1 unsupported, converting... ] > Ian Turner <vectro@pipeline.com> writes: > > Don't variable-length records incur a performance overhead? > > Only to the extent that the system can't cache offset information for > later columns in that table. While someone evidently once thought that > was worthwhile, I've never seen the column-access code show up as a > particularly hot spot in any profile I've run. I doubt you could > actually measure any difference, let alone show it to be important > enough to be worth worrying about. It clearly is a hot-spot. That monster macro, fastgetattr(), in heapam.h is in there for a reason. It accounts for about 5% for straight sequential scan case, last I heard from someone who ran a test. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Ian Turner <vectro@pipeline.com> writes: > > Don't variable-length records incur a performance overhead? > > Only to the extent that the system can't cache offset information for > later columns in that table. While someone evidently once thought that > was worthwhile, I've never seen the column-access code show up as a > particularly hot spot in any profile I've run. I doubt you could > actually measure any difference, let alone show it to be important > enough to be worth worrying about. > > In any case, char(n) will still do what you want for reasonable-size > records. The TOAST code only kicks in when the total tuple size exceeds > BLCKSZ/4 ... and at that point, compression is a good idea in any case. My logic is that I use char() when I want the length to be fixed, like 2-letter state codes, and varchar() for others where I just want a maximum allowed, like last name. I use text for arbitrary length stuff. Tom is right that though there is a small performance difference, it is better just to use the right type. > > Now that you mention it, though, doesn't TOAST break heapam's assumption > that char(n) is fixed length? Seems like we'd better either remove that > assumption or mark char(n) nontoastable. Any opinions which is better? I am sure Jan handled that. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Tom Lane wrote: > > Now that you mention it, though, doesn't TOAST break heapam's assumption > > that char(n) is fixed length? Seems like we'd better either remove that > > assumption or mark char(n) nontoastable. Any opinions which is better? > > Is the saved overhead from assuming char(n) is fixed really > that big that it's worth NOT to gain the TOAST advantages? > After the GB benchmarks we know that we have some spare > performance to waste for such things :-) Oh, now I get it. Some TOAST values may be out-of line. Can we really throw char() into TOAST? I guess we can. We have to record somewhere that we have toasted that tuple and disable the offset cache for it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Tom Lane wrote: >>>> Now that you mention it, though, doesn't TOAST break heapam's assumption >>>> that char(n) is fixed length? Seems like we'd better either remove that >>>> assumption or mark char(n) nontoastable. Any opinions which is better? > I don't see any more communication on this in my mail archives. Nothing's been done yet, but we *must* fix this before 7.1. I'm wondering whether it's worthwhile keeping the VARLENA_FIXED_SIZE macro at all. Is there any potential use for it? regards, tom lane