Thread: Mysql -> Postgresql pitfalls
Hello- I was exchanging some email with Tom Lane about some problems I was running into converting from mysql to postgresql. He suggested I contact this list. Is there some sort of document describing common mysql->pgsql pitfalls? If not, I'd be happy to start one with the things I've run into. Who would I talk to about that? I'm not on this list, so please CC me if you reply to this message. Thanks, Chad
Have you looked at techdocs.postgresql.org? --------------------------------------------------------------------------- Chad N. Tindel wrote: > Hello- > > I was exchanging some email with Tom Lane about some problems I was running > into converting from mysql to postgresql. He suggested I contact this list. > > Is there some sort of document describing common mysql->pgsql pitfalls? > If not, I'd be happy to start one with the things I've run into. Who > would I talk to about that? > > I'm not on this list, so please CC me if you reply to this message. > > Thanks, > > Chad > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Excellent! That is exactly what I'm talking about. BTW, the "my2pg.pl" link is a dead link. Chad
On Fri, 1 Aug 2003, Chad N. Tindel wrote: > Excellent! That is exactly what I'm talking about. > > BTW, the "my2pg.pl" link is a dead link. This is probably something that could be codified a bit here in the news group. I'd say the pitfalls I'm aware of are: autoincrement fields -> sequences full text indexing -> fts/tsearch etc... enum -> check constraint vacuum / analyze simple performance tuning (out of the box the postgresql.conf settings aren't really all that great, but they let postgresql come up on almost anything.) pg_hba.conf default setting that doesn't allow tcp/ip connections not being able to run postgresql as root (a good thing TM)
> > Excellent! That is exactly what I'm talking about. > > > > BTW, the "my2pg.pl" link is a dead link. > > This is probably something that could be codified a bit here in the news > group. > > I'd say the pitfalls I'm aware of are: > > autoincrement fields -> sequences Yes. The documentation very clearly states using sequences instead of auto-increment, but it doesn't make it clear that inserting the id's into data by hand doesn't cause the sequence to be auto-matically incremented. It'd be nice of postgres had a way to trigger an update of the sequence value after every insert containing an id clumn. > not being able to run postgresql as root (a good thing TM) <RANT>As a programmer, I personally would never write code that kept people from running things as root. I mean, what is the point? If an administrator wants to run postgresql or apache as root, why shouldn't they be allowed to make that conscious decision for themselves? As it is, you have to recompile apache with some BIG_SECURITY_HOLE defined in order to run as root, which means you can't just use the out of the box apache rpm. Its so stupid to write *extra* code that keeps people from doing something that isn't even fundamentally incorrect.</RANT> All that being said, I don't think it causes too big of a problem for postgres installations. Chad
"Chad N. Tindel" <chad@tindel.net> writes: > <RANT>As a programmer, I personally would never write code that kept > people from running things as root. I mean, what is the point? If someone roots your box, it's not our fault. Simple as that. Personally, I wish more net-accessible servers were written with that philosophy. The correct question is NEVER "why can't I run this as root?". The correct question is ALWAYS "how can I avoid running this as root? And if I can't avoid it, why not?". Any other approach leads to Outlook Express. (BTW, have you forgotten the SQL Server worm already?) regards, tom lane
On Fri, Aug 01, 2003 at 04:51:11PM -0400, Chad N. Tindel wrote: > > <RANT>As a programmer, I personally would never write code that kept people from > running things as root. I mean, what is the point? If an administrator Hmmm? The point is something called security. > wants to run postgresql or apache as root, why shouldn't they be allowed > to make that conscious decision for themselves? As it is, you have to recompile Because administrators have too much to worry about. If an application makes a conscious decision to allow itself to run in a knowingly insecure manner, that application is doing the admin a disfavor, and should be ditched. > apache with some BIG_SECURITY_HOLE defined in order to run as root, which means > you can't just use the out of the box apache rpm. Its so stupid to write > *extra* code that keeps people from doing something that isn't even > fundamentally incorrect.</RANT> This has no logic. Security is fundamental. No security is fundamentally incorrect. -Roberto -- +----| Roberto Mello - http://www.brasileiro.net/ |------+ + Computer Science Graduate Student, Utah State University + + USU Free Software & GNU/Linux Club - http://fslc.usu.edu/ + ---------------/ NO INSPIRATION TODAY /-------------------
On Sat, Aug 02, 2003 at 01:10:49PM -0400, Chad N. Tindel wrote: > > Hmmm? > > > > The point is something called security. > > There is no such thing as a "proper amount of security that is correct for > all operating environments". Whoever said there was? I didn't. > > This has no logic. Security is fundamental. No security is fundamentally > > incorrect. > > Well, you could make a box very secure by unplugging all the LAN cables from > it and putting it in a giant safe deposit box. However, I would say that such > a machine would be fundamentally incorrect for most operating environments. Nice strawmen. Too bad it's a logical fallacy. > Many people run their machines with "+ +" in root's .rhosts file because it > eases the task of doing administration. They work in a company where the box is > behind a firewall on some public network and they need there computers to How about people inside the company? Are they all nice people who live in happy valley? > get real work done.... they don't want things like "security" to get in the > way because nobody is trying to hack those machines. I digress. -Roberto -- +----| Roberto Mello - http://www.brasileiro.net/ |------+ + Computer Science Graduate Student, Utah State University + + USU Free Software & GNU/Linux Club - http://fslc.usu.edu/ + Backup not found: (Q)uem mandou usar o Stacker?
> > <RANT>As a programmer, I personally would never write code that kept > > people from running things as root. I mean, what is the point? > > If someone roots your box, it's not our fault. Simple as that. I didn't say "require them to run as a non-root user". I said "Give them the choice to decide what is correct for their environment". In the case of apache, there are many internal webservers that are not exposed to the threat of the public internet; for such servers, it may be appropriate to run apache as root because it simplifies the administration and automation of tasks. But to do so, one has to know how to re-compile apache, which will exclude a lot of your basic garden variety administrators. Its very un-friendly programming. Chad
> Hmmm? > > The point is something called security. There is no such thing as a "proper amount of security that is correct for all operating environments". > > apache with some BIG_SECURITY_HOLE defined in order to run as root, which means > > you can't just use the out of the box apache rpm. Its so stupid to write > > *extra* code that keeps people from doing something that isn't even > > fundamentally incorrect.</RANT> > > This has no logic. Security is fundamental. No security is fundamentally > incorrect. Well, you could make a box very secure by unplugging all the LAN cables from it and putting it in a giant safe deposit box. However, I would say that such a machine would be fundamentally incorrect for most operating environments. Do you worry about whether or not someone snuck into your house at night and installed some sort of keyboard logging device onto your PC so that they can get your root password? Is that a "fundamental" part of your personal security? Probably not, because *that* would be illogical for most people to worry about. Many people run their machines with "+ +" in root's .rhosts file because it eases the task of doing administration. They work in a company where the box is behind a firewall on some public network and they need there computers to get real work done.... they don't want things like "security" to get in the way because nobody is trying to hack those machines. Chad
"Chad N. Tindel" <chad@tindel.net> writes: >> If someone roots your box, it's not our fault. Simple as that. > I didn't say "require them to run as a non-root user". I said "Give them > the choice to decide what is correct for their environment". In the case > of apache, there are many internal webservers that are not exposed to the > threat of the public internet; for such servers, it may be appropriate to run > apache as root because it simplifies the administration and automation of > tasks. But to do so, one has to know how to re-compile apache, which will > exclude a lot of your basic garden variety administrators. Its very > un-friendly programming. If they don't know how to recompile apache, what are the odds that they are truly competent to decide that they can safely run it as root? Semi-competent people administering servers are the Achilles heel of the internet already. We are doing them a favor, not creating a problem, by preventing them from adopting insecure practices. regards, tom lane
On Sunday 03 August 2003 19:03, you wrote: > Many people run their machines with "+ +" in root's .rhosts file because it > eases the task of doing administration. They work in a company where the > box is behind a firewall on some public network and they need there > computers to get real work done.... they don't want things like "security" > to get in the way because nobody is trying to hack those machines. Can you send me the names and current employers of these "many people" so I can be sure I never do business with their companies and / or consider them for employment? Only slightly ;-) Ian Barwick barwick@gmx.net
On Fri, 1 Aug 2003, Chad N. Tindel wrote: > > > Excellent! That is exactly what I'm talking about. > > > > > > BTW, the "my2pg.pl" link is a dead link. > > > > This is probably something that could be codified a bit here in the news > > group. > > > > I'd say the pitfalls I'm aware of are: > > > > autoincrement fields -> sequences > > Yes. The documentation very clearly states using sequences instead of > auto-increment, but it doesn't make it clear that inserting the id's into > data by hand doesn't cause the sequence to be auto-matically incremented. It'd > be nice of postgres had a way to trigger an update of the sequence value after > every insert containing an id clumn. Actually, from a data cohesion point of view, that's an EXTREMELY dangerous thing to do, and is not likely to ever get implemented. However, mentioning that not only does postgresql do it this way, but here's why it's dangerous to do it the MySQL way as well, would be a good idea. Note that what I'm thinking of in a list of these pitfalls is simply a list of them, with links to the paragraphs that cover the pitfalls in the regular docs. > > not being able to run postgresql as root (a good thing TM) > > <RANT>As a programmer, I personally would never write code that kept people from > running things as root. I mean, what is the point? If an administrator > wants to run postgresql or apache as root, why shouldn't they be allowed > to make that conscious decision for themselves? > As it is, you have to recompile > apache with some BIG_SECURITY_HOLE defined in order to run as root, which means > you can't just use the out of the box apache rpm. Its so stupid to write > *extra* code that keeps people from doing something that isn't even > fundamentally incorrect.</RANT> If you don't know why running a non-system service as root is bad, you haven't been running Unix long enough. It is wrong, period, and dangerous, period. Not because you might do something dumb, but because it allows attackers to own your whole box should they compromise one non-system service. Very bad form. > All that being said, I don't think it causes too big of a problem for postgres > installations. No, only with folks who don't understand why running non-system services as root is quite possibly the biggest mistake you can make when configuring a service.
On Sat, 2 Aug 2003, Chad N. Tindel wrote: > > > <RANT>As a programmer, I personally would never write code that kept > > > people from running things as root. I mean, what is the point? > > > > If someone roots your box, it's not our fault. Simple as that. > > I didn't say "require them to run as a non-root user". I said "Give them > the choice to decide what is correct for their environment". In the case > of apache, there are many internal webservers that are not exposed to the > threat of the public internet; for such servers, it may be appropriate to run > apache as root because it simplifies the administration and automation of > tasks. But to do so, one has to know how to re-compile apache, which will > exclude a lot of your basic garden variety administrators. Its very > un-friendly programming. So tell me, what does the sysadmin gain by running postgresql as root. Seriously, what one advantage does he have? Besides allowing him to be lazy, I can't think of one.
Folks, Are we actually arguing about whether or not Postmaster should be allowed to run as root? I thought this question was settled, like, 5 years ago. If migrating MySQL users have trouble with it, maybe we should focus on supplying a battery of sample startup and maintainence scripts for them instead of monkeying with PostgreSQL's security setup? -- Josh Berkus Aglio Database Solutions San Francisco
Chad, Scott: > > Yes. The documentation very clearly states using sequences instead of > > auto-increment, but it doesn't make it clear that inserting the id's into > > data by hand doesn't cause the sequence to be auto-matically incremented. > > It'd be nice of postgres had a way to trigger an update of the sequence > > value after every insert containing an id clumn. Um, how would this be a bennefit? If you're inserting rows 101-259, how does it benefit you to have the system automatically increment the sequence from 601-759? FWIW, in 7.5 we're likely to implement SQL3 IDENTITY columns ... basically just a SERIAL column where you don't have the option of inserting your own value, you have to take what it gives you. -- Josh Berkus Aglio Database Solutions San Francisco
On Mon, 4 Aug 2003, Josh Berkus wrote: > Folks, > > Are we actually arguing about whether or not Postmaster should be allowed to > run as root? > > I thought this question was settled, like, 5 years ago. > > If migrating MySQL users have trouble with it, maybe we should focus on > supplying a battery of sample startup and maintainence scripts for them > instead of monkeying with PostgreSQL's security setup? No, I would consider that to have been a "thread jacking" over the weekend while I was away. I don't read the lists on the weekends (I don't even check my email, I pretty much disappear two days a week from the internet). anyway, the real issue is that there are common issues that we see from migrating MySQL users, and we should probably have a "oh, you're coming from MySQL-land? read this." kind of document. The fact that most mysql users see not running as root as a heavy handed tactic from the postgresql people, and not as a security issue give us a hint on how to write such a document.
On Saturday 02 August 2003 13:04, Chad N. Tindel wrote: > I didn't say "require them to run as a non-root user". I said "Give them > the choice to decide what is correct for their environment". In the case > of apache, there are many internal webservers that are not exposed to the > threat of the public internet; for such servers, it may be appropriate to > run apache as root because it simplifies the administration and automation > of tasks. It also simplifies a bug in apache crashing your box and scribbling all over your disk. Apache does have bugs, you know. (as do MySQL and PostgreSQL, but that's another story). The postmaster will not run as root. That is just the way it is, and has been, for quite some time. It is foolish to run an RDBMS server (or any other server that doesn't need root's permissions) as root when it is not necessary. It is lazy to run things as root to 'simplify' administration; properly administering a box isn't that hard, and user protections and permissions should be used to their intended effect in the course of routine administration. That's just basic Unix sysadmin practice that is well accepted by the vast majority of sysadmins. We just encourage the best practice in a more direct way than other servers, that's all. As to the subject matter of this thread, there are a great number of educational opportunities in such a migration/pitfalls document. The MySQL 'way' and the PostgreSQL 'way' are very different, and at points don't even have a common frame of reference. This issue is one of them; a thorough explanation, written in a direct non-condescending style, of why postmaster won't run as root would be a nice addition. -- Lamar Owen Director of Information Technology Pisgah Astronomical Research Institute
On Mon, 4 Aug 2003, Chad N. Tindel wrote: > > > > Yes. The documentation very clearly states using sequences instead of > > > > auto-increment, but it doesn't make it clear that inserting the id's into > > > > data by hand doesn't cause the sequence to be auto-matically incremented. > > > > It'd be nice of postgres had a way to trigger an update of the sequence > > > > value after every insert containing an id clumn. > > > > Um, how would this be a bennefit? If you're inserting rows 101-259, how does > In mysql, when you insert into an auto_increment field and you specify an id, > all future requests to insert a row without specifying the ID will still work > properly. In postgres, if you specify the id, your next insert without and > id will fail because the sequence won't have been updated. Correct. And MySQL is doing it "wrong" but everyone is used to it. In Postgresql, there are exact functions for setting the increment. Imagine this (T1 and T2 represent two different transactions: T1: begin; T2: begin; T1: select nextval('seqname'); <- returns 55 T2: select nextval('seqname'); <- returns 56 T2: insert into parent_table (id,info) values (56,'information'); T1: insert into parent_table (id,info) values (55,'somemoreinformation'); T1: insert into child_table (p_id,info) values (55,'childinfo'); T2: insert into child_table (p_id,info) values (56,'childinfo'); T1: commit; T2: commit; If Postgresql autoset the field to the value last inserted, then the sequence would be reset back to 55 and be ready to reissue 56 on the next call. Bad news. Or, imagine I delete a parent row then reinsert it, with a lower value, then the sequence is reset. Resetting the sequence automatically on insert is NOT the best way to handle sequences. Setting them by hand during imports is the preferred method because you'll assume that you're the only one on the database making changes. Once you go live, i.e. you've got 200 simultaneous users doing updates, the last thing you want is some user process accidentally setting your sequence number to some low number that's already been used just because they inserted it by hand. Postgresql does things differently than MySQL, and most of the time it does, it's been better thought out in postgresql, in terms of impact on highly paralleled database accesses. MySQL tends to be developed thinking of convenience instead of handling the "whoopsies" situations that can be created by programming for convenience. A lot like Unix versus Windows. Unix isn't just hard for the fun of it, some things in unix are hard so you'll have to understand the underlying reasoning before jumping in with both feet. The Windows methodology tends to be faster to learn and use, but often puts your data at risk. > > FWIW, in 7.5 we're likely to implement SQL3 IDENTITY columns ... basically > > just a SERIAL column where you don't have the option of inserting your own > > value, you have to take what it gives you. > > Interesting... how do you import data from a dump with such columsn? Easy, after you import the last row, you select setval('seqname',lastvalue); on the sequence. Like I said above, it's mostly just a different way of doing things in Postgresql, and often those different ways are less obvious, and quite often, being less obvious is actually safer even if it is a littler harder to learn up front.
On Mon, 4 Aug 2003, Chad N. Tindel wrote: > > > > FWIW, in 7.5 we're likely to implement SQL3 IDENTITY columns ... basically > > > > just a SERIAL column where you don't have the option of inserting your own > > > > value, you have to take what it gives you. > > > > > > Interesting... how do you import data from a dump with such columsn? > > > > Easy, after you import the last row, you > > > > select setval('seqname',lastvalue); > > > > on the sequence. Like I said above, it's mostly just a different way of > > doing things in Postgresql, and often those different ways are less > > obvious, and quite often, being less obvious is actually safer even if > > it is a littler harder to learn up front. > > But you just said that I can't actually include the id column in an insert > query. So how would I import data from a dump and ensure that the id columns > are what I expect them to be? I did not say that, I was talking about serial types. What the guy ahead of ME said was that they were looking at building the SQL3 IDENTITY columns, which are a serial you can't set the value of in an insert. With those, the import would happen "behind it's back" so to speak in the \copy command. I.e. you wouldn't use inserts to load your data, you'd use a bulk copy, which bypassess all the serial / IDENTITY stuff. Basically, with the IDENTITY type, if you try to insert a value, it just ignores it and inserts the next sequence. So, users would have no way of setting the id being inserted. There would still be, I'm sure, a method for setting the sequence number, it just might be limited to superusers / IDENTITY owners. So, I think we were getting Postgresql's CURRENT serial implementation confused with a possible future implementation of IDENTITY type.
> Are we actually arguing about whether or not Postmaster should be allowed to > run as root? > > I thought this question was settled, like, 5 years ago. I don't think my responses have been being posted to the list... they're probably all blocked pending approval of some administrator. No, I'm not arguing to allow postmaster to run as root. Chad
> > I didn't say "require them to run as a non-root user". I said "Give them > > the choice to decide what is correct for their environment". In the case > > of apache, there are many internal webservers that are not exposed to the > > threat of the public internet; for such servers, it may be appropriate to > > run apache as root because it simplifies the administration and automation > > of tasks. > > It also simplifies a bug in apache crashing your box and scribbling all over > your disk. Apache does have bugs, you know. (as do MySQL and PostgreSQL, but > that's another story). True. There are good reasons to not run apache as root. There are also perfectly valid reasons to run it as root, for such users who want to make an informed decision. > The postmaster will not run as root. That is just the way it is, and has > been, for quite some time. It is foolish to run an RDBMS server (or any > other server that doesn't need root's permissions) as root when it is not > necessary. It is lazy to run things as root to 'simplify' administration; > properly administering a box isn't that hard, and user protections and > permissions should be used to their intended effect in the course of routine > administration. That's just basic Unix sysadmin practice that is well > accepted by the vast majority of sysadmins. We just encourage the best > practice in a more direct way than other servers, that's all. My posts to the list are not going through. I am NOT advocating running postmaster as root. I never was. > As to the subject matter of this thread, there are a great number of > educational opportunities in such a migration/pitfalls document. The MySQL > 'way' and the PostgreSQL 'way' are very different, and at points don't even > have a common frame of reference. This issue is one of them; a thorough > explanation, written in a direct non-condescending style, of why postmaster > won't run as root would be a nice addition. I don't think there is a non-condescending way to tell an administrator that you think you know what is better for their operating environment than they do. Everybody understands the pitfalls of running root daemons; I don't think there is any more explanation required. I definitely think there is benefit to starting an official "how to migrate" document. There is already a lot of information up on the techdocs website that someone pointed me to. I've found answers to all the questions that I needed by looking around, but it might be nice to create a more coherent explanation of the "mysql way" vs. the "postgresql way". Chad
> > > Yes. The documentation very clearly states using sequences instead of > > > auto-increment, but it doesn't make it clear that inserting the id's into > > > data by hand doesn't cause the sequence to be auto-matically incremented. > > > It'd be nice of postgres had a way to trigger an update of the sequence > > > value after every insert containing an id clumn. > > Um, how would this be a bennefit? If you're inserting rows 101-259, how does In mysql, when you insert into an auto_increment field and you specify an id, all future requests to insert a row without specifying the ID will still work properly. In postgres, if you specify the id, your next insert without and id will fail because the sequence won't have been updated. > FWIW, in 7.5 we're likely to implement SQL3 IDENTITY columns ... basically > just a SERIAL column where you don't have the option of inserting your own > value, you have to take what it gives you. Interesting... how do you import data from a dump with such columsn? Chad
> > > FWIW, in 7.5 we're likely to implement SQL3 IDENTITY columns ... basically > > > just a SERIAL column where you don't have the option of inserting your own > > > value, you have to take what it gives you. > > > > Interesting... how do you import data from a dump with such columsn? > > Easy, after you import the last row, you > > select setval('seqname',lastvalue); > > on the sequence. Like I said above, it's mostly just a different way of > doing things in Postgresql, and often those different ways are less > obvious, and quite often, being less obvious is actually safer even if > it is a littler harder to learn up front. But you just said that I can't actually include the id column in an insert query. So how would I import data from a dump and ensure that the id columns are what I expect them to be? Chad
> speak in the \copy command. I.e. you wouldn't use inserts to load your > data, you'd use a bulk copy, which bypassess all the serial / IDENTITY > stuff. Basically, with the IDENTITY type, if you try to insert a value, COPY enforces everything that insert does. It's simply a little quicker than insert due to a different string parsing method and avoiding places that are for advanced features (subselects, functions, etc.). A default is still applied if the column has not been provided. Likewise, triggers (constraint triggers anyway) still run. Bumping the start value for an IDENTITY is simple: CREATE TABLE tab ( col integer GENERATED ALWAYS AS IDENTITY(START WITH 42) ); I suppose we'll need a GUC so that GENERATED ALWAYS isn't actually always -- just usually.
Attachment
On Monday 04 August 2003 13:54, Chad N. Tindel wrote: > True. There are good reasons to not run apache as root. There are also > perfectly valid reasons to run it as root, for such users who want to make > an informed decision. Name one. > My posts to the list are not going through. I am NOT advocating running > postmaster as root. I never was. No, you're advocating giving that option. We have chosen to not be the instrument of an admin shooting themselves in the foot. > I don't think there is a non-condescending way to tell an administrator > that you think you know what is better for their operating environment than > they do. Yes, there are. MySQL does it all the time. "You don't need {transactions|foreign keys|subqueries|feature of the day they don't have}." People don't seem to mind that. We have other reasons, IIRC. -- Lamar Owen Director of Information Technology Pisgah Astronomical Research Institute