Thread: InitPostgres and flatfiles question
Hi, I've just run into a race condition with creating a database and connecting to it immediately afterwards. I'm getting a "database %s not found" error just after the first flatfiles check in InitPostgres(). What that FindMyDatabase() there does, besides checking if the database exists, is getting the database and tablespace OIDs, right? Why does the postmaster not pass those instead of the database name? What would be the dangers of doing so? (Besides code different from standalone backend)? Regards Markus
Markus Schiltknecht <markus@bluegap.ch> writes: > I've just run into a race condition with creating a database and > connecting to it immediately afterwards. I'm getting a "database %s not > found" error just after the first flatfiles check in InitPostgres(). In what PG version? > What that FindMyDatabase() there does, besides checking if the database > exists, is getting the database and tablespace OIDs, right? Why does the > postmaster not pass those instead of the database name? Because the postmaster doesn't have direct database access. If it did, any corruption of shared memory would risk crashing the postmaster along with the backends. regards, tom lane
Hello Tom, Tom Lane wrote: > In what PG version? Postgres-R 8.3devel ;-) > Because the postmaster doesn't have direct database access. If it did, > any corruption of shared memory would risk crashing the postmaster > along with the backends. Understood, thanks. Most probably I better go another way, than fiddling with postgres startup internals. Is there a way to be sure the flatfile has been written to disk after a CREATE DATABASE? I would like to ensure I can connect to a newly created database. Regards Markus
Markus Schiltknecht <markus@bluegap.ch> writes: > Most probably I better go another way, than fiddling with postgres > startup internals. Is there a way to be sure the flatfile has been > written to disk after a CREATE DATABASE? I would like to ensure I can > connect to a newly created database. It should happen automatically at commit of the CREATE DATABASE ... and you'd not be able to see the pg_database row before that anyway. So I'm not clear on what you're worried about. regards, tom lane
Tom Lane wrote: > It should happen automatically at commit of the CREATE DATABASE ... and > you'd not be able to see the pg_database row before that anyway. So I'm > not clear on what you're worried about. Okay, thanks. I'll have to investigate on why exactly I still get the error, then. That's unclear to me, too. Regards Markus
Hi, Tom Lane wrote: > It should happen automatically at commit of the CREATE DATABASE ... and > you'd not be able to see the pg_database row before that anyway. So I'm > not clear on what you're worried about. I've just found the stumbling block: the -c option of psql wraps all in a transaction, as man psql says: If the command string contains multiple SQL commands, they are processed in a single transaction, unless thereare explicit BEGIN/COMMIT commands included in the string to divide it into multiple transactions. This isdifferent from the behavior when the same string is fed to psql’s standard input. Thank you for clarification, I wouldn't have expected that (especially because CREATE DATABASE itself says, it cannot be run inside a transaction block... A transaction block (with BEGIN and COMMIT) seems to be more than just a transaction, right?) Regards Markus
Markus Schiltknecht <markus@bluegap.ch> writes: > I've just found the stumbling block: the -c option of psql wraps all in > a transaction, as man psql says: > ... > Thank you for clarification, I wouldn't have expected that (especially > because CREATE DATABASE itself says, it cannot be run inside a > transaction block... A transaction block (with BEGIN and COMMIT) seems > to be more than just a transaction, right?) Hm, that's an interesting point. psql's -c just shoves its whole argument string at the backend in one PQexec(), instead of dividing at semicolons as psql does with normal input. And so it winds up as a single transaction because postgres.c doesn't force a transaction commit until the end of the querystring. But that's not a "transaction block" in the normal sense and so it doesn't trigger the PreventTransactionChain defense in CREATE DATABASE and elsewhere. I wonder whether we ought to change that? The point of PreventTransactionChain is that we don't want the user rolling back the statement post-completion, but it seems thatpsql -c 'CREATE DATABASE foo; ABORT; BEGIN; ...' would bypass the check. regards, tom lane
Am Donnerstag, 4. Januar 2007 16:36 schrieb Tom Lane: > Markus Schiltknecht <markus@bluegap.ch> writes: > Hm, that's an interesting point. psql's -c just shoves its whole > argument string at the backend in one PQexec(), instead of dividing > at semicolons as psql does with normal input. And so it winds up as > a single transaction because postgres.c doesn't force a transaction > commit until the end of the querystring. But that's not a "transaction > block" in the normal sense and so it doesn't trigger the > PreventTransactionChain defense in CREATE DATABASE and elsewhere. > > I wonder whether we ought to change that? The point of > PreventTransactionChain is that we don't want the user rolling back > the statement post-completion, but it seems that > psql -c 'CREATE DATABASE foo; ABORT; BEGIN; ...' > would bypass the check. Maybe not directly related to that problem, but I had a problem with "-c" last month, when I noticed that this will not work: psql -c "set client_encoding=iso-8859-1; select name from customer" (UTF8 database, output is hmmm... broken german umlauts). Best regardsMario Weilguni
Hi, Tom Lane wrote: > Hm, that's an interesting point. psql's -c just shoves its whole > argument string at the backend in one PQexec(), instead of dividing > at semicolons as psql does with normal input. And so it winds up as > a single transaction because postgres.c doesn't force a transaction > commit until the end of the querystring. But that's not a "transaction > block" in the normal sense and so it doesn't trigger the > PreventTransactionChain defense in CREATE DATABASE and elsewhere. Is there a good reason to not let psql -c behave exactly like psql from STDIN? I found this exception to be quite confusing. Of course that could break compatibility with certain scripts, but can this be fixed without doing so? Regards Markus
Markus Schiltknecht <markus@bluegap.ch> writes: > Tom Lane wrote: >> Hm, that's an interesting point. psql's -c just shoves its whole >> argument string at the backend in one PQexec(), instead of dividing >> at semicolons as psql does with normal input. > Is there a good reason to not let psql -c behave exactly like psql from > STDIN? Backwards compatibility, mostly --- there seems to be a considerable risk of subtly breaking people's scripts if we change the transactional boundaries for psql -c commands. regards, tom lane
Tom Lane wrote: > Markus Schiltknecht <markus@bluegap.ch> writes: > > Tom Lane wrote: > >> Hm, that's an interesting point. psql's -c just shoves its whole > >> argument string at the backend in one PQexec(), instead of dividing > >> at semicolons as psql does with normal input. > > > Is there a good reason to not let psql -c behave exactly like psql from > > STDIN? > > Backwards compatibility, mostly --- there seems to be a considerable > risk of subtly breaking people's scripts if we change the transactional > boundaries for psql -c commands. True, but if we keep hitting people who don't expect this behavior, I wonder if we should just fix it and mention it in the release notes. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> Markus Schiltknecht <markus@bluegap.ch> writes: >>> Is there a good reason to not let psql -c behave exactly like psql from >>> STDIN? >> >> Backwards compatibility, mostly --- there seems to be a considerable >> risk of subtly breaking people's scripts if we change the transactional >> boundaries for psql -c commands. > True, but if we keep hitting people who don't expect this behavior, I > wonder if we should just fix it and mention it in the release notes. One other point is that if we change -c's behavior, there won't be *any* way to submit multiple queries in a single PQexec using plain psql --- it will require hacking up a special test program using libpq directly. Unless we have plans to obsolete multi-queries-per-PQexec altogether, this doesn't seem like a good idea. OTOH, you could argue that forbidding multiple queries in one PQexec isn't a bad idea; it would provide an additional defense against SQL-injection attacks. We did that already in the "extended" query protocol and I've not heard many complaints. I'd be willing to buy into doing both together, perhaps. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> Markus Schiltknecht <markus@bluegap.ch> writes: > >>> Is there a good reason to not let psql -c behave exactly like psql from > >>> STDIN? > >> > >> Backwards compatibility, mostly --- there seems to be a considerable > >> risk of subtly breaking people's scripts if we change the transactional > >> boundaries for psql -c commands. > > > True, but if we keep hitting people who don't expect this behavior, I > > wonder if we should just fix it and mention it in the release notes. > > One other point is that if we change -c's behavior, there won't be > *any* way to submit multiple queries in a single PQexec using plain > psql --- it will require hacking up a special test program using > libpq directly. Unless we have plans to obsolete > multi-queries-per-PQexec altogether, this doesn't seem like a good idea. What value is allowing multiple queies via PQexec() via psql, aside from avoiding BEGIN/END around your -c query string? > OTOH, you could argue that forbidding multiple queries in one PQexec > isn't a bad idea; it would provide an additional defense against > SQL-injection attacks. We did that already in the "extended" query > protocol and I've not heard many complaints. > > I'd be willing to buy into doing both together, perhaps. True. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > What value is allowing multiple queies via PQexec() The only argument I can think of is that it allows applications to be sloppy about parsing a SQL script into individual commands before they send it. (I think initdb may be guilty of exactly that BTW...) At the same time you could argue that such sloppiness is inherently a Bad Idea. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > What value is allowing multiple queries via PQexec() > > The only argument I can think of is that it allows applications to be > sloppy about parsing a SQL script into individual commands before they > send it. (I think initdb may be guilty of exactly that BTW...) At the > same time you could argue that such sloppiness is inherently a Bad Idea. I thought the idea was that psql was going to split multiple -c commands into separate PQexec() calls, so sloppy was OK, and if they want a single transaction, add BEGIN/END to the string. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> The only argument I can think of is that it allows applications to be >> sloppy about parsing a SQL script into individual commands before they >> send it. (I think initdb may be guilty of exactly that BTW...) At the >> same time you could argue that such sloppiness is inherently a Bad Idea. > I thought the idea was that psql was going to split multiple -c commands > into separate PQexec() calls, so sloppy was OK, and if they want a > single transaction, add BEGIN/END to the string. No, psql isn't the point: we can certainly make its behavior match the backend's. What I'm wondering about is the effect on random PG-using applications: should we forbid them from sending multiple SQL commands per PQexec (or equivalent in other client library APIs)? Backwards compatibility says no, but you can make some decent arguments for forbidding it anyway. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> The only argument I can think of is that it allows applications to be > >> sloppy about parsing a SQL script into individual commands before they > >> send it. (I think initdb may be guilty of exactly that BTW...) At the > >> same time you could argue that such sloppiness is inherently a Bad Idea. > > > I thought the idea was that psql was going to split multiple -c commands > > into separate PQexec() calls, so sloppy was OK, and if they want a > > single transaction, add BEGIN/END to the string. > > No, psql isn't the point: we can certainly make its behavior match the > backend's. What I'm wondering about is the effect on random PG-using > applications: should we forbid them from sending multiple SQL commands > per PQexec (or equivalent in other client library APIs)? > > Backwards compatibility says no, but you can make some decent arguments > for forbidding it anyway. Yea, I was trying to separate the psql case from the PQexec() case. For psql, I think it is clear that -c _should_ act like a normal stdin query. That would eliminate confusion, and I don't see a large loss of functionality. The PQexec() case, the problem is we don't know who is using multi-statement PQexec() calls, and users can't always add BEGIN/END to fix them if they are embedded in applications. What we could do it do both and see what pushback we get during beta. We could always revert it before the final release. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: >> What value is allowing multiple queies via PQexec() > > The only argument I can think of is that it allows applications to be > sloppy about parsing a SQL script into individual commands before they > send it. (I think initdb may be guilty of exactly that BTW...) At the > same time you could argue that such sloppiness is inherently a Bad Idea. Doesn't it also avoid some network(?) overhead when you have a large number of small inserts or updates? I seem to recall a previous company where we had a major performance by concatenating a bunch of updates with ";"s in between and sending them to postgresql as a single command.
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: >> What value is allowing multiple queies via PQexec() > > The only argument I can think of is that it allows applications to be > sloppy about parsing a SQL script into individual commands before they > send it. (I think initdb may be guilty of exactly that BTW...) At the > same time you could argue that such sloppiness is inherently a Bad Idea. Doesn't it also avoid some network(?) overhead when you have a large number of small inserts or updates? I seem to recall a previous company where we had a major performance by concatenating a bunch of updates with ";"s in between and sending them to postgresql as a single command.
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> What value is allowing multiple queies via PQexec() >> >> The only argument I can think of is that it allows applications to be >> sloppy about parsing a SQL script into individual commands before they >> send it. (I think initdb may be guilty of exactly that BTW...) At the >> same time you could argue that such sloppiness is inherently a Bad Idea. > Doesn't it also avoid some network(?) overhead when you have > a large number of small inserts or updates? > I seem to recall a previous company where we had a major performance > by concatenating a bunch of updates with ";"s in between and sending > them to postgresql as a single command. These days you'd probably be better off using a multi-row VALUES() list if relevant. Also, if you really want to send multiple statements like that, there's a cleaner way to do it: use the extended query protocol and don't Sync or wait for a reply until you've sent them all. regards, tom lane
On Jan 4, 2007, at 11:30 PM, Bruce Momjian wrote: >> No, psql isn't the point: we can certainly make its behavior match >> the >> backend's. What I'm wondering about is the effect on random PG-using >> applications: should we forbid them from sending multiple SQL >> commands >> per PQexec (or equivalent in other client library APIs)? >> >> Backwards compatibility says no, but you can make some decent >> arguments >> for forbidding it anyway. > > Yea, I was trying to separate the psql case from the PQexec() > case. For > psql, I think it is clear that -c _should_ act like a normal stdin > query. That would eliminate confusion, and I don't see a large > loss of > functionality. Heh, something I hadn't expected to work: decibel=# select 1 decibel-# ; select 2 ?column? ---------- 1 (1 row) decibel-# ; ?column? ---------- 2 (1 row) > The PQexec() case, the problem is we don't know who is using > multi-statement PQexec() calls, and users can't always add BEGIN/ > END to > fix them if they are embedded in applications. > > What we could do it do both and see what pushback we get during beta. > We could always revert it before the final release. There is one (weak) argument for allowing multiple commands in a single call to the backend; it's going to perform better in an OLTP environment because of fewer round-trips between the client and server.. Actually, there's some cases there that might not fit well into wrapping them into a function, ie: multiple selects issued in one go. So maybe the argument isn't that weak afterall... -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
>> Tom Lane wrote: >>> Bruce Momjian <bruce@momjian.us> writes: >>>> What value is allowing multiple queies via PQexec() >>> >>> The only argument I can think of is that it allows applications to be >>> sloppy about parsing a SQL script into individual commands before they >>> send it. (I think initdb may be guilty of exactly that BTW...) At the >>> same time you could argue that such sloppiness is inherently a Bad >>> Idea. initdb doesn't use libpq at all ... are you saying it should only pass individual statements to the backend? Adding smarts to enable that would add complexity without any benefit I can see. cheers andrew
Ron Mayer wrote: > Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > >> What value is allowing multiple queies via PQexec() > > > > The only argument I can think of is that it allows applications to be > > sloppy about parsing a SQL script into individual commands before they > > send it. (I think initdb may be guilty of exactly that BTW...) At the > > same time you could argue that such sloppiness is inherently a Bad Idea. > > Doesn't it also avoid some network(?) overhead when you have > a large number of small inserts or updates? > > I seem to recall a previous company where we had a major performance > by concatenating a bunch of updates with ";"s in between and sending > them to postgresql as a single command. Added to TODO list: > o Consider parsing the -c string into individual queries so each > is run in its own transaction > > o Consider disallowing multiple queries in PQexec() as an > additional barrier to SQL injection attacks -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Do we need a TODO for this? --------------------------------------------------------------------------- Tom Lane wrote: > Markus Schiltknecht <markus@bluegap.ch> writes: > > I've just found the stumbling block: the -c option of psql wraps all in > > a transaction, as man psql says: > > ... > > Thank you for clarification, I wouldn't have expected that (especially > > because CREATE DATABASE itself says, it cannot be run inside a > > transaction block... A transaction block (with BEGIN and COMMIT) seems > > to be more than just a transaction, right?) > > Hm, that's an interesting point. psql's -c just shoves its whole > argument string at the backend in one PQexec(), instead of dividing > at semicolons as psql does with normal input. And so it winds up as > a single transaction because postgres.c doesn't force a transaction > commit until the end of the querystring. But that's not a "transaction > block" in the normal sense and so it doesn't trigger the > PreventTransactionChain defense in CREATE DATABASE and elsewhere. > > I wonder whether we ought to change that? The point of > PreventTransactionChain is that we don't want the user rolling back > the statement post-completion, but it seems that > psql -c 'CREATE DATABASE foo; ABORT; BEGIN; ...' > would bypass the check. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Fri, Jan 05, 2007 at 08:45:51PM -0500, Tom Lane wrote: > Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > > Tom Lane wrote: > >> Bruce Momjian <bruce@momjian.us> writes: > >>> What value is allowing multiple queies via PQexec() > >> > >> The only argument I can think of is that it allows applications to be > >> sloppy about parsing a SQL script into individual commands before they > >> send it. (I think initdb may be guilty of exactly that BTW...) At the > >> same time you could argue that such sloppiness is inherently a Bad Idea. > > > Doesn't it also avoid some network(?) overhead when you have > > a large number of small inserts or updates? > > > I seem to recall a previous company where we had a major performance > > by concatenating a bunch of updates with ";"s in between and sending > > them to postgresql as a single command. > > These days you'd probably be better off using a multi-row VALUES() list > if relevant. Also, if you really want to send multiple statements like > that, there's a cleaner way to do it: use the extended query protocol > and don't Sync or wait for a reply until you've sent them all. > > regards, tom lane > In shell scripts that do things in the database I often put >1 statement in the line. Since it is the shell, I want quick results. Usually it is an INSERT/UPDATE followed by a SELECT. It would be very frustrating not to be able to send multiple commands with one -c in psql. --elein
elein wrote: > On Fri, Jan 05, 2007 at 08:45:51PM -0500, Tom Lane wrote: > > Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > > > Tom Lane wrote: > > >> Bruce Momjian <bruce@momjian.us> writes: > > >>> What value is allowing multiple queies via PQexec() > > >> > > >> The only argument I can think of is that it allows applications to be > > >> sloppy about parsing a SQL script into individual commands before they > > >> send it. (I think initdb may be guilty of exactly that BTW...) At the > > >> same time you could argue that such sloppiness is inherently a Bad Idea. > > > > > Doesn't it also avoid some network(?) overhead when you have > > > a large number of small inserts or updates? > > > > > I seem to recall a previous company where we had a major performance > > > by concatenating a bunch of updates with ";"s in between and sending > > > them to postgresql as a single command. > > > > These days you'd probably be better off using a multi-row VALUES() list > > if relevant. Also, if you really want to send multiple statements like > > that, there's a cleaner way to do it: use the extended query protocol > > and don't Sync or wait for a reply until you've sent them all. > > > > regards, tom lane > > > In shell scripts that do things in the database I often put >1 statement > in the line. Since it is the shell, I want quick results. Usually it > is an INSERT/UPDATE followed by a SELECT. > > It would be very frustrating not to be able to send multiple commands > with one -c in psql. We aren't going to disable that --- we are considering disabling the backend from treating it as a single transaction. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Do we need a TODO for this? Well, if we *don't* change the backend to disallow multi statements per PQexec, then we'd probably better do something about this. If we do make that change then it's not a problem anymore. >> Hm, that's an interesting point. psql's -c just shoves its whole >> argument string at the backend in one PQexec(), instead of dividing >> at semicolons as psql does with normal input. And so it winds up as >> a single transaction because postgres.c doesn't force a transaction >> commit until the end of the querystring. But that's not a "transaction >> block" in the normal sense and so it doesn't trigger the >> PreventTransactionChain defense in CREATE DATABASE and elsewhere. regards, tom lane
Bruce Momjian <bruce@momjian.us> writes: > We aren't going to disable that --- we are considering disabling the > backend from treating it as a single transaction. Or even more specifically, making sure that that only happens if you explicitly put begin/commit into the -c string. One thing I think we probably ought *not* change is that if any of the statements fail the rest of the -c string is abandoned --- that is, -c would work like a script with ON_ERROR_STOP enabled. regards, tom lane
Tom Lane wrote: > Markus Schiltknecht <markus@bluegap.ch> writes: > > I've just found the stumbling block: the -c option of psql wraps all in > > a transaction, as man psql says: > > ... > > Thank you for clarification, I wouldn't have expected that (especially > > because CREATE DATABASE itself says, it cannot be run inside a > > transaction block... A transaction block (with BEGIN and COMMIT) seems > > to be more than just a transaction, right?) > > Hm, that's an interesting point. psql's -c just shoves its whole > argument string at the backend in one PQexec(), instead of dividing > at semicolons as psql does with normal input. And so it winds up as > a single transaction because postgres.c doesn't force a transaction > commit until the end of the querystring. But that's not a "transaction > block" in the normal sense and so it doesn't trigger the > PreventTransactionChain defense in CREATE DATABASE and elsewhere. > > I wonder whether we ought to change that? The point of > PreventTransactionChain is that we don't want the user rolling back > the statement post-completion, but it seems that > psql -c 'CREATE DATABASE foo; ABORT; BEGIN; ...' > would bypass the check. Added to TODO: o Fix transaction restriction checks for CREATE DATABASE and other commands http://archives.postgresql.org/pgsql-hackers/2007-01/msg00133.php -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +