Thread: PGSQL or other DB?

PGSQL or other DB?

From
durumdara
Date:
Dear PG Users!

I don't want to make a heated debate with this subject, but I wanna ask about your experiences because we need to make a new special site and we wanna know, which DB is the best for this...

This website will supports up to 200 corporations.
The functions are similar, but because of many differents of these corps (datas to keep, way to working them), we wanna make many databases (one DB for one corp).
The site uses one main database that handle all public, and shared services, and store the links to subdatabases.
The website will works with apache/mod_python, and the each of the corp's services are handled with another python module (from another (separated) database). The main structure of the source code already wroted.

But: we need to determine, which database we will use in the future for this project.

The main viewpoints:
 - quick (re)connect - because mod_python basically not store the database connections persistently
 - fast queries
 - easy IDE to use (like pgadmin)
 - the db server can handle many databases (webdb_nnn where nnn is an integer)
 - I can add/modify a table, or a field to a table without "full lock" on the table (like DBISAM restructure). Like in FireBird, where the "add field" change only the table description. I don't know that PG supports this way of the DB modifying.
 - Quick and easy backup/restore system

Another important thing that I don't understand (what as I saw) that the PostGreSQL is store the databases in one, unseparatable file set, in a directory named data.
In another databases, like DBISAM, FireBird, MySQL, the databases are separated to another directories/files.

This "one datadir" is seems to be not too good for us. We used DBISAM in our clients, and many times when we got some filesystem error, we can simply recover the tables - from the files.
When we want to backup or restore one database, we can do it in two way: a.) archive all files b.) make sql dump from database.

If a file get corrupted in a database, then we can restore the datas from files, and this filesystem error causes problems only for this database, not for all.

I very fear from to keep all databases in one place, because if they are corrupted, possible more of them injured (if they are not separated).
I cannot make filesystem based (hard) copy from one db (only SQL dump enabled).

Ok, I saw that pgsql supports "tablespaces", but as I saw, this function can hold only table datas in the another directory, and not the full database can separated with them.

Because I don't used one PGSQL with many databases (up to 200), I don't know, what happening, and which cases possible. But I think you have many experience with it. Please share it with me!

Please help me, because we need to determine which DB to use.

I started the usage of the PG in prev. month, and I liked it except the way of the data storage (one data dir).

I tried the MySQL before I tried PG. InnoDB is seems to be "forcing transaction system on MyISAM". And boolean data type is missing (solved with enum?).
I don't like it all, but it is seems to be fast with little tables, and it is separate the database files to another directories which thing I like. Possible it have many limitations what I don't saw in first time.

Please help me, which DB is good for us, and how to configure, and use PGSQL with these database-set which we need to use.

Thanks for your help:
    dd






Re: PGSQL or other DB?

From
Sam Mason
Date:
On Fri, Jan 30, 2009 at 08:37:02PM +0100, durumdara wrote:
> This website will supports up to 200 corporations.
> The functions are similar, but because of many differents of these corps
> (datas to keep, way to working them), we wanna make many databases (one
> DB for one corp).

Databases or schemas? they are different in PG and they have very
different behaviors which suit different use cases.

> The site uses one main database that handle all public, and shared
> services, and store the links to subdatabases.

Sounds like you want to use schemas.

> The main viewpoints:
>  - quick (re)connect - because mod_python basically not store the
> database connections persistently

?? mod_python can be used to do persistent connections as well as
connection pooling.

>  - fast queries

They're as fast as you write them! PG generally does better with higher
concurrent loads than MySQL (never used DBISAM or FireBird).

>  - the db server can handle many databases (webdb_nnn where nnn is an
> integer)

There's a summary of limits in the about page:

  http://www.postgresql.org/about/

>  - I can add/modify a table, or a field to a table without "full lock"
> on the table (like DBISAM restructure). Like in FireBird, where the "add
> field" change only the table description. I don't know that PG supports
> this way of the DB modifying.

Yes; have a read about MVCC:

  http://www.postgresql.org/docs/current/static/mvcc.html

>  - Quick and easy backup/restore system

pg_dump works well, how quick it is depends on the amount of data you
have obviously

> Another important thing that I don't understand (what as I saw) that the
> PostGreSQL is store the databases in one, unseparatable file set, in a
> directory named data.
> In another databases, like DBISAM, FireBird, MySQL, the databases are
> separated to another directories/files.

PG's scheme works well; one file per table doesn't work very well in
practice. there will always be links between tables (foreign keys being
the obvious one) and hence selectively restoring individual tables on a
file by file basis seems like an *amazing* way of killing your database.

> This "one datadir" is seems to be not too good for us. We used DBISAM in
> our clients, and many times when we got some filesystem error, we can
> simply recover the tables - from the files.

Use a proper backup system that takes consistent snapshots of your data,
anything else will come back and bite you when you really don't want it
to

> I very fear from to keep all databases in one place, because if they are
> corrupted, possible more of them injured (if they are not separated).
> I cannot make filesystem based (hard) copy from one db (only SQL dump
> enabled).

There are much better tools available to solve things than treating
tables as files!  As far as backups go, there are several options
of varying complexity and resilience. pg_dump is simple, you can do
filesystem level backups with a bit of fiddling, you can keep another
database fed with data from the main one with varying levels of lag.

  http://www.postgresql.org/docs/current/static/backup.html

--
  Sam  http://samason.me.uk/

Re: PGSQL or other DB?

From
Alan Hodgson
Date:
> > This "one datadir" is seems to be not too good for us. We used DBISAM
> > in our clients, and many times when we got some filesystem error, we
> > can simply recover the tables - from the files.
>

pg_dump files with the custom format can be used to selectively restore
tables. It's really easy and a lot more reliable than filesystem level
backups - which require snapshotting or shutting down the database or
in-database support to get clean backups of.

> > I very fear from to keep all databases in one place, because if they
> > are corrupted, possible more of them injured (if they are not
> > separated). I cannot make filesystem based (hard) copy from one db
> > (only SQL dump enabled).

If you consistently have hardware problems that cause filesystem corruption,
you should probably consider fixing that before worrying too much about
which software is on top of it being corrupted.

And, honestly, if keeping all the files in once place is a major problem for
you, then don't use PostgreSQL. It's not like it's going to be changed in
this regard.

--
Alan

Re: PGSQL or other DB?

From
ries van Twisk
Date:

On Jan 30, 2009, at 2:37 PM, durumdara wrote:

Dear PG Users!

I don't want to make a heated debate with this subject, but I wanna ask about your experiences because we need to make a new special site and we wanna know, which DB is the best for this...

This website will supports up to 200 corporations.
The functions are similar, but because of many differents of these corps (datas to keep, way to working them), we wanna make many databases (one DB for one corp).
The site uses one main database that handle all public, and shared services, and store the links to subdatabases.
The website will works with apache/mod_python, and the each of the corp's services are handled with another python module (from another (separated) database). The main structure of the source code already wroted.

But: we need to determine, which database we will use in the future for this project.

The main viewpoints:
 - quick (re)connect - because mod_python basically not store the database connections persistently

I don`t know much about python except the animal but could use a SQL proxy to solve this issue 


 - fast queries

Define fast, but remember a DB was never designed to be fast. However it sounds like for the coupld of companies you wouldn't have to much problems there.


 - easy IDE to use (like pgadmin)


 - the db server can handle many databases (webdb_nnn where nnn is an integer)

I don't have experience with PG and hunderds of DB's, but I think it shouldn't be a problem


 - I can add/modify a table, or a field to a table without "full lock" on the table (like DBISAM restructure). Like in FireBird, where the "add field" change only the table description. I don't know that PG supports this way of the DB modifying.

How big are these tables and how frequently do they change? If they change really frequently then you have a problem with your design in general.


 - Quick and easy backup/restore system

pg_dump/pg_restore are your friends. webmin makes this also easy if you want a point and click interface.



Another important thing that I don't understand (what as I saw) that the PostGreSQL is store the databases in one, unseparatable file set, in a directory named data.
In another databases, like DBISAM, FireBird, MySQL, the databases are separated to another directories/files.

This "one datadir" is seems to be not too good for us. We used DBISAM in our clients, and many times when we got some filesystem error, we can simply recover the tables - from the files.
When we want to backup or restore one database, we can do it in two way: a.) archive all files b.) make sql dump from database.

If a file get corrupted in a database, then we can restore the datas from files, and this filesystem error causes problems only for this database, not for all.

I very fear from to keep all databases in one place, because if they are corrupted, possible more of them injured (if they are not separated).
I cannot make filesystem based (hard) copy from one db (only SQL dump enabled).

Why would just one file get corrupt?? You can store your tables(indexes in different places if you want to but if one get's corrupt you need to change/check/replace hardware anyways. PostgreSQL doesn't behave much like MySQL where all of a sudden tables get corrupted and you need to repair them.



Ok, I saw that pgsql supports "tablespaces", but as I saw, this function can hold only table datas in the another directory, and not the full database can separated with them.

Correct...



Because I don't used one PGSQL with many databases (up to 200), I don't know, what happening, and which cases possible. But I think you have many experience with it. Please share it with me!

Don't expect yourself to just set one table and/or DB back. This is plain wrong and you need to make appropriate backups. If you really want to set a DB back as a table then use sqlite, but that defeats your requirements of 'fast'.



Please help me, because we need to determine which DB to use.

I think PG will do very well in your situation, but you have to set your mind off using files



I started the usage of the PG in prev. month, and I liked it except the way of the data storage (one data dir).

I don't know any DB (except may be some Object DB's and sqlite) where you can do that properly, officially and ACID compliant.



I tried the MySQL before I tried PG. InnoDB is seems to be "forcing transaction system on MyISAM". And boolean data type is missing (solved with enum?).
I don't like it all, but it is seems to be fast with little tables, and it is separate the database files to another directories which thing I like. Possible it have many limitations what I don't saw in first time.

Don't want to debate MySQL but PG is a better option IMHO, it recovers VERY well from crashes where you are worried about (I know, I live in Ecuador where power goes down every week or so and it always recovered perfectly).



Please help me, which DB is good for us, and how to configure, and use PGSQL with these database-set which we need to use.

PostgreSQL is good for you as long as you set your mind away from restoring a DB by replacing a fileset. pg_dump/pg_restore are your friends. If you do care about restoring a DB up to a point in time you can do WAL shipping.



Thanks for your help:
    dd









regards, Ries van Twisk


-------------------------------------------------------------------------------------------------
Ries van Twisk
tags: Freelance TYPO3 Glassfish JasperReports JasperETL Flex Blaze-DS WebORB PostgreSQL DB-Architect
Phone: +1-810-476-4196
SIP: +1-747-690-5133






Re: PGSQL or other DB?

From
Tony Caduto
Date:
durumdara wrote:
>
> Another important thing that I don't understand (what as I saw) that
> the PostGreSQL is store the databases in one, unseparatable file set,
> in a directory named data.
> In another databases, like DBISAM, FireBird, MySQL, the databases are
> separated to another directories/files.
>
> This "one datadir" is seems to be not too good for us. We used DBISAM
> in our clients, and many times when we got some filesystem error, we
> can simply recover the tables - from the files.
> When we want to backup or restore one database, we can do it in two
> way: a.) archive all files b.) make sql dump from database.
>

Postgresql uses a database cluster.  In the data directory each database
goes into a directory identified by the database OID, and in this
directory the database resides
in 2 gb chunks(as far as I remember).  You can easily backup the cluster
by using the file system, you just have to make sure you stop the
postmaster and then you can backup the entire data dir or individual
database dir.  You have to restore this to the same version of PG though.
it's the same kind of deal with Firebird, you can't backup a running
database with the file system and even Firebird must be stopped in order
to do a file system copy of the database file.

Firebird also stores all it's transaction data in the same file, so you
end up with HUGE files unless you do a backup and restore to shrink them
down.  You don't have this problem with PostgreSQL as it stores that
info in the WAL.

PostgreSQL will give the the best solution for your project hands down,
just give it a chance and don't worry about what you did with Firebird
or MySQL.

I have never had a PostgreSQL database or dump file become corrupt, but
on Firebird it happens fairly often and it's part of the reason why I
eventually dumped it and moved to PostgreSQL.

Hope that helps you out some.

Tony Caduto
AM Software Design
Home of Lightning Admin for PostgreSQL
http://www.amsoftwaredesign.com

Re: PGSQL or other DB?

From
Tony Caduto
Date:
durumdara wrote:
>
>  - quick (re)connect - because mod_python basically not store the
> database connections persistently
>

You should consider using a Python connection pool with something like
Pylons (http://pylonshq.com), I use dbutils:
http://www.webwareforpython.org/DBUtils/Docs/UsersGuide.html

You can scale PostgreSQL and Python web apps really well with that as
long as you are
not using CGI.  I use Pylons combined with mod_wsgi, but you can use it
with mod_python as well.


Tony Caduto
AM Software Design
Home of Lightning Admin for PostgreSQL
http://www.amsoftwaredesign.com


Re: PGSQL or other DB?

From
Greg Smith
Date:
On Fri, 30 Jan 2009, durumdara wrote:

> - quick (re)connect - because mod_python basically not store the database
> connections persistently

If this turns out to be a problem, you may want to use some sort of
connection pooling software in order to reduce this overhead.
http://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connection_Pooling#Connection_Pooling_and_Acceleration

gives a quick intro to this topic with links to some good solutions here.

> This "one datadir" is seems to be not too good for us. We used DBISAM in our
> clients, and many times when we got some filesystem error, we can simply
> recover the tables - from the files.

It's quite easy for ISAM solutions to get corrupted.  Barring a bad
hardware failure, this shouldn't ever happen to a PostgreSQL database.
The write-ahead log implementation nevers leaves you in a state where the
database can't recover itself after a crash or filesystem error.  You
should consider this a positive thing--this whole class of problems that
you think you need a solution to, needing to fix corrupted database files
regularly, shouldn't ever happen in a proper database implementation.

The first two sections of
http://www.postgresql.org/docs/8.3/static/wal.html give a good
introduction to this topic.

Rolling back to a backup should only be necessary after a serious hardware
disaster--the sort of situation where it's unlikely you'll be able to
trust any of your individual tables even if you had the option of
repairing them one at a time.

> I tried the MySQL before I tried PG. InnoDB is seems to be "forcing
> transaction system on MyISAM". And boolean data type is missing (solved with
> enum?).
> I don't like it all, but it is seems to be fast with little tables, and it is
> separate the database files to another directories which thing I like.
> Possible it have many limitations what I don't saw in first time.

http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007
has a lot of commentary in this area.  MySQL+MyISAM will have the same
sort of problems you should be trying to escape.  MySQL+InnoDB would be
much better.  In addition to missing features, the other thing you should
be careful about is making speed measurements with a low client
count--make sure you evaluate any database you're considering with a many
clients going at once as you can simulate.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: PGSQL or other DB?

From
Steve Crawford
Date:
Alan Hodgson wrote:
>
> And, honestly, if keeping all the files in once place is a major problem for
> you, then don't use PostgreSQL. It's not like it's going to be changed in
> this regard.
>
>
Um, I think it already did:
http://www.postgresql.org/docs/8.3/interactive/sql-createtablespace.html

Cheers,
Steve


Re: PGSQL or other DB?

From
Scott Marlowe
Date:
On Fri, Jan 30, 2009 at 1:04 PM, Sam Mason <sam@samason.me.uk> wrote:
> On Fri, Jan 30, 2009 at 08:37:02PM +0100, durumdara wrote:
>
>>  - I can add/modify a table, or a field to a table without "full lock"
>> on the table (like DBISAM restructure). Like in FireBird, where the "add
>> field" change only the table description. I don't know that PG supports
>> this way of the DB modifying.
>
> Yes; have a read about MVCC:
>
>  http://www.postgresql.org/docs/current/static/mvcc.html

Just because one uses MVCC doesn't mean it's free to add a new column
to a table.  Adding a nullable column in pgsql is trivial.  Adding an
index can be done concurrently.  InnoDB supports MVCC, but MySQL's way
of doing things results in innodb tables being rewritten from scratch
for both of those for varying reasons, resulting in a locked table the
whole time.

Create index concurrently is a treasure to behold on large production
systems.  I can't stress how important it is that I can create an
index on demand in the middle of the heaviest user activity on my
system and nothing hangs or locks from the user side waiting on it.
Same for adding a column.

>>  - Quick and easy backup/restore system
>
> pg_dump works well, how quick it is depends on the amount of data you
> have obviously

More important that quick and easy is reliable, predictable, and
capable of being run on a live, working database (backup that is).
pg_dump definitely is those things.

>> This "one datadir" is seems to be not too good for us. We used DBISAM in
>> our clients, and many times when we got some filesystem error, we can
>> simply recover the tables - from the files.

Do you get a lot of filesystem errors?  Or corrupted table errors.
Having table corruption be a common problem means something somewhere
is broken, whether it be your hardware or the db you're running.  I've
been running pgsql for a long time, going back to the tailend of the
6.x era.  I've never once had a corrupt table on a production server.
I am not alone.

> Use a proper backup system that takes consistent snapshots of your data,
> anything else will come back and bite you when you really don't want it
> to

I feel like I'm bathed with light and getting religious, but yes, this.

>> I very fear from to keep all databases in one place, because if they are
>> corrupted, possible more of them injured (if they are not separated).
>> I cannot make filesystem based (hard) copy from one db (only SQL dump
>> enabled).

You need to get out of the habit of playing in your databases files.
Any database that requires you to routinely do that is broken by
design.

I get the feeling you've learned how to use a hammer (the file system
level manipulation thing) to fix a problem that's very nail like (file
corruption in mysql) and are looking for a place to use your hammer in
pgsql.  Pgsql has no such nails sticking out of it.

Re: PGSQL or other DB?

From
Erik Jones
Date:
On Jan 30, 2009, at 11:37 AM, durumdara wrote:

> The main viewpoints:
>  - quick (re)connect - because mod_python basically not store the
> database connections persistently

mod_python is not a database adaptor.  Put another way, mod_python
does not make database connections.  If you're using Python then to
connect to Postgres you'll most likely be using psycopg2 to connect to
and query your database.  I'm not sure what your previous database
experience is but it sounds like you may be thinking in terms of PHP's
persistent connections and on that I'll just say that PHP's persistent
connections are not true persistent connections from your top-level
application's perspective.  If you want connection pooling then there
are options available, pgpool-II and pgbouncer are probably the two
most commonly used ones today.

>  - fast queries

Stop and think about that one for a moment.  Do you really expect to
go to a specific database's mailing list(s) with that requirement
expecting anything other than "We got you covered!"

>  - easy IDE to use (like pgadmin)

I think you just answered your own question.

>  - the db server can handle many databases (webdb_nnn where nnn is
> an integer)

I know people hate hearing this answer but that's the wrong question.
What matters isn't how many databases can be run (and, if you ever run
into a database server in this day and age where it does, run) but
rather how many concurrent connections you expect as well as both the
total data set size and how much of that data set will be constantly
worked with, i.e. needs to be in memory.  Let's say you have 10 5G
databases with each averaging 10% of their total size in use (in
memory, hopefully) at any given moment.  That's entirely different
from 10 10G databases with each averaging 10% in memory and both are
different from running 200 databases each with 5% in use at any given
moment.  All of that being said, a properly configured Postgres
provisioned for the workload installation built with a little
knowledge will scale with the best of 'em.

>  - I can add/modify a table, or a field to a table without "full
> lock" on the table (like DBISAM restructure). Like in FireBird,
> where the "add field" change only the table description. I don't
> know that PG supports this way of the DB modifying.

Nope.  PostgreSQL is an all or nothing transactional database.  I'd
never heard of DBISAM before you mentioned it and have never used
Firebird.  After doing a little reading it turns out that if you plan
to use transactions at all (which is very likely given even just the
little you've described about the applications you're building) then
you should realize that altering tables is not compatible with
transactions and doing so will automatically commit any open
transactions on the table.  Looking into Firebird I couldn't find how
it handles (or doesn't) that at all I but I did see that it will
happily let you add a new not null column with no default to a table
by writing nulls for the new attribute for any existing columns.  That
already makes me queasy.

>  - Quick and easy backup/restore system

Well, again, that depends.  I believe pgAdmin has some nice menu
options to handle simple cases (others can either add to or correct me
there) but as to the core Postgres distribution there's still a bit of
ambiguity arising from cases of "there's more than one way to do it"
even for basic tasks as well as some inconsistent UI aspects to the
existing tools (see the pg_dump/pg_restore/psql comments in the recent
Pet Peeves thread).

> Another important thing that I don't understand (what as I saw) that
> the PostGreSQL is store the databases in one, unseparatable file
> set, in a directory named data.
> In another databases, like DBISAM, FireBird, MySQL, the databases
> are separated to another directories/files.

> This "one datadir" is seems to be not too good for us. We used
> DBISAM in our clients, and many times when we got some filesystem
> error, we can simply recover the tables - from the files.

Way off.  While I didn't look into the storage layouts for DBISAM or
Firebird I can tell you that in any MySQL server instance all of the
tables will be under (to some depth) the same top level data directory
directory with each database in each own directory under that and
table files for each database under those.  I think you may be able to
use symlinks MyISAM table files but InnoDB tables will croak if you
try to do that.  What's more, by default InnoDB does *not* use
separate files per table and instead stores all InnoDB tables in one
file with it's own internal file system.  Admittedly, I think most
people turn on the innodb_file_per_table option such that each table
will get it's own data and index files but even then all of a given
table's data indexes will reside in the same file.  Each of those
InnoDB table data+index files is referred to as an InnoDB tablespace
(but that's really different from what most people expect a tablespace
to be, see next paragraph).  One consequence of that is that adding an
index to a table requires that the entire table and all of it's
indexes be rewritten.

PostgreSQL also follows the 1 directory per-database setup.  Each
database directory is located directly underneath the base/ direcotry
in the main Postgres One big difference is that table data and index
data are in separate files and multiple indexes for a given table are
all in separate files.  This can be viewed as the low-level embodiment
of the fact that in PostgreSQL indexes are not part of the tables they
are for but rather separate, but tightly coupled, database objects.
With this scheme adding a new index, which doesn't even require the
table to be locked any more for new indexes, only needs to write the
new index file, not rewrite the table data and previously existing
indexes as well.   What's more, Postgres allows you to create real
tablespaces so that you can place individual persistent database
objects (databases, tables, indexes, and some constraints) on separate
storage.

>
> When we want to backup or restore one database, we can do it in two
> way: a.) archive all files b.) make sql dump from database.

Both options are available with Postgres although, as has been noted
by others, option 'a' requires other tools in order to back up a
consistent data set.

> If a file get corrupted in a database, then we can restore the datas
> from files, and this filesystem error causes problems only for this
> database, not for all.

> I very fear from to keep all databases in one place, because if they
> are corrupted, possible more of them injured (if they are not
> separated).
> I cannot make filesystem based (hard) copy from one db (only SQL
> dump enabled).

Again, I'm not sure what you mean.  There are different kinds of
corruption that can be found ion files.  Database files could be
considered corrupt if the database screws something up and writes bad
data to a file.  That would be localized to the files it wrote to in
the bad operation (and is extremely rare with Postgres) whereas what I
think of as filesystem corruption often doesn't see file boundaries.

> Ok, I saw that pgsql supports "tablespaces", but as I saw, this
> function can hold only table datas in the another directory, and not
> the full database can separated with them.

No, you can put entire databases in/on different tablespaces.

> Because I don't used one PGSQL with many databases (up to 200), I
> don't know, what happening, and which cases possible. But I think
> you have many experience with it. Please share it with me!
>
> Please help me, because we need to determine which DB to use.
>
> I started the usage of the PG in prev. month, and I liked it except
> the way of the data storage (one data dir).
>
> I tried the MySQL before I tried PG. InnoDB is seems to be "forcing
> transaction system on MyISAM". And boolean data type is missing
> (solved with enum?).
> I don't like it all, but it is seems to be fast with little tables,
> and it is separate the database files to another directories which
> thing I like. Possible it have many limitations what I don't saw in
> first time.

Well, if you don't want transaction then stay away from Postgres and
realize that Postgres actually gives you more built-in support for
placing your data at different locations than MySQL.

> Please help me, which DB is good for us, and how to configure, and
> use PGSQL with these database-set which we need to use.

Just realize that when choosing a complex technology solution like a
database there really is no "yes" or "no" answer or "here's the right
way to do it" and "here's the wrong way".  You need to define your
requirements and part of that is asking questions.  For example, 'how
to configure..' -- if you seriously ask a question like that here
you're going  to get more questions (what kind of IO subsystem? how
many concurrent connections do you need to support? how much memory do
you have? how large is our data set? etc...) than you will answers.

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k






Re: PGSQL or other DB?

From
Scott Marlowe
Date:
On Sat, Jan 31, 2009 at 2:13 AM, Erik Jones <ejones@engineyard.com> wrote:
>
> On Jan 30, 2009, at 11:37 AM, durumdara wrote:

>>  - I can add/modify a table, or a field to a table without "full lock" on
>> the table (like DBISAM restructure). Like in FireBird, where the "add field"
>> change only the table description. I don't know that PG supports this way of
>> the DB modifying.
>
> Nope.  PostgreSQL is an all or nothing transactional database.  I'd never
> heard of DBISAM before you mentioned it and have never used Firebird.  After
> doing a little reading it turns out that if you plan to use transactions at
> all (which is very likely given even just the little you've described about
> the applications you're building) then you should realize that altering
> tables is not compatible with transactions and doing so will automatically
> commit any open transactions on the table.

Are talking about pgsql or some other database?  Everything in pgsql
can be done in a transaction, except create / drop database /
tablespace.

>  Looking into Firebird I couldn't
> find how it handles (or doesn't) that at all I but I did see that it will
> happily let you add a new not null column with no default to a table by
> writing nulls for the new attribute for any existing columns.  That already
> makes me queasy.

That's pretty much what pgsql does.  Why does it make you queasy?

Re: PGSQL or other DB?

From
Russ Brown
Date:
Scott Marlowe wrote:
> On Sat, Jan 31, 2009 at 2:13 AM, Erik Jones <ejones@engineyard.com> wrote:
>> On Jan 30, 2009, at 11:37 AM, durumdara wrote:
>
>>>  - I can add/modify a table, or a field to a table without "full lock" on
>>> the table (like DBISAM restructure). Like in FireBird, where the "add field"
>>> change only the table description. I don't know that PG supports this way of
>>> the DB modifying.
>> Nope.  PostgreSQL is an all or nothing transactional database.  I'd never
>> heard of DBISAM before you mentioned it and have never used Firebird.  After
>> doing a little reading it turns out that if you plan to use transactions at
>> all (which is very likely given even just the little you've described about
>> the applications you're building) then you should realize that altering
>> tables is not compatible with transactions and doing so will automatically
>> commit any open transactions on the table.
>
> Are talking about pgsql or some other database?  Everything in pgsql
> can be done in a transaction, except create / drop database /
> tablespace.
>
>>  Looking into Firebird I couldn't
>> find how it handles (or doesn't) that at all I but I did see that it will
>> happily let you add a new not null column with no default to a table by
>> writing nulls for the new attribute for any existing columns.  That already
>> makes me queasy.
>
> That's pretty much what pgsql does.  Why does it make you queasy?
>

I think the key is that the new column is NOT NULL, so defaulting the
new column's values to NULL results in immediate data integrity
inconsistency.

If I remember rightly, PG doesn't allow this: you have to create the
column as NULL, UPDATE and then add the NOT NULL constraint, or perhaps
(I haven't tried this) create the column with a default and then remove
it immediately afterwards.

Re: PGSQL or other DB?

From
Scott Marlowe
Date:
On Sun, Feb 1, 2009 at 7:33 AM, Russ Brown <pickscrape@gmail.com> wrote:
> Scott Marlowe wrote:
>>
>> On Sat, Jan 31, 2009 at 2:13 AM, Erik Jones <ejones@engineyard.com> wrote:
>>>
>>> On Jan 30, 2009, at 11:37 AM, durumdara wrote:
>>>  Looking into Firebird I couldn't
>>> find how it handles (or doesn't) that at all I but I did see that it will
>>> happily let you add a new not null column with no default to a table by
>>> writing nulls for the new attribute for any existing columns.  That
>>> already
>>> makes me queasy.
>>
>> That's pretty much what pgsql does.  Why does it make you queasy?
>>
>
> I think the key is that the new column is NOT NULL, so defaulting the new
> column's values to NULL results in immediate data integrity inconsistency.
>
> If I remember rightly, PG doesn't allow this: you have to create the column
> as NULL, UPDATE and then add the NOT NULL constraint, or perhaps (I haven't
> tried this) create the column with a default and then remove it immediately
> afterwards.

OK, I completely misunderstood what the other poster meant.  Pgsql
does NOT allow creating the not null column with nulls in place.  That
would make me quesy too. Creating it with a default works in
postgresql.

Re: PGSQL or other DB?

From
Erik Jones
Date:
On Jan 31, 2009, at 9:36 AM, Scott Marlowe wrote:

> On Sat, Jan 31, 2009 at 2:13 AM, Erik Jones <ejones@engineyard.com>
> wrote:
>>
>> On Jan 30, 2009, at 11:37 AM, durumdara wrote:
>
>>> - I can add/modify a table, or a field to a table without "full
>>> lock" on
>>> the table (like DBISAM restructure). Like in FireBird, where the
>>> "add field"
>>> change only the table description. I don't know that PG supports
>>> this way of
>>> the DB modifying.
>>
>> Nope.  PostgreSQL is an all or nothing transactional database.  I'd
>> never
>> heard of DBISAM before you mentioned it and have never used
>> Firebird.  After
>> doing a little reading it turns out that if you plan to use
>> transactions at
>> all (which is very likely given even just the little you've
>> described about
>> the applications you're building) then you should realize that
>> altering
>> tables is not compatible with transactions and doing so will
>> automatically
>> commit any open transactions on the table.
>
> Are talking about pgsql or some other database?  Everything in pgsql
> can be done in a transaction, except create / drop database /
> tablespace.

I was referring to DBISAM there.

>> Looking into Firebird I couldn't
>> find how it handles (or doesn't) that at all I but I did see that
>> it will
>> happily let you add a new not null column with no default to a
>> table by
>> writing nulls for the new attribute for any existing columns.  That
>> already
>> makes me queasy.
>
> That's pretty much what pgsql does.  Why does it make you queasy?

Another poster already beat me to answering this one so I'll not
repeat what they said.

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k






Re: PGSQL or other DB?

From
durumdara
Date:
Hi!

2009.01.31. 10:13 keltezéssel, Erik Jones írta:
>
> On Jan 30, 2009, at 11:37 AM, durumdara wrote:
>
>> The main viewpoints:
>>  - quick (re)connect - because mod_python basically not store the
>> database connections persistently
>
> mod_python is not a database adaptor.
Yes, I know. But because of mod_python (apache) commonly configured to
"1 interpreter/1 request", I need to reconnect with the adapter.
But I tested it, and it is seems to be have enough speed. If not, I must
use some proxy server to keep them alive.

> rewrite the table data and previously existing indexes as well.
> What's more, Postgres allows you to create real tablespaces so that
> you can place individual persistent database objects (databases,
> tables, indexes, and some constraints) on separate storage.
I wanna ask that if I create a database in another tablespace, is the
database name already stored in "main" data storage (like global metadata)?
And If I makes a table or other object into this database, need I to
define the tablespace name, or it is taken from the database
"tablespace" name?
For example:
create db anything tablespace "new2"

These sqls are different?
create table any_table (a int)
or
create table any_table (a int) tablespace "new2".

So: need I define tablespace instruction in every table/object creation,
or not?

And: if I store this database in another drive, and it is unmounted
(because of any reason - like failure, etc), is it causes any problems
with postgresql main databases (for example: service stops, etc). I ask
about "after restart pg service", not online usage.
So if I remove some tablespace path (directory, drive) in the level of
the filesystem, can pg service up to serve main databases the it can access?
Or is it completely die in this problem?

Thanks for your help:
    dd

Re: PGSQL or other DB?

From
Erik Jones
Date:
On Feb 2, 2009, at 12:23 AM, durumdara wrote:

> Hi!
>
> 2009.01.31. 10:13 keltezéssel, Erik Jones írta:
>>
>> On Jan 30, 2009, at 11:37 AM, durumdara wrote:
>>
>>> The main viewpoints:
>>> - quick (re)connect - because mod_python basically not store the
>>> database connections persistently
>>
>> mod_python is not a database adaptor.
> Yes, I know. But because of mod_python (apache) commonly configured
> to "1 interpreter/1 request", I need to reconnect with the adapter.
> But I tested it, and it is seems to be have enough speed. If not, I
> must use some proxy server to keep them alive.

There are various connection pooling softwares available such as
pgpool or pgbouncer for that.

>> rewrite the table data and previously existing indexes as well.
>> What's more, Postgres allows you to create real tablespaces so that
>> you can place individual persistent database objects (databases,
>> tables, indexes, and some constraints) on separate storage.
> I wanna ask that if I create a database in another tablespace, is
> the database name already stored in "main" data storage (like global
> metadata)?
> And If I makes a table or other object into this database, need I to
> define the tablespace name, or it is taken from the database
> "tablespace" name?
> For example:
> create db anything tablespace "new2"
>
> These sqls are different?
> create table any_table (a int)
> or
> create table any_table (a int) tablespace "new2".
>
> So: need I define tablespace instruction in every table/object
> creation, or not?

No, if you create a database as being in a given tablespace then all
table created in that database will go in that tablespace unless you
specify otherwise.

> And: if I store this database in another drive, and it is unmounted
> (because of any reason - like failure, etc), is it causes any
> problems with postgresql main databases (for example: service stops,
> etc). I ask about "after restart pg service", not online usage.
> So if I remove some tablespace path (directory, drive) in the level
> of the filesystem, can pg service up to serve main databases the it
> can access?
> Or is it completely die in this problem?

IIRC, the database should run fine and will just throw errors if you
try to access a database or table in a tablespace that isn't mounted.

> Thanks for your help:
>   dd

You're welcome :)

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k