Thread: Re: For the ametures. (related to "Are we losing momentum?")

Re: For the ametures. (related to "Are we losing momentum?")

From

"Dave Page"

Date:

17 April 2003, 06:01:50

Hi Ben

> -----Original Message-----
> From: Ben Clewett [mailto:B.Clewett@roadrunner.uk.com]
> Sent: 17 April 2003 10:45
> Cc: pgsql-hackers@postgresql.org
> Subject: [HACKERS] For the ametures. (related to "Are we
> losing momentum?")
>
>
> I am not a hacker of PgSQL, and new to Databases.  I was using MySQL
> under .NET, but was annoyed by their agressive licence agreements and
> immaturity.  (Their sales personel are also very rude.  One girl once
> told me that if I didn't like their licence terms I should just use
> flat-files instead.)

Probably more powerful ;-)

> -    A true Windows version which people can learn their craft on.

Coming with 7.4...

> -    Tools which look like Access, to do row level data
> editing with no SQL.

http://www.pgadmin.org/

It looks more like SQL Server's Enterprise Manager but does most if not
all of what I expect you need.

> -    Centrally located complete documentation in many
> consistent  easy to
> read formats, of the system and *ALL* API's, including
> in-line tutorials
> and examples.

The tarball includes the complete documentation in HTML format, and in
pgAdmin there's a searchable copy in the main chm help file.

> -    Data types like 'ENUM' which appeal to ametures.

Isn't that just syntactic sugar for a column with a check for specific
values on it?

> -    There are no administrative mandatorys.  Eg, VACUUM.
> (A stand-alone
> commercial app, like an Email client, will be contrainted by
> having to
> be an app and a DBA in one.)

PostgreSQL is by no means alone in this requirement. SQL Server for
example has 'optimizations' that are performed usually as part of a
scheduled maintenance plan and are analagous to vacuum in some ways.

> -    The tables (not innodb) are in different files of the
> same name.
> Allowing the OS adminitrator great ability.  EG, putting tables on
> separate partitions and therefore greatly speeding performance.

One reason for not doing this is that a table in PostgreSQL might span
mutiple files if it exceeds a couple of gigs in size.

> -    They have extensive backup support.  Including now,
> concurrent backup
> without user interuption or risk of inconsistency.

So does PostgreSQL (pg_dump/pg_dumpall).

Regards, Dave

PS, it's nice you decided not to go to the Dark Side :-)

Re: For the ametures. (related to "Are we losing momentum?")

From

Ben Clewett

Date:

17 April 2003, 07:42:32

Hi Dave,

A brief defence of my posting.  After which I'll retire to my side of 
the fence :)

>>-    A true Windows version which people can learn their craft on.
> 
> Coming with 7.4...

I look forward to this greatly.  Maybe here I'll have the chance to fix 
some problems for the greater community.

>>-    Tools which look like Access, to do row level data 
>>editing with no SQL.
> 
> 
> http://www.pgadmin.org/
> 
> It looks more like SQL Server's Enterprise Manager but does most if not
> all of what I expect you need.

Sorry, my fault, an excellent program.

> The tarball includes the complete documentation in HTML format, and in
> pgAdmin there's a searchable copy in the main chm help file.

But not the API's.  Not in one central location.  Some of it, the stuff 
I use, is on GBorg, and in inconsistent format.  I have personally found 
some documentation very fragmented.  So a subtle point about an ability 
is lost as I have assumed all comments to be in a few pages, and missed 
something vital or relevent in another sourse.  Eg, see my comment at 
the end.  But it's better than msdn :)

>>-    Data types like 'ENUM' which appeal to ametures.
> 
> Isn't that just syntactic sugar for a column with a check for specific
> values on it?

Yes :)  By point is not that PostgreSQL is lacking, only that the 
ameture finds others more friendly and inviting.

Although this may be a point which is irrelevent?

My personal 'gripe' was when reading through the postings, some people 
considered people who have not the time, patience or ability, to learn 
PostgreSQL completelly, somehow not worthy.

I wanted to support us dumb users! :)

>>-    There are no administrative mandatorys.  Eg, VACUUM.  
>>(A stand-alone 
>>commercial app, like an Email client, will be contrainted by 
>>having to 
>>be an app and a DBA in one.)
> 
> PostgreSQL is by no means alone in this requirement. SQL Server for
> example has 'optimizations' that are performed usually as part of a
> scheduled maintenance plan and are analagous to vacuum in some ways.

Is this a weekness in DBMS's that don't require this?  (MySQL, Liant 
etc.)  Is there a way of building a guarbage collector into the system? 
My Windows PC has no 'cron'.

>>-    The tables (not innodb) are in different files of the 
>>same name. 
>>Allowing the OS adminitrator great ability.  EG, putting tables on 
>>separate partitions and therefore greatly speeding performance.
> 
> One reason for not doing this is that a table in PostgreSQL might span
> mutiple files if it exceeds a couple of gigs in size.

They used multile files for tables, with a common pefix of the table 
name.  But they have dropped this them selves now.

I miss the way with MySQL I could delete a table, or move it, or back it 
up, manually using 'rm', 'mv' or 'cp'.

Working with IDE drives on PC's,  you can double the performace of a DB 
just by putting half the tables on a disk on another IDE chain.  Adding 
a DB using 'tar' is very a powerful ability.

But hay, if I missed it that much, I would not have moved! :)

>>-    They have extensive backup support.  Including now, 
>>concurrent backup 
>>without user interuption or risk of inconsistency.
> 
> 
> So does PostgreSQL (pg_dump/pg_dumpall).

I have used this, and it's a great command.

I could not work out from the documentation whether it takes a snapshot 
at the start time, or archives data at the time it find's it.  The 
documentation (app-pg-dump.html).  As the documentation does not clarify 
this very important point, I desided it's not safe to use when the 
system is in use.

Can this command can be used, with users in the system making heavy 
changes, and when takes many hours to complete, does produce a valid and 
consistent backup?

If so, you have all MySQL has here and in a more useful format.

> PS, it's nice you decided not to go to the Dark Side :-)

Thanks, Ben

Re: For the ametures. (related to "Are we losing momentum?")

From

Ian Barwick

Date:

17 April 2003, 08:15:24

On Thursday 17 April 2003 13:44, Ben Clewett wrote:
> Hi Dave,
>
> A brief defence of my posting.  After which I'll retire to my side of
> the fence :)
(snip)

> >>-    Data types like 'ENUM' which appeal to ametures.
> >
> > Isn't that just syntactic sugar for a column with a check for specific
> > values on it?
>
> Yes :)  By point is not that PostgreSQL is lacking, only that the
> ameture finds others more friendly and inviting.
>
> Although this may be a point which is irrelevent?

Probably ;-) because MySQL too lacks a few "user friendly" features
(like boolean datatypes).

Ian Barwick
barwick@gmx.net

Re: For the ametures. (related to "Are we losing momentum?")

From

"Andrew Dunstan"

Date:

17 April 2003, 08:18:44

(please note that the word is "amateur" - it comes from French/Latin,
meaning people who do things for the love of it).

There are lots of cron clones for Windows - try a Google search.

Also, there is a native Windows port of Pg 7.2.1 available - we have been
using it for a couple of months now on a small project without a single
hitch. Search this mailing list for details.

Like you, we eagerly await the official Windows port in 7.4. (then we'll
have shema, for example).

Part of the problem that amateurs often face in dealing with things like a
DBMS is that their lack of formal training leads them to expect things to
work in some intuitive fashion, and they don't (for very good technical
reasons). As someone who in a past life had to teach relational theory and
practice, I can tell you that just getting across the idea of a Cartesian
product can be quite hard. And as a former DBA I can tell you that even
seasoned professional developers often don't/can't take the trouble to
analyse what their queries are doing and why they demand so much in
resources. Running a DBMS (*any* DBMS) which has significant requirements is
unfortunately something that requires both understanding and experience. It
never "just works".

Finally, one of the important things for my particular situation, is that Pg
comes with a BSDish license, which means we have no issues with bundling it.
AFAIK it's pretty much alone in that.

andrew


----- Original Message -----
From: "Ben Clewett" <B.Clewett@roadrunner.uk.com>
To: "Dave Page" <dpage@vale-housing.co.uk>; <pgsql-hackers@postgresql.org>
Sent: Thursday, April 17, 2003 7:44 AM
Subject: Re: [HACKERS] For the ametures. (related to "Are we losing
momentum?")


> Hi Dave,
>
> A brief defence of my posting.  After which I'll retire to my side of
> the fence :)
>
> >>- A true Windows version which people can learn their craft on.
> >
> > Coming with 7.4...
>
> I look forward to this greatly.  Maybe here I'll have the chance to fix
> some problems for the greater community.
>
> >>- Tools which look like Access, to do row level data
> >>editing with no SQL.
> >
> >
> > http://www.pgadmin.org/
> >
> > It looks more like SQL Server's Enterprise Manager but does most if not
> > all of what I expect you need.
>
> Sorry, my fault, an excellent program.
>
> > The tarball includes the complete documentation in HTML format, and in
> > pgAdmin there's a searchable copy in the main chm help file.
>
> But not the API's.  Not in one central location.  Some of it, the stuff
> I use, is on GBorg, and in inconsistent format.  I have personally found
> some documentation very fragmented.  So a subtle point about an ability
> is lost as I have assumed all comments to be in a few pages, and missed
> something vital or relevent in another sourse.  Eg, see my comment at
> the end.  But it's better than msdn :)
>
> >>- Data types like 'ENUM' which appeal to ametures.
> >
> > Isn't that just syntactic sugar for a column with a check for specific
> > values on it?
>
> Yes :)  By point is not that PostgreSQL is lacking, only that the
> ameture finds others more friendly and inviting.
>
> Although this may be a point which is irrelevent?
>
> My personal 'gripe' was when reading through the postings, some people
> considered people who have not the time, patience or ability, to learn
> PostgreSQL completelly, somehow not worthy.
>
> I wanted to support us dumb users! :)
>
> >>- There are no administrative mandatorys.  Eg, VACUUM.
> >>(A stand-alone
> >>commercial app, like an Email client, will be contrainted by
> >>having to
> >>be an app and a DBA in one.)
> >
> > PostgreSQL is by no means alone in this requirement. SQL Server for
> > example has 'optimizations' that are performed usually as part of a
> > scheduled maintenance plan and are analagous to vacuum in some ways.
>
> Is this a weekness in DBMS's that don't require this?  (MySQL, Liant
> etc.)  Is there a way of building a guarbage collector into the system?
> My Windows PC has no 'cron'.
>
> >>- The tables (not innodb) are in different files of the
> >>same name.
> >>Allowing the OS adminitrator great ability.  EG, putting tables on
> >>separate partitions and therefore greatly speeding performance.
> >
> > One reason for not doing this is that a table in PostgreSQL might span
> > mutiple files if it exceeds a couple of gigs in size.
>
> They used multile files for tables, with a common pefix of the table
> name.  But they have dropped this them selves now.
>
> I miss the way with MySQL I could delete a table, or move it, or back it
> up, manually using 'rm', 'mv' or 'cp'.
>
> Working with IDE drives on PC's,  you can double the performace of a DB
> just by putting half the tables on a disk on another IDE chain.  Adding
> a DB using 'tar' is very a powerful ability.
>
> But hay, if I missed it that much, I would not have moved! :)
>
> >>- They have extensive backup support.  Including now,
> >>concurrent backup
> >>without user interuption or risk of inconsistency.
> >
> >
> > So does PostgreSQL (pg_dump/pg_dumpall).
>
> I have used this, and it's a great command.
>
> I could not work out from the documentation whether it takes a snapshot
> at the start time, or archives data at the time it find's it.  The
> documentation (app-pg-dump.html).  As the documentation does not clarify
> this very important point, I desided it's not safe to use when the
> system is in use.
>
> Can this command can be used, with users in the system making heavy
> changes, and when takes many hours to complete, does produce a valid and
> consistent backup?
>
> If so, you have all MySQL has here and in a more useful format.
>
> > PS, it's nice you decided not to go to the Dark Side :-)
>
> Thanks, Ben
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

Re: For the ametures. (related to "Are we losing momentum?")

From

"Dave Page"

Date:

17 April 2003, 08:44:13


> -----Original Message-----
> From: Ben Clewett [mailto:B.Clewett@roadrunner.uk.com]
> Sent: 17 April 2003 12:44
> To: Dave Page; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] For the ametures. (related to "Are we
> losing momentum?")
>
>
>
> But not the API's.  Not in one central location.  Some of it,
> the stuff
> I use, is on GBorg,

The API's that ship with the source (libpq et al.) are all documented in
the HTML docs that ship with the code as far as I'm aware.

I suspect what you look at on Gborg will be one or more of psqlODBC,
Npgsql or libpqxx? These are seperate projects, and hence have their own
documentation. I don't know about libpqxx, but the psqlODBC docs are
very old I admit. If anyone cares to work on them, please let me know.
The Npgsql stuff is all very new and very alpha and I guess of all the
Npgsql hackers, I'm probably the only one who hangs around here - that's
how seperate the project is from PostgreSQL itself.

What we could probably use is a page on the main website highlighting
all the programming interfaces - similar to
http://www.postgresql.org/users-lounge/interfaces.html but a bit more
prominent and focused. I'll put my third hat on now and repeat - if
anyone cares to work on this, please let me know :-)

> My personal 'gripe' was when reading through the postings,
> some people
> considered people who have not the time, patience or ability,
> to learn
> PostgreSQL completelly, somehow not worthy.
>
> I wanted to support us dumb users! :)

That's certainly not the case for many of the people here, though you
must remember, the vast majority of us work voluntarily and prefer to
help users who have made an effort to help themselves first rather than
those who expect us to do everything for them for free. Thankfully those
people are few and far between, but they do crop up from time to time.

> > PostgreSQL is by no means alone in this requirement. SQL Server for
> > example has 'optimizations' that are performed usually as part of a
> > scheduled maintenance plan and are analagous to vacuum in some ways.
>
> Is this a weekness in DBMS's that don't require this?  (MySQL, Liant
> etc.)  Is there a way of building a guarbage collector into
> the system?

Potentially I guess, if they are cleaning up and trying to reuse space
on the fly then they could suffer a performance hit.

> My Windows PC has no 'cron'.

No, but it probably has a Scheduled Tasks folder unless it's a really
old version.

>
> Can this command can be used, with users in the system making heavy
> changes, and when takes many hours to complete, does produce
> a valid and
> consistent backup?

Yes, pg_dump will give you a consistent backup - this is from section
9.1 of the Administrators Guide in the Backup and Restore section:

Dumps created by pg_dump are internally consistent, that is, updates to
the database while pg_dump is running will not be in the dump. pg_dump
does not block other operations on the database while it is working.
(Exceptions are those operations that need to operate with an exclusive
lock, such as VACUUM FULL.)

Regards, Dave.

Re: For the ametures. (related to "Are we losing momentum?")

From

Andrew Sullivan

Date:

17 April 2003, 08:44:51

On Thu, Apr 17, 2003 at 11:44:07AM +0000, Ben Clewett wrote:
> 
> I miss the way with MySQL I could delete a table, or move it, or back it 
> up, manually using 'rm', 'mv' or 'cp'.

Under most circumstances, you can't do that _anyway_, because doing
so will break stuff unless the postmaster is stopped.  I agree that
bing able to put tables and files on their own platters would be a
Good Thing, but in order to make it really safe, it needs to be
managed by the postmaster.  Making this difficult is sort of a
defence mechanism, therefore: if you make it too easy, people will be
shooting themselves in the foot all the time.

A

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8                                        +1 416 646 3304
x110

Re: For the ametures. (related to "Are we losing momentum?")

From

Bruce Momjian

Date:

17 April 2003, 11:37:59

Dave Page wrote:
> > -    They have extensive backup support.  Including now,
> > concurrent backup
> > without user interuption or risk of inconsistency.
>
> So does PostgreSQL (pg_dump/pg_dumpall).

I have applied the following doc patch to the pg_dump documentation to
more clearly state that it can do consistent backups during concurrent
access --- too many people weren't seeing that capability.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: pg_dump.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/pg_dump.sgml,v
retrieving revision 1.60
diff -c -c -r1.60 pg_dump.sgml
*** pg_dump.sgml    25 Mar 2003 16:15:42 -0000    1.60
--- pg_dump.sgml    17 Apr 2003 15:34:16 -0000
***************
*** 33,74 ****
    </title>

    <para>
!    <application>pg_dump</application> is a utility for saving a
!    <productname>PostgreSQL</productname> database into a script or an
!    archive file.  The script files are in plain-text format and
!    contain the SQL commands required to reconstruct the database to
!    the state it was in at the time it was saved.  To restore these
!    scripts, use <xref linkend="app-psql">.  They can be used to
!    reconstruct the database even on other machines and other
!    architectures, with some modifications even on other SQL database
!    products.
    </para>

    <para>
!    Furthermore, there are alternative archive file formats
!    that are meant to be used with <xref linkend="app-pgrestore"> to
!    rebuild the database, and they also allow
!    <application>pg_restore</application> to be selective about what is
!    restored, or even to reorder the items prior to being restored. The
!    archive files are also designed to be portable across
     architectures.
    </para>

    <para>
     When used with one of the archive file formats and combined with
!    <application>pg_restore</application>, <application>pg_dump</application> provides a
!    flexible archival and
     transfer mechanism. <application>pg_dump</application> can be used to
!    backup an entire database, then <application>pg_restore</application> can
!    be used to examine the archive and/or select which parts of the
!    database are to be restored.
!    The most flexible output file format is the <quote>custom</quote>
!    format (<option>-Fc</option>). It allows for selection and
!    reordering of all archived items, and is compressed by default. The
!    <application>tar</application> format (<option>-Ft</option>) is not
!    compressed and it is not possible to reorder data when loading, but
!    it is otherwise quite flexible; moreover, it can be manipulated with
!    other tools such as <command>tar</command>.
    </para>

    <para>
--- 33,79 ----
    </title>

    <para>
!    <application>pg_dump</application> is a utility for backing up a
!    <productname>PostgreSQL</productname> database. It makes consistent
!    backups even if the database is being used concurrently.
!    <application>pg_dump</application> does not block other users
!    accessing the database (readers or writers).
    </para>

    <para>
!    Dumps can be output in script or archive file formats. The script
!    files are in plain-text format and contain the SQL commands required
!    to reconstruct the database to the state it was in at the time it was
!    saved. To restore these scripts, use <xref linkend="app-psql">. They
!    can be used to reconstruct the database even on other machines and
!    other architectures, with some modifications even on other SQL
!    database products.
!   </para>
!
!   <para>
!    The alternative archive file formats that are meant to be used with
!    <xref linkend="app-pgrestore"> to rebuild the database, and they also
!    allow <application>pg_restore</application> to be selective about
!    what is restored, or even to reorder the items prior to being
!    restored. The archive files are also designed to be portable across
     architectures.
    </para>

    <para>
     When used with one of the archive file formats and combined with
!    <application>pg_restore</application>,
!    <application>pg_dump</application> provides a flexible archival and
     transfer mechanism. <application>pg_dump</application> can be used to
!    backup an entire database, then <application>pg_restore</application>
!    can be used to examine the archive and/or select which parts of the
!    database are to be restored. The most flexible output file format is
!    the <quote>custom</quote> format (<option>-Fc</option>). It allows
!    for selection and reordering of all archived items, and is compressed
!    by default. The <application>tar</application> format
!    (<option>-Ft</option>) is not compressed and it is not possible to
!    reorder data when loading, but it is otherwise quite flexible;
!    moreover, it can be manipulated with other tools such as
!    <command>tar</command>.
    </para>

    <para>
***************
*** 77,88 ****
     light of the limitations listed below.
    </para>

-   <para>
-    <application>pg_dump</application> makes consistent backups even if the
-    database is being used concurrently.  <application>pg_dump</application>
-    does not block other users accessing the database (readers or
-    writers).
-   </para>
   </refsect1>

   <refsect1 id="pg-dump-options">
--- 82,87 ----

Re: For the ametures. (related to "Are we losing momentum?")

From

Doug McNaught

Date:

17 April 2003, 12:38:04

Ben Clewett <B.Clewett@roadrunner.uk.com> writes:


> > So does PostgreSQL (pg_dump/pg_dumpall).
> 
> I have used this, and it's a great command.
> 
> I could not work out from the documentation whether it takes a
> snapshot at the start time, or archives data at the time it find's it.
> The documentation (app-pg-dump.html).  As the documentation does not
> clarify this very important point, I desided it's not safe to use when
> the system is in use.

Ummm, quoting from the pg_dump manpage:
      pg_dump  makes  consistent backups even if the database is      being used concurrently.  pg_dump  does  not
block other      users accessing the database (readers or writers).
 

What part of this isn't clear?

It's safe.  pg_dump does all its work inside a transaction, so MVCC
rules automatically guarantee that it sees a consistent snapshot.

> Can this command can be used, with users in the system making heavy
> changes, and when takes many hours to complete, does produce a valid
> and consistent backup?

Absolutely.

> If so, you have all MySQL has here and in a more useful format.

I think MySQL's consistent hot backup has to lock tables, while PG's
doesn't...

-Doug

Re: For the ametures. (related to "Are we losing momentum?")

From

Bruce Momjian

Date:

17 April 2003, 12:40:54

Doug McNaught wrote:
> Ben Clewett <B.Clewett@roadrunner.uk.com> writes:
> 
> 
> > > So does PostgreSQL (pg_dump/pg_dumpall).
> > 
> > I have used this, and it's a great command.
> > 
> > I could not work out from the documentation whether it takes a
> > snapshot at the start time, or archives data at the time it find's it.
> > The documentation (app-pg-dump.html).  As the documentation does not
> > clarify this very important point, I desided it's not safe to use when
> > the system is in use.
> 
> Ummm, quoting from the pg_dump manpage:
> 
>        pg_dump  makes  consistent backups even if the database is
>        being used concurrently.  pg_dump  does  not  block  other
>        users accessing the database (readers or writers).
> 
> What part of this isn't clear?
> 
> It's safe.  pg_dump does all its work inside a transaction, so MVCC
> rules automatically guarantee that it sees a consistent snapshot.

Too many people have missed that point --- it was too far down in the
manual page, after a long discussion about output formats.  Now it is
right in the first paragraph, which should eliminate that question ---
it was almost becoming an FAQ.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: For the ametures. (related to "Are we losing momentum?")

From

Jon Jensen

Date:

17 April 2003, 13:38:09

On Thu, 17 Apr 2003, Dave Page wrote:

> > -    Data types like 'ENUM' which appeal to ametures.
> 
> Isn't that just syntactic sugar for a column with a check for specific
> values on it?

I believe it's actually different. In PostgreSQL you'd use a VARCHAR 
column with CHECK constraints, which means there are actual possibly 
lengthy strings in the database. In MySQL's ENUM, the table structure maps 
a particular string to a bit pattern, so if you have two possible values, 
'superdogfood' and 'onetwothreefourfivesixseven', your column will only 
take 1 bit + overhead. Obviously no big deal until you get a few dozen 
possibilities. This is also what allows the SET type to work -- it's a set 
of binary flags for a named list of elements. The docs are here:

http://www.mysql.com/documentation/mysql/bychapter/manual_Reference.html#ENUM

I don't like the fact that numbers don't really work (being used as 
indices rather than names), that case isn't tolerated, that invalid 
entries go in as empty strings, etc., so I certainly wouldn't want to see 
them emulated exactly in PostgreSQL, but I imagine that ENUM could save a 
lot of disk space in certain circumstances, and SET seems useful.

Jon

Re: For the ametures. (related to "Are we losing momentum?")

From

Peter Eisentraut

Date:

17 April 2003, 13:40:10

Ben Clewett writes:

> I could not work out from the documentation whether it takes a snapshot
> at the start time, or archives data at the time it find's it.  The
> documentation (app-pg-dump.html).  As the documentation does not clarify
> this very important point, I desided it's not safe to use when the
> system is in use.
>
> Can this command can be used, with users in the system making heavy
> changes, and when takes many hours to complete, does produce a valid and
> consistent backup?

From the pg_dump reference page:
 <para>  <application>pg_dump</application> makes consistent backups even if the  database is being used concurrently.
<application>pg_dump</application> does not block other users accessing the database (readers or  writers). </para>

From the chapter Backup and Restore:
 <para>  Dumps created by <application>pg_dump</> are internally consistent,  that is, updates to the database while
<application>pg_dump</>is  running will not be in the dump. <application>pg_dump</> does not  block other operations on
thedatabase while it is working.  (Exceptions are those operations that need to operate with an  exclusive lock, such
as<command>VACUUM FULL</command>.) </para>

-- 
Peter Eisentraut   peter_e@gmx.net

Re: For the ametures. (related to "Are we losing momentum?")

From

"Matthew T. O'Connor"

Date:

17 April 2003, 14:46:47

----- Original Message -----
From: "Ben Clewett" <B.Clewett@roadrunner.uk.com>
> >>- There are no administrative mandatorys.  Eg, VACUUM.
> >>(A stand-alone
> >>commercial app, like an Email client, will be contrainted by
> >>having to
> >>be an app and a DBA in one.)
> >
> > PostgreSQL is by no means alone in this requirement. SQL Server for
> > example has 'optimizations' that are performed usually as part of a
> > scheduled maintenance plan and are analagous to vacuum in some ways.
>
> Is this a weekness in DBMS's that don't require this?  (MySQL, Liant
> etc.)  Is there a way of building a guarbage collector into the system?
> My Windows PC has no 'cron'.

Work is being done to build vacuum into the backend so that cron is not
required.  Hopefully will be in 7.4

> >>- The tables (not innodb) are in different files of the
> >>same name.
> >>Allowing the OS adminitrator great ability.  EG, putting tables on
> >>separate partitions and therefore greatly speeding performance.
> >
> > One reason for not doing this is that a table in PostgreSQL might span
> > mutiple files if it exceeds a couple of gigs in size.
>
> Working with IDE drives on PC's,  you can double the performace of a DB
> just by putting half the tables on a disk on another IDE chain.  Adding
> a DB using 'tar' is very a powerful ability.

You can do this using symlinks, but you do have to shut down the postmaster
before you play with the files directly.

> >>- They have extensive backup support.  Including now,
> >>concurrent backup
> >>without user interuption or risk of inconsistency.
> >
> > So does PostgreSQL (pg_dump/pg_dumpall).
>
> I have used this, and it's a great command.
>
> I could not work out from the documentation whether it takes a snapshot
> at the start time, or archives data at the time it find's it.  The
> documentation (app-pg-dump.html).  As the documentation does not clarify
> this very important point, I desided it's not safe to use when the
> system is in use.
>
> Can this command can be used, with users in the system making heavy
> changes, and when takes many hours to complete, does produce a valid and
> consistent backup?

Yes it takes a snapshot from when it starts dumping the database, so it's
consistent no matter how much activity is going on after you start pg_dump.

Re: For the ametures. (related to "Are we losing momentum?")

From

"scott.marlowe"

Date:

21 April 2003, 12:47:08

On Thu, 17 Apr 2003, Dave Page wrote:

> Hi Ben
> 
> > -    Data types like 'ENUM' which appeal to ametures.
> 
> Isn't that just syntactic sugar for a column with a check for specific
> values on it?

Yes.  But isn't serial just syntactic sugar for create sequence seqname; 
then including that sequence in a default clause?

Personally, i could easily see a use for enum() covering the check in 
constraint.  MySQL users can use enum() at their leisure, postgresql users 
can pick if up if they want to, and underneath it all is a check 
constraint the user can see with /dt just like with serial.

> > -    There are no administrative mandatorys.  Eg, VACUUM.  
> > (A stand-alone 
> > commercial app, like an Email client, will be contrainted by 
> > having to 
> > be an app and a DBA in one.)
> 
> PostgreSQL is by no means alone in this requirement. SQL Server for
> example has 'optimizations' that are performed usually as part of a
> scheduled maintenance plan and are analagous to vacuum in some ways.

But at the same time, it's one of those gotchas that would be nice to get 
rid of for people who just want a simple little database on their 
workstation.  It might be nice to have some kind of lazy auto vacuum 
daemon installed by default and configured to run every hour or so 
according to postgresql.conf. 

> > -    The tables (not innodb) are in different files of the 
> > same name. 
> > Allowing the OS adminitrator great ability.  EG, putting tables on 
> > separate partitions and therefore greatly speeding performance.
> 
> One reason for not doing this is that a table in PostgreSQL might span
> mutiple files if it exceeds a couple of gigs in size.

And let's face it, if we get tablespaces implemented, this kind of thing 
goes awawy.

Re: For the ametures. (related to "Are we losing momentum?")

From

Ben Clewett

Date:

22 April 2003, 04:22:42

Matthew T. O'Connor wrote:

> ----- Original Message -----
> From: "Ben Clewett" <B.Clewett@roadrunner.uk.com>

>>Working with IDE drives on PC's,  you can double the performace of a DB
>>just by putting half the tables on a disk on another IDE chain.
> 
> 
> You can do this using symlinks, but you do have to shut down the postmaster
> before you play with the files directly.

I was hoping this was the case. :)
From my data/base directory, I have a tree structure of numbered files 
of no obvious structure.  As well as some smaller directories, 'global', 
'pg_xlog' and 'pg_clog'.

If I wanted to divide the postmaster read() calls evenly to files 
located over several physical disks, how would you suggest distributing 
the data-space?  Would it be as simple as putting each child directory 
in 'data/base' on a different physical disk in a round-robbin fasion 
using symbolic links:  Or is it more involved...

data/base/1 -> /dev/hda
data/base/2 -> /dev/hdb
data/base/3 -> /dev/hdc
data/base/4 -> /dev/hda
data/base/5 -> /dev/hdb
data/base/6 -> /dev/hdc (etc)

(I have made the assumption that the postmaster serves different 
connections in parallel, otherwise this would have little effect :)

Thanks,

Ben

Re: For the ametures. (related to "Are we losing momentum?")

From

Shridhar Daithankar

Date:

22 April 2003, 04:38:14

On Tuesday 22 April 2003 13:55, Ben Clewett wrote:
> If I wanted to divide the postmaster read() calls evenly to files
> located over several physical disks, how would you suggest distributing
> the data-space?  Would it be as simple as putting each child directory
> in 'data/base' on a different physical disk in a round-robbin fasion
> using symbolic links:  Or is it more involved...
>
> data/base/1 -> /dev/hda
> data/base/2 -> /dev/hdb
> data/base/3 -> /dev/hdc
> data/base/4 -> /dev/hda
> data/base/5 -> /dev/hdb
> data/base/6 -> /dev/hdc (etc)

Don't bother splitting across disks unless you put them on different IDE 
channels as IDE channel bandwidth is shared. 

If you have that many disk, put them on IDE RAID. That is a much simpler 
solution.
Shridhar

Re: For the ametures. (related to "Are we losing

From

"Matthew T. O'Connor"

Date:

22 April 2003, 08:23:27

On Tue, 2003-04-22 at 04:25, Ben Clewett wrote:
> Matthew T. O'Connor wrote:
> From my data/base directory, I have a tree structure of numbered files 
> of no obvious structure.  As well as some smaller directories, 'global', 
> 'pg_xlog' and 'pg_clog'.
> 
> If I wanted to divide the postmaster read() calls evenly to files 
> located over several physical disks, how would you suggest distributing 
> the data-space?  Would it be as simple as putting each child directory 
> in 'data/base' on a different physical disk in a round-robbin fasion 
> using symbolic links:  Or is it more involved...
> 
> data/base/1 -> /dev/hda
> data/base/2 -> /dev/hdb
> data/base/3 -> /dev/hdc
> data/base/4 -> /dev/hda
> data/base/5 -> /dev/hdb
> data/base/6 -> /dev/hdc (etc)
> 
> (I have made the assumption that the postmaster serves different 
> connections in parallel, otherwise this would have little effect :)

Yes connections are served in parallel.  The best way to split the files
is something you have to figure out, probably based on usage.  The round
robin directory method you mentioned above falls down in that it only
splits whole databases into different locations regardless of how much
I/O is related to those databases.  You may wind up with inactive
databases on their own disk which would yield no performance gain.  It's
also probably better to get down to the file / index level rather than
whole databases as you may have a few tables that get 90% of the work.

Hopefully some of that was helpful.

Re: For the ametures. (related to 'Are we losing momentum?')

From

pgsql@mohawksoft.com

Date:

22 April 2003, 10:29:41

> On Tuesday 22 April 2003 13:55, Ben Clewett wrote:
>> If I wanted to divide the postmaster read() calls evenly to files
>> located over several physical disks, how would you suggest
>> distributing the data-space?  Would it be as simple as putting each
>> child directory in 'data/base' on a different physical disk in a
>> round-robbin fasion using symbolic links:  Or is it more involved...
>>
>> data/base/1 -> /dev/hda
>> data/base/2 -> /dev/hdb
>> data/base/3 -> /dev/hdc
>> data/base/4 -> /dev/hda
>> data/base/5 -> /dev/hdb
>> data/base/6 -> /dev/hdc (etc)
>
> Don't bother splitting across disks unless you put them on different
> IDE  channels as IDE channel bandwidth is shared.

While that is electricaly "true" it is not completely true. Modern IDE hard
disks are very advanced with large read-ahead caches. That combined with
IDE-DMA access, low seek times, faster spin rates, means you can get
performance across two IDE drives on the same channel.

For instance, two databases, one on HDA and the other database on HDB.
Successive reads inteleaved HDA/HDB/HDA/HDB etc. will share electical
bandwidth (as would SCSI). AFAIK, there is no standard asynchronous command
structure for IDE, however, the internal read-ahead cache on each drive will
usually have a pretty good guess at the "next" block based on some
predictive caching algorithm.

So, the "next" read from the drive has a good chance at coming from cache.
Plus the OS may "scatter gather" larger requests into smaller successive
requests (so a pure "read-ahead" will work great). Then consider
write-caching (if you dare).

It is very true you want to have one IDE drive per IDE channel, but these
days two drives on a channel are not as bad as it once was.  This is not due
to shared electrical bandwidth of the system (all bus systems suffer this)
but because of the electrical protocol to address the drives. ATA and EIDE
have made strides in this area.

>
> If you have that many disk, put them on IDE RAID. That is a much
> simpler  solution.

A hardware RAID system is obviously an "easier" solution, and
www.infortrend.com makes a very cool system, but spreading multiple
databases across multiple IDE drives and controllers will probably provide
higher overall performance if you have additional IDE channels instead of
forcing all the I/O through one controller (IDE or SCSI) channel.

Pretty good PCI/EIDE-DMA controllers are cheap, $50~$100, and you can fit a
bunch of them into a server system. Provided your OS has a reentrent driver
model, it should be possible for PostgreSQL to be performing as many I/O
operations concurrently as you have drive controllers, where as with an
IDE->SCSI raid controller, you may still be limited to how good your
specific driver handles concurrency within one driver instance.

The "best" solution is one hardware raid per I/O channel per database, but
that is expensive. One IDE driver per IDE channel per database is the next
best thing. Two IDE drives per channel, one drive per database, is very
workable if you make sure that the more active databases are on separate
controllers.

Re: For the ametures. (related to "Are we losing momentum?")

From

Tom Lane

Date:

22 April 2003, 11:18:45

Ben Clewett <B.Clewett@roadrunner.uk.com> writes:
> If I wanted to divide the postmaster read() calls evenly to files 
> located over several physical disks, how would you suggest distributing 
> the data-space?

AFAIK, the single biggest win you can get in this dimension is to put
the WAL log ($PGDATA/pg_xlog/) on a separate spindle from everything
else.  At least for write-intensive databases, that can buy you
something like 2x improvement for the price of one easy symlink.

After that, the conventional wisdom is to put indexes on a third spindle
(separate from base tables and from xlog).  But the bookkeeping and
maintenance effort needed for that is really too high to make it worth
worrying about, IMHO :-(.  Eventually we will have some kind of
tablespace feature to make it easy.

My recommendation at the moment would be: WAL on dedicated spindle,
everything else on the best RAID array you can set up.  And buy as much
RAM as you can afford.

See past discussions in pgsql-performance for more info.
        regards, tom lane

Re: For the ametures. (related to "Are we losing momentum?")

From

"D'Arcy J.M. Cain"

Date:

22 April 2003, 12:32:09

On Tuesday 22 April 2003 11:18, Tom Lane wrote:
> My recommendation at the moment would be: WAL on dedicated spindle,
> everything else on the best RAID array you can set up.  And buy as much
> RAM as you can afford.

Hmm.  Is this safe?  Let's assume for argument's sake that the RAID is 100% 
reliable and that the local spindle will eventually fail.  Can I lose data by 
leaving WAL on the local spindle?  Or are you suggesting two full RAID 
systems?

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

Re: For the ametures. (related to "Are we losing momentum?")

From

Tom Lane

Date:

22 April 2003, 12:51:07

"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> Hmm.  Is this safe?  Let's assume for argument's sake that the RAID is 100% 
> reliable and that the local spindle will eventually fail.  Can I lose data by
> leaving WAL on the local spindle?  Or are you suggesting two full RAID 
> systems?

A two-disk mirror for the WAL would be good, yes; it needs redundancy
just as much as the main storage.  But the discussion was about
performance not reliability; the OP hadn't mentioned that he wanted any
redundant storage.
        regards, tom lane

Re: For the ametures. (related to "Are we losing momentum?")

From

"D'Arcy J.M. Cain"

Date:

23 April 2003, 04:37:04

On Tuesday 22 April 2003 12:51, Tom Lane wrote:
> A two-disk mirror for the WAL would be good, yes; it needs redundancy
> just as much as the main storage.  But the discussion was about
> performance not reliability; the OP hadn't mentioned that he wanted any
> redundant storage.

Ah.  When you mentioned RAID I thought you were going after reliability.  So I 
am probably best in my environment to just leave everything on the RAID 
device and let it do it's job as best that it can.  It does seem to work 
faster than using local storage, even SCSI.

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.