Thread: Re: For the ametures. (related to "Are we losing momentum?")
Hi Ben > -----Original Message----- > From: Ben Clewett [mailto:B.Clewett@roadrunner.uk.com] > Sent: 17 April 2003 10:45 > Cc: pgsql-hackers@postgresql.org > Subject: [HACKERS] For the ametures. (related to "Are we > losing momentum?") > > > I am not a hacker of PgSQL, and new to Databases. I was using MySQL > under .NET, but was annoyed by their agressive licence agreements and > immaturity. (Their sales personel are also very rude. One girl once > told me that if I didn't like their licence terms I should just use > flat-files instead.) Probably more powerful ;-) > - A true Windows version which people can learn their craft on. Coming with 7.4... > - Tools which look like Access, to do row level data > editing with no SQL. http://www.pgadmin.org/ It looks more like SQL Server's Enterprise Manager but does most if not all of what I expect you need. > - Centrally located complete documentation in many > consistent easy to > read formats, of the system and *ALL* API's, including > in-line tutorials > and examples. The tarball includes the complete documentation in HTML format, and in pgAdmin there's a searchable copy in the main chm help file. > - Data types like 'ENUM' which appeal to ametures. Isn't that just syntactic sugar for a column with a check for specific values on it? > - There are no administrative mandatorys. Eg, VACUUM. > (A stand-alone > commercial app, like an Email client, will be contrainted by > having to > be an app and a DBA in one.) PostgreSQL is by no means alone in this requirement. SQL Server for example has 'optimizations' that are performed usually as part of a scheduled maintenance plan and are analagous to vacuum in some ways. > - The tables (not innodb) are in different files of the > same name. > Allowing the OS adminitrator great ability. EG, putting tables on > separate partitions and therefore greatly speeding performance. One reason for not doing this is that a table in PostgreSQL might span mutiple files if it exceeds a couple of gigs in size. > - They have extensive backup support. Including now, > concurrent backup > without user interuption or risk of inconsistency. So does PostgreSQL (pg_dump/pg_dumpall). Regards, Dave PS, it's nice you decided not to go to the Dark Side :-)
Hi Dave, A brief defence of my posting. After which I'll retire to my side of the fence :) >>- A true Windows version which people can learn their craft on. > > Coming with 7.4... I look forward to this greatly. Maybe here I'll have the chance to fix some problems for the greater community. >>- Tools which look like Access, to do row level data >>editing with no SQL. > > > http://www.pgadmin.org/ > > It looks more like SQL Server's Enterprise Manager but does most if not > all of what I expect you need. Sorry, my fault, an excellent program. > The tarball includes the complete documentation in HTML format, and in > pgAdmin there's a searchable copy in the main chm help file. But not the API's. Not in one central location. Some of it, the stuff I use, is on GBorg, and in inconsistent format. I have personally found some documentation very fragmented. So a subtle point about an ability is lost as I have assumed all comments to be in a few pages, and missed something vital or relevent in another sourse. Eg, see my comment at the end. But it's better than msdn :) >>- Data types like 'ENUM' which appeal to ametures. > > Isn't that just syntactic sugar for a column with a check for specific > values on it? Yes :) By point is not that PostgreSQL is lacking, only that the ameture finds others more friendly and inviting. Although this may be a point which is irrelevent? My personal 'gripe' was when reading through the postings, some people considered people who have not the time, patience or ability, to learn PostgreSQL completelly, somehow not worthy. I wanted to support us dumb users! :) >>- There are no administrative mandatorys. Eg, VACUUM. >>(A stand-alone >>commercial app, like an Email client, will be contrainted by >>having to >>be an app and a DBA in one.) > > PostgreSQL is by no means alone in this requirement. SQL Server for > example has 'optimizations' that are performed usually as part of a > scheduled maintenance plan and are analagous to vacuum in some ways. Is this a weekness in DBMS's that don't require this? (MySQL, Liant etc.) Is there a way of building a guarbage collector into the system? My Windows PC has no 'cron'. >>- The tables (not innodb) are in different files of the >>same name. >>Allowing the OS adminitrator great ability. EG, putting tables on >>separate partitions and therefore greatly speeding performance. > > One reason for not doing this is that a table in PostgreSQL might span > mutiple files if it exceeds a couple of gigs in size. They used multile files for tables, with a common pefix of the table name. But they have dropped this them selves now. I miss the way with MySQL I could delete a table, or move it, or back it up, manually using 'rm', 'mv' or 'cp'. Working with IDE drives on PC's, you can double the performace of a DB just by putting half the tables on a disk on another IDE chain. Adding a DB using 'tar' is very a powerful ability. But hay, if I missed it that much, I would not have moved! :) >>- They have extensive backup support. Including now, >>concurrent backup >>without user interuption or risk of inconsistency. > > > So does PostgreSQL (pg_dump/pg_dumpall). I have used this, and it's a great command. I could not work out from the documentation whether it takes a snapshot at the start time, or archives data at the time it find's it. The documentation (app-pg-dump.html). As the documentation does not clarify this very important point, I desided it's not safe to use when the system is in use. Can this command can be used, with users in the system making heavy changes, and when takes many hours to complete, does produce a valid and consistent backup? If so, you have all MySQL has here and in a more useful format. > PS, it's nice you decided not to go to the Dark Side :-) Thanks, Ben
On Thursday 17 April 2003 13:44, Ben Clewett wrote: > Hi Dave, > > A brief defence of my posting. After which I'll retire to my side of > the fence :) (snip) > >>- Data types like 'ENUM' which appeal to ametures. > > > > Isn't that just syntactic sugar for a column with a check for specific > > values on it? > > Yes :) By point is not that PostgreSQL is lacking, only that the > ameture finds others more friendly and inviting. > > Although this may be a point which is irrelevent? Probably ;-) because MySQL too lacks a few "user friendly" features (like boolean datatypes). Ian Barwick barwick@gmx.net
(please note that the word is "amateur" - it comes from French/Latin, meaning people who do things for the love of it). There are lots of cron clones for Windows - try a Google search. Also, there is a native Windows port of Pg 7.2.1 available - we have been using it for a couple of months now on a small project without a single hitch. Search this mailing list for details. Like you, we eagerly await the official Windows port in 7.4. (then we'll have shema, for example). Part of the problem that amateurs often face in dealing with things like a DBMS is that their lack of formal training leads them to expect things to work in some intuitive fashion, and they don't (for very good technical reasons). As someone who in a past life had to teach relational theory and practice, I can tell you that just getting across the idea of a Cartesian product can be quite hard. And as a former DBA I can tell you that even seasoned professional developers often don't/can't take the trouble to analyse what their queries are doing and why they demand so much in resources. Running a DBMS (*any* DBMS) which has significant requirements is unfortunately something that requires both understanding and experience. It never "just works". Finally, one of the important things for my particular situation, is that Pg comes with a BSDish license, which means we have no issues with bundling it. AFAIK it's pretty much alone in that. andrew ----- Original Message ----- From: "Ben Clewett" <B.Clewett@roadrunner.uk.com> To: "Dave Page" <dpage@vale-housing.co.uk>; <pgsql-hackers@postgresql.org> Sent: Thursday, April 17, 2003 7:44 AM Subject: Re: [HACKERS] For the ametures. (related to "Are we losing momentum?") > Hi Dave, > > A brief defence of my posting. After which I'll retire to my side of > the fence :) > > >>- A true Windows version which people can learn their craft on. > > > > Coming with 7.4... > > I look forward to this greatly. Maybe here I'll have the chance to fix > some problems for the greater community. > > >>- Tools which look like Access, to do row level data > >>editing with no SQL. > > > > > > http://www.pgadmin.org/ > > > > It looks more like SQL Server's Enterprise Manager but does most if not > > all of what I expect you need. > > Sorry, my fault, an excellent program. > > > The tarball includes the complete documentation in HTML format, and in > > pgAdmin there's a searchable copy in the main chm help file. > > But not the API's. Not in one central location. Some of it, the stuff > I use, is on GBorg, and in inconsistent format. I have personally found > some documentation very fragmented. So a subtle point about an ability > is lost as I have assumed all comments to be in a few pages, and missed > something vital or relevent in another sourse. Eg, see my comment at > the end. But it's better than msdn :) > > >>- Data types like 'ENUM' which appeal to ametures. > > > > Isn't that just syntactic sugar for a column with a check for specific > > values on it? > > Yes :) By point is not that PostgreSQL is lacking, only that the > ameture finds others more friendly and inviting. > > Although this may be a point which is irrelevent? > > My personal 'gripe' was when reading through the postings, some people > considered people who have not the time, patience or ability, to learn > PostgreSQL completelly, somehow not worthy. > > I wanted to support us dumb users! :) > > >>- There are no administrative mandatorys. Eg, VACUUM. > >>(A stand-alone > >>commercial app, like an Email client, will be contrainted by > >>having to > >>be an app and a DBA in one.) > > > > PostgreSQL is by no means alone in this requirement. SQL Server for > > example has 'optimizations' that are performed usually as part of a > > scheduled maintenance plan and are analagous to vacuum in some ways. > > Is this a weekness in DBMS's that don't require this? (MySQL, Liant > etc.) Is there a way of building a guarbage collector into the system? > My Windows PC has no 'cron'. > > >>- The tables (not innodb) are in different files of the > >>same name. > >>Allowing the OS adminitrator great ability. EG, putting tables on > >>separate partitions and therefore greatly speeding performance. > > > > One reason for not doing this is that a table in PostgreSQL might span > > mutiple files if it exceeds a couple of gigs in size. > > They used multile files for tables, with a common pefix of the table > name. But they have dropped this them selves now. > > I miss the way with MySQL I could delete a table, or move it, or back it > up, manually using 'rm', 'mv' or 'cp'. > > Working with IDE drives on PC's, you can double the performace of a DB > just by putting half the tables on a disk on another IDE chain. Adding > a DB using 'tar' is very a powerful ability. > > But hay, if I missed it that much, I would not have moved! :) > > >>- They have extensive backup support. Including now, > >>concurrent backup > >>without user interuption or risk of inconsistency. > > > > > > So does PostgreSQL (pg_dump/pg_dumpall). > > I have used this, and it's a great command. > > I could not work out from the documentation whether it takes a snapshot > at the start time, or archives data at the time it find's it. The > documentation (app-pg-dump.html). As the documentation does not clarify > this very important point, I desided it's not safe to use when the > system is in use. > > Can this command can be used, with users in the system making heavy > changes, and when takes many hours to complete, does produce a valid and > consistent backup? > > If so, you have all MySQL has here and in a more useful format. > > > PS, it's nice you decided not to go to the Dark Side :-) > > Thanks, Ben > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
> -----Original Message----- > From: Ben Clewett [mailto:B.Clewett@roadrunner.uk.com] > Sent: 17 April 2003 12:44 > To: Dave Page; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] For the ametures. (related to "Are we > losing momentum?") > > > > But not the API's. Not in one central location. Some of it, > the stuff > I use, is on GBorg, The API's that ship with the source (libpq et al.) are all documented in the HTML docs that ship with the code as far as I'm aware. I suspect what you look at on Gborg will be one or more of psqlODBC, Npgsql or libpqxx? These are seperate projects, and hence have their own documentation. I don't know about libpqxx, but the psqlODBC docs are very old I admit. If anyone cares to work on them, please let me know. The Npgsql stuff is all very new and very alpha and I guess of all the Npgsql hackers, I'm probably the only one who hangs around here - that's how seperate the project is from PostgreSQL itself. What we could probably use is a page on the main website highlighting all the programming interfaces - similar to http://www.postgresql.org/users-lounge/interfaces.html but a bit more prominent and focused. I'll put my third hat on now and repeat - if anyone cares to work on this, please let me know :-) > My personal 'gripe' was when reading through the postings, > some people > considered people who have not the time, patience or ability, > to learn > PostgreSQL completelly, somehow not worthy. > > I wanted to support us dumb users! :) That's certainly not the case for many of the people here, though you must remember, the vast majority of us work voluntarily and prefer to help users who have made an effort to help themselves first rather than those who expect us to do everything for them for free. Thankfully those people are few and far between, but they do crop up from time to time. > > PostgreSQL is by no means alone in this requirement. SQL Server for > > example has 'optimizations' that are performed usually as part of a > > scheduled maintenance plan and are analagous to vacuum in some ways. > > Is this a weekness in DBMS's that don't require this? (MySQL, Liant > etc.) Is there a way of building a guarbage collector into > the system? Potentially I guess, if they are cleaning up and trying to reuse space on the fly then they could suffer a performance hit. > My Windows PC has no 'cron'. No, but it probably has a Scheduled Tasks folder unless it's a really old version. > > Can this command can be used, with users in the system making heavy > changes, and when takes many hours to complete, does produce > a valid and > consistent backup? Yes, pg_dump will give you a consistent backup - this is from section 9.1 of the Administrators Guide in the Backup and Restore section: Dumps created by pg_dump are internally consistent, that is, updates to the database while pg_dump is running will not be in the dump. pg_dump does not block other operations on the database while it is working. (Exceptions are those operations that need to operate with an exclusive lock, such as VACUUM FULL.) Regards, Dave.
On Thu, Apr 17, 2003 at 11:44:07AM +0000, Ben Clewett wrote: > > I miss the way with MySQL I could delete a table, or move it, or back it > up, manually using 'rm', 'mv' or 'cp'. Under most circumstances, you can't do that _anyway_, because doing so will break stuff unless the postmaster is stopped. I agree that bing able to put tables and files on their own platters would be a Good Thing, but in order to make it really safe, it needs to be managed by the postmaster. Making this difficult is sort of a defence mechanism, therefore: if you make it too easy, people will be shooting themselves in the foot all the time. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Dave Page wrote: > > - They have extensive backup support. Including now, > > concurrent backup > > without user interuption or risk of inconsistency. > > So does PostgreSQL (pg_dump/pg_dumpall). I have applied the following doc patch to the pg_dump documentation to more clearly state that it can do consistent backups during concurrent access --- too many people weren't seeing that capability. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: pg_dump.sgml =================================================================== RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/pg_dump.sgml,v retrieving revision 1.60 diff -c -c -r1.60 pg_dump.sgml *** pg_dump.sgml 25 Mar 2003 16:15:42 -0000 1.60 --- pg_dump.sgml 17 Apr 2003 15:34:16 -0000 *************** *** 33,74 **** </title> <para> ! <application>pg_dump</application> is a utility for saving a ! <productname>PostgreSQL</productname> database into a script or an ! archive file. The script files are in plain-text format and ! contain the SQL commands required to reconstruct the database to ! the state it was in at the time it was saved. To restore these ! scripts, use <xref linkend="app-psql">. They can be used to ! reconstruct the database even on other machines and other ! architectures, with some modifications even on other SQL database ! products. </para> <para> ! Furthermore, there are alternative archive file formats ! that are meant to be used with <xref linkend="app-pgrestore"> to ! rebuild the database, and they also allow ! <application>pg_restore</application> to be selective about what is ! restored, or even to reorder the items prior to being restored. The ! archive files are also designed to be portable across architectures. </para> <para> When used with one of the archive file formats and combined with ! <application>pg_restore</application>, <application>pg_dump</application> provides a ! flexible archival and transfer mechanism. <application>pg_dump</application> can be used to ! backup an entire database, then <application>pg_restore</application> can ! be used to examine the archive and/or select which parts of the ! database are to be restored. ! The most flexible output file format is the <quote>custom</quote> ! format (<option>-Fc</option>). It allows for selection and ! reordering of all archived items, and is compressed by default. The ! <application>tar</application> format (<option>-Ft</option>) is not ! compressed and it is not possible to reorder data when loading, but ! it is otherwise quite flexible; moreover, it can be manipulated with ! other tools such as <command>tar</command>. </para> <para> --- 33,79 ---- </title> <para> ! <application>pg_dump</application> is a utility for backing up a ! <productname>PostgreSQL</productname> database. It makes consistent ! backups even if the database is being used concurrently. ! <application>pg_dump</application> does not block other users ! accessing the database (readers or writers). </para> <para> ! Dumps can be output in script or archive file formats. The script ! files are in plain-text format and contain the SQL commands required ! to reconstruct the database to the state it was in at the time it was ! saved. To restore these scripts, use <xref linkend="app-psql">. They ! can be used to reconstruct the database even on other machines and ! other architectures, with some modifications even on other SQL ! database products. ! </para> ! ! <para> ! The alternative archive file formats that are meant to be used with ! <xref linkend="app-pgrestore"> to rebuild the database, and they also ! allow <application>pg_restore</application> to be selective about ! what is restored, or even to reorder the items prior to being ! restored. The archive files are also designed to be portable across architectures. </para> <para> When used with one of the archive file formats and combined with ! <application>pg_restore</application>, ! <application>pg_dump</application> provides a flexible archival and transfer mechanism. <application>pg_dump</application> can be used to ! backup an entire database, then <application>pg_restore</application> ! can be used to examine the archive and/or select which parts of the ! database are to be restored. The most flexible output file format is ! the <quote>custom</quote> format (<option>-Fc</option>). It allows ! for selection and reordering of all archived items, and is compressed ! by default. The <application>tar</application> format ! (<option>-Ft</option>) is not compressed and it is not possible to ! reorder data when loading, but it is otherwise quite flexible; ! moreover, it can be manipulated with other tools such as ! <command>tar</command>. </para> <para> *************** *** 77,88 **** light of the limitations listed below. </para> - <para> - <application>pg_dump</application> makes consistent backups even if the - database is being used concurrently. <application>pg_dump</application> - does not block other users accessing the database (readers or - writers). - </para> </refsect1> <refsect1 id="pg-dump-options"> --- 82,87 ----
Ben Clewett <B.Clewett@roadrunner.uk.com> writes: > > So does PostgreSQL (pg_dump/pg_dumpall). > > I have used this, and it's a great command. > > I could not work out from the documentation whether it takes a > snapshot at the start time, or archives data at the time it find's it. > The documentation (app-pg-dump.html). As the documentation does not > clarify this very important point, I desided it's not safe to use when > the system is in use. Ummm, quoting from the pg_dump manpage: pg_dump makes consistent backups even if the database is being used concurrently. pg_dump does not block other users accessing the database (readers or writers). What part of this isn't clear? It's safe. pg_dump does all its work inside a transaction, so MVCC rules automatically guarantee that it sees a consistent snapshot. > Can this command can be used, with users in the system making heavy > changes, and when takes many hours to complete, does produce a valid > and consistent backup? Absolutely. > If so, you have all MySQL has here and in a more useful format. I think MySQL's consistent hot backup has to lock tables, while PG's doesn't... -Doug
Doug McNaught wrote: > Ben Clewett <B.Clewett@roadrunner.uk.com> writes: > > > > > So does PostgreSQL (pg_dump/pg_dumpall). > > > > I have used this, and it's a great command. > > > > I could not work out from the documentation whether it takes a > > snapshot at the start time, or archives data at the time it find's it. > > The documentation (app-pg-dump.html). As the documentation does not > > clarify this very important point, I desided it's not safe to use when > > the system is in use. > > Ummm, quoting from the pg_dump manpage: > > pg_dump makes consistent backups even if the database is > being used concurrently. pg_dump does not block other > users accessing the database (readers or writers). > > What part of this isn't clear? > > It's safe. pg_dump does all its work inside a transaction, so MVCC > rules automatically guarantee that it sees a consistent snapshot. Too many people have missed that point --- it was too far down in the manual page, after a long discussion about output formats. Now it is right in the first paragraph, which should eliminate that question --- it was almost becoming an FAQ. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Thu, 17 Apr 2003, Dave Page wrote: > > - Data types like 'ENUM' which appeal to ametures. > > Isn't that just syntactic sugar for a column with a check for specific > values on it? I believe it's actually different. In PostgreSQL you'd use a VARCHAR column with CHECK constraints, which means there are actual possibly lengthy strings in the database. In MySQL's ENUM, the table structure maps a particular string to a bit pattern, so if you have two possible values, 'superdogfood' and 'onetwothreefourfivesixseven', your column will only take 1 bit + overhead. Obviously no big deal until you get a few dozen possibilities. This is also what allows the SET type to work -- it's a set of binary flags for a named list of elements. The docs are here: http://www.mysql.com/documentation/mysql/bychapter/manual_Reference.html#ENUM I don't like the fact that numbers don't really work (being used as indices rather than names), that case isn't tolerated, that invalid entries go in as empty strings, etc., so I certainly wouldn't want to see them emulated exactly in PostgreSQL, but I imagine that ENUM could save a lot of disk space in certain circumstances, and SET seems useful. Jon
Ben Clewett writes: > I could not work out from the documentation whether it takes a snapshot > at the start time, or archives data at the time it find's it. The > documentation (app-pg-dump.html). As the documentation does not clarify > this very important point, I desided it's not safe to use when the > system is in use. > > Can this command can be used, with users in the system making heavy > changes, and when takes many hours to complete, does produce a valid and > consistent backup? From the pg_dump reference page: <para> <application>pg_dump</application> makes consistent backups even if the database is being used concurrently. <application>pg_dump</application> does not block other users accessing the database (readers or writers). </para> From the chapter Backup and Restore: <para> Dumps created by <application>pg_dump</> are internally consistent, that is, updates to the database while <application>pg_dump</>is running will not be in the dump. <application>pg_dump</> does not block other operations on thedatabase while it is working. (Exceptions are those operations that need to operate with an exclusive lock, such as<command>VACUUM FULL</command>.) </para> -- Peter Eisentraut peter_e@gmx.net
----- Original Message ----- From: "Ben Clewett" <B.Clewett@roadrunner.uk.com> > >>- There are no administrative mandatorys. Eg, VACUUM. > >>(A stand-alone > >>commercial app, like an Email client, will be contrainted by > >>having to > >>be an app and a DBA in one.) > > > > PostgreSQL is by no means alone in this requirement. SQL Server for > > example has 'optimizations' that are performed usually as part of a > > scheduled maintenance plan and are analagous to vacuum in some ways. > > Is this a weekness in DBMS's that don't require this? (MySQL, Liant > etc.) Is there a way of building a guarbage collector into the system? > My Windows PC has no 'cron'. Work is being done to build vacuum into the backend so that cron is not required. Hopefully will be in 7.4 > >>- The tables (not innodb) are in different files of the > >>same name. > >>Allowing the OS adminitrator great ability. EG, putting tables on > >>separate partitions and therefore greatly speeding performance. > > > > One reason for not doing this is that a table in PostgreSQL might span > > mutiple files if it exceeds a couple of gigs in size. > > Working with IDE drives on PC's, you can double the performace of a DB > just by putting half the tables on a disk on another IDE chain. Adding > a DB using 'tar' is very a powerful ability. You can do this using symlinks, but you do have to shut down the postmaster before you play with the files directly. > >>- They have extensive backup support. Including now, > >>concurrent backup > >>without user interuption or risk of inconsistency. > > > > So does PostgreSQL (pg_dump/pg_dumpall). > > I have used this, and it's a great command. > > I could not work out from the documentation whether it takes a snapshot > at the start time, or archives data at the time it find's it. The > documentation (app-pg-dump.html). As the documentation does not clarify > this very important point, I desided it's not safe to use when the > system is in use. > > Can this command can be used, with users in the system making heavy > changes, and when takes many hours to complete, does produce a valid and > consistent backup? Yes it takes a snapshot from when it starts dumping the database, so it's consistent no matter how much activity is going on after you start pg_dump.
On Thu, 17 Apr 2003, Dave Page wrote: > Hi Ben > > > - Data types like 'ENUM' which appeal to ametures. > > Isn't that just syntactic sugar for a column with a check for specific > values on it? Yes. But isn't serial just syntactic sugar for create sequence seqname; then including that sequence in a default clause? Personally, i could easily see a use for enum() covering the check in constraint. MySQL users can use enum() at their leisure, postgresql users can pick if up if they want to, and underneath it all is a check constraint the user can see with /dt just like with serial. > > - There are no administrative mandatorys. Eg, VACUUM. > > (A stand-alone > > commercial app, like an Email client, will be contrainted by > > having to > > be an app and a DBA in one.) > > PostgreSQL is by no means alone in this requirement. SQL Server for > example has 'optimizations' that are performed usually as part of a > scheduled maintenance plan and are analagous to vacuum in some ways. But at the same time, it's one of those gotchas that would be nice to get rid of for people who just want a simple little database on their workstation. It might be nice to have some kind of lazy auto vacuum daemon installed by default and configured to run every hour or so according to postgresql.conf. > > - The tables (not innodb) are in different files of the > > same name. > > Allowing the OS adminitrator great ability. EG, putting tables on > > separate partitions and therefore greatly speeding performance. > > One reason for not doing this is that a table in PostgreSQL might span > mutiple files if it exceeds a couple of gigs in size. And let's face it, if we get tablespaces implemented, this kind of thing goes awawy.
Matthew T. O'Connor wrote: > ----- Original Message ----- > From: "Ben Clewett" <B.Clewett@roadrunner.uk.com> >>Working with IDE drives on PC's, you can double the performace of a DB >>just by putting half the tables on a disk on another IDE chain. > > > You can do this using symlinks, but you do have to shut down the postmaster > before you play with the files directly. I was hoping this was the case. :) From my data/base directory, I have a tree structure of numbered files of no obvious structure. As well as some smaller directories, 'global', 'pg_xlog' and 'pg_clog'. If I wanted to divide the postmaster read() calls evenly to files located over several physical disks, how would you suggest distributing the data-space? Would it be as simple as putting each child directory in 'data/base' on a different physical disk in a round-robbin fasion using symbolic links: Or is it more involved... data/base/1 -> /dev/hda data/base/2 -> /dev/hdb data/base/3 -> /dev/hdc data/base/4 -> /dev/hda data/base/5 -> /dev/hdb data/base/6 -> /dev/hdc (etc) (I have made the assumption that the postmaster serves different connections in parallel, otherwise this would have little effect :) Thanks, Ben
On Tuesday 22 April 2003 13:55, Ben Clewett wrote: > If I wanted to divide the postmaster read() calls evenly to files > located over several physical disks, how would you suggest distributing > the data-space? Would it be as simple as putting each child directory > in 'data/base' on a different physical disk in a round-robbin fasion > using symbolic links: Or is it more involved... > > data/base/1 -> /dev/hda > data/base/2 -> /dev/hdb > data/base/3 -> /dev/hdc > data/base/4 -> /dev/hda > data/base/5 -> /dev/hdb > data/base/6 -> /dev/hdc (etc) Don't bother splitting across disks unless you put them on different IDE channels as IDE channel bandwidth is shared. If you have that many disk, put them on IDE RAID. That is a much simpler solution. Shridhar
On Tue, 2003-04-22 at 04:25, Ben Clewett wrote: > Matthew T. O'Connor wrote: > From my data/base directory, I have a tree structure of numbered files > of no obvious structure. As well as some smaller directories, 'global', > 'pg_xlog' and 'pg_clog'. > > If I wanted to divide the postmaster read() calls evenly to files > located over several physical disks, how would you suggest distributing > the data-space? Would it be as simple as putting each child directory > in 'data/base' on a different physical disk in a round-robbin fasion > using symbolic links: Or is it more involved... > > data/base/1 -> /dev/hda > data/base/2 -> /dev/hdb > data/base/3 -> /dev/hdc > data/base/4 -> /dev/hda > data/base/5 -> /dev/hdb > data/base/6 -> /dev/hdc (etc) > > (I have made the assumption that the postmaster serves different > connections in parallel, otherwise this would have little effect :) Yes connections are served in parallel. The best way to split the files is something you have to figure out, probably based on usage. The round robin directory method you mentioned above falls down in that it only splits whole databases into different locations regardless of how much I/O is related to those databases. You may wind up with inactive databases on their own disk which would yield no performance gain. It's also probably better to get down to the file / index level rather than whole databases as you may have a few tables that get 90% of the work. Hopefully some of that was helpful.
> On Tuesday 22 April 2003 13:55, Ben Clewett wrote: >> If I wanted to divide the postmaster read() calls evenly to files >> located over several physical disks, how would you suggest >> distributing the data-space? Would it be as simple as putting each >> child directory in 'data/base' on a different physical disk in a >> round-robbin fasion using symbolic links: Or is it more involved... >> >> data/base/1 -> /dev/hda >> data/base/2 -> /dev/hdb >> data/base/3 -> /dev/hdc >> data/base/4 -> /dev/hda >> data/base/5 -> /dev/hdb >> data/base/6 -> /dev/hdc (etc) > > Don't bother splitting across disks unless you put them on different > IDE channels as IDE channel bandwidth is shared. While that is electricaly "true" it is not completely true. Modern IDE hard disks are very advanced with large read-ahead caches. That combined with IDE-DMA access, low seek times, faster spin rates, means you can get performance across two IDE drives on the same channel. For instance, two databases, one on HDA and the other database on HDB. Successive reads inteleaved HDA/HDB/HDA/HDB etc. will share electical bandwidth (as would SCSI). AFAIK, there is no standard asynchronous command structure for IDE, however, the internal read-ahead cache on each drive will usually have a pretty good guess at the "next" block based on some predictive caching algorithm. So, the "next" read from the drive has a good chance at coming from cache. Plus the OS may "scatter gather" larger requests into smaller successive requests (so a pure "read-ahead" will work great). Then consider write-caching (if you dare). It is very true you want to have one IDE drive per IDE channel, but these days two drives on a channel are not as bad as it once was. This is not due to shared electrical bandwidth of the system (all bus systems suffer this) but because of the electrical protocol to address the drives. ATA and EIDE have made strides in this area. > > If you have that many disk, put them on IDE RAID. That is a much > simpler solution. A hardware RAID system is obviously an "easier" solution, and www.infortrend.com makes a very cool system, but spreading multiple databases across multiple IDE drives and controllers will probably provide higher overall performance if you have additional IDE channels instead of forcing all the I/O through one controller (IDE or SCSI) channel. Pretty good PCI/EIDE-DMA controllers are cheap, $50~$100, and you can fit a bunch of them into a server system. Provided your OS has a reentrent driver model, it should be possible for PostgreSQL to be performing as many I/O operations concurrently as you have drive controllers, where as with an IDE->SCSI raid controller, you may still be limited to how good your specific driver handles concurrency within one driver instance. The "best" solution is one hardware raid per I/O channel per database, but that is expensive. One IDE driver per IDE channel per database is the next best thing. Two IDE drives per channel, one drive per database, is very workable if you make sure that the more active databases are on separate controllers.
Ben Clewett <B.Clewett@roadrunner.uk.com> writes: > If I wanted to divide the postmaster read() calls evenly to files > located over several physical disks, how would you suggest distributing > the data-space? AFAIK, the single biggest win you can get in this dimension is to put the WAL log ($PGDATA/pg_xlog/) on a separate spindle from everything else. At least for write-intensive databases, that can buy you something like 2x improvement for the price of one easy symlink. After that, the conventional wisdom is to put indexes on a third spindle (separate from base tables and from xlog). But the bookkeeping and maintenance effort needed for that is really too high to make it worth worrying about, IMHO :-(. Eventually we will have some kind of tablespace feature to make it easy. My recommendation at the moment would be: WAL on dedicated spindle, everything else on the best RAID array you can set up. And buy as much RAM as you can afford. See past discussions in pgsql-performance for more info. regards, tom lane
On Tuesday 22 April 2003 11:18, Tom Lane wrote: > My recommendation at the moment would be: WAL on dedicated spindle, > everything else on the best RAID array you can set up. And buy as much > RAM as you can afford. Hmm. Is this safe? Let's assume for argument's sake that the RAID is 100% reliable and that the local spindle will eventually fail. Can I lose data by leaving WAL on the local spindle? Or are you suggesting two full RAID systems? -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > Hmm. Is this safe? Let's assume for argument's sake that the RAID is 100% > reliable and that the local spindle will eventually fail. Can I lose data by > leaving WAL on the local spindle? Or are you suggesting two full RAID > systems? A two-disk mirror for the WAL would be good, yes; it needs redundancy just as much as the main storage. But the discussion was about performance not reliability; the OP hadn't mentioned that he wanted any redundant storage. regards, tom lane
On Tuesday 22 April 2003 12:51, Tom Lane wrote: > A two-disk mirror for the WAL would be good, yes; it needs redundancy > just as much as the main storage. But the discussion was about > performance not reliability; the OP hadn't mentioned that he wanted any > redundant storage. Ah. When you mentioned RAID I thought you were going after reliability. So I am probably best in my environment to just leave everything on the RAID device and let it do it's job as best that it can. It does seem to work faster than using local storage, even SCSI. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.