Thread: Perceived weaknesses of postgres
http://www.theserverside.com/news/thread.tss?thread_id=48339 The interesting part is where somebody asks why NOT use postgres, and it's answers could give some additional hints to those interested on what people find missing from postgres to adopt it. Just to summarize some of the answers: * major PITA to upgrade between major versions; * executing a single query on multiple cpus/cores; * no direct table cache control; * accent-insensitive text comparisons; * fast select count(*); Wrong assumptions (but people seem to be sure it's like this): * no hotbackup except pg_dump (wrong: there are in fact a few different ways to do that); * pg_dump the only way to cleanly upgrade (wrong: slony is good for that); * missing 2 phase commit (wrong: it is now implemented); * inadequate performance with really large databases (wrong: there are known examples of really large postgres DBs); There are other claims like (quoting): "RAC, enterprise backup capabilities, database on raw partition, compatibility with enterprise storage (SAN,...)" which I don't know if there are adequate solutions for postgres or not. Cheers, Csaba.
On Feb 13, 2008 10:49 AM, Csaba Nagy <nagy@ecircle-ag.com> wrote: > http://www.theserverside.com/news/thread.tss?thread_id=48339 > > The interesting part is where somebody asks why NOT use postgres, and > it's answers could give some additional hints to those interested on > what people find missing from postgres to adopt it. > > Just to summarize some of the answers: > * major PITA to upgrade between major versions; Would be nice, though I must say that while the data migration is a pain, the SQL compatibility PostgreSQL provides is a blessing. Other open source RDBMS have major PITA changing queries in applications. ;-))) > * executing a single query on multiple cpus/cores; I wonder if our most popular open source rival can do it. I have heard people claiming so but I would not consider them authoritative. :) > * no direct table cache control; Could you elaborate more on this one? > * accent-insensitive text comparisons; Yesss. That would be quite useful. A more generic problem would be multiple collations in one cluster. Hopefully at column level. > * fast select count(*); > Wrong assumptions (but people seem to be sure it's like this): > * no hotbackup except pg_dump (wrong: there are in fact a few different > ways to do that); ...and pg_dump is not exactly hot. And PITR using log archiving is a great thing. I've set up a backups using Tivoli Storage Manager, and it works quite fine. BTW there is an urban legend, that Oracle stops writing to its datafiles when you ALTER TABLESPACE into backup mode. This is not true, they are modifying the files quite same as we do. :-) > * pg_dump the only way to cleanly upgrade (wrong: slony is good for > that); Slony is good as long as there are no DDLs issued. And its easy to shoot oneself in the foot if one is not careful (some time ago I have lost all the triggers while upgrading from 8.1 to 8.2; it was my fault since I did pg_dump -s on a slave database, not on the master...). This could be solved with help of DDL/DCL triggers though, which would be helpful for other things too, I guess. > * missing 2 phase commit (wrong: it is now implemented); > * inadequate performance with really large databases (wrong: there are > known examples of really large postgres DBs); > > There are other claims like (quoting): "RAC, enterprise backup > capabilities, database on raw partition, compatibility with enterprise > storage (SAN,...)" which I don't know if there are adequate solutions > for postgres or not. Regards, Dawid
On Wed, 2008-02-13 at 13:29 +0100, Dawid Kuroczko wrote: > > * no direct table cache control; > > Could you elaborate more on this one? Well, I was just summarizing what other people wrote :-) But I guess they refer to table level control of how much cache memory to use. I think there are DBMSs where you can tune the cache usage based on tables/table groups, so you can make sure the important tables (where you want instant response even if the table was not accessed in the last hours) are always in cache. Currently that's not possible in postgres - it is true that the cache memory will be used in an optimal way regarding mean performance, but sometimes the mean performance is just good enough even if not optimal, and you would like to eliminate worst case scenarios for some queries. In that case more control (on the cache strategy too) is better... Cheers, Csaba.
On Wed, 2008-02-13 at 13:39 +0100, Csaba Nagy wrote: > On Wed, 2008-02-13 at 13:29 +0100, Dawid Kuroczko wrote: > > > * no direct table cache control; > > > > Could you elaborate more on this one? > OK, re-reading what I just wrote makes me think it was not clear enough: I think they mean you can _reserve_ some cache memory directly to specific tables/table groups so that reserved cache is only used by those tables, and thus will not be sensitive to other activities than the access to those tables. Particularly a sequential scan on another, big, table will not touch that reserved cache, or any other big sweep of data access on other tables. Not sure this time I got it actually clearer though :-) Cheers, Csaba.
Csaba Nagy wrote: > http://www.theserverside.com/news/thread.tss?thread_id=48339 > > The interesting part is where somebody asks why NOT use postgres, and > it's answers could give some additional hints to those interested on > what people find missing from postgres to adopt it. > > Just to summarize some of the answers: > * major PITA to upgrade between major versions; I wonder which other databases (also commercial) can do that. > * executing a single query on multiple cpus/cores; > * no direct table cache control; > * accent-insensitive text comparisons; multiple collations would be nice... just accent-insensitive would not make much sense imho. > * fast select count(*); SELECT count(*) WHERE False; is very fast for me ;) Greets :-) Tino
Dawid Kuroczko wrote: > On Feb 13, 2008 10:49 AM, Csaba Nagy <nagy@ecircle-ag.com> wrote: >> http://www.theserverside.com/news/thread.tss?thread_id=48339 >> >> The interesting part is where somebody asks why NOT use postgres, and >> it's answers could give some additional hints to those interested on >> what people find missing from postgres to adopt it. >> >> Just to summarize some of the answers: >> * major PITA to upgrade between major versions; > > Would be nice, though I must say that while the data migration is a pain, the > SQL compatibility PostgreSQL provides is a blessing. Other open source > RDBMS have major PITA changing queries in applications. ;-))) I don't think these people are comparing to other opensource ones... They're comparing to the commercial ones (at least in this case) >> * executing a single query on multiple cpus/cores; > > I wonder if our most popular open source rival can do it. I have > heard people claiming > so but I would not consider them authoritative. :) Again, the commercial ones do, and that's what we're compared to... >> * no direct table cache control; > > Could you elaborate more on this one? I would guess they're referring to the ability to "pin" a table into memory, so that it always stays in the cache regardless of what else the database is doing. There is a narrow use-case where this can be very useful, but it can also be a very dangerous tool (hint: if you pin a table that grows up to say 80-90% of your RAM size, your database will not be fast for anything else) >> * pg_dump the only way to cleanly upgrade (wrong: slony is good for >> that); > > Slony is good as long as there are no DDLs issued. And its easy to > shoot oneself in the foot if one is not careful (some time ago I have > lost all the triggers while upgrading from 8.1 to 8.2; it was my fault > since I did pg_dump -s on a slave database, not on the master...). You can do DDL with Slony, but it requires some planning (DDLSCRIPT). Doesn't solve all problems, but it does solve some. //Magnus
On Wed, 2008-02-13 at 13:56 +0100, Magnus Hagander wrote: > I don't think these people are comparing to other opensource ones... > They're comparing to the commercial ones (at least in this case) Yes, that's definitely the case. And that can actually be taken as a compliment to the already attained status of postgres... > I would guess they're referring to the ability to "pin" a table into > memory, so that it always stays in the cache regardless of what else the > database is doing. There is a narrow use-case where this can be very > useful, but it can also be a very dangerous tool (hint: if you pin a > table that grows up to say 80-90% of your RAM size, your database will > not be fast for anything else) If you can not only pin the complete table, but just allocate a fixed amount of memory for caching _only that table_, then you get a compromise which would be better than the 2 extremes: the table is cached in a mixed bag together with all the rest, resulting in effectively being cached 0-100% depending on all the activities on the DB, or it is always 100% cached independently of the any activities in the DB. The intermediate solution is: the table is always cached using x amount of memory independently of what happens on the DB. That makes sure the table is completely cached if the allocated memory is enough, and at least the most used records are always cached if the table grows bigger than the allocated cache, while there's no risk of overrunning the available memory. This could also be applied to groups of tables. Cheers, Csaba.
Dawid Kuroczko wrote: > Slony is good as long as there are no DDLs issued. And its easy to > shoot oneself in the foot if one is not careful (some time ago I have > lost all the triggers while upgrading from 8.1 to 8.2; it was my fault > since I did pg_dump -s on a slave database, not on the master...). > > It can also be a very time consuming process. It often also involves having to juggle resources around, find extra disk space floating around somewhere to build extra replicas, or abandon your fail over target for a period of time while upgrade that. BTW, the trigger issue goes away with 8.3 and Slony 2.0. Brad.
Magnus Hagander wrote: > I would guess they're referring to the ability to "pin" a table into > memory, so that it always stays in the cache regardless of what else the > database is doing. There is a narrow use-case where this can be very > useful, but it can also be a very dangerous tool (hint: if you pin a > table that grows up to say 80-90% of your RAM size, your database will > not be fast for anything else) I know that MS removed this ability in SQL Server 2005 for pretty much this reason; it's usefulness was greatly outweighed by people screwing up their systems by not calculating things correctly.
Stephen Cook wrote: > Magnus Hagander wrote: >> I would guess they're referring to the ability to "pin" a table into >> memory, so that it always stays in the cache regardless of what else >> the database is doing. There is a narrow use-case where this can be >> very useful, but it can also be a very dangerous tool (hint: if you >> pin a table that grows up to say 80-90% of your RAM size, your >> database will not be fast for anything else) > > I know that MS removed this ability in SQL Server 2005 for pretty much > this reason; it's usefulness was greatly outweighed by people screwing > up their systems by not calculating things correctly. > What they removed was "dbcc pintable", which would specify that data pages for a table should be pinned in the general buffer cache as they are requested. This feature didn't allow you to divide up your buffer cache, and so this rather pointless feature went away in SQL Server 2005. A few large, active pinned tables is obviously going to really wreck performance for most databases. What SQL Server never had is more like what you get with Sybase, where you can partition your buffer cache into different regions of whatever fraction of the overall buffer cache you wish. This is IMHO a far more useful implementation. You can specify which regions a particular (set of) tables should use. You can further specify different page sizes for each buffer (say 2k pages for the intensely oltp stuff, and 16k pages for the more olapy things). You don't end up trying to leave *every* page of a table in memory this way, since LRU (or whatever method) will still recycle pages as needed within a named cache. This was all there in version 11 of the product, which was the last one I ever worked with (and is very dated at this point). This feature never made it to SQL Server since Microsoft went off and did their own thing well before this. It's more this Sybase-type implementation I assumed the original poster was asking about? You can do something kind of similar in MySQL with the MyISAM storage engine, but I've not heard of too many shops actually doing this (who knows). The MySQL manual seems to strongly recommend it, anyway. Paul