Thread: Limitations on PGSQL
At 12:04 05/11/01 +0530, you wrote: >Hi, > Can someone plz to do specify the features and more important the > limitations in using >PostgreSQL. More info regarding performace etc shall be of immense help >Regards >Bv :-) Hello Balaji, There are no real limitations when using PostgreSQL smart programming features: views, triggers, rules, types and plpgsql server-side language. For example: 1) FAST READINGS: triggers can store display values instead of performing several LEFT JOINS or calling PL/pgSQL functions. Similarly, you can use triggers to perform complex initialization or maintain consistency when adding/modifying a record. Cron jobs and functions can perform queries and store results for instant results (ex: statistics tables).This makes your database very fast in complex readings (ex: web environment). This concept of storing values is the base of optimization. 2) SAFETY: postgreSQL is a real transactional system. When using a combination of views and rules, you can control data modification very neatly. Example: you can define a sub-select of a table and control the scope of queries. This is very important in a commercial environment when you data is valuable and must not be deleted or modified given a set of rules. 3) CODING: server-side coding is mainly performed in PL/pgSQL, a very easy and powerful server-side language. This is paradise if you are a programmer. IMHO, the only few drawbacks are: 1) TABLE DEFINITION: it is Impossible to delete a column or to promote/demote a column type. You have to drop the table and import old values into a new table. This makes life harder when working on large databases. You are always afraid of loosing your data. Even with backups, it is always 'heart breaking' to modify a table. You have to perform tests to ensure all data is there and safe. 2) VIEWS/TRIGGERS cannot be modified. You have to drop them and create them again. This makes programming a little bit tricky. Further more, if you create a view, let's say "SELECT table1.*, table2.* FROM table1 a LEFT JOIN table2 b on a.oid=b.oida", the resulting view displays all fields, hence making it harder for a non programmer to read view content. This is very little drawback compared to power and reliability of PostgreSQL. Best regards, Jean-Michel POURE
IMHO Postgres' drawbacks are the following: Severely limited access/grants system - postgres gives little or no control over anything beyond controlling access to wholetables. -Yes you can create views but views have a couple of drawbacks too... This is especially a problem with regardto functions (no trusted functions). Long connection time - if you are using the web you will have to use some sort of persistant scheme e.g. Apache::DBI otherwiseyou will handle around 5 requests per sec on a decent computer. I wonder whether it would be possible for it toeither reconnect, keeping the connection to a new database or user, or reuse it's kids - like Apache. No schema/tablespaces/cross-database access (- And it's listed on EXOTIC :() - You can emulate some of these features yet it's not the same. Error messages take a long time to get used to and generally figuring things out may take some time (at least for me) If you create a function/trigger/view/rule etc. which accesses a table, and then you drop that table, and recreate it, youmay have to recreate the function etc. It's advantages are: Runs on practically any platform (I run OpenBSD so it matters). Supports triggers, rules (statement level triggers), views and stored procedures! fast - my queries - which may be quite complex at times, are generally fast, and if they are not I can always speed themup with EXPLAIN, indexes, triggers creating derived tables and so on. Did I say stored procedures? License - Do ANYTHING you want with it (more or less) not as communistic as the obiquitous GPL. Price - Depending on your internet connection generally less than $0.02... Great community - Does not mind answering questions and seems to forgive quickly as well. Write Ahead logging, and many other functions I haven't really exploited yet. Regards, Aasmund On Mon, 05 Nov 2001 11:33:48 +0100, Jean-Michel POURE <jm.poure@freesurf.fr> wrote: > At 12:04 05/11/01 +0530, you wrote: > > Hello Balaji, > > There are no real limitations when using PostgreSQL smart programming > features: views, triggers, rules, types and plpgsql server-side language. > > For example: > > 1) FAST READINGS: triggers can store display values instead of performing > several LEFT JOINS or calling PL/pgSQL functions. Similarly, you can use > triggers to perform complex initialization or maintain consistency when > adding/modifying a record. Cron jobs and functions can perform queries and > store results for instant results (ex: statistics tables).This makes your > database very fast in complex readings (ex: web environment). This concept > of storing values is the base of optimization. > 2) SAFETY: postgreSQL is a real transactional system. When using a > combination of views and rules, you can control data modification very > neatly. Example: you can define a sub-select of a table and control the > scope of queries. This is very important in a commercial environment when > you data is valuable and must not be deleted or modified given a set of rules. > 3) CODING: server-side coding is mainly performed in PL/pgSQL, a very easy > and powerful server-side language. > > This is paradise if you are a programmer. IMHO, the only few drawbacks are: > > 1) TABLE DEFINITION: it is Impossible to delete a column or to > promote/demote a column type. You have to drop the table and import old > values into a new table. This makes life harder when working on large > databases. You are always afraid of loosing your data. Even with backups, > it is always 'heart breaking' to modify a table. You have to perform tests > to ensure all data is there and safe. > > 2) VIEWS/TRIGGERS cannot be modified. You have to drop them and create them > again. This makes programming a little bit tricky. Further more, if you > create a view, let's say "SELECT table1.*, table2.* FROM table1 a LEFT JOIN > table2 b on a.oid=b.oida", the resulting view displays all fields, hence > making it harder for a non programmer to read view content. > > This is very little drawback compared to power and reliability of PostgreSQL. > > Best regards, > Jean-Michel POURE > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) Aasmund Midttun Godal aasmund@godal.com - http://www.godal.com/ +47 40 45 20 46
>Long connection time - if you are using the web you will have to use some >sort of persistant scheme e.g. Apache::DBI otherwise you will handle >around 5 requests per sec on a decent computer. I wonder whether it would >be possible for it to either reconnect, keeping the connection to a new >database or user, or reuse it's kids - like Apache. Php allows persistent connections. Don't you think? http://uk.php.net/manual/en/configuration.php#ini.sect.pgsql Postgres Configuration Directives pgsql.allow_persistent boolean Whether to allow persistent Postgres connections. pgsql.max_persistent integer The maximum number of persistent Postgres connections per process. pgsql.max_links integer The maximum number of Postgres connections per process, including persistent connections. Best regards, Jean-Michel POURE
First time user of Postgresql.... After created the database, how do I check what foreign keys (constraint references in Postgresql term) were created? I looked around using "psql" and "pgaccess", but no success, Thanks for the help, Sam,
On Mon, 21 Jan 2002, Sam Cao wrote: > First time user of Postgresql.... > After created the database, how do I check what foreign keys (constraint > references in Postgresql term) were created? I looked around using > "psql" and "pgaccess", but no success, Best thing to look at probably is the "Referential Integrity Tutorial & Hacking the Referential Integrity Tables" tutorial at http://techdocs.postgresql.org/ I believe that includes a view definition that gets alot of that information out.
I'm running postgres 7.1.2 on a freebsd machine -- Celeron 500 with 128 megs of ram (256 swap). Not the best for a real gut wrenching machine, but what was around to get the feel of what was wanted. A question was asked which i through to the database to see how it was able to handle the question at hand and it failed . . . after 50 minutes of processing it flopped to the ground killed: out of swap space. Granted the query was a large one (explanations below) but a few questions.. Is there a way to predict the requirements a system would need to handle a query of specific size / complexity? (and how?) Is there a way to pull this type of query off on this system? (is there a solution other than throw more ram / swap at it?) (one would easily be to handle it in chunks, but other suggestions are welcome) What would this type of query need to execute? How about to execute well? Table and query explanations follow... The query was joining three tables, which i know is not quite a good idea, but didn't see much of another way. The question was posed to find all the subcategories all customers have ordered from a company. The history table (history of orders) contains the id, date, cost, and orderid and has 838500 records. The ordered table (line items of orders) contains the orderid and a sku and has 2670000 records The subcategories table has the sku and subcategory and has 20000 records. each customer can have many orders which can have many items which can have many subcategories. the query was posed as: SELECT history.id, sub FROM insub WHERE history.orderid = ordered.orderid AND ordered.items = insub.sku ORDER BY ID; Any help would be greatly appreciated. Thanks in advance. .jtp
jtp <john@akadine.com> writes: > A question was asked which i through to the database to see how it was > able to handle the question at hand and it failed . . . after 50 minutes > of processing it flopped to the ground killed: out of swap space. My guess is that what actually bombed out was psql, which tries to buffer the entire result of a query. (Well, actually it's libpq not psql that does that, but anyway the client side is what's failing.) I suspect that your query is insufficiently constrained and will return many millions of rows --- are you sure you have the WHERE clauses right? If you actually do need to process a query that returns gazillions of rows, the best bet is to declare a cursor so you can fetch the result in bite-size chunks, say a few hundred rows at a time. regards, tom lane
Hi, just a general design question and wondering how postgres would handle either situation. I have a gobb of information (400000+ records) on individual accounts. I need to store all of their personal information (name, adress, etc) as well as all of their more dynamic company information (last purchase, times ordered, etc). One: All their dynamic information can be rebuilt from other tables, but it will be called upon rather frequently, so the redundency so as to not have to rebuild on every call seems acceptable by me. (smack me if i'm wrong) Two: There is only a one to one ration between an account (personal information) and that account's account information (makes sense, eh?). But does it make sense to keep this information in the same table or to break it up? I estimate about 20 fields in two separate tables or 40 in one big one. The personal information will almost always be index searched by name or zipcode. Whereas the other information they (they proverbial they) will probably want sorted in weirdass ways that the design was never intended for. Basically, it will be be subjected to more sequential scans than something with close to a half million records should be. My basic question ends up being: does postgres handle sequntial scans across tables with fewer fields better? Is there any performance increase by separating this into two tables? Thanks for any hints you could give me. .jtp
On Fri, 19 Apr 2002, jtp wrote: > One: All their dynamic information can be rebuilt from other tables, > but it will be called upon rather frequently, so the redundency so as to > not have to rebuild on every call seems acceptable by me. (smack me if i'm > wrong) It's quite reasonable to keep a summary table of information for fast reference. The only difficulty you have to deal with is how you keep it up to date. (Update every time the summarized data change? Update once an hour? Once a day? That kind of thing. It depends on your application.) > My basic question ends up being: does postgres handle > sequntial scans across tables with fewer fields better? Definitely. Given the same number of rows, a narrower table (fewer columns, shorter data types, that kind of thing) will always be scanned faster than a wider one simply because you need to read less data from the disk. This is database-independent, in fact. Since vacuuming also effectively involves a sequential scan, you'll also vacuum faster on a narrower table. So it makes sense to separate frequently updated data from less frequently updated data, and vacuum the frequently updated table more often, I would think. However, for tables that are already narrow, you may get little performance gain, or in some cases performance may even get worse, not to mention your data size blowing up bigger. Postgres has a quite high per-tuple overhead (31 bytes or more) so splitting small tables can actually cause growth and make things slower, if you frequently access both tables. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
Curt Sampson <cjs@cynic.net> writes: > However, for tables that are already narrow, you may get little > performance gain, or in some cases performance may even get worse, > not to mention your data size blowing up bigger. Postgres has a > quite high per-tuple overhead (31 bytes or more) so splitting small > tables can actually cause growth and make things slower, if you > frequently access both tables. Right. The *minimum* row overhead in Postgres is 36 bytes (32-byte tuple header plus 4-byte line pointer). More, the actual data space will be rounded up to the next MAXALIGN boundary, either 4 or 8 bytes depending on your platform. On an 8-byte-MAXALIGN platform like mine, a table containing a single int4 column will actually occupy 44 bytes per row. Ouch. So database designs involving lots of narrow tables are not to be preferred over designs with a few wide tables. AFAIK, all databases have nontrivial per-row overheads; PG might be a bit worse than average, but this is a significant issue no matter which DB you use. regards, tom lane
On Fri, 19 Apr 2002, Tom Lane wrote: > Right. The *minimum* row overhead in Postgres is 36 bytes (32-byte > tuple header plus 4-byte line pointer). Ah, right! The line pointer is four bytes because it includes the length of the tuple. But I'm not sure why we need this length, possibly because I don't understand the function of the LP_USED and LP_DELETED flags in the line pointer. (I'm guessing that if LP_USED is not set, the line pointer does not point to any data, and that if LP_DELETED is set, it points to a chunk of free space.) Why could we not just make all unallocated space be pointed to by LP_DELETED pointers, and then when we need space, use it from those (splitting and joining as necessary)? That gets rid of the need for a length. Then we could declare that all tuples must be aligned on a four-byte boundary, use the top 14 bits of a 16-bit line pointer as the address, and the bottom two bits for the LP_USED and LP_DELETED flag. This would slightly simplify the code for determining the flags, and incidently boost the maximum page size to 64K. If you're willing to use a mask and shift to determine the address, rather than just a mask, you could make the maximum page size 128K, use the top 15 bits of the line pointer as the address, and use the remaining bit as the LP_USED flag, since I don't see why we would then need the LP_DELETED flag at all. Or am I smoking crack here? > AFAIK, all databases have nontrivial per-row overheads; PG might be > a bit worse than average, but this is a significant issue no matter > which DB you use. For certain types of tables, such the sort of table joining two others for which I forget the proper term: CREATE TABLE folder_contents ( folder_id int NOT NULL, item_id int NOT NULL, PRIMARY KEY (folder_id, item_id)) some databases are much better. In MS SQL server, for example, since there are no variable length columns, the tuple format will be: 1 byte status bits A 1 byte status bits B 2 bytes fixed-length columns data length 4 bytes DATA: folder_id 4 bytes DATA: item_id 2 bytes number of columns 1 byte null bitmap (unfortunately doesn't go away in SQL server even when there are no nullable columns) (If there were variable length columns, you would have after this: two bytes for the number of columns, 2 bytes per column for the data offsets within the tuple, and then the variable data.) So in Postgres this would take, what, 44 bytes per tuple? But in SQL Server this takes 17 bytes per tuple (including the two byte line pointer in what they call the page's "row offset array), or about 40% of the space. Needless to say, in my last job, where I was dealing with a table like this with 85 million rows, I was happy for this to be a 1.3 GB table instead of a 3.5 GB table. Not that this made much performance difference in that application anyway, since, with a clustered index and typical folder sizes at a couple of dozen to a hundred or so items, I was basically never going to read more than one or two pages from disk to find the contents of a folder. Hm. I guess this really should be on hackers, shouldn't it? cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
Curt Sampson <cjs@cynic.net> writes: > ... Then we could declare that all tuples must be aligned on a > four-byte boundary, use the top 14 bits of a 16-bit line pointer as the > address, and the bottom two bits for the LP_USED and LP_DELETED flag. > This would slightly simplify the code for determining the flags, and > incidently boost the maximum page size to 64K. Hmm. Maybe, but the net effect would only be to reduce the minimum row overhead from 36 to 34 bytes. Not sure it's worth worrying about. Eliminating redundancy from the item headers has its downside, too, in terms of ability to detect problems. > ... I don't see why we would then > need the LP_DELETED flag at all. I believe we do want to distinguish three states: live tuple, dead tuple, and empty space. Otherwise there will be cases where you're forced to move data immediately to collapse empty space, when there's not a good reason to except that your representation can't cope. > Hm. I guess this really should be on hackers, shouldn't it? Yup... regards, tom lane
On Sat, Apr 20, 2002 at 01:55:38PM +0900, Curt Sampson wrote: > > AFAIK, all databases have nontrivial per-row overheads; PG might be > > a bit worse than average, but this is a significant issue no matter > > which DB you use. > > For certain types of tables, such the sort of table joining two > others for which I forget the proper term: > > CREATE TABLE folder_contents ( > folder_id int NOT NULL, > item_id int NOT NULL, > PRIMARY KEY (folder_id, item_id)) > > some databases are much better. In MS SQL server, for example, since > there are no variable length columns, the tuple format will be: > > 1 byte status bits A > 1 byte status bits B > 2 bytes fixed-length columns data length > 4 bytes DATA: folder_id > 4 bytes DATA: item_id > 2 bytes number of columns > 1 byte null bitmap (unfortunately doesn't go away in SQL > server even when there are no nullable columns) Where is the information needed to determine visibility for transactions? In Postgres that's at least 16 bytes (cmin,cmax,xmin,xmax). How does SQL server do that? > (If there were variable length columns, you would have after this: > two bytes for the number of columns, 2 bytes per column for the > data offsets within the tuple, and then the variable data.) In postgres, variable length columns don't cost anything if you don't use them. An int is always 4 bytes, even if there are variable length columns elsewhere. The only other overhead is 4 bytes for the OID and 6 bytes for the CTID, which I guess may be unnecessary. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Canada, Mexico, and Australia form the Axis of Nations That > Are Actually Quite Nice But Secretly Have Nasty Thoughts About America