Thread: Limitations on PGSQL

Limitations on PGSQL

From
Jean-Michel POURE
Date:
At 12:04 05/11/01 +0530, you wrote:
>Hi,
>    Can someone plz to do specify the features and more important the
> limitations in using
>PostgreSQL. More info regarding performace etc shall be of immense help
>Regards
>Bv :-)

Hello Balaji,

There are no real limitations when using PostgreSQL smart programming
features: views, triggers, rules, types and plpgsql server-side language.

For example:

1) FAST READINGS: triggers can store display values instead of performing
several LEFT JOINS or calling PL/pgSQL functions. Similarly, you can use
triggers to perform complex initialization or maintain consistency when
adding/modifying a record. Cron jobs and functions can perform queries and
store results for instant results (ex: statistics tables).This makes your
database very fast in complex readings (ex: web environment). This concept
of storing values is the base of optimization.
2) SAFETY: postgreSQL is a real transactional system. When using a
combination of views and rules, you can control data modification very
neatly. Example: you can define a sub-select of a table and control the
scope of queries. This is very important in a commercial environment when
you data is valuable and must not be deleted or modified given a set of rules.
3) CODING: server-side coding is mainly performed in PL/pgSQL, a very easy
and powerful server-side language.

This is paradise if you are a programmer. IMHO, the only few drawbacks are:

1) TABLE DEFINITION: it is Impossible to delete a column or to
promote/demote a column type. You have to drop the table and import old
values into a new table. This makes life harder when working on large
databases. You are always afraid of loosing your data. Even with backups,
it is always 'heart breaking' to modify a table. You have to perform tests
to ensure all data is there and safe.

2) VIEWS/TRIGGERS cannot be modified. You have to drop them and create them
again. This makes programming a little bit tricky. Further more, if you
create a view, let's say "SELECT table1.*, table2.* FROM table1 a LEFT JOIN
table2 b on a.oid=b.oida", the resulting view displays all fields, hence
making it harder for a non programmer to read view content.

This is very little drawback compared to power and reliability of PostgreSQL.

Best regards,
Jean-Michel POURE

Re: Limitations on PGSQL

From
"Aasmund Midttun Godal"
Date:
IMHO Postgres' drawbacks are the following:

Severely limited access/grants system - postgres gives little or no control over anything beyond controlling access to
wholetables. -Yes you can create views but views have a couple of drawbacks too... This is especially a problem with
regardto functions (no trusted functions). 

Long connection time - if you are using the web you will have to use some sort of persistant scheme e.g. Apache::DBI
otherwiseyou will handle around 5 requests per sec on a decent computer. I wonder whether it would be possible for it
toeither reconnect, keeping the connection to a new database or user, or reuse it's kids - like Apache. 

No schema/tablespaces/cross-database access (- And it's listed on EXOTIC :()
- You can emulate some of these features yet it's not the same.

Error messages take a long time to get used to and generally figuring things out may take some time (at least for me)

If you create a function/trigger/view/rule etc. which accesses a table, and then you drop that table, and recreate it,
youmay have to recreate the function etc. 

It's advantages are:

Runs on practically any platform (I run OpenBSD so it matters).

Supports triggers, rules (statement level triggers), views and stored procedures!

fast - my queries - which may be quite complex at times, are generally fast, and if they are not I can always speed
themup with EXPLAIN, indexes, triggers creating derived tables and so on. 

Did I say stored procedures?

License - Do ANYTHING you want with it (more or less) not as communistic as the obiquitous GPL.

Price - Depending on your internet connection generally less than $0.02...

Great community - Does not mind answering questions and seems to forgive quickly as well.

Write Ahead logging, and many other functions I haven't really exploited yet.

Regards,

Aasmund




On Mon, 05 Nov 2001 11:33:48 +0100, Jean-Michel POURE <jm.poure@freesurf.fr> wrote:
> At 12:04 05/11/01 +0530, you wrote:
>
> Hello Balaji,
>
> There are no real limitations when using PostgreSQL smart programming
> features: views, triggers, rules, types and plpgsql server-side language.
>
> For example:
>
> 1) FAST READINGS: triggers can store display values instead of performing
> several LEFT JOINS or calling PL/pgSQL functions. Similarly, you can use
> triggers to perform complex initialization or maintain consistency when
> adding/modifying a record. Cron jobs and functions can perform queries and
> store results for instant results (ex: statistics tables).This makes your
> database very fast in complex readings (ex: web environment). This concept
> of storing values is the base of optimization.
> 2) SAFETY: postgreSQL is a real transactional system. When using a
> combination of views and rules, you can control data modification very
> neatly. Example: you can define a sub-select of a table and control the
> scope of queries. This is very important in a commercial environment when
> you data is valuable and must not be deleted or modified given a set of rules.
> 3) CODING: server-side coding is mainly performed in PL/pgSQL, a very easy
> and powerful server-side language.
>
> This is paradise if you are a programmer. IMHO, the only few drawbacks are:
>
> 1) TABLE DEFINITION: it is Impossible to delete a column or to
> promote/demote a column type. You have to drop the table and import old
> values into a new table. This makes life harder when working on large
> databases. You are always afraid of loosing your data. Even with backups,
> it is always 'heart breaking' to modify a table. You have to perform tests
> to ensure all data is there and safe.
>
> 2) VIEWS/TRIGGERS cannot be modified. You have to drop them and create them
> again. This makes programming a little bit tricky. Further more, if you
> create a view, let's say "SELECT table1.*, table2.* FROM table1 a LEFT JOIN
> table2 b on a.oid=b.oida", the resulting view displays all fields, hence
> making it harder for a non programmer to read view content.
>
> This is very little drawback compared to power and reliability of PostgreSQL.
>
> Best regards,
> Jean-Michel POURE
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Aasmund Midttun Godal

aasmund@godal.com - http://www.godal.com/
+47 40 45 20 46

Re: Limitations on PGSQL

From
Jean-Michel POURE
Date:
>Long connection time - if you are using the web you will have to use some
>sort of persistant scheme e.g. Apache::DBI otherwise you will handle
>around 5 requests per sec on a decent computer. I wonder whether it would
>be possible for it to either reconnect, keeping the connection to a new
>database or user, or reuse it's kids - like Apache.

Php allows persistent connections. Don't you think?

http://uk.php.net/manual/en/configuration.php#ini.sect.pgsql
Postgres Configuration Directives
pgsql.allow_persistent boolean
Whether to allow persistent Postgres connections.
pgsql.max_persistent integer
The maximum number of persistent Postgres connections per process.
pgsql.max_links integer
The maximum number of Postgres connections per process, including
persistent connections.

Best regards,
Jean-Michel POURE

Foreign Key?

From
Sam Cao
Date:
First time user of Postgresql....
After created the database, how do I check what foreign keys (constraint
references in Postgresql term) were created? I looked around using
"psql" and "pgaccess", but no success,

Thanks for the help,

Sam,

Re: Foreign Key?

From
Stephan Szabo
Date:
On Mon, 21 Jan 2002, Sam Cao wrote:

> First time user of Postgresql....
> After created the database, how do I check what foreign keys (constraint
> references in Postgresql term) were created? I looked around using
> "psql" and "pgaccess", but no success,

Best thing to look at probably is the "Referential Integrity Tutorial &
Hacking the Referential Integrity Tables" tutorial at
http://techdocs.postgresql.org/

I believe that includes a view definition that gets alot of that
information out.



killed select?

From
jtp
Date:
I'm running postgres 7.1.2 on a freebsd machine -- Celeron 500 with 128
megs of ram (256 swap). Not the best for a real gut wrenching machine, but
what was around to get the feel of what was wanted.

A question was asked which i through to the database to see how it was
able to handle the question at hand and it failed . . . after 50 minutes
of processing it flopped to the ground killed: out of swap space.

Granted the query was a large one (explanations below) but a few
questions..

Is there a way to predict the requirements a system would need to handle a
query of specific size / complexity?  (and how?)

Is there a way to pull this type of query off on this system?  (is there a
solution other than throw more ram / swap at it?)  (one would easily be to
handle it in chunks, but other suggestions are welcome)

What would this type of query need to execute?  How about to execute well?

Table and query explanations follow...

The query was joining three tables, which i know is not quite a good idea,
but didn't see much of another way.  The question was posed to find all
the subcategories all customers have ordered from a company.

The history table (history of orders) contains the id, date, cost,
and orderid and has 838500 records.

The ordered table (line items of orders) contains the orderid and a sku
and has 2670000 records

The subcategories table has the sku and subcategory and has 20000 records.

each customer can have many orders which can have many items which can
have many subcategories.

the query was posed as:
  SELECT history.id, sub
     FROM insub
     WHERE history.orderid = ordered.orderid
       AND ordered.items = insub.sku
       ORDER BY ID;

Any help would be greatly appreciated.
Thanks in advance.

.jtp



Re: killed select?

From
Tom Lane
Date:
jtp <john@akadine.com> writes:
> A question was asked which i through to the database to see how it was
> able to handle the question at hand and it failed . . . after 50 minutes
> of processing it flopped to the ground killed: out of swap space.

My guess is that what actually bombed out was psql, which tries to
buffer the entire result of a query.  (Well, actually it's libpq not
psql that does that, but anyway the client side is what's failing.)

I suspect that your query is insufficiently constrained and will return
many millions of rows --- are you sure you have the WHERE clauses right?

If you actually do need to process a query that returns gazillions of
rows, the best bet is to declare a cursor so you can fetch the result
in bite-size chunks, say a few hundred rows at a time.

            regards, tom lane

general design question

From
jtp
Date:

Hi, just a general design question and wondering how postgres would handle
either situation.

I have a gobb of information (400000+ records) on individual accounts.  I
need to store all of their personal information (name, adress, etc) as
well as all of their more dynamic company information (last purchase,
times ordered, etc).

One:  All their dynamic information can be rebuilt from other tables,
but it will be called upon rather frequently, so the redundency so as to
not have to rebuild on every call seems acceptable by me. (smack me if i'm
wrong)

Two:  There is only a one to one ration between an account (personal
information) and that account's account information (makes sense,
eh?). But does it make sense to keep this information in the same table or
to break it up?  I estimate about 20 fields in two separate tables or 40
in one big one.  The personal information will almost always be index
searched by name or zipcode.  Whereas the other information they (they
proverbial they) will probably want sorted in weirdass ways that the
design was never intended for.  Basically, it will be be subjected to more
sequential scans than something with close to a half million records
should be.   My basic question ends up being: does postgres handle
sequntial scans across tables with fewer fields better?  Is there any
performance increase by separating this into two tables?

Thanks for any hints you could give me.
.jtp


Re: general design question

From
Curt Sampson
Date:
On Fri, 19 Apr 2002, jtp wrote:

> One:  All their dynamic information can be rebuilt from other tables,
> but it will be called upon rather frequently, so the redundency so as to
> not have to rebuild on every call seems acceptable by me. (smack me if i'm
> wrong)

It's quite reasonable to keep a summary table of information for
fast reference. The only difficulty you have to deal with is how
you keep it up to date. (Update every time the summarized data
change? Update once an hour? Once a day? That kind of thing. It
depends on your application.)

> My basic question ends up being: does postgres handle
> sequntial scans across tables with fewer fields better?

Definitely. Given the same number of rows, a narrower table (fewer
columns, shorter data types, that kind of thing) will always be
scanned faster than a wider one simply because you need to read
less data from the disk. This is database-independent, in fact.

Since vacuuming also effectively involves a sequential scan, you'll
also vacuum faster on a narrower table. So it makes sense to separate
frequently updated data from less frequently updated data, and
vacuum the frequently updated table more often, I would think.

However, for tables that are already narrow, you may get little
performance gain, or in some cases performance may even get worse,
not to mention your data size blowing up bigger. Postgres has a
quite high per-tuple overhead (31 bytes or more) so splitting small
tables can actually cause growth and make things slower, if you
frequently access both tables.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC


Re: general design question

From
Tom Lane
Date:
Curt Sampson <cjs@cynic.net> writes:
> However, for tables that are already narrow, you may get little
> performance gain, or in some cases performance may even get worse,
> not to mention your data size blowing up bigger. Postgres has a
> quite high per-tuple overhead (31 bytes or more) so splitting small
> tables can actually cause growth and make things slower, if you
> frequently access both tables.

Right.  The *minimum* row overhead in Postgres is 36 bytes (32-byte
tuple header plus 4-byte line pointer).  More, the actual data space
will be rounded up to the next MAXALIGN boundary, either 4 or 8 bytes
depending on your platform.  On an 8-byte-MAXALIGN platform like mine,
a table containing a single int4 column will actually occupy 44 bytes
per row.  Ouch.  So database designs involving lots of narrow tables
are not to be preferred over designs with a few wide tables.

AFAIK, all databases have nontrivial per-row overheads; PG might be
a bit worse than average, but this is a significant issue no matter
which DB you use.

            regards, tom lane

Re: general design question

From
Curt Sampson
Date:
On Fri, 19 Apr 2002, Tom Lane wrote:

> Right.  The *minimum* row overhead in Postgres is 36 bytes (32-byte
> tuple header plus 4-byte line pointer).

Ah, right! The line pointer is four bytes because it includes the length
of the tuple.

But I'm not sure why we need this length, possibly because I don't
understand the function of the LP_USED and LP_DELETED flags in the line
pointer. (I'm guessing that if LP_USED is not set, the line pointer does
not point to any data, and that if LP_DELETED is set, it points to a
chunk of free space.)

Why could we not just make all unallocated space be pointed to by
LP_DELETED pointers, and then when we need space, use it from those
(splitting and joining as necessary)? That gets rid of the need for
a length. Then we could declare that all tuples must be aligned on a
four-byte boundary, use the top 14 bits of a 16-bit line pointer as the
address, and the bottom two bits for the LP_USED and LP_DELETED flag.
This would slightly simplify the code for determining the flags, and
incidently boost the maximum page size to 64K.

If you're willing to use a mask and shift to determine the address,
rather than just a mask, you could make the maximum page size 128K,
use the top 15 bits of the line pointer as the address, and use the
remaining bit as the LP_USED flag, since I don't see why we would then
need the LP_DELETED flag at all.

Or am I smoking crack here?

> AFAIK, all databases have nontrivial per-row overheads; PG might be
> a bit worse than average, but this is a significant issue no matter
> which DB you use.

For certain types of tables, such the sort of table joining two
others for which I forget the proper term:

    CREATE TABLE folder_contents (
        folder_id    int NOT NULL,
        item_id        int NOT NULL,
        PRIMARY KEY (folder_id, item_id))

some databases are much better. In MS SQL server, for example, since
there are no variable length columns, the tuple format will be:

    1 byte        status bits A
    1 byte        status bits B
    2 bytes        fixed-length columns data length
    4 bytes        DATA: folder_id
    4 bytes        DATA: item_id
    2 bytes        number of columns
    1 byte        null bitmap (unfortunately doesn't go away in SQL
            server even when there are no nullable columns)

(If there were variable length columns, you would have after this:
two bytes for the number of columns, 2 bytes per column for the
data offsets within the tuple, and then the variable data.)

So in Postgres this would take, what, 44 bytes per tuple? But in
SQL Server this takes 17 bytes per tuple (including the two byte
line pointer in what they call the page's "row offset array), or
about 40% of the space.

Needless to say, in my last job, where I was dealing with a table
like this with 85 million rows, I was happy for this to be a 1.3
GB table instead of a 3.5 GB table. Not that this made much
performance difference in that application anyway, since, with a
clustered index and typical folder sizes at a couple of dozen to
a hundred or so items, I was basically never going to read more
than one or two pages from disk to find the contents of a folder.

Hm. I guess this really should be on hackers, shouldn't it?

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC


Re: general design question

From
Tom Lane
Date:
Curt Sampson <cjs@cynic.net> writes:
> ... Then we could declare that all tuples must be aligned on a
> four-byte boundary, use the top 14 bits of a 16-bit line pointer as the
> address, and the bottom two bits for the LP_USED and LP_DELETED flag.
> This would slightly simplify the code for determining the flags, and
> incidently boost the maximum page size to 64K.

Hmm.  Maybe, but the net effect would only be to reduce the minimum row
overhead from 36 to 34 bytes.  Not sure it's worth worrying about.
Eliminating redundancy from the item headers has its downside, too,
in terms of ability to detect problems.

> ... I don't see why we would then
> need the LP_DELETED flag at all.

I believe we do want to distinguish three states: live tuple, dead
tuple, and empty space.  Otherwise there will be cases where you're
forced to move data immediately to collapse empty space, when there's
not a good reason to except that your representation can't cope.

> Hm. I guess this really should be on hackers, shouldn't it?

Yup...

            regards, tom lane

Re: general design question

From
Martijn van Oosterhout
Date:
On Sat, Apr 20, 2002 at 01:55:38PM +0900, Curt Sampson wrote:
> > AFAIK, all databases have nontrivial per-row overheads; PG might be
> > a bit worse than average, but this is a significant issue no matter
> > which DB you use.
>
> For certain types of tables, such the sort of table joining two
> others for which I forget the proper term:
>
>     CREATE TABLE folder_contents (
>         folder_id    int NOT NULL,
>         item_id        int NOT NULL,
>         PRIMARY KEY (folder_id, item_id))
>
> some databases are much better. In MS SQL server, for example, since
> there are no variable length columns, the tuple format will be:
>
>     1 byte        status bits A
>     1 byte        status bits B
>     2 bytes        fixed-length columns data length
>     4 bytes        DATA: folder_id
>     4 bytes        DATA: item_id
>     2 bytes        number of columns
>     1 byte        null bitmap (unfortunately doesn't go away in SQL
>             server even when there are no nullable columns)

Where is the information needed to determine visibility for transactions? In
Postgres that's at least 16 bytes (cmin,cmax,xmin,xmax). How does SQL server
do that?

> (If there were variable length columns, you would have after this:
> two bytes for the number of columns, 2 bytes per column for the
> data offsets within the tuple, and then the variable data.)

In postgres, variable length columns don't cost anything if you don't use
them. An int is always 4 bytes, even if there are variable length columns
elsewhere. The only other overhead is 4 bytes for the OID and 6 bytes for
the CTID, which I guess may be unnecessary.

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Canada, Mexico, and Australia form the Axis of Nations That
> Are Actually Quite Nice But Secretly Have Nasty Thoughts About America