Thread: Tool for database design documentation?

Tool for database design documentation?

From
"Jon Christian Ottersen"
Date:

We are trying to find a good way to document our database design – what is the rationale behind each table/field, what kind of information is each field supposed to contain, perhaps also something about the linking between the tables etc. Is there anybody who has some experience on this? Is the built in ‘comments’ fields in pgsql the best tool or are there any special tools that would be recommendable?

Re: Tool for database design documentation?

From
Steve Manes
Date:
Jon Christian Ottersen wrote:
> We are trying to find a good way to document our database design – what
> is the rationale behind each table/field, what kind of information is
> each field supposed to contain, perhaps also something about the linking
> between the tables etc. Is there anybody who has some experience on
> this? Is the built in ‘comments’ fields in pgsql the best tool or are
> there any special tools that would be recommendable?

I prefer to use a third-party tool like Case Studio 2
(http://www.casestudio.com) for this.  Import your schema into CS,
document your various entities in the table/element/key/etc popups, then
Export everything to HTML.  CS generates pretty complete documentation.


Re: Tool for database design documentation?

From
Jeffrey Melloy
Date:

On Jul 30, 2005, at 2:30 PM, Jon Christian Ottersen wrote:

We are trying to find a good way to document our database design – what is the rationale behind each table/field, what kind of information is each field supposed to contain, perhaps also something about the linking between the tables etc. Is there anybody who has some experience on this? Is the built in ‘comments’ fields in pgsql the best tool or are there any special tools that would be recommendable?



Druid (http://druid.sourceforge.net) works fairly well.  (Though the interface is abysmal, it does its job).  It makes javadoc-style documentation for tables and columns.

Jeff

Re: Tool for database design documentation?

From
William Bug
Date:
I use a combination of the gnome-based diagraming application Dia (http://www.gnome.org/projects/dia/) and the tedia2sql diagram-to-DDL SQL translation script, which is written in Perl (http://tedia2sql.tigris.org/).

It's a bit awkward, but:
    1) Dia is a wonderful, open source diagramming tool I can run on Windows, Mac OS X (via Fink) or Linux - nearly all of the commercial tools have traditionally been Windows only, though this is slowly changing;
    2) Dia comes with a set of UML diagram objects which work quite well for laying out a detailed data model.  There are also ERD diagram objects, they don't seem to be of much use;
    3) tedia2sql does a very good job translating most standard SQL-92 (and some additional SQL-99 extensions) from UML objects to SQL objects;
    4) Dia saves its diagrams in XML format which makes it relatively straight-forward to add to the SQL translations being performed  (see the many tedia2sql like translators on the Dia links page - http://www.gnome.org/projects/dia/links.html).  Some of the PostgreSQL specific features such as Inheritance and Rules can be diagramed using the UML formalism and one can write extensions in Perl to tedia2sql to turn that in DDL SQL;

There are other SQL to Dia diagram translators (e.g., PostgreSQL AutoDoc [http://www.rbt.ca/autodoc/]) which allow for reverse-engineering a data model diagram from an existing PostgreSQL DDL SQL schema.  There are also other Dia-to-SQL tools (see the Dia links page), some of them even more specifically tailored to PostgreSQL than tedia2SQL is.  I came to using tedia2sql over these because I found it supported more of the general SQL entities and relations I needed and it was relatively easy to use.

I should add this approach of sticking with open source tools that are relatively straight-forward to modify and extend is not for the faint of heart.  You absolutely must have a solid grasp on what a database Entity-Relation Diagram (ERD) is.  Understanding the translation process (XML Dia diagram objects to DDL SQL code generation) can be quite helpful in making best use of tedia2sql, though it is not necessary to getting started turning DDL SQL out from your Dia data models.  A thorough understanding of this process and knowledge of XML document parsing IS required to extend the translator.

This more "home grown" solution doesn't do as good a job separating logical data models (implementable in any RDBMS) from physical data models (version of the logical model designed for a specific RDBMS).  I've found to easiest and quickest to simply create the physical model diagram and focus on how to get that translate properly in PostgreSQL DDL SQL.  The Dia data model diagrams I create aren't easily re-usuable in other RDBMSs, but then again, I don't intend to stop using PostgreSQL, unless I'm driven from it for some applications-specific reason.

I'm working on a port of tedia2sql to Ruby (my scripting language of choice of late), mainly because I've got Object-Relational mapping libraries there which will make it easier to create a complete, "round trip" ERD system - i.e., be able to both generate Dia diagrams form existing PostgreSQL DDL SQL schemas and turn Dia diagrams in to DDL SQL that can be used to directly instantiate the schema in PostgreSQL (via DBI code).

Cheers,
Bill

On Jul 31, 2005, at 1:54 AM, Jeffrey Melloy wrote:


On Jul 30, 2005, at 2:30 PM, Jon Christian Ottersen wrote:

We are trying to find a good way to document our database design – what is the rationale behind each table/field, what kind of information is each field supposed to contain, perhaps also something about the linking between the tables etc. Is there anybody who has some experience on this? Is the built in ‘comments’ fields in pgsql the best tool or are there any special tools that would be recommendable?



Druid (http://druid.sourceforge.net) works fairly well.  (Though the interface is abysmal, it does its job).  It makes javadoc-style documentation for tables and columns.

Jeff

feeding big script to psql

From
Havasvölgyi Ottó
Date:
Hi,

A generated a big SQL script (about 20 Mb), and fed it to psql. I was very
surprised that within a minute psql became quite slow. There were areas,
where less than 10 row were inserted in a second.
This is on a WinXP machine with local server 8.0.3, and only I use it.
Looking at the log files of PG the commands are executed fast.
I have monitored psql with te TaskManager, and it was at 45-50% CPU (this is
Hyperthreaded, so 50% is the max for one thread), and the memory
allocation/deallocation was very active, even about +-2Mb/sec.
The command were simple create table and insert commands chunked into
several lines like this:

CREATE TABLE aeloleg(
vevo CHAR(6),
szallito INTEGER,
datum DATE,
hatarido DATE,
vevo_nev CHAR(30),
ir_szam INTEGER,
helyseg CHAR(20),
cim CHAR(20),
befizetes INTEGER,
terheles INTEGER,
hitel INTEGER,
rendeles INTEGER,
jel CHAR(1),
trans INTEGER,
szoveg TEXT,
storno BOOLEAN) WITHOUT OIDS;

The insert commands for one table were surrounded by BEGIN and COMMIT like
this:

CREATE
BEGIN
INSERT
INSERT
...
INSERT
COMMIT


I know it would be faster with COPY, but this is extremly slow, and the
bottleneck is psql.
What is the problem?

Regards,
Otto



Re: feeding big script to psql

From
Tom Lane
Date:
=?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:
> I know it would be faster with COPY, but this is extremly slow, and the
> bottleneck is psql.
> What is the problem?

Hmm, does the Windows port have readline support, and if so does adding
the "-n" switch to the psql invocation fix the problem?  Or you could
try feeding the script with -f switch or \i rather than "psql <script".
Readline adds a fair amount of overhead, which is completely invisible
at human typing speeds but can be annoying when reading scripts.

            regards, tom lane

Re: feeding big script to psql

From
Havasvölgyi Ottó
Date:
Tom,

Thanks for the suggestion. I have just applied both switch , -f (I have
applied this in the previous case too) and -n, but it becomes slow again. At
the beginning it reads about 300 KB a second, and when it has read 1.5 MB,
it reads only about 10 KB a second, it slows down gradually. Maybe others
should also try this scenario. Can I help anything?

Best Regards,
Otto


----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Havasvölgyi Ottó" <h.otto@freemail.hu>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 3:54 AM
Subject: Re: [GENERAL] feeding big script to psql


> =?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:
>> I know it would be faster with COPY, but this is extremly slow, and the
>> bottleneck is psql.
>> What is the problem?
>
> Hmm, does the Windows port have readline support, and if so does adding
> the "-n" switch to the psql invocation fix the problem?  Or you could
> try feeding the script with -f switch or \i rather than "psql <script".
> Readline adds a fair amount of overhead, which is completely invisible
> at human typing speeds but can be annoying when reading scripts.
>
> regards, tom lane
>
>



Re: feeding big script to psql

From
Havasvölgyi Ottó
Date:
Hi,

The effect is the same even if I redirect the output to file with the -o
switch.
At the beginning 200 KB/sec, at 1.5 MB the speed is less than 20 KB/sec.

Best Regards,
Otto



----- Original Message -----
From: "Havasvölgyi Ottó" <h.otto@freemail.hu>
To: "Tom Lane" <tgl@sss.pgh.pa.us>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 11:24 AM
Subject: Re: [GENERAL] feeding big script to psql


> Tom,
>
> Thanks for the suggestion. I have just applied both switch , -f (I have
> applied this in the previous case too) and -n, but it becomes slow again.
> At the beginning it reads about 300 KB a second, and when it has read 1.5
> MB, it reads only about 10 KB a second, it slows down gradually. Maybe
> others should also try this scenario. Can I help anything?
>
> Best Regards,
> Otto
>
>
> ----- Original Message -----
> From: "Tom Lane" <tgl@sss.pgh.pa.us>
> To: "Havasvölgyi Ottó" <h.otto@freemail.hu>
> Cc: <pgsql-general@postgresql.org>
> Sent: Tuesday, August 02, 2005 3:54 AM
> Subject: Re: [GENERAL] feeding big script to psql
>
>
>> =?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:
>>> I know it would be faster with COPY, but this is extremly slow, and the
>>> bottleneck is psql.
>>> What is the problem?
>>
>> Hmm, does the Windows port have readline support, and if so does adding
>> the "-n" switch to the psql invocation fix the problem?  Or you could
>> try feeding the script with -f switch or \i rather than "psql <script".
>> Readline adds a fair amount of overhead, which is completely invisible
>> at human typing speeds but can be annoying when reading scripts.
>>
>> regards, tom lane
>>
>>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>
>



Re: feeding big script to psql

From
Havasvölgyi Ottó
Date:
Hi,

Now I am at 7 MB, and the reading speed is 3-4KB/sec.

Best Regards,
Otto


----- Original Message -----
From: "Havasvölgyi Ottó" <h.otto@freemail.hu>
To: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 1:31 PM
Subject: Re: [GENERAL] feeding big script to psql


> Hi,
>
> The effect is the same even if I redirect the output to file with the -o
> switch.
> At the beginning 200 KB/sec, at 1.5 MB the speed is less than 20 KB/sec.
>
> Best Regards,
> Otto
>
>
>
> ----- Original Message -----
> From: "Havasvölgyi Ottó" <h.otto@freemail.hu>
> To: "Tom Lane" <tgl@sss.pgh.pa.us>
> Cc: <pgsql-general@postgresql.org>
> Sent: Tuesday, August 02, 2005 11:24 AM
> Subject: Re: [GENERAL] feeding big script to psql
>
>
>> Tom,
>>
>> Thanks for the suggestion. I have just applied both switch , -f (I have
>> applied this in the previous case too) and -n, but it becomes slow again.
>> At the beginning it reads about 300 KB a second, and when it has read 1.5
>> MB, it reads only about 10 KB a second, it slows down gradually. Maybe
>> others should also try this scenario. Can I help anything?
>>
>> Best Regards,
>> Otto
>>
>>
>> ----- Original Message -----
>> From: "Tom Lane" <tgl@sss.pgh.pa.us>
>> To: "Havasvölgyi Ottó" <h.otto@freemail.hu>
>> Cc: <pgsql-general@postgresql.org>
>> Sent: Tuesday, August 02, 2005 3:54 AM
>> Subject: Re: [GENERAL] feeding big script to psql
>>
>>
>>> =?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:
>>>> I know it would be faster with COPY, but this is extremly slow, and the
>>>> bottleneck is psql.
>>>> What is the problem?
>>>
>>> Hmm, does the Windows port have readline support, and if so does adding
>>> the "-n" switch to the psql invocation fix the problem?  Or you could
>>> try feeding the script with -f switch or \i rather than "psql <script".
>>> Readline adds a fair amount of overhead, which is completely invisible
>>> at human typing speeds but can be annoying when reading scripts.
>>>
>>> regards, tom lane
>>>
>>>
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>       subscribe-nomail command to majordomo@postgresql.org so that your
>>       message can get through to the mailing list cleanly
>>
>>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>
>



Re: feeding big script to psql

From
Dan Sugalski
Date:
At 1:57 PM +0200 8/2/05, Havasvölgyi Ottó wrote:
>Hi,
>
>Now I am at 7 MB, and the reading speed is 3-4KB/sec.

Have you checked to see if you're swapping as
this goes on, either in the client or on the
server?

>----- Original Message ----- From: "Havasvölgyi Ottó" <h.otto@freemail.hu>
>To: <pgsql-general@postgresql.org>
>Sent: Tuesday, August 02, 2005 1:31 PM
>Subject: Re: [GENERAL] feeding big script to psql
>
>>Hi,
>>
>>The effect is the same even if I redirect the
>>output to file with the -o switch.
>>At the beginning 200 KB/sec, at 1.5 MB the speed is less than 20 KB/sec.
>>
>>Best Regards,
>>Otto
>>
>>
>>
>>----- Original Message ----- From: "Havasvölgyi Ottó" <h.otto@freemail.hu>
>>To: "Tom Lane" <tgl@sss.pgh.pa.us>
>>Cc: <pgsql-general@postgresql.org>
>>Sent: Tuesday, August 02, 2005 11:24 AM
>>Subject: Re: [GENERAL] feeding big script to psql
>>
>>>Tom,
>>>
>>>Thanks for the suggestion. I have just applied
>>>both switch , -f (I have applied this in the
>>>previous case too) and -n, but it becomes slow
>>>again. At the beginning it reads about 300 KB
>>>a second, and when it has read 1.5 MB, it
>>>reads only about 10 KB a second, it slows down
>>>gradually. Maybe others should also try this
>>>scenario. Can I help anything?
>>>
>>>Best Regards,
>>>Otto
>>>
>>>
>>>----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us>
>>>To: "Havasvölgyi Ottó" <h.otto@freemail.hu>
>>>Cc: <pgsql-general@postgresql.org>
>>>Sent: Tuesday, August 02, 2005 3:54 AM
>>>Subject: Re: [GENERAL] feeding big script to psql
>>>
>>>>=?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:
>>>>>I know it would be faster with COPY, but this is extremly slow, and the
>>>>>bottleneck is psql.
>>>>>What is the problem?
>>>>
>>>>Hmm, does the Windows port have readline support, and if so does adding
>>>>the "-n" switch to the psql invocation fix the problem?  Or you could
>>>>try feeding the script with -f switch or \i rather than "psql <script".
>>>>Readline adds a fair amount of overhead, which is completely invisible
>>>>at human typing speeds but can be annoying when reading scripts.
>>>>
>>>>regards, tom lane
>>>>
>>>
>>>
>>>
>>>---------------------------(end of broadcast)---------------------------
>>>TIP 1: if posting/reading through Usenet, please send an appropriate
>>>       subscribe-nomail command to majordomo@postgresql.org so that your
>>>       message can get through to the mailing list cleanly
>>>
>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 4: Have you searched our list archives?
>>
>>               http://archives.postgresql.org
>>
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match


--
                Dan

--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
dan@sidhe.org                         have teddy bears and even
                                       teddy bears get drunk

Re: feeding big script to psql

From
Peter Wilson
Date:
Dan Sugalski wrote:
> At 1:57 PM +0200 8/2/05, Havasvölgyi Ottó wrote:
>> Hi,
>>
>> Now I am at 7 MB, and the reading speed is 3-4KB/sec.
>
> Have you checked to see if you're swapping as this goes on, either in
> the client or on the server?
>
>> ----- Original Message ----- From: "Havasvölgyi Ottó"
>> <h.otto@freemail.hu>
>> To: <pgsql-general@postgresql.org>
>> Sent: Tuesday, August 02, 2005 1:31 PM
>> Subject: Re: [GENERAL] feeding big script to psql
>>
>>> Hi,
>>>
>>> The effect is the same even if I redirect the output to file with the
>>> -o switch.
>>> At the beginning 200 KB/sec, at 1.5 MB the speed is less than 20 KB/sec.
>>>
>>> Best Regards,
>>> Otto
>>>
>>>
>>>
>>> ----- Original Message ----- From: "Havasvölgyi Ottó"
>>> <h.otto@freemail.hu>
>>> To: "Tom Lane" <tgl@sss.pgh.pa.us>
>>> Cc: <pgsql-general@postgresql.org>
>>> Sent: Tuesday, August 02, 2005 11:24 AM
>>> Subject: Re: [GENERAL] feeding big script to psql
>>>
>>>> Tom,
>>>>
>>>> Thanks for the suggestion. I have just applied both switch , -f (I
>>>> have applied this in the previous case too) and -n, but it becomes
>>>> slow again. At the beginning it reads about 300 KB a second, and
>>>> when it has read 1.5 MB, it reads only about 10 KB a second, it
>>>> slows down gradually. Maybe others should also try this scenario.
>>>> Can I help anything?
>>>>
>>>> Best Regards,
>>>> Otto
>>>>
>>>>
>>>> ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us>
>>>> To: "Havasvölgyi Ottó" <h.otto@freemail.hu>
>>>> Cc: <pgsql-general@postgresql.org>
>>>> Sent: Tuesday, August 02, 2005 3:54 AM
>>>> Subject: Re: [GENERAL] feeding big script to psql
>>>>
>>>>> =?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:
>>>>>> I know it would be faster with COPY, but this is extremly slow,
>>>>>> and the
>>>>>> bottleneck is psql.
>>>>>> What is the problem?
>>>>>
>>>>> Hmm, does the Windows port have readline support, and if so does
>>>>> adding
>>>>> the "-n" switch to the psql invocation fix the problem?  Or you could
>>>>> try feeding the script with -f switch or \i rather than "psql
>>>>> <script".
>>>>> Readline adds a fair amount of overhead, which is completely invisible
>>>>> at human typing speeds but can be annoying when reading scripts.
>>>>>
>>>>> regards, tom lane
>>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>>>       subscribe-nomail command to majordomo@postgresql.org so that your
>>>>       message can get through to the mailing list cleanly
>>>>
>>>
>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 4: Have you searched our list archives?
>>>
>>>               http://archives.postgresql.org
>>>
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>       choose an index scan if your joining column's datatypes do not
>>       match
>
>
> --
>                 Dan
>
> --------------------------------------it's like this-------------------
> Dan Sugalski                          even samurai
> dan@sidhe.org                         have teddy bears and even
>                                       teddy bears get drunk
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Have you tried inserting VACUUM commands into the script every now and then?

I found a while ago that after inserting a lot of rows into a clean
Postgres table it would take several minutes just to analyse a command,
not even starting the execution. That was on version 8.0. On version
7.4.x the query never returned at all.

Pete
--
Peter Wilson - YellowHawk Ltd : http://www.yellowhawk.co.uk

Re: feeding big script to psql

From
Scott Marlowe
Date:
On Tue, 2005-08-02 at 04:24, Havasvölgyi Ottó wrote:
> Tom,
>
> Thanks for the suggestion. I have just applied both switch , -f (I have
> applied this in the previous case too) and -n, but it becomes slow again. At
> the beginning it reads about 300 KB a second, and when it has read 1.5 MB,
> it reads only about 10 KB a second, it slows down gradually. Maybe others
> should also try this scenario. Can I help anything?

I be you've got an issue where a seq scan on an fk field or something
works fine for the first few thousand rows.  At some point, pgsql should
switch to an index scan, but it just doesn't know it.

Try wrapping every 10,000 or so inserts with

begin;
<insert 10,000 rows>
commit;
analyze;
begin;
rinse, wash repeat.

You probably won't need an analyze after the first one though.

Re: feeding big script to psql

From
Tom Lane
Date:
Peter Wilson <petew@yellowhawk.co.uk> writes:
> I found a while ago that after inserting a lot of rows into a clean
> Postgres table it would take several minutes just to analyse a command,
> not even starting the execution.

Oh?  Could you provide a test case for this?  I can certainly believe
that the planner might choose a bad plan if it has no statistics, but
it shouldn't take a long time to do it.

            regards, tom lane

Re: feeding big script to psql

From
Peter Wilson
Date:
I was a little busy with deadlines at the time but I saved the database
in it's slow configuration so I could investigate during a quieter period.

I'll do a restore now and see whether I can remember back to April when
I came across this issue.

Pete


Tom Lane wrote:
> Peter Wilson <petew@yellowhawk.co.uk> writes:
>> I found a while ago that after inserting a lot of rows into a clean
>> Postgres table it would take several minutes just to analyse a command,
>> not even starting the execution.
>
> Oh?  Could you provide a test case for this?  I can certainly believe
> that the planner might choose a bad plan if it has no statistics, but
> it shouldn't take a long time to do it.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Re: feeding big script to psql

From
Havasvölgyi Ottó
Date:
Scott,

There were no foreign keys (even no indices) during data import, and none of
the tables had more than 4000 records. And I have checked the log for
durations, and all insert statements were 0.000 ms. So it seems that the
problem is not at the server.
During the process no other application did anything. No other HDD activity
either.

Best Regadrs,
Otto


----- Original Message -----
From: "Scott Marlowe" <smarlowe@g2switchworks.com>
To: "Havasvölgyi Ottó" <h.otto@freemail.hu>
Cc: "Tom Lane" <tgl@sss.pgh.pa.us>; <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 5:57 PM
Subject: Re: [GENERAL] feeding big script to psql


On Tue, 2005-08-02 at 04:24, Havasvölgyi Ottó wrote:
> Tom,
>
> Thanks for the suggestion. I have just applied both switch , -f (I have
> applied this in the previous case too) and -n, but it becomes slow again.
> At
> the beginning it reads about 300 KB a second, and when it has read 1.5 MB,
> it reads only about 10 KB a second, it slows down gradually. Maybe others
> should also try this scenario. Can I help anything?

I be you've got an issue where a seq scan on an fk field or something
works fine for the first few thousand rows.  At some point, pgsql should
switch to an index scan, but it just doesn't know it.

Try wrapping every 10,000 or so inserts with

begin;
<insert 10,000 rows>
commit;
analyze;
begin;
rinse, wash repeat.

You probably won't need an analyze after the first one though.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings




Re: feeding big script to psql

From
Peter Wilson
Date:
Tom Lane wrote:
 > Peter Wilson <petew@yellowhawk.co.uk> writes:
 >> I found a while ago that after inserting a lot of rows into a clean
 >> Postgres table it would take several minutes just to analyse a command,
 >> not even starting the execution.
 >
 > Oh?  Could you provide a test case for this?  I can certainly believe
 > that the planner might choose a bad plan if it has no statistics, but
 > it shouldn't take a long time to do it.
 >
 >             regards, tom lane
 >
 > ---------------------------(end of broadcast)---------------------------
 > TIP 2: Don't 'kill -9' the postmaster
 >

Hi Tom,
I've spent half an hour restoring the database from the backup I took. This was back in December '04. The database dump
isabout 95Mbytes, none of the  
tables are particularly big.

On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is what pgadminIII does when you press the explain
button.According to the manual,  
this actually executes the command and gives actual times (I think pgadmin is wrong here - if you ask it to explain a
commandyou want the plan it's  
going to use - not wait for it to finish!)

That said - I'll include the details here, at least partly because the tables are small in database terms, and I don't
understandhow this query even  
with lots of scanning can take 100% CPU for over 10 minutes (654 seconds for explain/analyze  651 seconds for execute).

OK - the query is:

SELECT DISTINCT c.client_id, c.instance, c.contact_id, c.uname FROM contacts c
    WHERE c.client_id = 'gadget'
      AND c.instance = '0'
      AND (    c.type = 'COMMUNITY'
            OR c.type = 'OU'
            OR c.type = 'INDIVIDUAL'
          )
      AND c.contact_id in (
             SELECT subb.community_id  FROM contact_att subb
                WHERE subb.client_id = 'gadget'
                  AND subb.instance = '0'
                  AND subb.contact_id = 3854.000000
         ) ;

By itself the sub-select executes in 235ms and yields a set that does not vary depending on the outer select - it's a
constantset in effect. It would  
seem that in the worst case assuming no index or size information, the planner should spot the invariance of the
subselectand a sequential scan of  
the 'contacts' table would be the worst result I would expect.

There are two tables involved in this query.

    'contacts' contains 3728 records.
    'contact_att' contains 19217 records.

The query plan yields :

Unique  (cost=12.05..12.06 rows=1 width=90) (actual time=654491.967..654491.984 rows=3 loops=1)
   ->  Sort  (cost=12.05..12.05 rows=1 width=90) (actual time=654491.958..654491.959 rows=3 loops=1)
         Sort Key: c.client_id, c.instance, c.contact_id, c.uname
         ->  Nested Loop IN Join  (cost=0.00..12.04 rows=1 width=90) (actual time=577763.884..654491.864 rows=3
loops=1)
               Join Filter: ("outer".contact_id = "inner".community_id)
               ->  Index Scan using cos_uk on contacts c  (cost=0.00..6.02 rows=1 width=90) (actual time=0.227..123.862
rows=2791loops=1) 
                     Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
                     Filter: ((("type")::text = 'COMMUNITY'::text) OR (("type")::text = 'OU'::text) OR (("type")::text
='INDIVIDUAL'::text)) 
               ->  Index Scan using ca_pk on contact_att subb  (cost=0.00..6.01 rows=1 width=8) (actual
time=0.207..234.423rows=3 loops=2791) 
                     Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
                     Filter: ((contact_id)::numeric = 3854.000000)
Total runtime: 654492.320 ms

Definitions for the two relevant tables are:

create table contacts (
     INSTANCE CHARACTER (1)  NOT NULL ,
     client_id varchar (50) not null ,
     contact_id bigint default nextval('contacts_contact_id_seq'::text),
     UNAME VARCHAR (32)  NOT NULL ,
     TYPE VARCHAR (20)  NOT NULL DEFAULT 'INDIVIDUAL',
     parent bigint,
     NAME VARCHAR (240) ,
     PHONE VARCHAR (32) ,
     FAX VARCHAR (32) ,
     EMAIL VARCHAR (240) ,
     BASIC_PW VARCHAR (128) ,
     DESCRIPTION VARCHAR (240),
     custom_data varchar(8192),
CONSTRAINT COS_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID)
)

create table contact_att(
     instance character(1),
     client_id varchar(50) not null,
     contact_id bigint,
     community_id bigint,
     inherited smallint,
CONSTRAINT CA_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID,COMMUNITY_ID)
)

CREATE INDEX CO_PA_IND ON CONTACTS (PARENT);
CREATE INDEX CO_TY_IND ON CONTACTS (TYPE);

--
Hope you find that useful! If there is anything else I can provide you with then please do let me know.

Pete


------------------------------------------------------------------------
Peter Wilson - YellowHawk Ltd, http://www.yellowhawk.co.uk



Re: feeding big script to psql

From
Peter Wilson
Date:
Tom Lane wrote:
> Peter Wilson <petew@yellowhawk.co.uk> writes:
>> I found a while ago that after inserting a lot of rows into a clean
>> Postgres table it would take several minutes just to analyse a command,
>> not even starting the execution.
>
> Oh?  Could you provide a test case for this?  I can certainly believe
> that the planner might choose a bad plan if it has no statistics, but
> it shouldn't take a long time to do it.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>
After a vacuum the query plan becomes:

Unique  (cost=1438.65..1438.66 rows=1 width=39) (actual time=260.473..260.489 rows=3 loops=1)
   ->  Sort  (cost=1438.65..1438.65 rows=1 width=39) (actual time=260.468..260.471 rows=3 loops=1)
         Sort Key: c.client_id, c.instance, c.contact_id, c.uname
         ->  Nested Loop  (cost=1434.14..1438.64 rows=1 width=39) (actual time=260.007..260.306 rows=3 loops=1)
               ->  HashAggregate  (cost=1434.14..1434.14 rows=1 width=8) (actual time=259.666..259.686 rows=3 loops=1)
                     ->  Index Scan using ca_pk on contact_att subb  (cost=0.00..1433.95 rows=78 width=8) (actual
time=0.367..259.617rows=3 loops=1) 
                           Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
                           Filter: ((contact_id)::numeric = 3854.000000)
               ->  Index Scan using cos_pk on contacts c  (cost=0.00..4.48 rows=1 width=39) (actual time=0.178..0.182
rows=1loops=3) 
                     Index Cond: ((c.instance = '0'::bpchar) AND ((c.client_id)::text = 'gadget'::text) AND
(c.contact_id= "outer".community_id)) 
                     Filter: ((("type")::text = 'COMMUNITY'::text) OR (("type")::text = 'OU'::text) OR (("type")::text
='INDIVIDUAL'::text)) 
Total runtime: 260.886 ms

whitebeam_slow=> \d contacts
                                       Table "public.contacts"
    Column    |          Type           |                         Modifiers
-------------+-------------------------+-----------------------------------------------------------
  instance    | character(1)            | not null
  client_id   | character varying(50)   | not null
  contact_id  | bigint                  | not null default nextval('contacts_contact_id_seq'::text)
  uname       | character varying(32)   | not null
  type        | character varying(20)   | not null default 'INDIVIDUAL'::character varying
  parent      | bigint                  |
  name        | character varying(240)  |
  phone       | character varying(32)   |
  fax         | character varying(32)   |
  email       | character varying(240)  |
  basic_pw    | character varying(128)  |
  description | character varying(240)  |
  custom_data | character varying(8192) |
Indexes:
     "cos_pk" PRIMARY KEY, btree (instance, client_id, contact_id)
     "cos_uk" UNIQUE, btree (instance, client_id, uname)
     "co_pa_ind" btree (parent)
     "co_ty_ind" btree ("type")

whitebeam_slow-> \d contact_att
             Table "public.contact_att"
     Column    |         Type          | Modifiers
--------------+-----------------------+-----------
  instance     | character(1)          | not null
  client_id    | character varying(50) | not null
  contact_id   | bigint                | not null
  community_id | bigint                | not null
  inherited    | smallint              |
Indexes:
     "ca_pk" PRIMARY KEY, btree (instance, client_id, contact_id, community_id)
Foreign-key constraints:
     "ca_cos_comm" FOREIGN KEY (instance, client_id, community_id) REFERENCES contacts(instance, client_id, contact_id)
ONUPDATE RESTRICT ON DELETE  
RESTRICT
     "ca_cos_fk" FOREIGN KEY (instance, client_id, contact_id) REFERENCES contacts(instance, client_id, contact_id) ON
UPDATERESTRICT ON DELETE RESTRICT 


------------------------------------------------------------------------
Peter Wilson - YellowHawk Ltd, http://www.yellowhawk.co.uk

Re: feeding big script to psql

From
Tom Lane
Date:
Peter Wilson <petew@yellowhawk.co.uk> writes:
> Tom Lane wrote:
>>> Oh?  Could you provide a test case for this?  I can certainly believe
>>> that the planner might choose a bad plan if it has no statistics, but
>>> it shouldn't take a long time to do it.

> On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is
> what pgadminIII does when you press the explain button.

Ah.  Well, this is an ideal example of why you need statistics ---
without 'em, the planner is more or less flying blind about the number
of matching rows.  The original plan had

>                ->  Index Scan using ca_pk on contact_att subb  (cost=0.00..6.01 rows=1 width=8) (actual
time=0.207..234.423rows=3 loops=2791) 
>                      Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
>                      Filter: ((contact_id)::numeric = 3854.000000)

while your "after a vacuum" (I suppose really a vacuum analyze) plan has

>                      ->  Index Scan using ca_pk on contact_att subb  (cost=0.00..1433.95 rows=78 width=8) (actual
time=0.367..259.617rows=3 loops=1) 
>                            Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
>                            Filter: ((contact_id)::numeric = 3854.000000)

This is the identical scan plan ... but now that the planner realizes
it's going to be pretty expensive, it arranges the join in a way that
requires only one scan of contact_att and not 2791 of 'em.

The key point here is that the index condition on instance/client_id
is not selective --- it'll pull out a lot of rows.  All but 3 of 'em are
then discarded by the contact_id condition, but the damage in terms
of runtime was already done.  With stats, the planner can realize this
--- without stats, it has no chance.

Looking at your table definition, I suppose you were expecting the
contact_id condition to be used with the index, but since contact_id is
bigint, comparing it to a numeric-type constant is not considered indexable.
You want to lose the ".000000" in the query.

            regards, tom lane

Re: feeding big script to psql

From
Tom Lane
Date:
=?iso-8859-2?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:
> Thanks for the suggestion. I have just applied both switch , -f (I have
> applied this in the previous case too) and -n, but it becomes slow again. At
> the beginning it reads about 300 KB a second, and when it has read 1.5 MB,
> it reads only about 10 KB a second, it slows down gradually. Maybe others
> should also try this scenario. Can I help anything?

Well, I don't see it happening here.  I made up a script consisting of a
whole lot of repetitions of

    insert into t1 values(1,2,3);

with one of these inserted every 1000 lines:

    \echo 1000 `date`

so I could track the performance.  I created a table by hand:

    create table t1(f1 int, f2 int, f3 int);

and then started the script with

    psql -q -f big.sql testdb

At the beginning I was seeing about two echoes per second.  I let it run
for an hour, and I was still seeing about two echoes per second.  That's
something close to 170MB of script file read (over 5.7 million rows
inserted by the time I stopped it).

So, either this test case is too simple to expose your problem, or
there's something platform-specific going on.  I don't have a windows
machine to try it on ...

            regards, tom lane

Re: feeding big script to psql

From
Peter Wilson
Date:
Tom Lane wrote:
> Peter Wilson <petew@yellowhawk.co.uk> writes:
>> Tom Lane wrote:
>>>> Oh?  Could you provide a test case for this?  I can certainly believe
>>>> that the planner might choose a bad plan if it has no statistics, but
>>>> it shouldn't take a long time to do it.
>
>> On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is
>> what pgadminIII does when you press the explain button.
>
> Ah.  Well, this is an ideal example of why you need statistics ---
> without 'em, the planner is more or less flying blind about the number
> of matching rows.  The original plan had
>
>>                ->  Index Scan using ca_pk on contact_att subb  (cost=0.00..6.01 rows=1 width=8) (actual
time=0.207..234.423rows=3 loops=2791) 
>>                      Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
>>                      Filter: ((contact_id)::numeric = 3854.000000)
>
> while your "after a vacuum" (I suppose really a vacuum analyze) plan has
>
>>                      ->  Index Scan using ca_pk on contact_att subb  (cost=0.00..1433.95 rows=78 width=8) (actual
time=0.367..259.617rows=3 loops=1) 
>>                            Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
>>                            Filter: ((contact_id)::numeric = 3854.000000)
>
> This is the identical scan plan ... but now that the planner realizes
> it's going to be pretty expensive, it arranges the join in a way that
> requires only one scan of contact_att and not 2791 of 'em.
>
> The key point here is that the index condition on instance/client_id
> is not selective --- it'll pull out a lot of rows.  All but 3 of 'em are
> then discarded by the contact_id condition, but the damage in terms
> of runtime was already done.  With stats, the planner can realize this
> --- without stats, it has no chance.
>
> Looking at your table definition, I suppose you were expecting the
> contact_id condition to be used with the index, but since contact_id is
> bigint, comparing it to a numeric-type constant is not considered indexable.
> You want to lose the ".000000" in the query.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
Thanks for that Tom - especially the bit about removing the .00000 from
the numbers. I'm pretty new to some of this database stuff - even newer at
trying to optimise queries and 'think like the planner'. Never occurred to
me the number format would have that effect.

Removing the zeroes actaully knocked a few ms of the execution times in
real-life querries :-)

Just out of interest - is there an opportunity for the planner to realise
the sub-select is basically invariant for the outer-query and execute once,
regardless of stats. Seems like the loop-invariant optimisation in a 'C'
compiler. If you have to do something once v. doing it 2791 times then
I'd plop for the once!

Thanks again Tom, much appreciated for that little nugget
Pete
--
Peter Wilson. YellowHawk Ltd, http://www.yellowhawk.co.uk

Re: feeding big script to psql

From
Havasvölgyi Ottó
Date:
Tom,

My queries were written in multi-line mode like this:

insert into t1 values(1,
2,
3);

I don't know, what effect this has to performace..

Regards,
Otto



----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Havasvölgyi Ottó" <h.otto@freemail.hu>
Cc: <pgsql-general@postgresql.org>
Sent: Wednesday, August 03, 2005 1:03 AM
Subject: Re: [GENERAL] feeding big script to psql


> =?iso-8859-2?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:
>> Thanks for the suggestion. I have just applied both switch , -f (I have
>> applied this in the previous case too) and -n, but it becomes slow again.
>> At
>> the beginning it reads about 300 KB a second, and when it has read 1.5
>> MB,
>> it reads only about 10 KB a second, it slows down gradually. Maybe others
>> should also try this scenario. Can I help anything?
>
> Well, I don't see it happening here.  I made up a script consisting of a
> whole lot of repetitions of
>
> insert into t1 values(1,2,3);
>
> with one of these inserted every 1000 lines:
>
> \echo 1000 `date`
>
> so I could track the performance.  I created a table by hand:
>
> create table t1(f1 int, f2 int, f3 int);
>
> and then started the script with
>
> psql -q -f big.sql testdb
>
> At the beginning I was seeing about two echoes per second.  I let it run
> for an hour, and I was still seeing about two echoes per second.  That's
> something close to 170MB of script file read (over 5.7 million rows
> inserted by the time I stopped it).
>
> So, either this test case is too simple to expose your problem, or
> there's something platform-specific going on.  I don't have a windows
> machine to try it on ...
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>
>



Re: feeding big script to psql

From
Tom Lane
Date:
=?iso-8859-2?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:
> My queries were written in multi-line mode like this:

> insert into t1 values(1,
> 2,
> 3);

> I don't know, what effect this has to performace..

I tried my test again that way, and it made no difference at all.

Can anyone else replicate this problem?

            regards, tom lane

Re: Tool for database design documentation?

From
"littlebutty"
Date:
Not a free utility, but a good one:
http://www.datanamic.com/dezign/index.html