Thread: Tool for database design documentation?
We are trying to find a good way to document our database design – what is the rationale behind each table/field, what kind of information is each field supposed to contain, perhaps also something about the linking between the tables etc. Is there anybody who has some experience on this? Is the built in ‘comments’ fields in pgsql the best tool or are there any special tools that would be recommendable?
Jon Christian Ottersen wrote: > We are trying to find a good way to document our database design – what > is the rationale behind each table/field, what kind of information is > each field supposed to contain, perhaps also something about the linking > between the tables etc. Is there anybody who has some experience on > this? Is the built in ‘comments’ fields in pgsql the best tool or are > there any special tools that would be recommendable? I prefer to use a third-party tool like Case Studio 2 (http://www.casestudio.com) for this. Import your schema into CS, document your various entities in the table/element/key/etc popups, then Export everything to HTML. CS generates pretty complete documentation.
We are trying to find a good way to document our database design – what is the rationale behind each table/field, what kind of information is each field supposed to contain, perhaps also something about the linking between the tables etc. Is there anybody who has some experience on this? Is the built in ‘comments’ fields in pgsql the best tool or are there any special tools that would be recommendable?
On Jul 30, 2005, at 2:30 PM, Jon Christian Ottersen wrote:We are trying to find a good way to document our database design – what is the rationale behind each table/field, what kind of information is each field supposed to contain, perhaps also something about the linking between the tables etc. Is there anybody who has some experience on this? Is the built in ‘comments’ fields in pgsql the best tool or are there any special tools that would be recommendable?
Druid (http://druid.sourceforge.net) works fairly well. (Though the interface is abysmal, it does its job). It makes javadoc-style documentation for tables and columns.Jeff
Hi, A generated a big SQL script (about 20 Mb), and fed it to psql. I was very surprised that within a minute psql became quite slow. There were areas, where less than 10 row were inserted in a second. This is on a WinXP machine with local server 8.0.3, and only I use it. Looking at the log files of PG the commands are executed fast. I have monitored psql with te TaskManager, and it was at 45-50% CPU (this is Hyperthreaded, so 50% is the max for one thread), and the memory allocation/deallocation was very active, even about +-2Mb/sec. The command were simple create table and insert commands chunked into several lines like this: CREATE TABLE aeloleg( vevo CHAR(6), szallito INTEGER, datum DATE, hatarido DATE, vevo_nev CHAR(30), ir_szam INTEGER, helyseg CHAR(20), cim CHAR(20), befizetes INTEGER, terheles INTEGER, hitel INTEGER, rendeles INTEGER, jel CHAR(1), trans INTEGER, szoveg TEXT, storno BOOLEAN) WITHOUT OIDS; The insert commands for one table were surrounded by BEGIN and COMMIT like this: CREATE BEGIN INSERT INSERT ... INSERT COMMIT I know it would be faster with COPY, but this is extremly slow, and the bottleneck is psql. What is the problem? Regards, Otto
=?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes: > I know it would be faster with COPY, but this is extremly slow, and the > bottleneck is psql. > What is the problem? Hmm, does the Windows port have readline support, and if so does adding the "-n" switch to the psql invocation fix the problem? Or you could try feeding the script with -f switch or \i rather than "psql <script". Readline adds a fair amount of overhead, which is completely invisible at human typing speeds but can be annoying when reading scripts. regards, tom lane
Tom, Thanks for the suggestion. I have just applied both switch , -f (I have applied this in the previous case too) and -n, but it becomes slow again. At the beginning it reads about 300 KB a second, and when it has read 1.5 MB, it reads only about 10 KB a second, it slows down gradually. Maybe others should also try this scenario. Can I help anything? Best Regards, Otto ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Havasvölgyi Ottó" <h.otto@freemail.hu> Cc: <pgsql-general@postgresql.org> Sent: Tuesday, August 02, 2005 3:54 AM Subject: Re: [GENERAL] feeding big script to psql > =?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes: >> I know it would be faster with COPY, but this is extremly slow, and the >> bottleneck is psql. >> What is the problem? > > Hmm, does the Windows port have readline support, and if so does adding > the "-n" switch to the psql invocation fix the problem? Or you could > try feeding the script with -f switch or \i rather than "psql <script". > Readline adds a fair amount of overhead, which is completely invisible > at human typing speeds but can be annoying when reading scripts. > > regards, tom lane > >
Hi, The effect is the same even if I redirect the output to file with the -o switch. At the beginning 200 KB/sec, at 1.5 MB the speed is less than 20 KB/sec. Best Regards, Otto ----- Original Message ----- From: "Havasvölgyi Ottó" <h.otto@freemail.hu> To: "Tom Lane" <tgl@sss.pgh.pa.us> Cc: <pgsql-general@postgresql.org> Sent: Tuesday, August 02, 2005 11:24 AM Subject: Re: [GENERAL] feeding big script to psql > Tom, > > Thanks for the suggestion. I have just applied both switch , -f (I have > applied this in the previous case too) and -n, but it becomes slow again. > At the beginning it reads about 300 KB a second, and when it has read 1.5 > MB, it reads only about 10 KB a second, it slows down gradually. Maybe > others should also try this scenario. Can I help anything? > > Best Regards, > Otto > > > ----- Original Message ----- > From: "Tom Lane" <tgl@sss.pgh.pa.us> > To: "Havasvölgyi Ottó" <h.otto@freemail.hu> > Cc: <pgsql-general@postgresql.org> > Sent: Tuesday, August 02, 2005 3:54 AM > Subject: Re: [GENERAL] feeding big script to psql > > >> =?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes: >>> I know it would be faster with COPY, but this is extremly slow, and the >>> bottleneck is psql. >>> What is the problem? >> >> Hmm, does the Windows port have readline support, and if so does adding >> the "-n" switch to the psql invocation fix the problem? Or you could >> try feeding the script with -f switch or \i rather than "psql <script". >> Readline adds a fair amount of overhead, which is completely invisible >> at human typing speeds but can be annoying when reading scripts. >> >> regards, tom lane >> >> > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > >
Hi, Now I am at 7 MB, and the reading speed is 3-4KB/sec. Best Regards, Otto ----- Original Message ----- From: "Havasvölgyi Ottó" <h.otto@freemail.hu> To: <pgsql-general@postgresql.org> Sent: Tuesday, August 02, 2005 1:31 PM Subject: Re: [GENERAL] feeding big script to psql > Hi, > > The effect is the same even if I redirect the output to file with the -o > switch. > At the beginning 200 KB/sec, at 1.5 MB the speed is less than 20 KB/sec. > > Best Regards, > Otto > > > > ----- Original Message ----- > From: "Havasvölgyi Ottó" <h.otto@freemail.hu> > To: "Tom Lane" <tgl@sss.pgh.pa.us> > Cc: <pgsql-general@postgresql.org> > Sent: Tuesday, August 02, 2005 11:24 AM > Subject: Re: [GENERAL] feeding big script to psql > > >> Tom, >> >> Thanks for the suggestion. I have just applied both switch , -f (I have >> applied this in the previous case too) and -n, but it becomes slow again. >> At the beginning it reads about 300 KB a second, and when it has read 1.5 >> MB, it reads only about 10 KB a second, it slows down gradually. Maybe >> others should also try this scenario. Can I help anything? >> >> Best Regards, >> Otto >> >> >> ----- Original Message ----- >> From: "Tom Lane" <tgl@sss.pgh.pa.us> >> To: "Havasvölgyi Ottó" <h.otto@freemail.hu> >> Cc: <pgsql-general@postgresql.org> >> Sent: Tuesday, August 02, 2005 3:54 AM >> Subject: Re: [GENERAL] feeding big script to psql >> >> >>> =?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes: >>>> I know it would be faster with COPY, but this is extremly slow, and the >>>> bottleneck is psql. >>>> What is the problem? >>> >>> Hmm, does the Windows port have readline support, and if so does adding >>> the "-n" switch to the psql invocation fix the problem? Or you could >>> try feeding the script with -f switch or \i rather than "psql <script". >>> Readline adds a fair amount of overhead, which is completely invisible >>> at human typing speeds but can be annoying when reading scripts. >>> >>> regards, tom lane >>> >>> >> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 1: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly >> >> > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > >
At 1:57 PM +0200 8/2/05, Havasvölgyi Ottó wrote: >Hi, > >Now I am at 7 MB, and the reading speed is 3-4KB/sec. Have you checked to see if you're swapping as this goes on, either in the client or on the server? >----- Original Message ----- From: "Havasvölgyi Ottó" <h.otto@freemail.hu> >To: <pgsql-general@postgresql.org> >Sent: Tuesday, August 02, 2005 1:31 PM >Subject: Re: [GENERAL] feeding big script to psql > >>Hi, >> >>The effect is the same even if I redirect the >>output to file with the -o switch. >>At the beginning 200 KB/sec, at 1.5 MB the speed is less than 20 KB/sec. >> >>Best Regards, >>Otto >> >> >> >>----- Original Message ----- From: "Havasvölgyi Ottó" <h.otto@freemail.hu> >>To: "Tom Lane" <tgl@sss.pgh.pa.us> >>Cc: <pgsql-general@postgresql.org> >>Sent: Tuesday, August 02, 2005 11:24 AM >>Subject: Re: [GENERAL] feeding big script to psql >> >>>Tom, >>> >>>Thanks for the suggestion. I have just applied >>>both switch , -f (I have applied this in the >>>previous case too) and -n, but it becomes slow >>>again. At the beginning it reads about 300 KB >>>a second, and when it has read 1.5 MB, it >>>reads only about 10 KB a second, it slows down >>>gradually. Maybe others should also try this >>>scenario. Can I help anything? >>> >>>Best Regards, >>>Otto >>> >>> >>>----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> >>>To: "Havasvölgyi Ottó" <h.otto@freemail.hu> >>>Cc: <pgsql-general@postgresql.org> >>>Sent: Tuesday, August 02, 2005 3:54 AM >>>Subject: Re: [GENERAL] feeding big script to psql >>> >>>>=?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes: >>>>>I know it would be faster with COPY, but this is extremly slow, and the >>>>>bottleneck is psql. >>>>>What is the problem? >>>> >>>>Hmm, does the Windows port have readline support, and if so does adding >>>>the "-n" switch to the psql invocation fix the problem? Or you could >>>>try feeding the script with -f switch or \i rather than "psql <script". >>>>Readline adds a fair amount of overhead, which is completely invisible >>>>at human typing speeds but can be annoying when reading scripts. >>>> >>>>regards, tom lane >>>> >>> >>> >>> >>>---------------------------(end of broadcast)--------------------------- >>>TIP 1: if posting/reading through Usenet, please send an appropriate >>> subscribe-nomail command to majordomo@postgresql.org so that your >>> message can get through to the mailing list cleanly >>> >> >> >> >>---------------------------(end of broadcast)--------------------------- >>TIP 4: Have you searched our list archives? >> >> http://archives.postgresql.org >> > > > >---------------------------(end of broadcast)--------------------------- >TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Dan --------------------------------------it's like this------------------- Dan Sugalski even samurai dan@sidhe.org have teddy bears and even teddy bears get drunk
Dan Sugalski wrote: > At 1:57 PM +0200 8/2/05, Havasvölgyi Ottó wrote: >> Hi, >> >> Now I am at 7 MB, and the reading speed is 3-4KB/sec. > > Have you checked to see if you're swapping as this goes on, either in > the client or on the server? > >> ----- Original Message ----- From: "Havasvölgyi Ottó" >> <h.otto@freemail.hu> >> To: <pgsql-general@postgresql.org> >> Sent: Tuesday, August 02, 2005 1:31 PM >> Subject: Re: [GENERAL] feeding big script to psql >> >>> Hi, >>> >>> The effect is the same even if I redirect the output to file with the >>> -o switch. >>> At the beginning 200 KB/sec, at 1.5 MB the speed is less than 20 KB/sec. >>> >>> Best Regards, >>> Otto >>> >>> >>> >>> ----- Original Message ----- From: "Havasvölgyi Ottó" >>> <h.otto@freemail.hu> >>> To: "Tom Lane" <tgl@sss.pgh.pa.us> >>> Cc: <pgsql-general@postgresql.org> >>> Sent: Tuesday, August 02, 2005 11:24 AM >>> Subject: Re: [GENERAL] feeding big script to psql >>> >>>> Tom, >>>> >>>> Thanks for the suggestion. I have just applied both switch , -f (I >>>> have applied this in the previous case too) and -n, but it becomes >>>> slow again. At the beginning it reads about 300 KB a second, and >>>> when it has read 1.5 MB, it reads only about 10 KB a second, it >>>> slows down gradually. Maybe others should also try this scenario. >>>> Can I help anything? >>>> >>>> Best Regards, >>>> Otto >>>> >>>> >>>> ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> >>>> To: "Havasvölgyi Ottó" <h.otto@freemail.hu> >>>> Cc: <pgsql-general@postgresql.org> >>>> Sent: Tuesday, August 02, 2005 3:54 AM >>>> Subject: Re: [GENERAL] feeding big script to psql >>>> >>>>> =?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes: >>>>>> I know it would be faster with COPY, but this is extremly slow, >>>>>> and the >>>>>> bottleneck is psql. >>>>>> What is the problem? >>>>> >>>>> Hmm, does the Windows port have readline support, and if so does >>>>> adding >>>>> the "-n" switch to the psql invocation fix the problem? Or you could >>>>> try feeding the script with -f switch or \i rather than "psql >>>>> <script". >>>>> Readline adds a fair amount of overhead, which is completely invisible >>>>> at human typing speeds but can be annoying when reading scripts. >>>>> >>>>> regards, tom lane >>>>> >>>> >>>> >>>> >>>> ---------------------------(end of >>>> broadcast)--------------------------- >>>> TIP 1: if posting/reading through Usenet, please send an appropriate >>>> subscribe-nomail command to majordomo@postgresql.org so that your >>>> message can get through to the mailing list cleanly >>>> >>> >>> >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 4: Have you searched our list archives? >>> >>> http://archives.postgresql.org >>> >> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 9: In versions below 8.0, the planner will ignore your desire to >> choose an index scan if your joining column's datatypes do not >> match > > > -- > Dan > > --------------------------------------it's like this------------------- > Dan Sugalski even samurai > dan@sidhe.org have teddy bears and even > teddy bears get drunk > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > Have you tried inserting VACUUM commands into the script every now and then? I found a while ago that after inserting a lot of rows into a clean Postgres table it would take several minutes just to analyse a command, not even starting the execution. That was on version 8.0. On version 7.4.x the query never returned at all. Pete -- Peter Wilson - YellowHawk Ltd : http://www.yellowhawk.co.uk
On Tue, 2005-08-02 at 04:24, Havasvölgyi Ottó wrote: > Tom, > > Thanks for the suggestion. I have just applied both switch , -f (I have > applied this in the previous case too) and -n, but it becomes slow again. At > the beginning it reads about 300 KB a second, and when it has read 1.5 MB, > it reads only about 10 KB a second, it slows down gradually. Maybe others > should also try this scenario. Can I help anything? I be you've got an issue where a seq scan on an fk field or something works fine for the first few thousand rows. At some point, pgsql should switch to an index scan, but it just doesn't know it. Try wrapping every 10,000 or so inserts with begin; <insert 10,000 rows> commit; analyze; begin; rinse, wash repeat. You probably won't need an analyze after the first one though.
Peter Wilson <petew@yellowhawk.co.uk> writes: > I found a while ago that after inserting a lot of rows into a clean > Postgres table it would take several minutes just to analyse a command, > not even starting the execution. Oh? Could you provide a test case for this? I can certainly believe that the planner might choose a bad plan if it has no statistics, but it shouldn't take a long time to do it. regards, tom lane
I was a little busy with deadlines at the time but I saved the database in it's slow configuration so I could investigate during a quieter period. I'll do a restore now and see whether I can remember back to April when I came across this issue. Pete Tom Lane wrote: > Peter Wilson <petew@yellowhawk.co.uk> writes: >> I found a while ago that after inserting a lot of rows into a clean >> Postgres table it would take several minutes just to analyse a command, >> not even starting the execution. > > Oh? Could you provide a test case for this? I can certainly believe > that the planner might choose a bad plan if it has no statistics, but > it shouldn't take a long time to do it. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
Scott, There were no foreign keys (even no indices) during data import, and none of the tables had more than 4000 records. And I have checked the log for durations, and all insert statements were 0.000 ms. So it seems that the problem is not at the server. During the process no other application did anything. No other HDD activity either. Best Regadrs, Otto ----- Original Message ----- From: "Scott Marlowe" <smarlowe@g2switchworks.com> To: "Havasvölgyi Ottó" <h.otto@freemail.hu> Cc: "Tom Lane" <tgl@sss.pgh.pa.us>; <pgsql-general@postgresql.org> Sent: Tuesday, August 02, 2005 5:57 PM Subject: Re: [GENERAL] feeding big script to psql On Tue, 2005-08-02 at 04:24, Havasvölgyi Ottó wrote: > Tom, > > Thanks for the suggestion. I have just applied both switch , -f (I have > applied this in the previous case too) and -n, but it becomes slow again. > At > the beginning it reads about 300 KB a second, and when it has read 1.5 MB, > it reads only about 10 KB a second, it slows down gradually. Maybe others > should also try this scenario. Can I help anything? I be you've got an issue where a seq scan on an fk field or something works fine for the first few thousand rows. At some point, pgsql should switch to an index scan, but it just doesn't know it. Try wrapping every 10,000 or so inserts with begin; <insert 10,000 rows> commit; analyze; begin; rinse, wash repeat. You probably won't need an analyze after the first one though. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings
Tom Lane wrote: > Peter Wilson <petew@yellowhawk.co.uk> writes: >> I found a while ago that after inserting a lot of rows into a clean >> Postgres table it would take several minutes just to analyse a command, >> not even starting the execution. > > Oh? Could you provide a test case for this? I can certainly believe > that the planner might choose a bad plan if it has no statistics, but > it shouldn't take a long time to do it. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > Hi Tom, I've spent half an hour restoring the database from the backup I took. This was back in December '04. The database dump isabout 95Mbytes, none of the tables are particularly big. On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is what pgadminIII does when you press the explain button.According to the manual, this actually executes the command and gives actual times (I think pgadmin is wrong here - if you ask it to explain a commandyou want the plan it's going to use - not wait for it to finish!) That said - I'll include the details here, at least partly because the tables are small in database terms, and I don't understandhow this query even with lots of scanning can take 100% CPU for over 10 minutes (654 seconds for explain/analyze 651 seconds for execute). OK - the query is: SELECT DISTINCT c.client_id, c.instance, c.contact_id, c.uname FROM contacts c WHERE c.client_id = 'gadget' AND c.instance = '0' AND ( c.type = 'COMMUNITY' OR c.type = 'OU' OR c.type = 'INDIVIDUAL' ) AND c.contact_id in ( SELECT subb.community_id FROM contact_att subb WHERE subb.client_id = 'gadget' AND subb.instance = '0' AND subb.contact_id = 3854.000000 ) ; By itself the sub-select executes in 235ms and yields a set that does not vary depending on the outer select - it's a constantset in effect. It would seem that in the worst case assuming no index or size information, the planner should spot the invariance of the subselectand a sequential scan of the 'contacts' table would be the worst result I would expect. There are two tables involved in this query. 'contacts' contains 3728 records. 'contact_att' contains 19217 records. The query plan yields : Unique (cost=12.05..12.06 rows=1 width=90) (actual time=654491.967..654491.984 rows=3 loops=1) -> Sort (cost=12.05..12.05 rows=1 width=90) (actual time=654491.958..654491.959 rows=3 loops=1) Sort Key: c.client_id, c.instance, c.contact_id, c.uname -> Nested Loop IN Join (cost=0.00..12.04 rows=1 width=90) (actual time=577763.884..654491.864 rows=3 loops=1) Join Filter: ("outer".contact_id = "inner".community_id) -> Index Scan using cos_uk on contacts c (cost=0.00..6.02 rows=1 width=90) (actual time=0.227..123.862 rows=2791loops=1) Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text)) Filter: ((("type")::text = 'COMMUNITY'::text) OR (("type")::text = 'OU'::text) OR (("type")::text ='INDIVIDUAL'::text)) -> Index Scan using ca_pk on contact_att subb (cost=0.00..6.01 rows=1 width=8) (actual time=0.207..234.423rows=3 loops=2791) Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text)) Filter: ((contact_id)::numeric = 3854.000000) Total runtime: 654492.320 ms Definitions for the two relevant tables are: create table contacts ( INSTANCE CHARACTER (1) NOT NULL , client_id varchar (50) not null , contact_id bigint default nextval('contacts_contact_id_seq'::text), UNAME VARCHAR (32) NOT NULL , TYPE VARCHAR (20) NOT NULL DEFAULT 'INDIVIDUAL', parent bigint, NAME VARCHAR (240) , PHONE VARCHAR (32) , FAX VARCHAR (32) , EMAIL VARCHAR (240) , BASIC_PW VARCHAR (128) , DESCRIPTION VARCHAR (240), custom_data varchar(8192), CONSTRAINT COS_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID) ) create table contact_att( instance character(1), client_id varchar(50) not null, contact_id bigint, community_id bigint, inherited smallint, CONSTRAINT CA_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID,COMMUNITY_ID) ) CREATE INDEX CO_PA_IND ON CONTACTS (PARENT); CREATE INDEX CO_TY_IND ON CONTACTS (TYPE); -- Hope you find that useful! If there is anything else I can provide you with then please do let me know. Pete ------------------------------------------------------------------------ Peter Wilson - YellowHawk Ltd, http://www.yellowhawk.co.uk
Tom Lane wrote: > Peter Wilson <petew@yellowhawk.co.uk> writes: >> I found a while ago that after inserting a lot of rows into a clean >> Postgres table it would take several minutes just to analyse a command, >> not even starting the execution. > > Oh? Could you provide a test case for this? I can certainly believe > that the planner might choose a bad plan if it has no statistics, but > it shouldn't take a long time to do it. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > After a vacuum the query plan becomes: Unique (cost=1438.65..1438.66 rows=1 width=39) (actual time=260.473..260.489 rows=3 loops=1) -> Sort (cost=1438.65..1438.65 rows=1 width=39) (actual time=260.468..260.471 rows=3 loops=1) Sort Key: c.client_id, c.instance, c.contact_id, c.uname -> Nested Loop (cost=1434.14..1438.64 rows=1 width=39) (actual time=260.007..260.306 rows=3 loops=1) -> HashAggregate (cost=1434.14..1434.14 rows=1 width=8) (actual time=259.666..259.686 rows=3 loops=1) -> Index Scan using ca_pk on contact_att subb (cost=0.00..1433.95 rows=78 width=8) (actual time=0.367..259.617rows=3 loops=1) Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text)) Filter: ((contact_id)::numeric = 3854.000000) -> Index Scan using cos_pk on contacts c (cost=0.00..4.48 rows=1 width=39) (actual time=0.178..0.182 rows=1loops=3) Index Cond: ((c.instance = '0'::bpchar) AND ((c.client_id)::text = 'gadget'::text) AND (c.contact_id= "outer".community_id)) Filter: ((("type")::text = 'COMMUNITY'::text) OR (("type")::text = 'OU'::text) OR (("type")::text ='INDIVIDUAL'::text)) Total runtime: 260.886 ms whitebeam_slow=> \d contacts Table "public.contacts" Column | Type | Modifiers -------------+-------------------------+----------------------------------------------------------- instance | character(1) | not null client_id | character varying(50) | not null contact_id | bigint | not null default nextval('contacts_contact_id_seq'::text) uname | character varying(32) | not null type | character varying(20) | not null default 'INDIVIDUAL'::character varying parent | bigint | name | character varying(240) | phone | character varying(32) | fax | character varying(32) | email | character varying(240) | basic_pw | character varying(128) | description | character varying(240) | custom_data | character varying(8192) | Indexes: "cos_pk" PRIMARY KEY, btree (instance, client_id, contact_id) "cos_uk" UNIQUE, btree (instance, client_id, uname) "co_pa_ind" btree (parent) "co_ty_ind" btree ("type") whitebeam_slow-> \d contact_att Table "public.contact_att" Column | Type | Modifiers --------------+-----------------------+----------- instance | character(1) | not null client_id | character varying(50) | not null contact_id | bigint | not null community_id | bigint | not null inherited | smallint | Indexes: "ca_pk" PRIMARY KEY, btree (instance, client_id, contact_id, community_id) Foreign-key constraints: "ca_cos_comm" FOREIGN KEY (instance, client_id, community_id) REFERENCES contacts(instance, client_id, contact_id) ONUPDATE RESTRICT ON DELETE RESTRICT "ca_cos_fk" FOREIGN KEY (instance, client_id, contact_id) REFERENCES contacts(instance, client_id, contact_id) ON UPDATERESTRICT ON DELETE RESTRICT ------------------------------------------------------------------------ Peter Wilson - YellowHawk Ltd, http://www.yellowhawk.co.uk
Peter Wilson <petew@yellowhawk.co.uk> writes: > Tom Lane wrote: >>> Oh? Could you provide a test case for this? I can certainly believe >>> that the planner might choose a bad plan if it has no statistics, but >>> it shouldn't take a long time to do it. > On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is > what pgadminIII does when you press the explain button. Ah. Well, this is an ideal example of why you need statistics --- without 'em, the planner is more or less flying blind about the number of matching rows. The original plan had > -> Index Scan using ca_pk on contact_att subb (cost=0.00..6.01 rows=1 width=8) (actual time=0.207..234.423rows=3 loops=2791) > Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text)) > Filter: ((contact_id)::numeric = 3854.000000) while your "after a vacuum" (I suppose really a vacuum analyze) plan has > -> Index Scan using ca_pk on contact_att subb (cost=0.00..1433.95 rows=78 width=8) (actual time=0.367..259.617rows=3 loops=1) > Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text)) > Filter: ((contact_id)::numeric = 3854.000000) This is the identical scan plan ... but now that the planner realizes it's going to be pretty expensive, it arranges the join in a way that requires only one scan of contact_att and not 2791 of 'em. The key point here is that the index condition on instance/client_id is not selective --- it'll pull out a lot of rows. All but 3 of 'em are then discarded by the contact_id condition, but the damage in terms of runtime was already done. With stats, the planner can realize this --- without stats, it has no chance. Looking at your table definition, I suppose you were expecting the contact_id condition to be used with the index, but since contact_id is bigint, comparing it to a numeric-type constant is not considered indexable. You want to lose the ".000000" in the query. regards, tom lane
=?iso-8859-2?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes: > Thanks for the suggestion. I have just applied both switch , -f (I have > applied this in the previous case too) and -n, but it becomes slow again. At > the beginning it reads about 300 KB a second, and when it has read 1.5 MB, > it reads only about 10 KB a second, it slows down gradually. Maybe others > should also try this scenario. Can I help anything? Well, I don't see it happening here. I made up a script consisting of a whole lot of repetitions of insert into t1 values(1,2,3); with one of these inserted every 1000 lines: \echo 1000 `date` so I could track the performance. I created a table by hand: create table t1(f1 int, f2 int, f3 int); and then started the script with psql -q -f big.sql testdb At the beginning I was seeing about two echoes per second. I let it run for an hour, and I was still seeing about two echoes per second. That's something close to 170MB of script file read (over 5.7 million rows inserted by the time I stopped it). So, either this test case is too simple to expose your problem, or there's something platform-specific going on. I don't have a windows machine to try it on ... regards, tom lane
Tom Lane wrote: > Peter Wilson <petew@yellowhawk.co.uk> writes: >> Tom Lane wrote: >>>> Oh? Could you provide a test case for this? I can certainly believe >>>> that the planner might choose a bad plan if it has no statistics, but >>>> it shouldn't take a long time to do it. > >> On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is >> what pgadminIII does when you press the explain button. > > Ah. Well, this is an ideal example of why you need statistics --- > without 'em, the planner is more or less flying blind about the number > of matching rows. The original plan had > >> -> Index Scan using ca_pk on contact_att subb (cost=0.00..6.01 rows=1 width=8) (actual time=0.207..234.423rows=3 loops=2791) >> Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text)) >> Filter: ((contact_id)::numeric = 3854.000000) > > while your "after a vacuum" (I suppose really a vacuum analyze) plan has > >> -> Index Scan using ca_pk on contact_att subb (cost=0.00..1433.95 rows=78 width=8) (actual time=0.367..259.617rows=3 loops=1) >> Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text)) >> Filter: ((contact_id)::numeric = 3854.000000) > > This is the identical scan plan ... but now that the planner realizes > it's going to be pretty expensive, it arranges the join in a way that > requires only one scan of contact_att and not 2791 of 'em. > > The key point here is that the index condition on instance/client_id > is not selective --- it'll pull out a lot of rows. All but 3 of 'em are > then discarded by the contact_id condition, but the damage in terms > of runtime was already done. With stats, the planner can realize this > --- without stats, it has no chance. > > Looking at your table definition, I suppose you were expecting the > contact_id condition to be used with the index, but since contact_id is > bigint, comparing it to a numeric-type constant is not considered indexable. > You want to lose the ".000000" in the query. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > Thanks for that Tom - especially the bit about removing the .00000 from the numbers. I'm pretty new to some of this database stuff - even newer at trying to optimise queries and 'think like the planner'. Never occurred to me the number format would have that effect. Removing the zeroes actaully knocked a few ms of the execution times in real-life querries :-) Just out of interest - is there an opportunity for the planner to realise the sub-select is basically invariant for the outer-query and execute once, regardless of stats. Seems like the loop-invariant optimisation in a 'C' compiler. If you have to do something once v. doing it 2791 times then I'd plop for the once! Thanks again Tom, much appreciated for that little nugget Pete -- Peter Wilson. YellowHawk Ltd, http://www.yellowhawk.co.uk
Tom, My queries were written in multi-line mode like this: insert into t1 values(1, 2, 3); I don't know, what effect this has to performace.. Regards, Otto ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Havasvölgyi Ottó" <h.otto@freemail.hu> Cc: <pgsql-general@postgresql.org> Sent: Wednesday, August 03, 2005 1:03 AM Subject: Re: [GENERAL] feeding big script to psql > =?iso-8859-2?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes: >> Thanks for the suggestion. I have just applied both switch , -f (I have >> applied this in the previous case too) and -n, but it becomes slow again. >> At >> the beginning it reads about 300 KB a second, and when it has read 1.5 >> MB, >> it reads only about 10 KB a second, it slows down gradually. Maybe others >> should also try this scenario. Can I help anything? > > Well, I don't see it happening here. I made up a script consisting of a > whole lot of repetitions of > > insert into t1 values(1,2,3); > > with one of these inserted every 1000 lines: > > \echo 1000 `date` > > so I could track the performance. I created a table by hand: > > create table t1(f1 int, f2 int, f3 int); > > and then started the script with > > psql -q -f big.sql testdb > > At the beginning I was seeing about two echoes per second. I let it run > for an hour, and I was still seeing about two echoes per second. That's > something close to 170MB of script file read (over 5.7 million rows > inserted by the time I stopped it). > > So, either this test case is too simple to expose your problem, or > there's something platform-specific going on. I don't have a windows > machine to try it on ... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > >
=?iso-8859-2?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes: > My queries were written in multi-line mode like this: > insert into t1 values(1, > 2, > 3); > I don't know, what effect this has to performace.. I tried my test again that way, and it made no difference at all. Can anyone else replicate this problem? regards, tom lane
Not a free utility, but a good one: http://www.datanamic.com/dezign/index.html