Thread: How estimated rows is running ?

How estimated rows is running ?

From
Hervé Piedvache
Date:
Hi,

I have a database with one table with about 22 millions records.
I have a script every day wich insert about 200 000 records ...
Sometime my script takes 4 hours to insert the data, sometime 1 hour.

Then I have a question does for inserting data PostgreSQL use somewhere the
number of estimated rows ... ? For me no ... but ...

For example ...

mybases=# ANALYZE VERBOSE my_table;
INFO:  analyzing "public.my_table"
INFO:  "my_table": 434342 pages, 30000 rows sampled, 22585030 estimated total
rows
ANALYZE
Time: 173317.410 ms

I insert 200 000 data ... 4 hours ...

Then I get back to the same situation (other server same database in same
situation before the insert ... (same analyze as before)) ... I do the same
Analyze ... then I do a vacuum analyze like this after ... Why the estimated
total rows of the first index is not in correlation with the analyze done
before ? Why the total at the end of the vacuum is also different ? And why
after the vacuum analyze the insert takes 1 hour ??

mybases=# VACUUM VERBOSE ANALYZE my_table;
INFO:  vacuuming "public.my_table"
INFO:  index "ix_my_table_compte" now contains 21897465 row versions in 113659
pages
DETAIL:  270724 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 12.23s/20.53u sec elapsed 824.63 sec.
INFO:  index "ix_my_table_idcontract" now contains 21897465 row versions in
65647 pages
DETAIL:  270724 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 4.32s/14.52u sec elapsed 337.45 sec.
INFO:  index "ix_my_table_arrete_week" now contains 21897465 row versions in
87723 pages
DETAIL:  270724 index row versions were removed.
2780 index pages have been deleted, 1689 are currently reusable.
CPU 8.68s/11.40u sec elapsed 754.39 sec.
INFO:  "my_table": removed 270724 row versions in 5209 pages
DETAIL:  CPU 0.27s/0.74u sec elapsed 15.40 sec.
INFO:  "my_table": found 270724 removable, 21897465 nonremovable row versions
in 434342 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 416707 unused item pointers.
0 pages are entirely empty.
CPU 44.14s/52.89u sec elapsed 2247.39 sec.
INFO:  "my_table": truncated 434342 to 431553 pages
DETAIL:  CPU 0.06s/0.03u sec elapsed 0.08 sec.
INFO:  vacuuming "pg_toast.pg_toast_917442"
INFO:  index "pg_toast_917442_index" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_917442": found 0 removable, 0 nonremovable row versions in 0
pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.06 sec.
INFO:  analyzing "public.my_table"
INFO:  "my_table": 431553 pages, 30000 rows sampled, 22440007 estimated total
rows
VACUUM
Time: 2493147.338 ms

Thanks for you replies,
--
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

Re: How estimated rows is running ?

From
"scott.marlowe"
Date:
On Wed, 7 Apr 2004, [iso-8859-15] Hervé Piedvache wrote:

> Hi,
>
> I have a database with one table with about 22 millions records.
> I have a script every day wich insert about 200 000 records ...
> Sometime my script takes 4 hours to insert the data, sometime 1 hour.

Are you wrapping all your inserts into one or several large transactions?
This can make inserting data MUCH faster in postgresql.

Do you have fk/pk relationships?  If so, are they of the same types?

I wouldn't worry about the vacuum / analyze numbers being a little off.  A
small variance of a few percent is no biggie, it's when things are off by
factors that the planner gets confused and makes bad decisions.


Re: How estimated rows is running ?

From
Hervé Piedvache
Date:
Scott,

Le mercredi 7 Avril 2004 22:13, scott.marlowe a écrit :
> On Wed, 7 Apr 2004, [iso-8859-15] Hervé Piedvache wrote:
> >
> > I have a database with one table with about 22 millions records.
> > I have a script every day wich insert about 200 000 records ...
> > Sometime my script takes 4 hours to insert the data, sometime 1 hour.
>
> Are you wrapping all your inserts into one or several large transactions?
> This can make inserting data MUCH faster in postgresql.

It's one transaction ... because it's an INSERT from SELECT ...

> Do you have fk/pk relationships?  If so, are they of the same types?

No no fk/pk ...

> I wouldn't worry about the vacuum / analyze numbers being a little off.  A
> small variance of a few percent is no biggie, it's when things are off by
> factors that the planner gets confused and makes bad decisions.

Yes but he gets confused of about 150 000 rows ... it's few in front of the 22
millions ... but it's done just after an Analyze ... so why the analyze do
not get the information ??

Regards,
--
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902