Thread: slow seqscan after vacuum analize

slow seqscan after vacuum analize

From
"Edoardo Ceccarelli"
Date:
I have a simple query that scans each record, like this: select * from utente where
luogorilasciodoc='ciao'
The execution time BEFORE vacuum is: 1203ms The execution time AFTER vacuum is: 6656ms
!!!
What is going on? Thought that one vaccum the db to get better performance!

PostgreSQL Ver. 7.3.4 on linux

Thank you
Edoardo


Re: slow seqscan after vacuum analize

From
"scott.marlowe"
Date:
On Wed, 4 Feb 2004, Edoardo Ceccarelli wrote:

> I have a simple query that scans each record, like this: select * from utente where
> luogorilasciodoc='ciao'
> The execution time BEFORE vacuum is: 1203ms The execution time AFTER vacuum is: 6656ms
> !!!
> What is going on? Thought that one vaccum the db to get better performance!
>
> PostgreSQL Ver. 7.3.4 on linux

Can you show us the output of "explain analyze select * from ..." for both
before and after a vacuum?   i.e. with the difference in performance.  If
not, just an explain analyze of the way it runs now might be enough.


R: slow seqscan after vacuum analize

From
"Edoardo Ceccarelli"
Date:
> -----Messaggio originale-----
> Da: scott.marlowe [mailto:scott.marlowe@ihs.com]
> Inviato: mercoledì 4 febbraio 2004 22.45
> A: Edoardo Ceccarelli
> Cc: pgsql-admin@postgresql.org
> Oggetto: Re: [ADMIN] slow seqscan after vacuum analize
>
> On Wed, 4 Feb 2004, Edoardo Ceccarelli wrote:
>
> > I have a simple query that scans each record, like this:
> select * from
> > utente where luogorilasciodoc='ciao'
> > The execution time BEFORE vacuum is: 1203ms The execution
> time AFTER
> > vacuum is: 6656ms !!!
> > What is going on? Thought that one vaccum the db to get
> better performance!
> >
> > PostgreSQL Ver. 7.3.4 on linux
>
> Can you show us the output of "explain analyze select * from
> ..." for both
> before and after a vacuum?   i.e. with the difference in
> performance.  If
> not, just an explain analyze of the way it runs now might be enough.
>


BEFORE (copy of the db without having run the vacuum):

explain analyze SELECT * FROM utente where luogorilasciodoc='ciao';
                                                 QUERY PLAN

--------------------------------------------------------------------------------------
----------------------
 Seq Scan on utente  (cost=0.00..40947.90 rows=1826 width=724) (actual
time=131.17..856.96 rows=15 loops=1)
   Filter: (luogorilasciodoc = 'ciao'::bpchar)
 Total runtime: 857.06 msec
(3 rows)


AFTER (actual db)
portaportese=# explain analyze SELECT * FROM utente where luogorilasciodoc='ciao';
                                                QUERY PLAN

--------------------------------------------------------------------------------------
--------------------
 Seq Scan on utente  (cost=0.00..92174.50 rows=3 width=724) (actual
time=705.41..6458.19 rows=15 loops=1)
   Filter: (luogorilasciodoc = 'ciao'::bpchar)
 Total runtime: 6458.29 msec
(3 rows

Things are worst only for seqscan, when it uses indexscan timing is good.



Re: R: slow seqscan after vacuum analize

From
Sam Barnett-Cormack
Date:
On Thu, 5 Feb 2004, Edoardo Ceccarelli wrote:

> BEFORE (copy of the db without having run the vacuum):
>
> explain analyze SELECT * FROM utente where luogorilasciodoc='ciao';
>                                                  QUERY PLAN
>
> --------------------------------------------------------------------------------------
> ----------------------
>  Seq Scan on utente  (cost=0.00..40947.90 rows=1826 width=724) (actual
> time=131.17..856.96 rows=15 loops=1)
>    Filter: (luogorilasciodoc = 'ciao'::bpchar)
>  Total runtime: 857.06 msec
> (3 rows)
>
>
> AFTER (actual db)
> portaportese=# explain analyze SELECT * FROM utente where luogorilasciodoc='ciao';
>                                                 QUERY PLAN
>
> --------------------------------------------------------------------------------------
> --------------------
>  Seq Scan on utente  (cost=0.00..92174.50 rows=3 width=724) (actual
> time=705.41..6458.19 rows=15 loops=1)
>    Filter: (luogorilasciodoc = 'ciao'::bpchar)
>  Total runtime: 6458.29 msec
> (3 rows
>
> Things are worst only for seqscan, when it uses indexscan timing is good.

Only thing I can think of is if storage method had been changed. Not
sure if that would even affect it, or if it could do that by itself.
Just brainstorming.

--

Sam Barnett-Cormack
Software Developer                           |  Student of Physics & Maths
UK Mirror Service (http://www.mirror.ac.uk)  |  Lancaster University

R: R: slow seqscan after vacuum analize

From
"Edoardo Ceccarelli"
Date:
> ----------------------------------------------------------------------
> > ----------------
> > --------------------
> >  Seq Scan on utente  (cost=0.00..92174.50 rows=3 width=724) (actual
> > time=705.41..6458.19 rows=15 loops=1)
> >    Filter: (luogorilasciodoc = 'ciao'::bpchar)  Total
> runtime: 6458.29
> > msec
> > (3 rows
> >
> > Things are worst only for seqscan, when it uses indexscan
> timing is good.
>
> Only thing I can think of is if storage method had been
> changed. Not sure if that would even affect it, or if it
> could do that by itself.
> Just brainstorming.
>

Do you know how can I check if the storage method has changed?
I was thinking that the priority target of a vacuum operation is to reclaim disk space
- this might imply that the performance are worst for a seqscan - maybe it's normal.
Anyway, I am doing a VACUUM FULL ANALYZE right now to see if things get better.

Thanks for you hints
Edoardo


Re: R: slow seqscan after vacuum analize

From
"David F. Skoll"
Date:
On Thu, 5 Feb 2004, Edoardo Ceccarelli wrote:

> Things are worst only for seqscan, when it uses indexscan timing is good.

It might just be that running VACUUM flushed the operating system's
buffer cache.

I always try running two or three EXPLAIN ANALYZE's in a row for the
same query.  Usually, the first one takes much longer than the rest.

Regards,

David.

R: R: slow seqscan after vacuum analize

From
"Edoardo Ceccarelli"
Date:
Yes, you are right but it wasn't the case this time, I have run the explain plenty of
times with same results. I think that the reason was that I made a simple VACUUM,
after a VACUUM FULL ANALYZE (1h!!) things are ok

Best Regards
Edoardo

> -----Messaggio originale-----
> Da: David F. Skoll [mailto:dfs@roaringpenguin.com]
> Inviato: giovedì 5 febbraio 2004 3.17
> A: Edoardo Ceccarelli
> Cc: pgsql-admin@postgresql.org
> Oggetto: Re: R: [ADMIN] slow seqscan after vacuum analize
>
> On Thu, 5 Feb 2004, Edoardo Ceccarelli wrote:
>
> > Things are worst only for seqscan, when it uses indexscan
> timing is good.
>
> It might just be that running VACUUM flushed the operating
> system's buffer cache.
>
> I always try running two or three EXPLAIN ANALYZE's in a row
> for the same query.  Usually, the first one takes much longer
> than the rest.
>
> Regards,
>
> David.
>
>


Re: R: slow seqscan after vacuum analize

From
"Iain"
Date:

> Yes, you are right but it wasn't the case this time, I have run the
explain plenty of
> times with same results. I think that the reason was that I made a simple
VACUUM,
> after a VACUUM FULL ANALYZE (1h!!) things are ok

It's reasonable to expect that a seq scan will perform faster after a full
vacuum, as the physical size and organization of the table has been changed.

I wouldn't expact a plain vacuum to have any tangible affect on performance,
for the better or for the worse..

I'd like to know more about the possibility of plain vacuums harming
performance. This is the first I've heard of it. Vacuum full is not always
an option in a production environment.


Re: R: R: slow seqscan after vacuum analize

From
Sam Barnett-Cormack
Date:
On Thu, 5 Feb 2004, Edoardo Ceccarelli wrote:

> after a VACUUM FULL ANALYZE (1h!!) things are ok

Only a hour - lucky you ;)

--

Sam Barnett-Cormack
Software Developer                           |  Student of Physics & Maths
UK Mirror Service (http://www.mirror.ac.uk)  |  Lancaster University

Re: R: R: slow seqscan after vacuum analize

From
Christopher Browne
Date:
In the last exciting episode, eddy@axa.it ("Edoardo Ceccarelli") wrote:
> Yes, you are right but it wasn't the case this time, I have run the
> explain plenty of times with same results. I think that the reason
> was that I made a simple VACUUM, after a VACUUM FULL ANALYZE (1h!!)
> things are ok

It sounds as though you weren't vacuuming ("just plain vacuum") often
enough.

What tends to go wrong is when the table winds up with a LOT of empty
space due to there being a lot of updates to the table without dead
tuples being cleaned out.  The table winds up big, with no way to
shrink it without the cost of a VACUUM FULL.

If you vacuumed more often, the size of the table would likely stay
smaller which is sure to be helpful.

Another factor worth considering: If a few values are very common in
the field you are selecting on, then the query optimizer can get
convinced (wrongly) that a Seq Scan is the best choice.  Using ALTER
TABLE T ALTER COLUMN C SET STATISTICS [some value] to increase the
number of "bins" can be helpful in such cases.  (My pet theory is that
the present default value of 10 is a little low, and that a lot of
optimizer errors might be resolved by bumping it up a bit...)
--
(format nil "~S@~S" "cbbrowne" "ntlug.org")
http://www.ntlug.org/~cbbrowne/sgml.html
But  what can  you  do with  it?   -- ubiquitous  cry from  Linux-user
partner.  -- Andy Pearce, <ajp@hpopd.pwd.hp.com>

Re: R: slow seqscan after vacuum analize

From
Christopher Browne
Date:
Centuries ago, Nostradamus foresaw when iain@mst.co.jp ("Iain") would write:
> I'd like to know more about the possibility of plain vacuums harming
> performance. This is the first I've heard of it. Vacuum full is not always
> an option in a production environment.

There certainly are known cases where systems where the I/O bus is
already fairly much saturated will suffer BADLY when a big vacuum is
thrown at them.

The problem in such cases is that the vacuum draws the pages that it
is working on into the buffer cache, pushing out data that is actually
useful to cache.

There are experimental patches for 7.4, 7.3, and even, I believe, 7.2,
for a "sleeping vacuum" that tries to limit the damage by sleeping
every so often so that the vacuum does not dominate, and so that
"ordinary traffic" gets a chance to reclaim cache.  And there are
efforts in 7.5 to improve cache management, so that pages brought in
by VACUUM would be put at the opposite end of the "LRU" queue.  That
way, instead of them being treated as Most Recently Used, pushing
everything the least bit older towards being dropped from the buffer
cache, the vacuumed pages would be treated as if they were LRU, so
they would get evicted FIRST.

But if the Original Poster is encountering that the database is doing
Seq Scans when it would be better to do an Index Scan, that is a
separate problem, and focusing on the VACUUM may distract from the
_real_ problem...
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/advocacy.html
Rules of the Evil Overlord #195. "I will not use hostages as bait in a
trap.  Unless  you're going  to use them  for negotiation or  as human
shields, there's no point in taking them."
<http://www.eviloverlord.com/>

Re: R: slow seqscan after vacuum analize

From
Andrew Biagioni
Date:
Iain wrote:
 
Yes, you are right but it wasn't the case this time, I have run the   
explain plenty of 
times with same results. I think that the reason was that I made a simple   
VACUUM, 
after a VACUUM FULL ANALYZE (1h!!) things are ok   
It's reasonable to expect that a seq scan will perform faster after a full
vacuum, as the physical size and organization of the table has been changed.

I wouldn't expact a plain vacuum to have any tangible affect on performance,
for the better or for the worse.. 
A plain VACUUM (without ANALYZE) will change the layout of the data without refreshing the optimizer information, so that anything that DOES use the optimizer will often be negatively affected;  VACUUM ANALYZE does a vacuum AND refreshes the optimizer information. 

I usually run VACUUM ANALYZE on an hourly basis on our production system, and it's fast enough and unobtrusive enough that I can't tell that it's running (except by looking at the log, of course).
I'd like to know more about the possibility of plain vacuums harming
performance. This is the first I've heard of it. Vacuum full is not always
an option in a production environment.


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

R: R: slow seqscan after vacuum analize

From
"Edoardo Ceccarelli"
Date:
> But if the Original Poster is encountering that the database
> is doing Seq Scans when it would be better to do an Index
> Scan, that is a separate problem, and focusing on the VACUUM
> may distract from the _real_ problem...
> --

I have only noticed that after a VACUUM ANALYZE of the db the time of a seq scan (when
a seqscan is required) has increased by a factor of ten, this is my primary focus,
otherwise I would have posted something about "my select is really slow ..."

To be more precise:
Given a query that HAS to be executed with a seqscan I have noticed an increase in
time comparing before and after the vacuum.


Anyway I am working to create the same situation again to post some output of the
verbose option of the vacuum.

Edoardo


VACUUM Quesition

From
"Stefan Sturm"
Date:
Hello,

I have a question about vaccum. There are thre Version of vaccums:
1.) plain vacuum
2.) vacuum analyze
3.) full vacuum

Is it right that a vacuum analyze includes the plain vacuum and the full
vacuum includes the vacuum analyze? And which vacuum locks the tables?

And is there a strategy when to use which vacuum? I think I should do a full
vacuum every night and a vacuum analyze sometimes a day.

Is this OK?

Greetings,
Stefan Sturm



Re: R: R: slow seqscan after vacuum analize

From
Tom Lane
Date:
Christopher Browne <cbbrowne@acm.org> writes:
> Another factor worth considering: If a few values are very common in
> the field you are selecting on, then the query optimizer can get
> convinced (wrongly) that a Seq Scan is the best choice.  Using ALTER
> TABLE T ALTER COLUMN C SET STATISTICS [some value] to increase the
> number of "bins" can be helpful in such cases.  (My pet theory is that
> the present default value of 10 is a little low, and that a lot of
> optimizer errors might be resolved by bumping it up a bit...)

I also suspect that 10 is a lowball figure, but no one has done any work
to establish what might be a more reasonable default.  Larger values
have real costs in both pg_statistic space and planner runtime, so I
don't want to push it up without some kind of evidence.

BTW, if you think a higher default might be appropriate, you can set it
in postgresql.conf instead of running around and manually ALTERing all
your tables.

            regards, tom lane

Re: VACUUM Quesition

From
Sam Barnett-Cormack
Date:
On Thu, 5 Feb 2004, Stefan Sturm wrote:

> Hello,
>
> I have a question about vaccum. There are thre Version of vaccums:
> 1.) plain vacuum
> 2.) vacuum analyze
> 3.) full vacuum
>
> Is it right that a vacuum analyze includes the plain vacuum and the full
> vacuum includes the vacuum analyze? And which vacuum locks the tables?

'full' is an option to vacuum, essentially, as is 'analyze', so there
are four variants (ignoring the freeze option):

vacuum
vacuum analyze
vacuum full
vacuum full analyze

It's all pretty obvious from there.

--

Sam Barnett-Cormack
Software Developer                           |  Student of Physics & Maths
UK Mirror Service (http://www.mirror.ac.uk)  |  Lancaster University

Re: R: R: slow seqscan after vacuum analize

From
Tom Lane
Date:
"Edoardo Ceccarelli" <eddy@axa.it> writes:
> Given a query that HAS to be executed with a seqscan I have noticed an
> increase in time comparing before and after the vacuum.

This is really hard to believe --- I cannot think of any mechanism that
would result in that effect.  Unless the vacuum were flushing the
kernel's disk cache, but the effects of that would only persist for one
scan.  You did say that the increased time is repeatable if you do
multiple seqscans after the vacuum?

            regards, tom lane

Re: R: R: slow seqscan after vacuum analize

From
"Edoardo"
Date:
On Thu, 05 Feb 2004 14:33:45 -0500, Tom Lane wrote
> "Edoardo Ceccarelli" <eddy@axa.it> writes:
> > Given a query that HAS to be executed with a seqscan I have noticed an
> > increase in time comparing before and after the vacuum.
>
> This is really hard to believe --- I cannot think of any mechanism that
> would result in that effect.  Unless the vacuum were flushing the
> kernel's disk cache, but the effects of that would only persist for one
> scan.  You did say that the increased time is repeatable if you do
> multiple seqscans after the vacuum?
>
>             regards, tom lane


Yes, I can assure you that was repeatable and has disappeared only after a
VACUUM FULL ANALYZE

it was something really stable in it's bad behaviour. I am going to make some
test soon and I will post here the results.

Best Regards
Edoardo

--
The net spotter (http://expot.it)
Open WebMail Project (http://openwebmail.org)


Drop indexes inside transaction?

From
Steve Lane
Date:
Hello all:

Question here about how best to optimize a large data load. Data load is ad
hoc and so needs to be able to run during production hours.

We have a piece of software written in a desktop RAD environment (FileMaker
Pro). Our users are using this tool to generate data that we need to get
into postgresql. We're sending it through a middleware layer written in PHP.

A single data load will generate from 10K-100K rows.

Initially we did everything procedurally through PHP. 100K inserts, each one
called through several layers of PHP abstraction. Bad idea.

Current idea is to have PHP dump the data to a file, and suck the whole file
in at once somehow. So far, so good: PHP can create the file in 6 minutes
for 100K rows. That's actually acceptable.

Now we want to use COPY to bring the data in. The target table has 6
indexes. Without indexes, naturally, we can load 80K rows in 2 seconds. With
indexes, 46 seconds. (oddly, ONE index contributes 40+ seconds of that, yet
they're all similar, single-column indexes. Anyway, that's another story).

Normally, in a batch-like environment, I'd feel free to drop the indexes,
load, reindex. That's perfectly fast. But the environment needs to be live,
and those indexes are vital to a reporting engine that can be hit at any
time. So we can't just drop them, even briefly.

So I hit on the idea of doing the same thing, but inside a transaction. In
theory that should affect no one else. To my delight, the transaction
drop-copy-reindex ran in 7 seconds.

I guess I'm just wondering how that's possible. I hate to sound like a
superstitious goof, but it sounds to good to be true. At best, I figured to
pay the whole penalty at the time of COMMIT -- that it would be fast up to
that point, and then of course need to do exactly the same work as the
transactionless version, as far as reconciling indexes or whatever the more
accurately technical term is.

So: is this too good to be true? Or is this actually a good way to do this?

Any other observations on the whole process? Is there a better or different
approach, or other things we should consider?

Any and all thoughts are welcome.

-- sgl


=======================================================
Steve Lane

Vice President
The Moyer Group
14 North Peoria St Suite 2H
Chicago, IL 60607

Voice: (312) 433-2421       Email: slane@moyergroup.com
Fax:   (312) 850-3930       Web:   http://www.moyergroup.com
=======================================================


Index (re)-creation speed

From
Steve Lane
Date:
All:

Been looking at speed issues related to loading large data sets, and
experimenting with dropping and rebuilding indexes. I noticed some
interesting behavior on the table in question. Here's the table:

             Table "test_responses_2"
     Attribute     |       Type        | Modifier
-------------------+-------------------+----------
 id_response       | integer           |
 id_code           | integer           |
 id_survey         | integer           |
 id_topic          | integer           |
 id_item           | integer           |
 id_administration | integer           |
 response_int      | smallint          |
 response_txt      | character varying |
 rec_handle        | character varying |
Indices: id_administration_test_key,
         id_code_test_key,
         id_item_test_key,
         id_response_test_key,
         id_survey_test_key,
         id_topic_test_key


When I drop and rebuild the indexes, they take oddly varying amounts of time
to rebuild. I rebuilt them in the following order, with the following rough
times. I took a guess that the speed of the rebuild might be related to the
number of distinct values in the column -- this seems true in some cases but
not in others. Here are the times:

id_response       38 secs (86000 distinct)
id_topic          33 secs (6 distinct)
id_survey         13 secs (1 distinct)
id_code           39 secs (1444 distinct)
id_item           40 secs (65 distinct)
id_administration 13 secs (1 distinct)

Is there anything I should be learning from this?

-- sgl

=======================================================
Steve Lane

Vice President
The Moyer Group
14 North Peoria St Suite 2H
Chicago, IL 60607

Voice: (312) 433-2421       Email: slane@moyergroup.com
Fax:   (312) 850-3930       Web:   http://www.moyergroup.com
=======================================================


Re: Drop indexes inside transaction?

From
Tom Lane
Date:
Steve Lane <slane@moyergroup.com> writes:
> Now we want to use COPY to bring the data in. The target table has 6
> indexes. Without indexes, naturally, we can load 80K rows in 2 seconds. With
> indexes, 46 seconds. (oddly, ONE index contributes 40+ seconds of that, yet
> they're all similar, single-column indexes. Anyway, that's another story).

No, I'd say that's the key part of the story.  Details?

> So I hit on the idea of doing the same thing, but inside a transaction. In
> theory that should affect no one else.

... other than locking them out of the table while the transaction runs.
That doesn't sound like what you want to do.  In any case, reindexing
the table will get slower and slower as the pre-existing data in the
table expands.

            regards, tom lane

Re: Index (re)-creation speed

From
Tom Lane
Date:
Steve Lane <slane@moyergroup.com> writes:
> When I drop and rebuild the indexes, they take oddly varying amounts of time
> to rebuild. I rebuilt them in the following order, with the following rough
> times. I took a guess that the speed of the rebuild might be related to the
> number of distinct values in the column -- this seems true in some cases but
> not in others. Here are the times:

> id_response       38 secs (86000 distinct)
> id_topic          33 secs (6 distinct)
> id_survey         13 secs (1 distinct)
> id_code           39 secs (1444 distinct)
> id_item           40 secs (65 distinct)
> id_administration 13 secs (1 distinct)

How many rows altogether in this table?  What sort_mem are you using
(and have you tried altering it)?

When I read your previous message I was wondering about different
datatypes having different comparison costs, but since these are all
integer columns that's clearly not the issue here.  I think you may
be seeing something related to the number of initial runs created in the
sorting algorithm --- with only one distinct value, there'd always be
just one run and no need for any merge passes.

(Why are you bothering to index columns with only one or a few values in
them, anyway?  Such an index will be useless for searches ...)

            regards, tom lane

Re: Drop indexes inside transaction?

From
Steve Lane
Date:
On 2/6/04 12:23 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Steve Lane <slane@moyergroup.com> writes:
>> Now we want to use COPY to bring the data in. The target table has 6
>> indexes. Without indexes, naturally, we can load 80K rows in 2 seconds. With
>> indexes, 46 seconds. (oddly, ONE index contributes 40+ seconds of that, yet
>> they're all similar, single-column indexes. Anyway, that's another story).
>
> No, I'd say that's the key part of the story.  Details?

I have another post out there called "index re-creation speed" that delves
into this. Here's the table:

             Table "test_responses_2"
     Attribute     |       Type        | Modifier
-------------------+-------------------+----------
 id_response       | integer           |
 id_code           | integer           |
 id_survey         | integer           |
 id_topic          | integer           |
 id_item           | integer           |
 id_administration | integer           |
 response_int      | smallint          |
 response_txt      | character varying |
 rec_handle        | character varying |
Indices: id_administration_test_key,
         id_code_test_key,
         id_item_test_key,
         id_response_test_key,
         id_survey_test_key,
         id_topic_test_key

Id_item_test_key is the one that drives up the speed of the COPY
dramatically. 65 distinct values in that column for the given data set.
>
>> So I hit on the idea of doing the same thing, but inside a transaction. In
>> theory that should affect no one else.
>
> ... other than locking them out of the table while the transaction runs.

Ha! I knew that that tasty-looking lunch had to have a price tag.

> That doesn't sound like what you want to do.  In any case, reindexing
> the table will get slower and slower as the pre-existing data in the
> table expands.

Yes, I've been running some tests and it stops being very acceptable around
3-5M rows.

So am I more or less out of luck? The issue, again, is how to insert at most
80-100K rows into a running system, at a time determined by users, into a
fairly heavily-indexed table.

I thought of dropping the indexes and deferring index recreation -- maybe
even having a very frequent cron job rebuild the indexes. All that buys me
is being able to return control to the initiating user quickly. It still has
the issue of expanding reindex time and, I'm guessing, the issue of locking
out other users as well.

Am I simply asking too much of my tools here?

-- sgl



Re: Index (re)-creation speed

From
Steve Lane
Date:
On 2/6/04 12:30 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Steve Lane <slane@moyergroup.com> writes:
>> When I drop and rebuild the indexes, they take oddly varying amounts of time
>> to rebuild. I rebuilt them in the following order, with the following rough
>> times. I took a guess that the speed of the rebuild might be related to the
>> number of distinct values in the column -- this seems true in some cases but
>> not in others. Here are the times:
>
>> id_response       38 secs (86000 distinct)
>> id_topic          33 secs (6 distinct)
>> id_survey         13 secs (1 distinct)
>> id_code           39 secs (1444 distinct)
>> id_item           40 secs (65 distinct)
>> id_administration 13 secs (1 distinct)
>
> How many rows altogether in this table?  What sort_mem are you using
> (and have you tried altering it)?

Sorry, left out important info. I cleaned the table first (it's a temporary
copy), then imported one run of data, 86,000 rows.

I haven't checked sort_mem, will do so.
>
> When I read your previous message I was wondering about different
> datatypes having different comparison costs, but since these are all
> integer columns that's clearly not the issue here.  I think you may
> be seeing something related to the number of initial runs created in the
> sorting algorithm --- with only one distinct value, there'd always be
> just one run and no need for any merge passes.
>
> (Why are you bothering to index columns with only one or a few values in
> them, anyway?  Such an index will be useless for searches ...)

Again, I left out some useful details. Id_survey and id_administration will
be identical for a single run of data (which can generate 10-100K inserts).
And my tests were just one data set, imported into a cleaned table. Still,
the distinctness of these columns will be low -- on the order of one
distinct value per 10^4 rows. The others have levels of distinctness
proportional to what the above chart shows -- id_response is unique,
id_topic will have 4-8 values per 10^4 records, and so on.

And still, none of this explains to me why the index on id_item drove the
COPY cost up so apparently dramatically. I tried the COPY again both with
and without that one index and in both cases it caused COPY to go from 5 to
40+ seconds.

-- sgl


Re: Index (re)-creation speed

From
Tom Lane
Date:
Steve Lane <slane@moyergroup.com> writes:
> And still, none of this explains to me why the index on id_item drove the
> COPY cost up so apparently dramatically. I tried the COPY again both with
> and without that one index and in both cases it caused COPY to go from 5 to
> 40+ seconds.

That doesn't make a lot of sense to me either, particularly seeing that
id_item seems to be an intermediate case as far as uniqueness goes.
Do you still get 30+ seconds to COPY if that is the *only* index?

If the data you're using isn't too sensitive, would you mind sending me
a copy (off-list)?  I'm interested to try profiling this behavior.

            regards, tom lane

R: R: R: slow seqscan after vacuum analize

From
"Edoardo Ceccarelli"
Date:
Well, maybe it was more, can't remember, it was at 3am! :)

> -----Messaggio originale-----
> Da: pgsql-admin-owner@postgresql.org
> [mailto:pgsql-admin-owner@postgresql.org] Per conto di Sam
> Barnett-Cormack
> Inviato: giovedì 5 febbraio 2004 16.00
> A: Edoardo Ceccarelli
> Cc: 'David F. Skoll'; pgsql-admin@postgresql.org
> Oggetto: Re: R: R: [ADMIN] slow seqscan after vacuum analize
>
> On Thu, 5 Feb 2004, Edoardo Ceccarelli wrote:
>
> > after a VACUUM FULL ANALYZE (1h!!) things are ok
>
> Only a hour - lucky you ;)
>
> --
>
> Sam Barnett-Cormack
> Software Developer                           |  Student of
> Physics & Maths
> UK Mirror Service (http://www.mirror.ac.uk)  |  Lancaster University
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>
>


R: R: slow seqscan after vacuum analize

From
"Edoardo Ceccarelli"
Date:
> > AFTER (actual db)
> > portaportese=# explain analyze SELECT * FROM utente where
> luogorilasciodoc='ciao';
> >                                                 QUERY PLAN
> >
> >
> ----------------------------------------------------------------------
> > ----------------
> > --------------------
> >  Seq Scan on utente  (cost=0.00..92174.50 rows=3 width=724) (actual
> > time=705.41..6458.19 rows=15 loops=1)
> >    Filter: (luogorilasciodoc = 'ciao'::bpchar)  Total
> runtime: 6458.29
> > msec
> > (3 rows

After a VACUUM FULL ANALYZE:

portaportese=# explain analyze SELECT * FROM utente where luogorilasciodoc='ciao';
                                                QUERY PLAN

--------------------------------------------------------------------------------------
--------------------
 Seq Scan on utente  (cost=0.00..41329.21 rows=154 width=724) (actual
time=91.61..751.28 rows=15 loops=1)
   Filter: (luogorilasciodoc = 'ciao'::bpchar)
 Total runtime: 751.35 msec


Things are better now! :) it surely was because i have used the normal vacuum...