Thread: why postgresql is so slow?
hello, all. I have a table about 2 million rows. when I run "select * from table1" in psql, it will take me about 10 minutes to get theresult. I wonder if postgresql can immediately return result like db2. After that I create a index on a column named id. The time executing "selct * from table1 where id=10" in psql is much fasterthan that of executing "select * from table1 where id <10" in psql. why? By the way I'm using postgresql 7.3. Who can tell me the reason. 3x. Jinqiang Han
On Saturday 20 September 2003 10:38, Jinqiang Han wrote: > hello, all. This isn't really a hackers question - perhaps try the "general","sql" etc lists in future. This list is for questions about the source-code of PG. > I have a table about 2 million rows. when I run "select * from table1" in > psql, it will take me about 10 minutes to get the result. I wonder if > postgresql can immediately return result like db2. PG is returning *all* the rows you requested. If you want one row at a time, perhaps use a cursor. > After that I create a index on a column named id. The time executing > "selct * from table1 where id=10" in psql is much faster than that of > executing "select * from table1 where id <10" in psql. why? Because there are more records returned? You don't supply enough information. If you'd like to repost to the general list, along with details of the tables and the output of EXPLAIN ANALYSE for your queries, I'll be glad to go into detail there. -- Richard Huxton Archonet Ltd
Jinqiang Han wrote: > hello, all. > > I have a table about 2 million rows. when I run "select * from table1" in > psql, it will take me about 10 minutes to get the result. I wonder if > postgresql can immediately return result like db2. If you're executing in psql, it's probably trying to load the whole result set into RAM at once, and then try to format it, and then pipe it through a pager (i.e. more). That's not really a test of PostgreSQL, I would recommend using a cursor or something so that psql doesn't give you a false impression. > > After that I create a index on a column named id. The time executing > "selct * from table1 where id=10" in psql is much faster than that of > executing "select * from table1 where id <10" in psql. why? > Well, that depends. First, turn on stats collecting and run "VACUUM ANALYZE". That will collect some data about your data which helps the planner make a good choice. My initial guess is that the second query is not using the index. That may happen because if "id" is an INT column, than negatives are allowed, and the planner figures that "id<10" only narrows it down to about half the possible rows (in which case it would choose sequential scan). However, we humans know that the "id" column is really a positive INT (or at least I assume it is, based on your implication that "id<10" should be about as fast as "id=10"), therefore "id<10" really means a max 10 rows out of 2M (which means it should do an index scan). Turning on stats and running VACUUM ANALYZE will give the planner the information that it needs to make an intelligent choice. If that doesn't help you, give us the table schema and run: EXPLAIN ANALYZE SELECT * FROM table1 WHERE id=10 and then run: EXPLAIN ANALYZE SELECT * FROM table1 WHERE id<10 and give us the output. Then run "VACUUM ANALYZE" every once in a while (depending on how fast your data changes), like every night for instance. Regards, Jeff Davis
Jeff <jdavis-pgsql@empires.org> wrote: > Then run "VACUUM ANALYZE" every once in a while (depending on how > fast your data changes), like every night for instance. Consider VACUUM and ANALYZE somewhat separately. You need to ANALYZE any time the distribution of the data changes. You need to VACUUM any time a lot of data is invalidated by UPDATEs or DELETEs. (And when you VACUUM, it's cheap to throw in an ANALYZE.) That turns into a variety of policies depending on what the pattern of activity for a particular table is: - If a table only ever gets INSERTs, you need only ever ANALYZE it, as VACUUM should be useless. (Caveat: If INSERTs couldget rolled back due to other processing nuking transactions, that leaves dead tuples for VACUUM to work with...) - A table that sees enormous numbers of updates (for instance, a list of account balances) should be vacuumed REALLY frequently. - Any time you do a VACUUM, you might as well also do an ANALYZE. pg_autovacuum applies these policies, so it's a reasonable thing to use in 7.3/7.4... -- If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me http://www3.sympatico.ca/cbbrowne/ "One of the most dangerous things in the universe is an ignorant people with real grievances. That is nowhere near as dangerous, however, as an informed and intelligent society with grievances. The damage that vengeful intelligence can wreak, you cannot even imagine." -- Miles Teg, Heretics of Dune
On Sat, 2003-09-20 at 06:14, Jeff wrote: > Well, that depends. First, turn on stats collecting and run "VACUUM > ANALYZE". That will collect some data about your data which helps the > planner make a good choice. The statistics collector and the statistics collected by ANALYZE have nothing to do with one another -- stats collector stats aren't used by the query planner. So just running ANALYZE or VACUUM ANALYZE should be sufficient. -Neil