Re: [HACKERS] \dt and disk access - Mailing list pgsql-hackers
| From | Bruce Momjian |
|---|---|
| Subject | Re: [HACKERS] \dt and disk access |
| Date | |
| Msg-id | 008816d319784df7d3ea835790456e2f Whole thread Raw |
| In response to | [HACKERS] \dt and disk access (Bruce Momjian <maillist@candle.pha.pa.us>) |
| List | pgsql-hackers |
> >
> > Can someone suggest why running the \dt comand in psql causes so many
> > disk accesses?
> >
> > I am seeing some queries causing a lot of disk access that doesn't make
> > sense to me.
> >
> > I am running my server with fsync off.
>
> I now realize the unusual disk activity is caused by any query that uses
> an ORDER BY. A query that returns three rows takes MUCH longer with the
> ORDER BY than without it, 0.20 seconds vs. 0.61 seconds, three times
> longer.
>
> The query is:
>
> select * from matter
>
> AND
>
> select * from matter order by matter
>
> The table has three short text columns.
>
>
> Would anyone like to comment on why ORDER BY is doing this? It
> certainly slows down the query. I am running BSD/OS.
>
> This is of course using PostgreSQL 6.1.
OK, I think I have some more information on this.
To test:
create table ordertest (x integer);
insert into ordertest values (3);
Then try:
select * from ordertest;
AND
select * from ordertest order by x;
I test it by:
echo "select * from ordertest;" | time psql test
OK, now for some times. On my machine (PP200 running BSD/OS), the
startup/shutdown time for the database is 0.15 seconds, so I will
subtract that time to show the incremental elapsed wallclock time for
each command:
Select without order by: 0.01
Select with order by: 0.30
Create Table: 0.11
Create Index: 0.54
Insert into Table: 0.01
These are all times on a table with one integer column, and one row.
As you can see, the ORDER BY has slowed the SELECT by a factor of 30.
You can actually create three tables in the same time, or insert 30
rows.
This looks like a pretty serious performance problem.
Without the ORDER BY, performance is great. With the ORDER BY,
performance is poor, and I can see some mSQL comparisons where we will
look very bad. Let's face it, most multi-row SELECT statements use
ORDER BY, and it is a waste to use 30x the time to sort the data as it
took to get the data out of the database.
FASTBUILD is not used in the ORDER BY operations.
How is ORDER BY performed? The output of the SELECT statement is placed
in a temp table, and the temp table is then sorted. A comment from
ExecSort.c says:
/* ----------------------------------------------------------------
* ExecSort
*
* old comments
* Retrieves tuples fron the outer subtree and insert them into a
* temporary relation. The temporary relation is then sorted and
* the sorted relation is stored in the relation whose ID is indicated
* in the 'tempid' field of this node.
* Assumes that heap access method is used.
ExecSort calls psort() to do the work. psort() uses a Knuth algorithm.
I removed the psort() call from ExecSort, and the execution time was:
Select with order by using psort(): 0.30
Select with order by NOT using psort(): 0.07
Now of course, not using psort() causes the data not to be sorted, but
there must be some inefficienty if sorting one row takes 0.23 seconds on
a PP200 with 64MB of RAM. I suspect the other 0.06 seconds is caused by
the creation of the temp table and moving the rows around.
Any comments?
- --
Bruce Momjian
maillist@candle.pha.pa.us
------------------------------
pgsql-hackers by date: