Thread: questions about disk configurations

questions about disk configurations

From

Hubert depesz Lubaczewski

Date:

09 December 2002, 06:36:10

hi
i have a question about best harddisk configuration for postgresql
performance.
of course i know that:
- scsi is better than ide
- 2 disks are better than 1
- 3 disks are better than 2

i know that with 3 disks one should move xlog to one drive, index files
to second and tables to third.
that's clear.

but:
will making software raid on this discs provide performance increase or
decrease?
which raid (0,1,5,10?) is best for postgresql? maybe it differs when it
comes to different datatypes (i.e. raid "X" is best for indices, but "Y"
best for tables).

i'd like to know what are the options to store all this information
(xlog, indices and tables). what configurations are best, what medium
and what should be avoided at all cost.

hope you can help me, and sorry for my english.

depesz

--
hubert depesz lubaczewski                          http://www.depesz.pl/
------------------------------------------------------------------------
Mój Boże, spraw abym milczał, dopóki się nie upewnię,  że  naprawdę  mam
coś do powiedzenia.                                      (c) 1998 depesz

Re: questions about disk configurations

From

Adrian 'Dagurashibanipal' von Bidder

Date:

09 December 2002, 07:02:15

On Mon, 2002-12-09 at 12:32, Hubert depesz Lubaczewski wrote:
> hi
> i have a question about best harddisk configuration for postgresql
> performance.
[...]

Yo!

A bit more data is needed before anybody can give you more help:
 - what is your budget?
 - how big will your databases be?
 - what's the read/write ratio?

Even then you'll not get any good recipes, because there aren't any.
You'll have to do benchmarks yourself. A few fundamental things that are
probably true for most:

 - more RAM is always good. Independent from the disc architecture - if
an access isn't going to the disc at all, it's always good. (if you're
mostly writing this may be lessened).
 - always carefully tune the postgres installation (random page cost,
sort mem, shared buffers, ... - all depend on your system and you
application)
 - as you correctly said: distribute the load on many spindles. On a
busy database, 4*20G is probably faster than 1*80G

beyound this, experiences vary. RAID1 and RAID5 are rated differently by
different people - and especially with RAID5 there are (I think) really
performance differencies between the various products. RAID0 is fastest,
of course, but you probably care for your data.

For equally good implementations, RAID1 and RAID5 may have similar
speed, especially if the RAID controller for RAID5 has enough RAM. If
the active dataset on a RAID5 is bigger than the available caching RAM,
write performance sucks as a single block write requires 2 reads and 2
writes. If the RAID5 controller has enough RAM (and a decent
implementation), write performance can be almost equal to RAID1 (2
writes for a single block write).

So far
-- vbi

--
this email is protected by a digital signature:  http://fortytwo.ch/gpg

NOTE: keyserver bugs! get my key here: https://fortytwo.ch/gpg/92082481

Attachment

signature.asc

Re: questions about disk configurations

From

Hubert depesz Lubaczewski

Date:

09 December 2002, 14:49:32

On Mon, Dec 09, 2002 at 01:01:58PM +0100, Adrian 'Dagurashibanipal' von Bidder wrote:
> A bit more data is needed before anybody can give you more help:
>  - what is your budget?
>  - how big will your databases be?
>  - what's the read/write ratio?

my question as for now is purely theoretical. i'm not asking about any
specific situation, but me may talk about medium sized web size. budget
is irrelevant (i'd like to talk *only* about harddrives, not memory,
architescure and so on).

>  - as you correctly said: distribute the load on many spindles. On a
> busy database, 4*20G is probably faster than 1*80G

as i said: i know that 3 disks are bettar than 1 (as for postgres
installation, because system data and swap should be on 4th disc - but
this is obvious).

> beyound this, experiences vary. RAID1 and RAID5 are rated differently by
> different people - and especially with RAID5 there are (I think) really
> performance differencies between the various products. RAID0 is fastest,
> of course, but you probably care for your data.

that's exactly what i'm asking about: which raid is best suited for
which data amongst out 3 sets (xlog, tables, indices). or maybe for some
types of data single disc is better than raid for some strange reason?
is it better to (when having 2 discs) setup raid 0/1 or to use tham
separatelly as xlog/tables?

depesz

--
hubert depesz lubaczewski                          http://www.depesz.pl/
------------------------------------------------------------------------
Mój Boże, spraw abym milczał, dopóki się nie upewnię,  że  naprawdę  mam
coś do powiedzenia.                                      (c) 1998 depesz

Re: questions about disk configurations

From

Josh Berkus

Date:

09 December 2002, 14:51:23

Depesz,

> i have a question about best harddisk configuration for postgresql
> performance.
> of course i know that:
> - scsi is better than ide
> - 2 disks are better than 1
> - 3 disks are better than 2
>
> i know that with 3 disks one should move xlog to one drive, index files
> to second and tables to third.
> that's clear.

Er, no, it's not.  In fact, for a 3-disk config, I reccommend:

Disk 1: OS, swap, system logs
Disk 2: Data + Indexes
Disk 3: Transaction Log

> but:
> will making software raid on this discs provide performance increase or
> decrease?

Hardware RAID can improve *read* performance, particilarly RAIDs 1, 01, and
10.  For writing, the best you can do is having it not inhibit performance.
The general testament is that *software* RAID does not improve things at all;
actually, the best that can be said for Linux Software RAID 1 is that it does
not harm performance much.

> i'd like to know what are the options to store all this information
> (xlog, indices and tables). what configurations are best, what medium
> and what should be avoided at all cost.

Ask specific questions.  If you want the full performance tutorial, you'd have
to pay a steep fee for 1-3 days of training.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: questions about disk configurations

From

"Keith Bottner"

Date:

09 December 2002, 15:00:34

I don't know whether you have read this link but it was helpful to me.

http://www.ca.postgresql.org/docs/momjian/hw_performance/0.html

It discusses PostgreSQL Hardware Performance Tuning.

Hope it helps!

Keith Bottner

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Hubert
depesz Lubaczewski
Sent: Monday, December 09, 2002 7:05 AM
To: Adrian 'Dagurashibanipal' von Bidder;
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] questions about disk configurations

On Mon, Dec 09, 2002 at 01:01:58PM +0100, Adrian 'Dagurashibanipal' von
Bidder wrote:
> A bit more data is needed before anybody can give you more help:
>  - what is your budget?
>  - how big will your databases be?
>  - what's the read/write ratio?

my question as for now is purely theoretical. i'm not asking about any
specific situation, but me may talk about medium sized web size. budget
is irrelevant (i'd like to talk *only* about harddrives, not memory,
architescure and so on).

>  - as you correctly said: distribute the load on many spindles. On a
> busy database, 4*20G is probably faster than 1*80G

as i said: i know that 3 disks are bettar than 1 (as for postgres
installation, because system data and swap should be on 4th disc - but
this is obvious).

> beyound this, experiences vary. RAID1 and RAID5 are rated differently
> by different people - and especially with RAID5 there are (I think)
> really performance differencies between the various products. RAID0 is

> fastest, of course, but you probably care for your data.

that's exactly what i'm asking about: which raid is best suited for
which data amongst out 3 sets (xlog, tables, indices). or maybe for some
types of data single disc is better than raid for some strange reason?
is it better to (when having 2 discs) setup raid 0/1 or to use tham
separatelly as xlog/tables?

depesz

--
hubert depesz lubaczewski                          http://www.depesz.pl/
------------------------------------------------------------------------
Mój Boże, spraw abym milczał, dopóki się nie upewnię,  że  naprawdę  mam
coś do powiedzenia.                                      (c) 1998 depesz

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: questions about disk configurations

From

Ron Johnson

Date:

09 December 2002, 15:47:30

On Mon, 2002-12-09 at 07:05, Hubert depesz Lubaczewski wrote:
> On Mon, Dec 09, 2002 at 01:01:58PM +0100, Adrian 'Dagurashibanipal' von Bidder wrote:
> > A bit more data is needed before anybody can give you more help:
> >  - what is your budget?
> >  - how big will your databases be?
> >  - what's the read/write ratio?
>
> my question as for now is purely theoretical. i'm not asking about any
> specific situation, but me may talk about medium sized web size. budget
> is irrelevant (i'd like to talk *only* about harddrives, not memory,
> architescure and so on).

What is "medium sized web"?  The *system* *is* important!!  Stuffing
your box with RAM may, in fact, override your disks, if the RAM caches
enough.

> >  - as you correctly said: distribute the load on many spindles. On a
> > busy database, 4*20G is probably faster than 1*80G
>
> as i said: i know that 3 disks are bettar than 1 (as for postgres
> installation, because system data and swap should be on 4th disc - but
> this is obvious).
>
> > beyound this, experiences vary. RAID1 and RAID5 are rated differently by
> > different people - and especially with RAID5 there are (I think) really
> > performance differencies between the various products. RAID0 is fastest,
> > of course, but you probably care for your data.
>
> that's exactly what i'm asking about: which raid is best suited for
> which data amongst out 3 sets (xlog, tables, indices). or maybe for some
> types of data single disc is better than raid for some strange reason?
> is it better to (when having 2 discs) setup raid 0/1 or to use tham
> separatelly as xlog/tables?

These are *GENERALITIES*!!!!  _All_ is dependent on which SCSI
controller you choose, and how much cache it has!!!!!!!!

- RAID0 does *great* at both reading and writing, but everyone knows
that it is insecure.
- RAID1 does better than JBOD at reading and writing, but not as
good as RAID0.
- RAID01 and RAID10 do just about as good as RAID0.
- RAID5 does great with reads, but bad with writes, *unless* the
controller has *lots* of cache.  Then, write speeds are great.

Slightly off topic: if I have Important Data, then I would not trust
a caching controlller unless it has a battery backup.  Unfortunately,
the only "caching controlllers with battery backup" that I've seen
are pretty expensive...

--
+------------------------------------------------------------+
| Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
|                                                            |
| "they love our milk and honey, but preach about another    |
|  way of living"                                            |
|    Merle Haggard, "The Fighting Side Of Me"                |
+------------------------------------------------------------+

Re: questions about disk configurations

From

eric soroos

Date:

09 December 2002, 16:51:54

>
> Er, no, it's not.  In fact, for a 3-disk config, I reccommend:
>
> Disk 1: OS, swap, system logs
> Disk 2: Data + Indexes
> Disk 3: Transaction Log

What is the accepted way of splitting the data from pg_xlog?

I've been testing some configurations for low budget performance, and I haven't been able to make this help vs. one
disk.(under osx, ymmv) 

I rsync'd the pg_xlog directory to another disk, then set up a symlink pointing from the data/pg_xlog to
/other/disk/pg_xlog. 

I then got tps numbers that were 2/3 of the single ide drive speed. The only explanation I can come up with is that
somethingis seeking to the symlink, then doing the actual write on the other drive.  

I'm going to try this under linux using mount points, but I need to shuffle hardware first.


any ideas?

eric

Re: questions about disk configurations

From

Josh Berkus

Date:

09 December 2002, 17:15:28

Eric,

> I'm going to try this under linux using mount points, but I need to shuffle
hardware first.

This is the only way I've done it.   I'm not sure what the Mac problem is.


--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: questions about disk configurations

From

"scott.marlowe"

Date:

09 December 2002, 17:36:43

On Mon, 9 Dec 2002, eric soroos wrote:

> >
> > Er, no, it's not.  In fact, for a 3-disk config, I reccommend:
> >
> > Disk 1: OS, swap, system logs
> > Disk 2: Data + Indexes
> > Disk 3: Transaction Log
>
> What is the accepted way of splitting the data from pg_xlog?

You really can't split it so to speak.  It all needs to be in one place.
Or do you mean splitting the load?  Maybe putting it onto a RAID0
partition, but that's chancy.

> I've been testing some configurations for low budget performance, and I
> haven't been able to make this help vs. one disk. (under osx, ymmv)

I haven't found anything that helps much either, except for fast drives.

You can, however, turn on the noatime mounting option under Linux (BSD has
something similar) and it should help speed things up on any file system.
You can also try turning on the async option, but I'm not sure this is a
problem or not for data integrity on a transaction log file system.
Comments?

> I rsync'd the pg_xlog directory to another disk, then set up a symlink
> pointing from the data/pg_xlog to /other/disk/pg_xlog.
>
> I then got tps numbers that were 2/3 of the single ide drive speed. The
> only explanation I can come up with is that something is seeking to the
> symlink, then doing the actual write on the other drive.

rsync isn't still running is it?  you can just use the cp command while
the database is shut down to move the pg_xlog dir.  like so:

pg_ctl stop
mkdir /mnt/bigdog/pg_xlog
chown postgres.postgres /mnt/bigdog/pg_xlog
chmod 700 /mnt/bigdog/pg_xlog
cd $PGDATA
cp -Rfp pg_xlog/* /mnt/bigdog/pg_xlog/
mv pg_xlog pg_xlog.old (I always keep stuff till I'm sure I really don't
need it.)
ln -s /mnt/bigdog/pg_xlog pg_xlog
pg_ctl start

Don't forget, noatime in the mount options, makes a big difference.

Re: questions about disk configurations

From

eric soroos

Date:

09 December 2002, 17:59:37


> > I've been testing some configurations for low budget performance, and I
> > haven't been able to make this help vs. one disk. (under osx, ymmv)
>
> I haven't found anything that helps much either, except for fast drives.


>
> You can, however, turn on the noatime mounting option under Linux (BSD has
> something similar) and it should help speed things up on any file system.

I don't think that's an option for osx/hfs mounts, at least mount doesn't list it. (not that mount really works on
10.1.x,but whatever) 

> You can also try turning on the async option, but I'm not sure this is a
> problem or not for data integrity on a transaction log file system.
> Comments?

from man mount:

            async   All I/O to the file system should be done asynchronously.
                     This is a dangerous flag to set, and should not be used
                     unless you are prepared to recreate the file system
                     should your system crash.


I'd guess that this is about the same as fsync = off, except that it's your os lying to you instead of your database.

>
> > I rsync'd the pg_xlog directory to another disk, then set up a symlink
> > pointing from the data/pg_xlog to /other/disk/pg_xlog.
> >
> > I then got tps numbers that were 2/3 of the single ide drive speed. The
> > only explanation I can come up with is that something is seeking to the
> > symlink, then doing the actual write on the other drive.
>
> rsync isn't still running is it?  you can just use the cp command while
> the database is shut down to move the pg_xlog dir.  like so:

rsync == copy, it's just that I remember the command line switches for it.

> pg_ctl stop
> mkdir /mnt/bigdog/pg_xlog
> chown postgres.postgres /mnt/bigdog/pg_xlog
> chmod 700 /mnt/bigdog/pg_xlog
> cd $PGDATA
> cp -Rfp pg_xlog/* /mnt/bigdog/pg_xlog/
> mv pg_xlog pg_xlog.old (I always keep stuff till I'm sure I really don't
> need it.)
> ln -s /mnt/bigdog/pg_xlog pg_xlog
> pg_ctl start
>

This is about what I did, except that /mnt/bigdog/pg_xlog == /Volumes/scsi1.

Where you can do something different is mount bigdog at data/pg_xlog, instead of using the symlinks. Given the
interestingstate of filesystem tools under osx, I can't really do that. (at least under 10.1.5, looks like the laptop
running10.2 has a little more info. not that the laptop has room for a 3.5" 10k rpm scsi drive & pci scsi card for
testing...)

eric

Re: questions about disk configurations

From

"scott.marlowe"

Date:

09 December 2002, 18:06:44

On Mon, 9 Dec 2002, Josh Berkus wrote:

>
> Depesz,
>
> > i have a question about best harddisk configuration for postgresql
> > performance.
> > of course i know that:
> > - scsi is better than ide
> > - 2 disks are better than 1
> > - 3 disks are better than 2
> >
> > i know that with 3 disks one should move xlog to one drive, index files
> > to second and tables to third.
> > that's clear.
>
> Er, no, it's not.  In fact, for a 3-disk config, I reccommend:
>
> Disk 1: OS, swap, system logs
> Disk 2: Data + Indexes
> Disk 3: Transaction Log

Actually, first I'd try one big RAID 5 and see how it runs.  THEN I'd
spend time mucking around with different configs if that wasn't fast
enough.  If you need x performance and get 10x with a RAID 5 then move on
to more interesting problems.

> > but:
> > will making software raid on this discs provide performance increase or
> > decrease?
>
> Hardware RAID can improve *read* performance, particilarly RAIDs 1, 01, and
> 10.  For writing, the best you can do is having it not inhibit performance.
> The general testament is that *software* RAID does not improve things at all;
> actually, the best that can be said for Linux Software RAID 1 is that it does
> not harm performance much.

Not in my experience.  I'd estimate my test box with dual 18 Gig UW scsis
runs about 1.5 to 1.8 times faster with the two drives in a RAID1 as if
a single one is used.  Bonnie confirms this.  single drive can read about
25 Megs a second, a pair in a RAID1 reads at about 48 Megs a second.

But as you pointed out in your reply, it's more important to look at how
he's gonna drive the database.  If it has to input hundreds of short
queries a second, that's a whole different problem than a data warehouse
with 500 people throwing 8 way joins at the data all day.

Re: questions about disk configurations

From

Josh Berkus

Date:

09 December 2002, 18:28:12

Scott,

> Actually, first I'd try one big RAID 5 and see how it runs.  THEN I'd
> spend time mucking around with different configs if that wasn't fast
> enough.  If you need x performance and get 10x with a RAID 5 then move on
> to more interesting problems.

Depends on how much time you have to spend re-installing.   IMHO, RAID 5 is
slower that straight disks for Postgres, especially with large numbers of
writes.  This may not be true for $1000 RAID controllers, but I have yet to
use one.

I have a box with a low-end RAID 5 controller, and it drives like a single IDE
drive on large UPDATE queries.   Slower, somethimes.

> Not in my experience.  I'd estimate my test box with dual 18 Gig UW scsis
> runs about 1.5 to 1.8 times faster with the two drives in a RAID1 as if
> a single one is used.  Bonnie confirms this.  single drive can read about
> 25 Megs a second, a pair in a RAID1 reads at about 48 Megs a second.

This is Linux software RAID?

> But as you pointed out in your reply, it's more important to look at how
> he's gonna drive the database.  If it has to input hundreds of short
> queries a second, that's a whole different problem than a data warehouse
> with 500 people throwing 8 way joins at the data all day.

Definitely.

--
-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
                                        Josh Berkus
   Complete information technology     josh@agliodbs.com
    and data management solutions     (415) 565-7293
   for law firms, small businesses      fax 621-2533
    and non-profit organizations.     San Francisco

Re: questions about disk configurations

From

Tom Lane

Date:

09 December 2002, 19:13:40

Josh Berkus <josh@agliodbs.com> writes:
>> I'm going to try this under linux using mount points, but I need to shuffle
> hardware first.

> This is the only way I've done it.   I'm not sure what the Mac problem is.

It sounds like OS X fails to optimize repeated lookups of the same
symlink.  I haven't tried to do any performance measurement of this
myself, but if true a gripe to Apple would be in order.  Most of the
designs I've seen for clean tablespace handling will depend on symlinks
much more than we do today, so a performance penalty for symlinks will
*really* hurt further down the road.

            regards, tom lane

Re: questions about disk configurations

From

"scott.marlowe"

Date:

09 December 2002, 19:19:47

On Mon, 9 Dec 2002, Josh Berkus wrote:

>
> Scott,
>
> > Actually, first I'd try one big RAID 5 and see how it runs.  THEN I'd
> > spend time mucking around with different configs if that wasn't fast
> > enough.  If you need x performance and get 10x with a RAID 5 then move on
> > to more interesting problems.
>
> Depends on how much time you have to spend re-installing.   IMHO, RAID 5 is
> slower that straight disks for Postgres, especially with large numbers of
> writes.  This may not be true for $1000 RAID controllers, but I have yet to
> use one.

Even the fastest RAID 5 boxes aren't superfast, but a RAID5 of 15k drives
with a lot of drive in it does OK, since it can 1: spread small writes
around on many different drives (i.e. if you have 12 drives, and a lot of
small writes, a lot of them will be on different drives.) as well as
spreading out random reads, while providing good large reads, i.e.
sequential scans.

The key to good RAID 5 is to throw as many drives as you possibly can at a
problem, preferably across several SCSI interfaces.  Or FC-AL.

> I have a box with a low-end RAID 5 controller, and it drives like a single IDE
> drive on large UPDATE queries.   Slower, somethimes.

Many low end RAID 5 controllers are pretty slow.  The adaptec AIC133
series (I think that's the right number) are total dogs.  The older AMI
Mega raids were fast for their day, but any decent 350 MHz machine with a
dual channed SymBIOS card will outrun it at RAID 5.

> > Not in my experience.  I'd estimate my test box with dual 18 Gig UW scsis
> > runs about 1.5 to 1.8 times faster with the two drives in a RAID1 as if
> > a single one is used.  Bonnie confirms this.  single drive can read about
> > 25 Megs a second, a pair in a RAID1 reads at about 48 Megs a second.
>
> This is Linux software RAID?

Yep.  The kernel level drivers are quite fast in my experience, but they
don't seem to give any improvement when layered (i.e. 1+0 or 0+1) over
whatever is the slowest of the two layers.  I.e. setting up a RAID5 of
RAID0s results in almost the exact same performance as if you'd just setup
the same number of drives under RAID 5 as you had mirror sets in RAID0.
Since this is the case, you get better performance just going to RAID 5
with twice the disks and twice (-1n) the space.

Re: questions about disk configurations

From

Hubert depesz Lubaczewski

Date:

10 December 2002, 16:17:00

i'm replying to my own letter to gather all replies in one mail.

i got some replies. some of them are useful. some aren't really. i'm not
really sure how the usage can modify "what is best for some part of
database files".
can you explain me how comes that for some uses it's best
(performance-wise) to keep xlog's on straight disc, and tables on raid5
with lots' of disks, and for some other uses it's better to keep xlog on
raid0 and tables on raid 10?

anyway: what i understood is that usually the best (performance-wise),
would be to put:
xlog - separate - unraid'ed disk, or raid0
tables - any raid, but not raid 1
indices - any raid, but not raid 1

thanks for all replies.

depesz

--
hubert depesz lubaczewski                          http://www.depesz.pl/
------------------------------------------------------------------------
Mój Boże, spraw abym milczał, dopóki się nie upewnię,  że  naprawdę  mam
coś do powiedzenia.                                      (c) 1998 depesz

Attachment

msg-22763-537.dat

Re: questions about disk configurations

From

Ron Johnson

Date:

10 December 2002, 18:13:37

On Tue, 2002-12-10 at 15:16, Hubert depesz Lubaczewski wrote:
[snip]
> anyway: what i understood is that usually the best (performance-wise),
> would be to put:
> xlog - separate - unraid'ed disk, or raid0

Unless your data is *easily* recreatable, NEVER RAID0!!

> tables - any raid, but not raid 1
> indices - any raid, but not raid 1

Why not RAID1 (mirroring)?  It speeds up both reads and writes.

Things are also dependent on the RAID controller, since they
all have different strengths and weaknesses.

--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA      http://members.cox.net/ron.l.johnson  |
|                                                               |
| "My advice to you is to get married: If you find a good wife, |
| you will be happy; if not, you will become a philosopher."    |
|    Socrates                                                   |
+---------------------------------------------------------------+

Which of the solution is better?

From

Wei Weng

Date:

11 December 2002, 00:20:28

I have two tables A and B where A is a huge table with thousands of rows, B
is a small table with only a couple of entries.

I want to do something like

SELECT
    A.ID
    A.Name
FROM
    A JOIN B ON (A.ID = B.ID)

And on the other hand I can have something like this

SELECT
    A.ID
    A.Name
FROM
    A
WHERE
    A.ID IN (B_Id_List)

B_Id_List is a string concatenation of B.ID. (ie, 1,2,3,4,5 ...)

Which one is faster, more efficient?

And if you could, which one is faster/more efficient under MS SQL Server 7?
I am trying to develop a cross platform query, that is why I need to
concern with performance under different databases.

Thanks a lot!

Wei

Re: Which of the solution is better?

From

Ron Johnson

Date:

11 December 2002, 00:46:04

On Tue, 2002-12-10 at 23:20, Wei Weng wrote:
> I have two tables A and B where A is a huge table with thousands of rows, B
> is a small table with only a couple of entries.
>
> I want to do something like
>
> SELECT
>     A.ID
>     A.Name
> FROM
>     A JOIN B ON (A.ID = B.ID)

How is this query any different from:
SELECT
    A.ID,
    A.Name
FROM
    A,
    B
WHERE
    A.ID = B.ID

> And on the other hand I can have something like this
>
> SELECT
>     A.ID
>     A.Name
> FROM
>     A
> WHERE
>     A.ID IN (B_Id_List)
>
> B_Id_List is a string concatenation of B.ID. (ie, 1,2,3,4,5 ...)
>
> Which one is faster, more efficient?
>
> And if you could, which one is faster/more efficient under MS SQL Server 7?
> I am trying to develop a cross platform query, that is why I need to
> concern with performance under different databases.

--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA      http://members.cox.net/ron.l.johnson  |
|                                                               |
| "My advice to you is to get married: If you find a good wife, |
| you will be happy; if not, you will become a philosopher."    |
|    Socrates                                                   |
+---------------------------------------------------------------+

Re: Which of the solution is better?

From

Wei Weng

Date:

11 December 2002, 10:28:40

I don't think there is any. It is just another way to write an outer
join.

On Wed, 2002-12-11 at 00:46, Ron Johnson wrote:
> On Tue, 2002-12-10 at 23:20, Wei Weng wrote:
> > I have two tables A and B where A is a huge table with thousands of rows, B
> > is a small table with only a couple of entries.
> >
> > I want to do something like
> >
> > SELECT
> >     A.ID
> >     A.Name
> > FROM
> >     A JOIN B ON (A.ID = B.ID)
>
> How is this query any different from:
> SELECT
>     A.ID,
>     A.Name
> FROM
>     A,
>     B
> WHERE
>     A.ID = B.ID
>
> > And on the other hand I can have something like this
> >
> > SELECT
> >     A.ID
> >     A.Name
> > FROM
> >     A
> > WHERE
> >     A.ID IN (B_Id_List)
> >
> > B_Id_List is a string concatenation of B.ID. (ie, 1,2,3,4,5 ...)
> >
> > Which one is faster, more efficient?
> >
> > And if you could, which one is faster/more efficient under MS SQL Server 7?
> > I am trying to develop a cross platform query, that is why I need to
> > concern with performance under different databases.
--
Wei Weng
Network Software Engineer
KenCast Inc.

Re: Which of the solution is better?

From

Andrew Sullivan

Date:

11 December 2002, 10:48:25

On Wed, Dec 11, 2002 at 11:26:20AM -0500, Wei Weng wrote:
> I don't think there is any. It is just another way to write an outer
> join.

That's not exactly true.  Doing A JOIN B ON (A.ID=B.ID) constrains
the planner.  See the section on explicit join order in the
PostgreSQL manual.

The IN locution, by the way, is almost always bad in Postgres.  Avoid
it.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110