Thread: Which query is less expensive / faster?

Which query is less expensive / faster?

From
Postgres Coder
Date:
Hi all,

I have a couple different ways to get results out of my table
structure, but I'm unsure which way is faster or less expensive to
run:

Option 1:  Query a table of 4 million rows, on 4 indexed columns.  It
will return 1 row:

SELECT field1, field2, field3, field4
FROM tablea
WHERE field1 = $1 AND field2 = $2 AND field3 = $3 AND field4 = $4

Option 2: Query a table of 200,000 rows on 1 indexed column.  It will
return 800 rows:

SELECT *
FROM tableb
WHERE field1 = $1

Which one is going to return results the fastest, with the least
expense to the database server?

Re: Which query is less expensive / faster?

From
Tzahi Fadida
Date:
I am not an expert to postgres internals but in General:
if you have a btree multicolumn index on the 4 fields
then it should take around logF(4M). lets guess the F
to be 5 so its around 10 ios +1 block read.
for the same thing for a hashtable its about the same or less.

if you have any subset of the fields indexed with a btree
it costs logF(4M) + all the blocks with those subset which is still
better than a sequential scan.

another possibility which requires careful analyze of the frequencies
is intersecting all the rows from the 4 separate indices and finding 1
that matches.

In any case, when in doubt run the EXPLAIN on your query.
see the documentation.

Regards,
    tzahi.

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of
> Postgres Coder
> Sent: Friday, February 25, 2005 1:46 AM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Which query is less expensive / faster?
>
>
> Hi all,
>
> I have a couple different ways to get results out of my table
> structure, but I'm unsure which way is faster or less expensive to
> run:
>
> Option 1:  Query a table of 4 million rows, on 4 indexed
> columns.  It will return 1 row:
>
> SELECT field1, field2, field3, field4
> FROM tablea
> WHERE field1 = $1 AND field2 = $2 AND field3 = $3 AND field4 = $4
>
> Option 2: Query a table of 200,000 rows on 1 indexed column.
> It will return 800 rows:
>
> SELECT *
> FROM tableb
> WHERE field1 = $1
>
> Which one is going to return results the fastest, with the
> least expense to the database server?
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>
>



Re: Which query is less expensive / faster?

From
"Jon Asher"
Date:
Thanks for the reply... but which query will be faster and less expensive?
I don't have a table now with 4 million rows, but I'm thinking of building
such a table.  Querying it would return 1 row.  The alternative is to query
an existing table of 200k rows, and return 800 rows.

Option 1:  Query a table of 4 million rows, on 4 indexed columns.  It
will return 1 row:

SELECT field1, field2, field3, field4 FROM tablea WHERE field1 = $1
AND field2 = $2 AND field3 = $3 AND field4 = $4

Option 2: Query a table of 200,000 rows on 1 indexed column.
It will return 800 rows:

SELECT *
FROM tableb
WHERE field1 = $1

 Which one is going to return results the fastest, with the least
 expense to the database server?

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Tzahi Fadida
Sent: Thursday, February 24, 2005 5:18 PM
To: 'Postgres Coder'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Which query is less expensive / faster?

I am not an expert to postgres internals but in General:
if you have a btree multicolumn index on the 4 fields then it should take
around logF(4M). lets guess the F to be 5 so its around 10 ios +1 block
read.
for the same thing for a hashtable its about the same or less.

if you have any subset of the fields indexed with a btree it costs logF(4M)
+ all the blocks with those subset which is still better than a sequential
scan.

another possibility which requires careful analyze of the frequencies is
intersecting all the rows from the 4 separate indices and finding 1 that
matches.

In any case, when in doubt run the EXPLAIN on your query.
see the documentation.

Regards,
    tzahi.

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Postgres
> Coder
> Sent: Friday, February 25, 2005 1:46 AM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Which query is less expensive / faster?
>
>
> Hi all,
>
> I have a couple different ways to get results out of my table
> structure, but I'm unsure which way is faster or less expensive to
> run:
>
> Option 1:  Query a table of 4 million rows, on 4 indexed columns.  It
> will return 1 row:
>
> SELECT field1, field2, field3, field4 FROM tablea WHERE field1 = $1
> AND field2 = $2 AND field3 = $3 AND field4 = $4
>
> Option 2: Query a table of 200,000 rows on 1 indexed column.
> It will return 800 rows:
>
> SELECT *
> FROM tableb
> WHERE field1 = $1
>
> Which one is going to return results the fastest, with the least
> expense to the database server?
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>
>



---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)



Re: Which query is less expensive / faster?

From
Richard Huxton
Date:
Jon Asher wrote:
> Thanks for the reply... but which query will be faster and less expensive?
> I don't have a table now with 4 million rows, but I'm thinking of building
> such a table.  Querying it would return 1 row.  The alternative is to query
> an existing table of 200k rows, and return 800 rows.
>
> Option 1:  Query a table of 4 million rows, on 4 indexed columns.  It
> will return 1 row:
>
> SELECT field1, field2, field3, field4 FROM tablea WHERE field1 = $1
> AND field2 = $2 AND field3 = $3 AND field4 = $4
>
> Option 2: Query a table of 200,000 rows on 1 indexed column.
> It will return 800 rows:
>
> SELECT *
> FROM tableb
> WHERE field1 = $1
>
>  Which one is going to return results the fastest, with the least
>  expense to the database server?

  30 seconds anyway.The answer can only be "that depends". Maybe option1
has very wide columns for indexing and a poor spread of values. Maybe
option 2 has only three small columns, the indexed column is very
discriminating and the whole table plus index fits in RAM. Maybe you
have an array of 15,000rpm SCSI disks, maybe you have a laptop IDE
drive, maybe the client is at the end of a 56k modem line and fetching
800 rows will take  Of course, that's all without taking into account
whatever else the database is doing, not to mention the rest of the machine.

Now, in general fetching one row will be quicker, but without real
information no-one can say. The best and quickest solution? Mock up some
tables with test-data and try it with your actual setup.  Make sure your
test-data has a realistic distribution of values and number of rows. I
got caught out a couple of weeks ago where one small and insignificant
table crippled a big stats recalculation - all because my testing had
focused on the typical case of a few dozen rows and not the occasional
case of a thousand or so.

--
   Richard Huxton
   Archonet Ltd

Re: Which query is less expensive / faster?

From
Richard Huxton
Date:
Jon Asher wrote:
> Thanks for the reply... but which query will be faster and less expensive?
> I don't have a table now with 4 million rows, but I'm thinking of building
> such a table.  Querying it would return 1 row.  The alternative is to query
> an existing table of 200k rows, and return 800 rows.

Actually, as a side-issue to my last reply...

How does one row contain the same information as 800?

--
   Richard Huxton
   Archonet Ltd