Thread: cost of CREATE VIEW ... AS SELECT DISTINCT

cost of CREATE VIEW ... AS SELECT DISTINCT

From
T E Schmitz
Date:
Hello,
How expensive would it be to maintain the following VIEW:

CREATE VIEW origin AS SELECT DISTINCT origin FROM transaktion

if there is in index on transaktion.origin; the table transaktion has 
thousands of records and there are only a few distinct origin?

-- 


Regards/Gruß,

Tarlika Elisabeth Schmitz


Re: cost of CREATE VIEW ... AS SELECT DISTINCT

From
Scott Marlowe
Date:
On Mon, 2005-03-28 at 15:43, T E Schmitz wrote:
> Hello,
> How expensive would it be to maintain the following VIEW:
> 
> CREATE VIEW origin AS SELECT DISTINCT origin FROM transaktion
> 
> if there is in index on transaktion.origin; the table transaktion has 
> thousands of records and there are only a few distinct origin?

The cost will only be encurred when running the view.  if you want
materialized views (which WOULD have maintenance costs) you'll have to
do a google search for postgresql materialized views and implement them
by hand, so to speak.

The cost of executing that view should be the same as the cost of
running the query by hand.


Re: cost of CREATE VIEW ... AS SELECT DISTINCT

From
T E Schmitz
Date:
Hello Scott,

Scott Marlowe wrote:
> On Mon, 2005-03-28 at 15:43, T E Schmitz wrote:
> 
>>How expensive would it be to maintain the following VIEW:
>>
>>CREATE VIEW origin AS SELECT DISTINCT origin FROM transaktion
>>
>>if there is in index on transaktion.origin; the table transaktion has 
>>thousands of records and there are only a few distinct origin?
> 
> 
> The cost will only be encurred when running the view.  if you want
> materialized views (which WOULD have maintenance costs) you'll have to

Thank you for the pointer - that might come in handy for another scenario.

> The cost of executing that view should be the same as the cost of
> running the query by hand.

I did an EXPLAIN ANALYZE and a sequential scan was carried out despite 
the index I had on the column. Maybe this is because I only have very 
few records in my test DB.

Would the "SELECT DISTINCT origin" always cause a sequential table scan 
regardless whether there is an index on the origin column or not?



-- 


Regards/Gruß,

Tarlika Elisabeth Schmitz


Re: cost of CREATE VIEW ... AS SELECT DISTINCT

From
Sean Davis
Date:
On Mar 29, 2005, at 5:07 AM, T E Schmitz wrote:

> Hello Scott,
>
> Scott Marlowe wrote:
>> On Mon, 2005-03-28 at 15:43, T E Schmitz wrote:
>>> How expensive would it be to maintain the following VIEW:
>>>
>>> CREATE VIEW origin AS SELECT DISTINCT origin FROM transaktion
>>>
>>> if there is in index on transaktion.origin; the table transaktion 
>>> has thousands of records and there are only a few distinct origin?
>> The cost will only be encurred when running the view.  if you want
>> materialized views (which WOULD have maintenance costs) you'll have to
>
> Thank you for the pointer - that might come in handy for another 
> scenario.
>
>> The cost of executing that view should be the same as the cost of
>> running the query by hand.
>
> I did an EXPLAIN ANALYZE and a sequential scan was carried out despite 
> the index I had on the column. Maybe this is because I only have very 
> few records in my test DB.
>
> Would the "SELECT DISTINCT origin" always cause a sequential table 
> scan regardless whether there is an index on the origin column or not?
>

I think you are right.  If this is expensive and run often, you could 
always normalize further and create a table of "unique_origin" that 
would have only unique origins and set transaktion to have a foreign 
key referring to the unique_origin table and then just query the 
unique_origin table when you need to do the query above.

Sean



Re: cost of CREATE VIEW ... AS SELECT DISTINCT

From
Bruno Wolff III
Date:
On Tue, Mar 29, 2005 at 11:07:20 +0100, T E Schmitz <mailreg@numerixtechnology.de> wrote:
> 
> Would the "SELECT DISTINCT origin" always cause a sequential table scan 
> regardless whether there is an index on the origin column or not?

It's worse than that, SELECT DISTINCT cannot use a hash aggregate plan
and will need to do a sort to eliminate duplicates. Unless the view
is used in a way that restricts the candidate rows, this probably isn't going
to be very fast. You might be better off changing the view to use GROUP BY
instead of DISTINCT.


Re: cost of CREATE VIEW ... AS SELECT DISTINCT

From
T E Schmitz
Date:
Bruno Wolff III wrote:
> On Tue, Mar 29, 2005 at 11:07:20 +0100,
>   T E Schmitz <mailreg@numerixtechnology.de> wrote:
> 
>>Would the "SELECT DISTINCT origin" always cause a sequential table scan 
>>regardless whether there is an index on the origin column or not?
> 
> 
> It's worse than that, SELECT DISTINCT cannot use a hash aggregate plan
> and will need to do a sort to eliminate duplicates. Unless the view
> is used in a way that restricts the candidate rows, this probably isn't going
> to be very fast. You might be better off changing the view to use GROUP BY
> instead of DISTINCT.

As far as I can see (via EXPLAIN), both DISTINCT and GROUP BY will lead 
to a sequentail scan. Is that correct?

If that's the case, I should come up with a different concept to obtain 
a list of ORIGINs.

-- 


Regards/Gruß,

Tarlika Elisabeth Schmitz


Re: cost of CREATE VIEW ... AS SELECT DISTINCT

From
Alvaro Herrera
Date:
On Tue, Mar 29, 2005 at 02:21:15PM +0100, T E Schmitz wrote:
> Bruno Wolff III wrote:
> >On Tue, Mar 29, 2005 at 11:07:20 +0100,
> >  T E Schmitz <mailreg@numerixtechnology.de> wrote:
> >
> >>Would the "SELECT DISTINCT origin" always cause a sequential table
> >>scan regardless whether there is an index on the origin column or
> >>not?
> >
> >It's worse than that, SELECT DISTINCT cannot use a hash aggregate
> >plan and will need to do a sort to eliminate duplicates. Unless the
> >view is used in a way that restricts the candidate rows, this
> >probably isn't going to be very fast. You might be better off
> >changing the view to use GROUP BY instead of DISTINCT.
> 
> As far as I can see (via EXPLAIN), both DISTINCT and GROUP BY will
> lead to a sequentail scan. Is that correct?

That (GROUP BY using a seqscan) may be caused by the small size of the
table.  Try populating it some more.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"¿Qué importan los años?  Lo que realmente importa es comprobar que
a fin de cuentas la mejor edad de la vida es estar vivo"  (Mafalda)


Re: cost of CREATE VIEW ... AS SELECT DISTINCT

From
T E Schmitz
Date:
Alvaro Herrera wrote:
> On Tue, Mar 29, 2005 at 02:21:15PM +0100, T E Schmitz wrote:
> 
>>Bruno Wolff III wrote:
>>
>>>On Tue, Mar 29, 2005 at 11:07:20 +0100,
>>> T E Schmitz <mailreg@numerixtechnology.de> wrote:
>>>
>>>
>>>>Would the "SELECT DISTINCT origin" always cause a sequential table
>>>>scan regardless whether there is an index on the origin column or
>>>>not?
>>>
>>>It's worse than that, SELECT DISTINCT cannot use a hash aggregate
>>>plan and will need to do a sort to eliminate duplicates. Unless the
>>>view is used in a way that restricts the candidate rows, this
>>>probably isn't going to be very fast. You might be better off
>>>changing the view to use GROUP BY instead of DISTINCT.
>>
>>As far as I can see (via EXPLAIN), both DISTINCT and GROUP BY will
>>lead to a sequentail scan. Is that correct?
> 
> That (GROUP BY using a seqscan) may be caused by the small size of the
> table.  Try populating it some more.
> 

Shall do.
But am I correct in assuming that I should place an index on the group 
by (TRANSAKTION.ORIGIN) column?


-- 


Regards/Gruß,

Tarlika Elisabeth Schmitz


Re: cost of CREATE VIEW ... AS SELECT DISTINCT

From
Bruno Wolff III
Date:
On Tue, Mar 29, 2005 at 14:21:15 +0100, T E Schmitz <mailreg@numerixtechnology.de> wrote:
> 
> As far as I can see (via EXPLAIN), both DISTINCT and GROUP BY will lead 
> to a sequentail scan. Is that correct?

If you need to read the whole table yes. However if you join the view
to something else that might not be necessary.
> If that's the case, I should come up with a different concept to obtain 
> a list of ORIGINs.

That may be a good idea, especially if there are lots of rows for each
origin value.


Re: cost of CREATE VIEW ... AS SELECT DISTINCT

From
Bruno Wolff III
Date:
On Tue, Mar 29, 2005 at 15:12:24 +0100, T E Schmitz <mailreg@numerixtechnology.de> wrote:
> 
> Shall do.
> But am I correct in assuming that I should place an index on the group 
> by (TRANSAKTION.ORIGIN) column?

This will mainly help when joining the view to another table. This would
also allow using an index scan instead of a sort in for DISTINCT which
may be faster in some cases.


Re: cost of CREATE VIEW ... AS SELECT DISTINCT

From
Tom Lane
Date:
T E Schmitz <mailreg@numerixtechnology.de> writes:
> As far as I can see (via EXPLAIN), both DISTINCT and GROUP BY will lead 
> to a sequentail scan. Is that correct?

I'm not sure why you expect something different.  The query requires
visiting every row of the table in any case --- else it might miss
values that occur only once --- therefore a seqscan is most efficient.

The planner will consider plans that involve an indexscan, but only
as a substitute for doing an explicit sort before a uniq-style grouping
step.  Generally the explicit sort will win that comparison.  And a
hash grouping step probably dominates them both.
        regards, tom lane


Re: cost of CREATE VIEW ... AS SELECT DISTINCT

From
T E Schmitz
Date:
Bruno Wolff III wrote:
> On Tue, Mar 29, 2005 at 14:21:15 +0100,
>   T E Schmitz <mailreg@numerixtechnology.de> wrote:

>>If that's the case, I should come up with a different concept to obtain 
>>a list of ORIGINs.
> 
> 
> That may be a good idea, especially if there are lots of rows for each
> origin value.

That's what I will do. Thank you for all your input.

-- 


Regards/Gruß,

Tarlika Elisabeth Schmitz