Thread: cost of CREATE VIEW ... AS SELECT DISTINCT
Hello, How expensive would it be to maintain the following VIEW: CREATE VIEW origin AS SELECT DISTINCT origin FROM transaktion if there is in index on transaktion.origin; the table transaktion has thousands of records and there are only a few distinct origin? -- Regards/Gruß, Tarlika Elisabeth Schmitz
On Mon, 2005-03-28 at 15:43, T E Schmitz wrote: > Hello, > How expensive would it be to maintain the following VIEW: > > CREATE VIEW origin AS SELECT DISTINCT origin FROM transaktion > > if there is in index on transaktion.origin; the table transaktion has > thousands of records and there are only a few distinct origin? The cost will only be encurred when running the view. if you want materialized views (which WOULD have maintenance costs) you'll have to do a google search for postgresql materialized views and implement them by hand, so to speak. The cost of executing that view should be the same as the cost of running the query by hand.
Hello Scott, Scott Marlowe wrote: > On Mon, 2005-03-28 at 15:43, T E Schmitz wrote: > >>How expensive would it be to maintain the following VIEW: >> >>CREATE VIEW origin AS SELECT DISTINCT origin FROM transaktion >> >>if there is in index on transaktion.origin; the table transaktion has >>thousands of records and there are only a few distinct origin? > > > The cost will only be encurred when running the view. if you want > materialized views (which WOULD have maintenance costs) you'll have to Thank you for the pointer - that might come in handy for another scenario. > The cost of executing that view should be the same as the cost of > running the query by hand. I did an EXPLAIN ANALYZE and a sequential scan was carried out despite the index I had on the column. Maybe this is because I only have very few records in my test DB. Would the "SELECT DISTINCT origin" always cause a sequential table scan regardless whether there is an index on the origin column or not? -- Regards/Gruß, Tarlika Elisabeth Schmitz
On Mar 29, 2005, at 5:07 AM, T E Schmitz wrote: > Hello Scott, > > Scott Marlowe wrote: >> On Mon, 2005-03-28 at 15:43, T E Schmitz wrote: >>> How expensive would it be to maintain the following VIEW: >>> >>> CREATE VIEW origin AS SELECT DISTINCT origin FROM transaktion >>> >>> if there is in index on transaktion.origin; the table transaktion >>> has thousands of records and there are only a few distinct origin? >> The cost will only be encurred when running the view. if you want >> materialized views (which WOULD have maintenance costs) you'll have to > > Thank you for the pointer - that might come in handy for another > scenario. > >> The cost of executing that view should be the same as the cost of >> running the query by hand. > > I did an EXPLAIN ANALYZE and a sequential scan was carried out despite > the index I had on the column. Maybe this is because I only have very > few records in my test DB. > > Would the "SELECT DISTINCT origin" always cause a sequential table > scan regardless whether there is an index on the origin column or not? > I think you are right. If this is expensive and run often, you could always normalize further and create a table of "unique_origin" that would have only unique origins and set transaktion to have a foreign key referring to the unique_origin table and then just query the unique_origin table when you need to do the query above. Sean
On Tue, Mar 29, 2005 at 11:07:20 +0100, T E Schmitz <mailreg@numerixtechnology.de> wrote: > > Would the "SELECT DISTINCT origin" always cause a sequential table scan > regardless whether there is an index on the origin column or not? It's worse than that, SELECT DISTINCT cannot use a hash aggregate plan and will need to do a sort to eliminate duplicates. Unless the view is used in a way that restricts the candidate rows, this probably isn't going to be very fast. You might be better off changing the view to use GROUP BY instead of DISTINCT.
Bruno Wolff III wrote: > On Tue, Mar 29, 2005 at 11:07:20 +0100, > T E Schmitz <mailreg@numerixtechnology.de> wrote: > >>Would the "SELECT DISTINCT origin" always cause a sequential table scan >>regardless whether there is an index on the origin column or not? > > > It's worse than that, SELECT DISTINCT cannot use a hash aggregate plan > and will need to do a sort to eliminate duplicates. Unless the view > is used in a way that restricts the candidate rows, this probably isn't going > to be very fast. You might be better off changing the view to use GROUP BY > instead of DISTINCT. As far as I can see (via EXPLAIN), both DISTINCT and GROUP BY will lead to a sequentail scan. Is that correct? If that's the case, I should come up with a different concept to obtain a list of ORIGINs. -- Regards/Gruß, Tarlika Elisabeth Schmitz
On Tue, Mar 29, 2005 at 02:21:15PM +0100, T E Schmitz wrote: > Bruno Wolff III wrote: > >On Tue, Mar 29, 2005 at 11:07:20 +0100, > > T E Schmitz <mailreg@numerixtechnology.de> wrote: > > > >>Would the "SELECT DISTINCT origin" always cause a sequential table > >>scan regardless whether there is an index on the origin column or > >>not? > > > >It's worse than that, SELECT DISTINCT cannot use a hash aggregate > >plan and will need to do a sort to eliminate duplicates. Unless the > >view is used in a way that restricts the candidate rows, this > >probably isn't going to be very fast. You might be better off > >changing the view to use GROUP BY instead of DISTINCT. > > As far as I can see (via EXPLAIN), both DISTINCT and GROUP BY will > lead to a sequentail scan. Is that correct? That (GROUP BY using a seqscan) may be caused by the small size of the table. Try populating it some more. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "¿Qué importan los años? Lo que realmente importa es comprobar que a fin de cuentas la mejor edad de la vida es estar vivo" (Mafalda)
Alvaro Herrera wrote: > On Tue, Mar 29, 2005 at 02:21:15PM +0100, T E Schmitz wrote: > >>Bruno Wolff III wrote: >> >>>On Tue, Mar 29, 2005 at 11:07:20 +0100, >>> T E Schmitz <mailreg@numerixtechnology.de> wrote: >>> >>> >>>>Would the "SELECT DISTINCT origin" always cause a sequential table >>>>scan regardless whether there is an index on the origin column or >>>>not? >>> >>>It's worse than that, SELECT DISTINCT cannot use a hash aggregate >>>plan and will need to do a sort to eliminate duplicates. Unless the >>>view is used in a way that restricts the candidate rows, this >>>probably isn't going to be very fast. You might be better off >>>changing the view to use GROUP BY instead of DISTINCT. >> >>As far as I can see (via EXPLAIN), both DISTINCT and GROUP BY will >>lead to a sequentail scan. Is that correct? > > That (GROUP BY using a seqscan) may be caused by the small size of the > table. Try populating it some more. > Shall do. But am I correct in assuming that I should place an index on the group by (TRANSAKTION.ORIGIN) column? -- Regards/Gruß, Tarlika Elisabeth Schmitz
On Tue, Mar 29, 2005 at 14:21:15 +0100, T E Schmitz <mailreg@numerixtechnology.de> wrote: > > As far as I can see (via EXPLAIN), both DISTINCT and GROUP BY will lead > to a sequentail scan. Is that correct? If you need to read the whole table yes. However if you join the view to something else that might not be necessary. > If that's the case, I should come up with a different concept to obtain > a list of ORIGINs. That may be a good idea, especially if there are lots of rows for each origin value.
On Tue, Mar 29, 2005 at 15:12:24 +0100, T E Schmitz <mailreg@numerixtechnology.de> wrote: > > Shall do. > But am I correct in assuming that I should place an index on the group > by (TRANSAKTION.ORIGIN) column? This will mainly help when joining the view to another table. This would also allow using an index scan instead of a sort in for DISTINCT which may be faster in some cases.
T E Schmitz <mailreg@numerixtechnology.de> writes: > As far as I can see (via EXPLAIN), both DISTINCT and GROUP BY will lead > to a sequentail scan. Is that correct? I'm not sure why you expect something different. The query requires visiting every row of the table in any case --- else it might miss values that occur only once --- therefore a seqscan is most efficient. The planner will consider plans that involve an indexscan, but only as a substitute for doing an explicit sort before a uniq-style grouping step. Generally the explicit sort will win that comparison. And a hash grouping step probably dominates them both. regards, tom lane
Bruno Wolff III wrote: > On Tue, Mar 29, 2005 at 14:21:15 +0100, > T E Schmitz <mailreg@numerixtechnology.de> wrote: >>If that's the case, I should come up with a different concept to obtain >>a list of ORIGINs. > > > That may be a good idea, especially if there are lots of rows for each > origin value. That's what I will do. Thank you for all your input. -- Regards/Gruß, Tarlika Elisabeth Schmitz