Thread: effective SELECT from child tables
<div class="Section1"><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-US" style="font-size: 10.0pt;font-family:Arial">Let table A be inherited by A1, A2, A3.</span></font><p class="MsoNormal"><font face="Arial" size="2"><spanlang="EN-US" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-US" style="font-size: 10.0pt;font-family:Arial">How to select from A records where actual relations are A1, A2 ?</span></font><p class="MsoNormal"><fontface="Arial" size="2"><span lang="EN-US" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-US" style="font-size: 10.0pt;font-family:Arial">I found a way somewhere, it sounds like SELECT … WHERE tableoid IN (a1.oid, a2.oid),</span></font><pclass="MsoNormal"><font face="Arial" size="2"><span lang="EN-US" style="font-size: 10.0pt;font-family:Arial">but tableoid checks actually do seq scan.</span></font><p class="MsoNormal"><font face="Arial"size="2"><span lang="EN-US" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-US" style="font-size: 10.0pt;font-family:Arial">Like: SELECT * FROM sometable WHERE tableoid =anything will do seq. scan on sometable..</span></font><pclass="MsoNormal"><font face="Arial" size="2"><span lang="EN-US" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-US" style="font-size: 10.0pt;font-family:Arial">So such way seems very ineffective: it seq scans and filters records..</span></font><p class="MsoNormal"><fontface="Arial" size="2"><span lang="EN-US" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-US" style="font-size: 10.0pt;font-family:Arial">Maybe new constraint_exclusion staff could help to exclude non-matching tables from inheritancequery ?</span></font></div>
Ilia, > Maybe new constraint_exclusion staff could help to exclude non-matching > tables from inheritance query ? Yes, that's exactly what it's for. Your testing is welcome. Download 8.1 and try it today. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
"Ilia Kantor" <ilia@obnovlenie.ru> writes: > Let table A be inherited by A1, A2, A3. > How to select from A records where actual relations are A1, A2 ? Why not just select directly from the child tables? I can't get excited about optimizing the case you propose. regards, tom lane
>> Maybe new constraint_exclusion staff could help to exclude non-matching >> tables from inheritance query ? > Yes, that's exactly what it's for. Your testing is welcome. Download 8.1 > and try it today. Great, I'm developing on 8.1b2 now... But could you be more particular about the solution ? Only the way I can think of is to add "relname" field into parent table, add "BEFORE INSERT" trigger to each child that will set it appropriately and CHECK (relname=<here goes table name>). It works in this case.. More than that.. I can create indexes on each table with "WHERE relname!=<table name>", then they are used with DELETE/UPDATE WHERE relname IN(..). But the whole idea to adding an extra field, trigger and duplicating table name multiple times.. Feels a bit stinky to me.. Could you suggest another path ?
>> Let table A be inherited by A1, A2, A3. >> How to select from A records where actual relations are A1, A2 ? >Why not just select directly from the child tables? I can't get excited >about optimizing the case you propose. Because "WHERE concrete_class IN (a,b,c)" is much more convenient and flexible way of forming select then manually split request into many unions. Also, this query runs on top of "abstract class", so inheritance really assists me here.
On Tue, Sep 27, 2005 at 09:30:55PM +0400, Ilia Kantor wrote: > Let table A be inherited by A1, A2, A3. > How to select from A records where actual relations are A1, A2 ? If A1 and A2 will be naturally together, where compared to A, or A3, why not introduce an intermediate table? A would be inherited by A12, and A3. A12 would be inherited by A1, and A2. You can do the UNION yourself, as well. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Wed, 2005-09-28 at 12:13 +0400, Ilia Kantor wrote: > >> Let table A be inherited by A1, A2, A3. > >> How to select from A records where actual relations are A1, A2 ? > > >Why not just select directly from the child tables? I can't get excited > >about optimizing the case you propose. > > Because "WHERE concrete_class IN (a,b,c)" is much more convenient and > flexible way of forming select then manually split request into many unions. > Also, this query runs on top of "abstract class", so inheritance really > assists me here. > If you treat the sub-class Discriminator as a data item rather than some additional syntax for class membership then you will find this works very straightforwardly for you. Include the Discriminator as a column in A and it will be inherited by all A1, A2, A3. e.g. concrete_class char(1) not null then setup constraints on each table like so ALTER TABLE A1 ADD CHECK (concrete_class = 'A') ALTER TABLE A2 ADD CHECK (concrete_class = 'B') ALTER TABLE A3 ADD CHECK (concrete_class = 'C') then when you run a query like SELECT * FROM A WHERE concrete_class IN ('A','B') you will find that table A3, which corresponds to concrete_class C has been excluded from your query. Presumably A, B, C are all mutually exclusive, so the end result will be the same as if you had used a UNION ALL set query. This will add 1 byte per row in your superclass... and requires no index. You can even add this as a DEFAULT value for each child table, so the actual column concrete_class need not be mentioned in an INSERT statement. (I've got plans to add an ABSTRACT keyword to tables to follow along the same set of OO terminology in describing this situation. In next release, not 8.1) Best Regards, Simon Riggs
> If you treat the sub-class Discriminator as a data item rather than some > additional syntax for class membership then you will find this works > very straightforwardly for you. Your suggestion is essentially the same as mine.. There exists tableoid, pretty much suited to tell between tables in the case of inheritance.. I can't see a "real" need to add a special "class classifier" to each table.. This solution is a workaround. It will work, just can't make myself love it.
On Wed, Sep 28, 2005 at 11:00:46PM +0400, Ilia Kantor wrote: > Your suggestion is essentially the same as mine.. > There exists tableoid, pretty much suited to tell between tables in the case > of inheritance.. I can't see a "real" need to add a special "class > classifier" to each table.. > This solution is a workaround. It will work, just can't make myself love it. I wonder if it would be possible to tweak the constraints exclusion code so that if it sees something of the form "tableoid = X" to exclude other tables... You know, assume each table has a constraint "tableoid = OID". Still, it is a fairly unusual feature. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Ilia, Well, Simon is still writing the CE docs. In the meantime: http://archives.postgresql.org/pgsql-hackers/2005-07/msg00461.php --josh
On Wed, Sep 28, 2005 at 10:24:18PM +0200, Martijn van Oosterhout wrote: > On Wed, Sep 28, 2005 at 11:00:46PM +0400, Ilia Kantor wrote: > > Your suggestion is essentially the same as mine.. > > There exists tableoid, pretty much suited to tell between tables in the case > > of inheritance.. I can't see a "real" need to add a special "class > > classifier" to each table.. > > This solution is a workaround. It will work, just can't make myself love it. > > I wonder if it would be possible to tweak the constraints exclusion > code so that if it sees something of the form "tableoid = X" to exclude > other tables... > > You know, assume each table has a constraint "tableoid = OID". > > Still, it is a fairly unusual feature. Well, it's possibly a good way to do list partitioning where you can drop the partitioning key out of each partition, something I would love to have for stats.distributed.net (I'm actually working on a project that does exactly this with a UNION ALL view and rules...) -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, Sep 28, 2005 at 07:25:46PM +0100, Simon Riggs wrote: > Include the Discriminator as a column in A and it will be inherited by > all A1, A2, A3. > e.g. concrete_class char(1) not null <snip> > This will add 1 byte per row in your superclass... and requires no I thought char was actually stored variable-length...? I know there's a type that actually acts like char does on most databases, but I can't remember what it is off-hand (it should be mentioned in docs 8.3...) -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, 2005-09-28 at 22:24 +0200, Martijn van Oosterhout wrote: > On Wed, Sep 28, 2005 at 11:00:46PM +0400, Ilia Kantor wrote: > > Your suggestion is essentially the same as mine.. > > There exists tableoid, pretty much suited to tell between tables in the case > > of inheritance.. I can't see a "real" need to add a special "class > > classifier" to each table.. > > This solution is a workaround. It will work, just can't make myself love it. > > I wonder if it would be possible to tweak the constraints exclusion > code so that if it sees something of the form "tableoid = X" to exclude > other tables... > > You know, assume each table has a constraint "tableoid = OID". > > Still, it is a fairly unusual feature. As I pointed out, the solution I proposed works with CE, in comparison with selecting a tableoid, which does not. It also costs 1 byte per row. Jim Nasby requested a similar feature a couple of months back. Essentially this request reduces to the idea of having attributes that are constant for all rows of a table. That doesn't have any side or additional benefits AFAICS, so improving that case isn't going to be at the top of a worthwhile-improvements list for a long time yet, especially since the CE list already has at least 10 items on it. Best Regards, Simon Riggs
On Fri, 2005-09-30 at 18:30 -0500, Jim C. Nasby wrote: > I thought char was actually stored variable-length...? I know there's a > type that actually acts like char does on most databases, but I can't > remember what it is off-hand (it should be mentioned in docs 8.3...) You are correct on that point, but IMHO that doesn't change the situation sufficiently for the main part of this thread. Best Regards, Simon Riggs
On Fri, Sep 30, 2005 at 06:30:10PM -0500, Jim C. Nasby wrote: > On Wed, Sep 28, 2005 at 07:25:46PM +0100, Simon Riggs wrote: > > Include the Discriminator as a column in A and it will be inherited by > > all A1, A2, A3. > > e.g. concrete_class char(1) not null > <snip> > > This will add 1 byte per row in your superclass... and requires no > > I thought char was actually stored variable-length...? I know there's a > type that actually acts like char does on most databases, but I can't > remember what it is off-hand (it should be mentioned in docs 8.3...) IIRC, this is the difference between "char" and char(1). The latter is variable length and can store any character per current encoding, hence the variable length. "char" on the other hand is a one byte (presumably ASCII) character. It's used mainly in the system catalogs... -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
On Fri, Sep 30, 2005 at 09:54:39PM +0100, Simon Riggs wrote: > On Wed, 2005-09-28 at 22:24 +0200, Martijn van Oosterhout wrote: > > I wonder if it would be possible to tweak the constraints exclusion > > code so that if it sees something of the form "tableoid = X" to exclude > > other tables... > > You know, assume each table has a constraint "tableoid = OID". > > Still, it is a fairly unusual feature. > As I pointed out, the solution I proposed works with CE, in comparison > with selecting a tableoid, which does not. It also costs 1 byte per row. I can't see the use of an extra char column in a row, that is static for an entire table, as anything except an ugly hack. It might work. It doesn't make it right. > Jim Nasby requested a similar feature a couple of months back. > Essentially this request reduces to the idea of having attributes that > are constant for all rows of a table. That doesn't have any side or > additional benefits AFAICS, so improving that case isn't going to be at > the top of a worthwhile-improvements list for a long time yet, > especially since the CE list already has at least 10 items on it. It has the 'side or additional benefit' being requested here. The ability to filter the child table by some attribute. For example, if the child tables are used for partitioning, and the attribute were to keep a date range, the field restriction optimization could be used to automatically determine the set of tables to use for the date range specified. With such a change, it would even work automatically if the date ranges overlapped for some reason. Selecting a table name by date is hacky. This sort of solution would be a general solution to the problem. If the original code suggested - the use of the table oid, and comparing it to the rows table oid, is supported by PostgreSQL (I have never tried it), it seems like a miss for the optimizer not to understand what it means. Each row has one table that it is associated with. It's quite obviously criteria that could be used to restrict the table search, and belongs in any COMPLETE implementation of a restriction optimizer. I still prefer Jim Nasby's model, though. I think it easily extends such that the table row oid IS an automatic constant for all rows of a table. It's a natural extension, and takes away the exceptional nature of the table row oid. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Sat, Oct 01, 2005 at 10:05:22AM -0400, mark@mark.mielke.cc wrote: > It has the 'side or additional benefit' being requested here. The ability > to filter the child table by some attribute. For example, if the child > tables are used for partitioning, and the attribute were to keep a date > range, the field restriction optimization could be used to automatically > determine the set of tables to use for the date range specified. With > such a change, it would even work automatically if the date ranges > overlapped for some reason. Selecting a table name by date is hacky. This > sort of solution would be a general solution to the problem. This is what "Constraint Exclusion" does. It uses CHECK constraints on a table to filter out tables that obviously don't apply to a query. It's just the the specific case of "tableoid = XXX" is not supported right now. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
To clarify, this is a hard-coded implementation of what I'm asking for: http://cvs.distributed.net/viewcvs.cgi/stats-sql/logdb/ in a nutshell: CREATE TABLE log_other ( project_id smallint NOT NULL ... ) CREATE TABLE log_8 ( -- No project_id ... ) CREATE TABLE log_24, log_25, log_5... CREATE VIEW log AS SELECT * FROM log_other UNION ALL SELECT 8 AS project_id, * FROM log_8 ... So the end result is that for cases where project_id is 5, 8, 24, or 25, the data will be stored in tables that don't have the project_id. If I were to use this on the main table for http://stats.distributed.net, which has ~130M rows, I would be able to save 130M*4 bytes (4 instead of 2 due to alignment), or 520MB. The logdb will have many times that number of rows, so the savings will be even larger. Note that this technique wouldn't help at all for something like date partitioning, because you have to store the date in the partitioned table. On Sat, Oct 01, 2005 at 04:35:49PM +0200, Martijn van Oosterhout wrote: > On Sat, Oct 01, 2005 at 10:05:22AM -0400, mark@mark.mielke.cc wrote: > > It has the 'side or additional benefit' being requested here. The ability > > to filter the child table by some attribute. For example, if the child > > tables are used for partitioning, and the attribute were to keep a date > > range, the field restriction optimization could be used to automatically > > determine the set of tables to use for the date range specified. With > > such a change, it would even work automatically if the date ranges > > overlapped for some reason. Selecting a table name by date is hacky. This > > sort of solution would be a general solution to the problem. > > This is what "Constraint Exclusion" does. It uses CHECK constraints on > a table to filter out tables that obviously don't apply to a query. > It's just the the specific case of "tableoid = XXX" is not supported > right now. > > Have a nice day, > -- > Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > > tool for doing 5% of the work and then sitting around waiting for someone > > else to do the other 95% so you can sue them. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Sat, Oct 01, 2005 at 09:43:03AM +0100, Simon Riggs wrote: > On Fri, 2005-09-30 at 18:30 -0500, Jim C. Nasby wrote: > > > I thought char was actually stored variable-length...? I know there's a > > type that actually acts like char does on most databases, but I can't > > remember what it is off-hand (it should be mentioned in docs 8.3...) > > You are correct on that point, but IMHO that doesn't change the > situation sufficiently for the main part of this thread. Depends on how many tuples and how wide they are. 4 extra bytes over a couple hundred million rows isn't something to sneeze at. :) -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Sat, Oct 01, 2005 at 02:13:09PM +0200, Martijn van Oosterhout wrote: > On Fri, Sep 30, 2005 at 06:30:10PM -0500, Jim C. Nasby wrote: > > On Wed, Sep 28, 2005 at 07:25:46PM +0100, Simon Riggs wrote: > > > Include the Discriminator as a column in A and it will be inherited by > > > all A1, A2, A3. > > > e.g. concrete_class char(1) not null > > <snip> > > > This will add 1 byte per row in your superclass... and requires no > > > > I thought char was actually stored variable-length...? I know there's a > > type that actually acts like char does on most databases, but I can't > > remember what it is off-hand (it should be mentioned in docs 8.3...) > > IIRC, this is the difference between "char" and char(1). The latter is > variable length and can store any character per current encoding, hence > the variable length. "char" on the other hand is a one byte (presumably > ASCII) character. It's used mainly in the system catalogs... According to the docs, char == char(1). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: >On Sat, Oct 01, 2005 at 02:13:09PM +0200, Martijn van Oosterhout wrote: > > >>On Fri, Sep 30, 2005 at 06:30:10PM -0500, Jim C. Nasby wrote: >> >> >>>On Wed, Sep 28, 2005 at 07:25:46PM +0100, Simon Riggs wrote: >>> >>> >>>>Include the Discriminator as a column in A and it will be inherited by >>>>all A1, A2, A3. >>>>e.g. concrete_class char(1) not null >>>> >>>> >>><snip> >>> >>> >>>>This will add 1 byte per row in your superclass... and requires no >>>> >>>> >>>I thought char was actually stored variable-length...? I know there's a >>>type that actually acts like char does on most databases, but I can't >>>remember what it is off-hand (it should be mentioned in docs 8.3...) >>> >>> >>IIRC, this is the difference between "char" and char(1). The latter is >>variable length and can store any character per current encoding, hence >>the variable length. "char" on the other hand is a one byte (presumably >>ASCII) character. It's used mainly in the system catalogs... >> >> > >According to the docs, char == char(1). > > The docs also say: The type "char" (note the quotes) is different from char(1) in that it only uses one byte of storage. It is internally used in the system catalogs as a poor-man's enumeration type. cheers andrew
Hmm, I'm trying to understand here. If every row in log_8 should have the same project_id, couldn't this be acheived by having each row in log_other contain the tableoid of the table it refers to. Then a join will return the info you're looking for. Or am I missing something? On Sat, Oct 01, 2005 at 10:57:27AM -0500, Jim C. Nasby wrote: > To clarify, this is a hard-coded implementation of what I'm asking for: > http://cvs.distributed.net/viewcvs.cgi/stats-sql/logdb/ in a nutshell: > > CREATE TABLE log_other ( > project_id smallint NOT NULL > ... > ) > > CREATE TABLE log_8 ( > -- No project_id > ... > ) > CREATE TABLE log_24, log_25, log_5... > CREATE VIEW log AS > SELECT * FROM log_other > UNION ALL SELECT 8 AS project_id, * FROM log_8 > ... > > So the end result is that for cases where project_id is 5, 8, 24, or 25, > the data will be stored in tables that don't have the project_id. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
On Sat, Oct 01, 2005 at 06:28:03PM +0200, Martijn van Oosterhout wrote: > Hmm, I'm trying to understand here. If every row in log_8 should have > the same project_id, couldn't this be acheived by having each row in log_other > contain the tableoid of the table it refers to. Then a join will return > the info you're looking for. > > Or am I missing something? log_other will only contain rows where project_id NOT IN (5,8,24,25). The UNION ALL view is the general purpose way to read data from the tables. (Actually, since it has rules on it, it's also the way to write data to the tables as well). > On Sat, Oct 01, 2005 at 10:57:27AM -0500, Jim C. Nasby wrote: > > To clarify, this is a hard-coded implementation of what I'm asking for: > > http://cvs.distributed.net/viewcvs.cgi/stats-sql/logdb/ in a nutshell: > > > > CREATE TABLE log_other ( > > project_id smallint NOT NULL > > ... > > ) > > > > CREATE TABLE log_8 ( > > -- No project_id > > ... > > ) > > CREATE TABLE log_24, log_25, log_5... > > CREATE VIEW log AS > > SELECT * FROM log_other > > UNION ALL SELECT 8 AS project_id, * FROM log_8 > > ... > > > > So the end result is that for cases where project_id is 5, 8, 24, or 25, > > the data will be stored in tables that don't have the project_id. > -- > Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > > tool for doing 5% of the work and then sitting around waiting for someone > > else to do the other 95% so you can sue them. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Sat, 2005-10-01 at 10:57 -0500, Jim C. Nasby wrote: > To clarify, this is a hard-coded implementation of what I'm asking for: > http://cvs.distributed.net/viewcvs.cgi/stats-sql/logdb/ in a nutshell: > > CREATE TABLE log_other ( > project_id smallint NOT NULL > ... > ) > > CREATE TABLE log_8 ( > -- No project_id > ... > ) > CREATE TABLE log_24, log_25, log_5... > CREATE VIEW log AS > SELECT * FROM log_other > UNION ALL SELECT 8 AS project_id, * FROM log_8 > ... > > So the end result is that for cases where project_id is 5, 8, 24, or 25, > the data will be stored in tables that don't have the project_id. > > If I were to use this on the main table for > http://stats.distributed.net, which has ~130M rows, I would be able to > save 130M*4 bytes (4 instead of 2 due to alignment), or 520MB. The logdb > will have many times that number of rows, so the savings will be even > larger. > > Note that this technique wouldn't help at all for something like date > partitioning, because you have to store the date in the partitioned > table. Jim, Your idea was noted before and actually; I mentioned it to show that I listen and take note of ideas from any source. For everybody, I would note that the current behaviour is exactly the way that List Partitioning works on other systems. The cost of this technique is only paid if you choose to partition on something that you would not otherwise have included in your table. In many cases, you'll choose a column that would have been in the table if you created one big table so the additional cost is zero. In your example, I would expect to see project_id in a superclass table and so there would be no cost. The idea is neat, but IMHO the potential saving of this idea is not big enough for me to prioritise that very highly over other items at this time. Best Regards, Simon Riggs
On L, 2005-10-01 at 19:59 +0100, Simon Riggs wrote: > Jim, > > Your idea was noted before and actually; I mentioned it to show that I > listen and take note of ideas from any source. > > For everybody, I would note that the current behaviour is exactly the > way that List Partitioning works on other systems. > > The cost of this technique is only paid if you choose to partition on > something that you would not otherwise have included in your table. In > many cases, you'll choose a column that would have been in the table if > you created one big table so the additional cost is zero. If we had not disabled SELECT rules on ordinary tables some time back (reserving them exclusively for VIEWs), then most of the benefit of not storing static tables would have been obtained by storing NULL in the constant column (via RULE or TRIGGER) and creating an ON SELECT rule on the subtable that returns the desired constant value. I also often wish that this would be possible when someone adds a column with a default value to a multi-million row table on a 24/7 production system and insists on filling all existing columns with the default. A rule "ON SELECT FROM table_x WHERE col_x IS NULL return col_x=default_for_col_x" would solve that nicely. This would even not require adding null bitmap to existing tuples with no null values. > In your example, I would expect to see project_id in a superclass table > and so there would be no cost. > > The idea is neat, but IMHO the potential saving of this idea is not big > enough for me to prioritise that very highly over other items at this > time. I think that bringing the ON SELECT rules of form "ON SELECT ... RETURN DEFAUL FOR COLUMN x" would be the cleanest and easiest way to do this. Another use of SELECT rules would be introducing computed columns, which can also be done by a NULL-filled column and ON SELECT rule using a function. We could additionally require the column on which this is defined to have a "MUST BE NULL" constraint :) -- Hannu Krosing <hannu@skype.net>
Hannu Krosing <hannu@skype.net> writes: > I also often wish that this would be possible when someone adds a column > with a default value to a multi-million row table on a 24/7 production > system and insists on filling all existing columns with the default. > > A rule "ON SELECT FROM table_x WHERE col_x IS NULL return > col_x=default_for_col_x" would solve that nicely. Of course that only works if the reason they want to set fill the rows with the default value isn't precisely because NULL is a perfectly reasonable thing for the column to have (but not what they want for the existing rows). -- greg
On Sat, Oct 01, 2005 at 04:35:49PM +0200, Martijn van Oosterhout wrote: > On Sat, Oct 01, 2005 at 10:05:22AM -0400, mark@mark.mielke.cc wrote: > > It has the 'side or additional benefit' being requested here. The ability > > to filter the child table by some attribute. For example, if the child > > tables are used for partitioning, and the attribute were to keep a date > > range, the field restriction optimization could be used to automatically > > determine the set of tables to use for the date range specified. With > > such a change, it would even work automatically if the date ranges > > overlapped for some reason. Selecting a table name by date is hacky. This > > sort of solution would be a general solution to the problem. > This is what "Constraint Exclusion" does. It uses CHECK constraints on > a table to filter out tables that obviously don't apply to a query. Good point. I'll have to invent another use case. :-) > It's just the the specific case of "tableoid = XXX" is not supported > right now. Yes. This is what I was looking for Simon to admit. :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On P, 2005-10-02 at 01:24 -0400, Greg Stark wrote: > Hannu Krosing <hannu@skype.net> writes: > > > I also often wish that this would be possible when someone adds a column > > with a default value to a multi-million row table on a 24/7 production > > system and insists on filling all existing columns with the default. > > > > A rule "ON SELECT FROM table_x WHERE col_x IS NULL return > > col_x=default_for_col_x" would solve that nicely. > > Of course that only works if the reason they want to set fill the rows with > the default value isn't precisely because NULL is a perfectly reasonable thing > for the column to have (but not what they want for the existing rows). Sure. What would be needed for adding new colums with default filling would be some end-of-tuple marker or stored column count or tuple version nr, and then a rule (or just default behaviour) of showing default value for *missing* columns (colno > nr of stored columns). -- Hannu Krosing <hannu@skype.net>
On Sun, Oct 02, 2005 at 03:57:37PM +0300, Hannu Krosing wrote: > On P, 2005-10-02 at 01:24 -0400, Greg Stark wrote: > > Of course that only works if the reason they want to set fill the rows with > > the default value isn't precisely because NULL is a perfectly reasonable thing > > for the column to have (but not what they want for the existing rows). > > Sure. > > What would be needed for adding new colums with default filling would be > some end-of-tuple marker or stored column count or tuple version nr, and > then a rule (or just default behaviour) of showing default value for > *missing* columns (colno > nr of stored columns). Actually, PostgreSQL does know the number of columns in a tuple. It would be possible get change heap_getattr to return the default value. However, from a semantic point of view, it would be a bit strange. If you added a column, updated some rows then set a default, that default might end up applying to every row, except the ones you already modified. With careful coding you may be able to get around this. However, a good argument can be made that setting the DEFAULT for a column shouldn't change data anywhere. What about if I want to change the default for new values but not for old ones. That wouldn't work if the database starts adding values randomly, depending on when they are read... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > However, from a semantic point of view, it would be a bit strange. If > you added a column, updated some rows then set a default, that default > might end up applying to every row, except the ones you already > modified. With careful coding you may be able to get around this. I suppose the update would have to check for these new columns and fill in the default value when first updated. But what do you do if the DEFAULT expression isn't immutable. Volatile DEFAULT expressions are extremely common in databases... namely sequences. It's too bad though. An efficient way to add a new column with a default would clear up one of the missing feature of Postgres. Virtually every time I add a new column I have to do this ALTER TABLE, ALTER COLUMN SET DEFAULT, UPDATE TABLE, VACUUM FULL, ALTER COLUMN SET NOT NULL dance. It would be real nice if you could add a new NOT NULL column with a default with a single command and just move on. -- greg
If one defines a restriction such that 'COLUMN = VALUE' for a specific table, in a theoretical sort of model that completely ignores implementation difficulty, or changes to the restriction, I think it would be safe to not store COLUMN in the tuple. If the tuple is stored, then COLUMN = VALUE, so when fetched, the value will be VALUE. Back to the real world, this would be difficult to implement without treating the column special from the point of table creation, and preventing the restriction from being altered without re-building the table... :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Sun, Oct 02, 2005 at 11:51:27PM +0300, Hannu Krosing wrote: > Right. Actually the "default" value returned for missing columns should > be different from the default for new values and should be settable only > once, when adding the column, else issues would become really really > weird. Right, the only way I could really imagine it working is have a flag, attautodefault which if set would return the default instead of NULL. It would only ever be set if the ADD COLUMN SET DEFAULT happened together. But does this mean you have one magic default fixed for the life of the column and the actual default which can be changed anytime? Seems messy, though possible... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
On P, 2005-10-02 at 15:30 +0200, Martijn van Oosterhout wrote: > On Sun, Oct 02, 2005 at 03:57:37PM +0300, Hannu Krosing wrote: > > On P, 2005-10-02 at 01:24 -0400, Greg Stark wrote: > > > Of course that only works if the reason they want to set fill the rows with > > > the default value isn't precisely because NULL is a perfectly reasonable thing > > > for the column to have (but not what they want for the existing rows). > > > > Sure. > > > > What would be needed for adding new colums with default filling would be > > some end-of-tuple marker or stored column count or tuple version nr, and > > then a rule (or just default behaviour) of showing default value for > > *missing* columns (colno > nr of stored columns). > > Actually, PostgreSQL does know the number of columns in a tuple. It > would be possible get change heap_getattr to return the default value. Good. (I have not looked at tuple layout for a few years :) > However, from a semantic point of view, it would be a bit strange. If > you added a column, updated some rows then set a default, that default > might end up applying to every row, except the ones you already > modified. With careful coding you may be able to get around this. If I do the add column and set default in the same transaction, this should not be a problem. > However, a good argument can be made that setting the DEFAULT for a > column shouldn't change data anywhere. What about if I want to change > the default for new values but not for old ones. That wouldn't work if > the database starts adding values randomly, depending on when they are > read... Right. Actually the "default" value returned for missing columns should be different from the default for new values and should be settable only once, when adding the column, else issues would become really really weird. -- Hannu Krosing <hannu@skype.net>
Martijn van Oosterhout <kleptog@svana.org> writes: > On Sun, Oct 02, 2005 at 11:51:27PM +0300, Hannu Krosing wrote: > > Right. Actually the "default" value returned for missing columns should > > be different from the default for new values and should be settable only > > once, when adding the column, else issues would become really really > > weird. > > Right, the only way I could really imagine it working is have a flag, > attautodefault which if set would return the default instead of NULL. No, I think redefining NULL is a non-starter. This whole thing only has legs if Postgres can distinguish between a column that has never been set and a column that has NULL. Actually the only case where I don't see bizarre semantic consequences is the case of a newly created column. It would be nice to be able to do: ALTER TABLE ADD foo integer DEFAULT 1 And there's no question of what what the semantics of this are. No question of columns magically acquiring a value where they were NULL before. The *main* reason I would be happy about this is that it would let me add NOT NULL columns efficiently. I would expect NOT NULL to be allowed whenever a default is provided since there's obviously no way it can cause a problem. (I don't follow the "NULL is evil all the time" philosophy but I do try to set columns NOT NULL wherever I can. It always annoys me that when adding a new column I have to choose between a massive disk intensive batch job or compromising the design.) On the other hand if you do ALTER TABLE ADD foo integer and then later do ALTER TABLE ALTER foo SET DEFAULT 1 then there is a window where all those foos are NULL and then they magically become 1? That doesn't seem tenable. -- greg
Greg Stark <gsstark@mit.edu> writes: > It would be nice to be able to do: > ALTER TABLE ADD foo integer DEFAULT 1 > And there's no question of what what the semantics of this are. Sure, but you can only optimize this if the default expression is immutable... > On the other hand if you do > ALTER TABLE ADD foo integer > and then later do > ALTER TABLE ALTER foo SET DEFAULT 1 > then there is a window where all those foos are NULL and then they magically > become 1? That doesn't seem tenable. It'd also be contrary to the SQL spec, AFAICS. Here's another interesting case to think about: ALTER TABLE ADD foo integer DEFAULT 1...ALTER TABLE ALTER foo SET DEFAULT 2 You'll have to pay the table-traversal cost on one step or the other. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Here's another interesting case to think about: > > ALTER TABLE ADD foo integer DEFAULT 1 > ... > ALTER TABLE ALTER foo SET DEFAULT 2 > > You'll have to pay the table-traversal cost on one step or the other. A good point. I wonder if this could be solved by recording the xid of the ALTER transaction along with the default value. I imagine that might get complicated quickly though. -- greg
On P, 2005-10-02 at 22:49 -0400, Greg Stark wrote: > No, I think redefining NULL is a non-starter. This whole thing only has legs > if Postgres can distinguish between a column that has never been set and a > column that has NULL. > > Actually the only case where I don't see bizarre semantic consequences is the > case of a newly created column. It would be nice to be able to do: > > ALTER TABLE ADD foo integer DEFAULT 1 Probably a different syntax would be better here, perhaps ALTER TABLE ADD foo integer AS 1 WHEN MISSING; or somesuch. > And there's no question of what what the semantics of this are. No question of > columns magically acquiring a value where they were NULL before. > > The *main* reason I would be happy about this is that it would let me add NOT > NULL columns efficiently. I would expect NOT NULL to be allowed whenever a > default is provided since there's obviously no way it can cause a problem. > (I don't follow the "NULL is evil all the time" philosophy but I do try to set > columns NOT NULL wherever I can. It always annoys me that when adding a new > column I have to choose between a massive disk intensive batch job or > compromising the design.) > > > On the other hand if you do > > ALTER TABLE ADD foo integer > > and then later do > > ALTER TABLE ALTER foo SET DEFAULT 1 > > then there is a window where all those foos are NULL and then they magically > become 1? That doesn't seem tenable. Not if you require these two to happen in one transaction to affect added/missing values. -- Hannu Krosing <hannu@skype.net>
On P, 2005-10-02 at 23:00 -0400, Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: > > It would be nice to be able to do: > > ALTER TABLE ADD foo integer DEFAULT 1 > > And there's no question of what what the semantics of this are. > > Sure, but you can only optimize this if the default expression is > immutable... > > > On the other hand if you do > > ALTER TABLE ADD foo integer > > and then later do > > ALTER TABLE ALTER foo SET DEFAULT 1 > > then there is a window where all those foos are NULL and then they magically > > become 1? That doesn't seem tenable. > > It'd also be contrary to the SQL spec, AFAICS. > > Here's another interesting case to think about: > > ALTER TABLE ADD foo integer DEFAULT 1 > ... > ALTER TABLE ALTER foo SET DEFAULT 2 > > You'll have to pay the table-traversal cost on one step or the other. The second, ALTER ... SET DEFAULT, would only set default for newly inserted columns, not the ones which are missing due to tuples being created before the column existed. But completely different syntax may be more clear. ALTER TABLE ADD foo integer WITH DEFAULT 1; Or whatever -- Hannu Krosing <hannu@skype.net>
Hannu Krosing <hannu@skype.net> writes: > Probably a different syntax would be better here, perhaps > > ALTER TABLE ADD foo integer AS 1 WHEN MISSING; > > or somesuch. Uhm, if you're adding the column they're *all* "missing". That's the whole point. If you start inventing a new user-visible concept "missing" and try to distinguish it from NULL you're going to have a hell of a time defining the semantics. The goal has to be to provide the *exact* same user-visible semantics as actually setting the default. That means setting all the existing rows if you're adding a new column. It also unfortunately means tackling the much trickier gotcha that Tom raised about what happens if you want to later change the default. -- greg
Hannu Krosing <hannu@skype.net> writes: > On P, 2005-10-02 at 23:00 -0400, Tom Lane wrote: > > > > Here's another interesting case to think about: > > > > ALTER TABLE ADD foo integer DEFAULT 1 > > ... > > ALTER TABLE ALTER foo SET DEFAULT 2 > > > > You'll have to pay the table-traversal cost on one step or the other. > > The second, ALTER ... SET DEFAULT, would only set default for newly > inserted columns, not the ones which are missing due to tuples being > created before the column existed. Hm. So you're saying there are only ever exactly two types of defaults. The "initial" default that applies to all tuples that were created before the column was added. And the "current" default that only ever applies to newly created tuples. That does seem to cleanly close this hole. -- greg
On Mon, 2005-10-03 at 23:24 -0400, Greg Stark wrote: > Hannu Krosing <hannu@skype.net> writes: > > > On P, 2005-10-02 at 23:00 -0400, Tom Lane wrote: > > > > > > Here's another interesting case to think about: > > > > > > ALTER TABLE ADD foo integer DEFAULT 1 > > > ... > > > ALTER TABLE ALTER foo SET DEFAULT 2 > > > > > > You'll have to pay the table-traversal cost on one step or the other. > > > > The second, ALTER ... SET DEFAULT, would only set default for newly > > inserted columns, not the ones which are missing due to tuples being > > created before the column existed. > > Hm. So you're saying there are only ever exactly two types of defaults. The > "initial" default that applies to all tuples that were created before the > column was added. And the "current" default that only ever applies to newly > created tuples. > > That does seem to cleanly close this hole. I don't think so. ALTER TABLE tab ADD foo integer DEFAULT 1; INSERT INTO tab DEFAULT VALUES; ALTER TABLE tab ALTER foo SET DEFAULT 2 INSERT INTO tab DEFAULT VALUES; ALTER TABLE tab ALTER foo SET DEFAULT 3 INSERT INTO tab DEFAULT VALUES; SELECT foo FROM tab; --
On Mon, 2005-10-03 at 23:51 -0400, Rod Taylor wrote: > On Mon, 2005-10-03 at 23:24 -0400, Greg Stark wrote: > > Hannu Krosing <hannu@skype.net> writes: > > > > > On P, 2005-10-02 at 23:00 -0400, Tom Lane wrote: > > > > > > > > Here's another interesting case to think about: > > > > > > > > ALTER TABLE ADD foo integer DEFAULT 1 > > > > ... > > > > ALTER TABLE ALTER foo SET DEFAULT 2 > > > > > > > > You'll have to pay the table-traversal cost on one step or the other. > > > > > > The second, ALTER ... SET DEFAULT, would only set default for newly > > > inserted columns, not the ones which are missing due to tuples being > > > created before the column existed. > > > > Hm. So you're saying there are only ever exactly two types of defaults. The > > "initial" default that applies to all tuples that were created before the > > column was added. And the "current" default that only ever applies to newly > > created tuples. > > > > That does seem to cleanly close this hole. > > I don't think so. Ignore me. The thread seems to be about allowing fast addition of columns, not decreasing storage space. For some reason I was thinking of a bitmap like the NULL bitmap for compressing out all default values. --
Rod Taylor <pg@rbt.ca> writes: > > Hm. So you're saying there are only ever exactly two types of defaults. The > > "initial" default that applies to all tuples that were created before the > > column was added. And the "current" default that only ever applies to newly > > created tuples. > > > > That does seem to cleanly close this hole. > > I don't think so. > > ALTER TABLE tab ADD foo integer DEFAULT 1; > INSERT INTO tab DEFAULT VALUES; This inserts a physical "1" in the record (the "current" default"). > ALTER TABLE tab ALTER foo SET DEFAULT 2 > INSERT INTO tab DEFAULT VALUES; This inserts a physical "2" in the record. > ALTER TABLE tab ALTER foo SET DEFAULT 3 > INSERT INTO tab DEFAULT VALUES; This inserts a physical "3" in the record. > SELECT foo FROM tab; This checks for any old records that predate the column and use the "initial" default of 1 for those records. The three records above all postdate the column addition so they have values present, namely 1, 2, and 3. -- greg
Is there enough for a TODO here? On Mon, Oct 03, 2005 at 11:24:30PM -0400, Greg Stark wrote: > > Hannu Krosing <hannu@skype.net> writes: > > > On P, 2005-10-02 at 23:00 -0400, Tom Lane wrote: > > > > > > Here's another interesting case to think about: > > > > > > ALTER TABLE ADD foo integer DEFAULT 1 > > > ... > > > ALTER TABLE ALTER foo SET DEFAULT 2 > > > > > > You'll have to pay the table-traversal cost on one step or the other. > > > > The second, ALTER ... SET DEFAULT, would only set default for newly > > inserted columns, not the ones which are missing due to tuples being > > created before the column existed. > > Hm. So you're saying there are only ever exactly two types of defaults. The > "initial" default that applies to all tuples that were created before the > column was added. And the "current" default that only ever applies to newly > created tuples. > > That does seem to cleanly close this hole. > > -- > greg > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Sat, Oct 01, 2005 at 07:59:11PM +0100, Simon Riggs wrote: > On Sat, 2005-10-01 at 10:57 -0500, Jim C. Nasby wrote: > > To clarify, this is a hard-coded implementation of what I'm asking for: > > http://cvs.distributed.net/viewcvs.cgi/stats-sql/logdb/ in a nutshell: > > > > CREATE TABLE log_other ( > > project_id smallint NOT NULL > > ... > > ) > > > > CREATE TABLE log_8 ( > > -- No project_id > > ... > > ) > > CREATE TABLE log_24, log_25, log_5... > > CREATE VIEW log AS > > SELECT * FROM log_other > > UNION ALL SELECT 8 AS project_id, * FROM log_8 > > ... > > > > So the end result is that for cases where project_id is 5, 8, 24, or 25, > > the data will be stored in tables that don't have the project_id. > > > > If I were to use this on the main table for > > http://stats.distributed.net, which has ~130M rows, I would be able to > > save 130M*4 bytes (4 instead of 2 due to alignment), or 520MB. The logdb > > will have many times that number of rows, so the savings will be even > > larger. > > > > Note that this technique wouldn't help at all for something like date > > partitioning, because you have to store the date in the partitioned > > table. > > Jim, > > Your idea was noted before and actually; I mentioned it to show that I > listen and take note of ideas from any source. > > For everybody, I would note that the current behaviour is exactly the > way that List Partitioning works on other systems. > > The cost of this technique is only paid if you choose to partition on > something that you would not otherwise have included in your table. In > many cases, you'll choose a column that would have been in the table if > you created one big table so the additional cost is zero. Well, the idea is to be more space efficient than if one big table was used. This is unique to this class of partitioning problems. > In your example, I would expect to see project_id in a superclass table > and so there would be no cost. Superclass table? > The idea is neat, but IMHO the potential saving of this idea is not big > enough for me to prioritise that very highly over other items at this > time. Certainly. I only chimed in with a specific example so people could better understand what the idea was. I know it's on the list and might be addressed at some point. In the mean time it's not too horrible to hard-code a solution. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461