Thread: automatic system info tool?
Someone at the conference mentioned a tool that would portably and reliably report system info such as architecture. If someone has details I would like to have them, as it would help solve the buildfarm personality problem properly. cheers andrew
Andrew, > Someone at the conference mentioned a tool that would portably and > reliably report system info such as architecture. If someone has details > I would like to have them, as it would help solve the buildfarm > personality problem properly. There's potentially two, actually. A SF start-up is going to be open-sourcing a tool, soon, and Jim Nasby said that Distributed.net has code for this. -- Josh Berkus PostgreSQL @ Sun San Francisco
Andrew Dunstan wrote: > Someone at the conference mentioned a tool that would portably and > reliably report system info such as architecture. What's wrong with config.guess? -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: >Andrew Dunstan wrote: > > >>Someone at the conference mentioned a tool that would portably and >>reliably report system info such as architecture. >> >> > >What's wrong with config.guess? > > > That will probably be OK for architecture. We also classify buildfarm machines by <os, os_version, compiler, compiler_version> and config.guess doesn't give us that, unfortunately. However, I will look at canonicalising the architecture to the config.guess values reported. cheers andrew
-----Original Message----- From: "Andrew Dunstan" <andrew@dunslane.net> To: "Peter Eisentraut" <peter_e@gmx.net> Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org> Sent: 16/07/06 23:50 Subject: Re: [HACKERS] automatic system info tool? > We also classify buildfarm machines by <os, os_version, compiler, > compiler_version> and config.guess doesn't give us that, unfortunately. It also won't work on Win32 without msys (or SFU/Interix). /D
On Sun, Jul 16, 2006 at 06:49:26PM -0400, Andrew Dunstan wrote: > We also classify buildfarm machines by <os, os_version, compiler, > compiler_version> and config.guess doesn't give us that, unfortunately. It would seem to be a lot easier to use the values from perl itself, given you're already using it: # perl -MConfig -e 'print "os=$Config{osname}, osvers=$Config{osvers}, archname=$Config{archname}\n"' os=linux, osvers=2.6.15.4, archname=i486-linux-gnu-thread-multi If you look at perl -V it give you a subset of useful values, probably a lot nicer than munging config.guess. It'll even tell you the size of of the C datatypes if you're interested :) Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
I'm fairly familiar with it :-) The trouble is that it gives info set at the time perl was compiled, which doesn't help with the problem where a machine has been upgraded. For example, on this FC3 machine it reports a different kernel version from the one I have upgraded to, not surprisingly. So what I need if possible is a runtime tool to detect the info we need. cheers andrew Martijn van Oosterhout wrote: >On Sun, Jul 16, 2006 at 06:49:26PM -0400, Andrew Dunstan wrote: > > >>We also classify buildfarm machines by <os, os_version, compiler, >>compiler_version> and config.guess doesn't give us that, unfortunately. >> >> > >It would seem to be a lot easier to use the values from perl itself, >given you're already using it: > ># perl -MConfig -e 'print "os=$Config{osname}, osvers=$Config{osvers}, archname=$Config{archname}\n"' >os=linux, osvers=2.6.15.4, archname=i486-linux-gnu-thread-multi > >If you look at perl -V it give you a subset of useful values, probably >a lot nicer than munging config.guess. It'll even tell you the size of >of the C datatypes if you're interested :) > >Have a nice day, > >
On Mon, Jul 17, 2006 at 09:06:34AM -0400, Andrew Dunstan wrote: > > I'm fairly familiar with it :-) > > The trouble is that it gives info set at the time perl was compiled, > which doesn't help with the problem where a machine has been upgraded. > For example, on this FC3 machine it reports a different kernel version > from the one I have upgraded to, not surprisingly. Hmm. The osname and archname might still be useful, but the rest could be out of date. > So what I need if possible is a runtime tool to detect the info we need. On UNIX systems uname may work pretty well. But I guess each system may have slightly different options. What'll probably happen is that you end up with a big if() statement testing $Config{osname} wtih each case having specific code to determine the specifics. But for that you need information. How do you get the currently running release of windows for example? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
> > On UNIX systems uname may work pretty well. But I guess each > system may > have slightly different options. What'll probably happen is that you > end up with a big if() statement testing $Config{osname} wtih > each case > having specific code to determine the specifics. But for that you need > information. How do you get the currently running release of windows > for example? > If you can open a command shell you can get the OS version with the 'ver' command under Windows: C:\>ver Microsoft Windows XP [Version 5.1.2600] C:\> This should work on 2000 or later. (Windows 2000 is 5.0.something.) Regards, Paul
On Mon, Jul 17, 2006 at 11:06:50AM -0400, Bort, Paul wrote: > If you can open a command shell you can get the OS version with the > 'ver' command under Windows: > > C:\>ver > > Microsoft Windows XP [Version 5.1.2600] How do you do this from a program though. Under UNIX uname() is a function call as well as a program. It returns the os name, version, hostname and system type. Mind you, maybe perl provides emulation for uname? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Jul 17, 2006, at 12:57 PM, Martijn van Oosterhout wrote: > On Mon, Jul 17, 2006 at 11:06:50AM -0400, Bort, Paul wrote: >> If you can open a command shell you can get the OS version with the >> 'ver' command under Windows: >> >> C:\>ver >> >> Microsoft Windows XP [Version 5.1.2600] > > How do you do this from a program though. Under UNIX uname() is a > function call as well as a program. It returns the os name, version, > hostname and system type. > GetVersionEx() will get you the windows version, service pack, etc IIRC. Cheers, Steve
> > How do you do this from a program though. Under UNIX uname() is a > function call as well as a program. It returns the os name, version, > hostname and system type. > Multiple methods (TIMTOWTDI) depending on what you want: my $verstring = `cmd.exe /c ver`; # or use Win32; my ($string, $major, $minor, $build, $id ) = Win32::GetOSVersion; The environment variables PROCESSOR_ARCHITECTURE and PROCESSOR_IDENTIFIER should provide the basic hardware information. > Mind you, maybe perl provides emulation for uname? Not that I know of. > > Have a nice day, > -- > Martijn van Oosterhout <kleptog@svana.org> > http://svana.org/kleptog/ > > From each according to his ability. To each according to > his ability to litigate. >
On 7/18/06, Bort, Paul <pbort@tmwsystems.com> wrote: > > Mind you, maybe perl provides emulation for uname? > Not that I know of. Wouldn't $^0 and $Config{archname} cover quite a few, though?
> >> If you can open a command shell you can get the OS version with the > >> 'ver' command under Windows: > >> > >> C:\>ver > >> > >> Microsoft Windows XP [Version 5.1.2600] > > > > How do you do this from a program though. Under UNIX uname() is a > > function call as well as a program. It returns the os name, version, > > hostname and system type. > > > > GetVersionEx() will get you the windows version, service > pack, etc IIRC. in perl: use POSIX; print join(',',POSIX::uname()),"\n"; prints: Windows NT,hostname.domain.com,5.0,Build 2195 (Service Pack 4),x86 Works on all Platforms. (more detail on Win with: use Win32; join(' ', Win32::GetOSVersion()), "\n";) Andreas
Hi, I have posted a patch to the CVS head for on-disk bitmap index to pgsql-patches. If this can get in 8.2, that would be great. Any comments and suggestions are welcome. I still need to add several items: (1) README file in src/backend/access/bitmap. (2) Bitmap index documentation. (3) Hiding the internal btree. Also, I have disabled the multi-column index support because there is a known problem. Assume that there is a bitmap index on a and b. When a query predicate has only a, the current code may generate a wrong result. That's because the current code assumes that b is null. The ultimate problem is because the search code only handles one bitmap vector now. I need a fix to support manipulating multiple bitmap vectors. If you find any other problems, please let me know. Thanks, Jie
Andrej Ricnik-Bay wrote: > On 7/18/06, Bort, Paul <pbort@tmwsystems.com> wrote: > >> > Mind you, maybe perl provides emulation for uname? >> Not that I know of. > > Wouldn't $^0 and $Config{archname} cover quite a few, though? > No. As previously explained, these values reflect what was true when and where perl was compiled. In general, I need what is true at the time buildfarm is run. Anyway, I think we have some good possibilities. cheers andrew
On Mon, 17 Jul 2006, Jie Zhang wrote: > Hi, > > I have posted a patch to the CVS head for on-disk bitmap index to > pgsql-patches. If this can get in 8.2, that would be great. Any comments and > suggestions are welcome. > > I still need to add several items: > > (1) README file in src/backend/access/bitmap. > (2) Bitmap index documentation. > (3) Hiding the internal btree. > > Also, I have disabled the multi-column index support because there is a > known problem. Assume that there is a bitmap index on a and b. When a query > predicate has only a, the current code may generate a wrong result. That's > because the current code assumes that b is null. The ultimate problem is > because the search code only handles one bitmap vector now. I need a fix to > support manipulating multiple bitmap vectors. > > If you find any other problems, please let me know. Is anyone else looking at this patch? I am reviewing it with Jie, tidying it up and trying to solve some of the problems mentioned above. If anyone wants to take a look at us, please ask for an updated patch. Thanks, Gavin
Gavin Sherry <swm@linuxworld.com.au> writes: > Is anyone else looking at this patch? It's on my list of things to look at, but so are a lot of other patches ;-) A couple of comments after a *very* fast scan through the patch: * The xlog routines need help; they seem to not be updated for recent changes in the API for xlog recovery code. * The hacks on vacuum.c (where it tests the AM name) are utterly unacceptable. If you need the main code to do something special for a particular index AM, define a bool flag column for it in pg_am. * The interface to the existing executor bitmap scan code is mighty messy --- seems like there's a lot of almost duplicate code, a lot of rather invasive hacking, etc. This needs to be rethought and refactored. * Why in the world is it introducing duplicate copies of a lot of btree comparison functions? Use the ones that are there. * The patch itself is a mess: it introduces .orig and .rej files, changes around $PostgreSQL$ lines, etc. However, the main problem I've got with this is that a new index AM is a pretty large burden, and no one's made the slightest effort to sell pghackers on taking this on. What's the use-case, what's the performance like, where is it really an improvement over what we've got? regards, tom lane
On Sun, 23 Jul 2006, Tom Lane wrote: > Gavin Sherry <swm@linuxworld.com.au> writes: > > Is anyone else looking at this patch? > > It's on my list of things to look at, but so are a lot of other patches ;-) > > A couple of comments after a *very* fast scan through the patch: > > * The xlog routines need help; they seem to not be updated for recent > changes in the API for xlog recovery code. Yep. The patch was actually against 8.1 and was hastily brought up to 8.2. I think Jie's intention was to simply let everyone know that this was going on. > > * The hacks on vacuum.c (where it tests the AM name) are utterly > unacceptable. If you need the main code to do something special for a > particular index AM, define a bool flag column for it in pg_am. Yes. > > * The interface to the existing executor bitmap scan code is mighty > messy --- seems like there's a lot of almost duplicate code, a lot > of rather invasive hacking, etc. This needs to be rethought and > refactored. I agree. > > * Why in the world is it introducing duplicate copies of a lot > of btree comparison functions? Use the ones that are there. Yes, I raised this with Jie and she has fixed it. One thought is, we may want to rename those comparison functions prefixed with 'bm' to make their naming less confusing. They'll be used by btree, gin and bitmap index methods. Anyway, a seperate patch. > > * The patch itself is a mess: it introduces .orig and .rej files, > changes around $PostgreSQL$ lines, etc. > Right, not to mention patches to configure and a lot of style which needs to be knocked into shape. > However, the main problem I've got with this is that a new index AM is a > pretty large burden, and no one's made the slightest effort to sell > pghackers on taking this on. What's the use-case, what's the > performance like, where is it really an improvement over what we've got? For low cardinality sets, bitmaps greatly out perform btree. Jie and others at Greenplum tested this implementation (and others) against very large, low cardinality sets. I wasn't involved in it. I urge Jie and/or Luke to present those results. Thanks, Gavin
Gavin Sherry <swm@linuxworld.com.au> writes: > On Sun, 23 Jul 2006, Tom Lane wrote: >> However, the main problem I've got with this is that a new index AM is a >> pretty large burden, and no one's made the slightest effort to sell >> pghackers on taking this on. > For low cardinality sets, bitmaps greatly out perform btree. If the column is sufficiently low cardinality, you might as well just do a seqscan --- you'll be hitting most of the heap's pages anyway. I'm still waiting to be convinced that there's a sweet spot wide enough to justify supporting another index AM. (I'm also wondering whether this doesn't overlap the use-case for GIN.) regards, tom lane
Tom, On 7/23/06 5:25 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > If the column is sufficiently low cardinality, you might as well just do > a seqscan --- you'll be hitting most of the heap's pages anyway. I'm > still waiting to be convinced that there's a sweet spot wide enough to > justify supporting another index AM. (I'm also wondering whether this > doesn't overlap the use-case for GIN.) We presented them at the Postgres Anniversary summit talk (Bruce M. was there) and the reaction was let's get this into 8.2 because many people there (more than 5) really wanted to use it as a standard feature. A version of the slides with only the bitmap index results are located here: http://intranet.greenplum.com/bitmap-index-perf-ayush.ppt. - Luke
Thanks Tom and Gavin for your comments! Yes, this patch is generated against 8.2 in a short time. As long as the code is working, I post the patch to get some comments and help. >> >> * The xlog routines need help; they seem to not be updated for recent >> changes in the API for xlog recovery code. > > Yep. The patch was actually against 8.1 and was hastily brought up to 8.2. > I think Jie's intention was to simply let everyone know that this was > going on. Thanks for pointing this out. I didn't notice that these are changed in 8.2. > >> >> * The hacks on vacuum.c (where it tests the AM name) are utterly >> unacceptable. If you need the main code to do something special for a >> particular index AM, define a bool flag column for it in pg_am. > > Yes. Sounds good. > >> >> * The interface to the existing executor bitmap scan code is mighty >> messy --- seems like there's a lot of almost duplicate code, a lot >> of rather invasive hacking, etc. This needs to be rethought and >> refactored. > > I agree. I will think about this more. > >> >> * Why in the world is it introducing duplicate copies of a lot >> of btree comparison functions? Use the ones that are there. > > Yes, I raised this with Jie and she has fixed it. One thought is, we may > want to rename those comparison functions prefixed with 'bm' to make their > naming less confusing. They'll be used by btree, gin and bitmap index > methods. Anyway, a seperate patch. Yeah, the main problem I hesitated to use btree's comparison functions because of those function names starting with 'bt'. Since Gavin told me that Gin is using those functions as well, I had changed them. Renaming them would be good. > >> >> * The patch itself is a mess: it introduces .orig and .rej files, >> changes around $PostgreSQL$ lines, etc. >> > > Right, not to mention patches to configure and a lot of style which needs > to be knocked into shape. > The way I generate a patch is kind of clumsy. I need to find a better way to do that. I will start fixing these. Thanks, Jie
Ühel kenal päeval, P, 2006-07-23 kell 20:25, kirjutas Tom Lane: > Gavin Sherry <swm@linuxworld.com.au> writes: > > On Sun, 23 Jul 2006, Tom Lane wrote: > >> However, the main problem I've got with this is that a new index AM is a > >> pretty large burden, and no one's made the slightest effort to sell > >> pghackers on taking this on. > > > For low cardinality sets, bitmaps greatly out perform btree. > > If the column is sufficiently low cardinality, you might as well just do > a seqscan --- you'll be hitting most of the heap's pages anyway. I'm > still waiting to be convinced that there's a sweet spot wide enough to > justify supporting another index AM. (I'm also wondering whether this > doesn't overlap the use-case for GIN.) IIRC they quoted the cardinality of 10000 as something that is still faster than btree for several usecases. And also for AND-s of several indexes, where indexes are BIG, your btree indexes may be almost as big as tables but the resulting set of pages is small. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
On 7/24/06 6:59 AM, "Hannu Krosing" <hannu@skype.net> wrote: > Ühel kenal päeval, P, 2006-07-23 kell 20:25, kirjutas Tom Lane: >> Gavin Sherry <swm@linuxworld.com.au> writes: >>> On Sun, 23 Jul 2006, Tom Lane wrote: >>>> However, the main problem I've got with this is that a new index AM is a >>>> pretty large burden, and no one's made the slightest effort to sell >>>> pghackers on taking this on. >> >>> For low cardinality sets, bitmaps greatly out perform btree. >> >> If the column is sufficiently low cardinality, you might as well just do >> a seqscan --- you'll be hitting most of the heap's pages anyway. I'm >> still waiting to be convinced that there's a sweet spot wide enough to >> justify supporting another index AM. (I'm also wondering whether this >> doesn't overlap the use-case for GIN.) > > IIRC they quoted the cardinality of 10000 as something that is still > faster than btree for several usecases. > > And also for AND-s of several indexes, where indexes are BIG, your btree > indexes may be almost as big as tables but the resulting set of pages is > small. Yeah, Hannu points it out very well -- the bitmap index works very well when columns have low cardinalities and AND operations will produce small number of results. Also, the bitmap index is very small in low cardinality cases, where the btree tends to take up at least 10 times more space.
Jie Zhang wrote: > > IIRC they quoted the cardinality of 10000 as something that is still > > faster than btree for several usecases. > > > > And also for AND-s of several indexes, where indexes are BIG, your btree > > indexes may be almost as big as tables but the resulting set of pages is > > small. > > Yeah, Hannu points it out very well -- the bitmap index works very well when > columns have low cardinalities and AND operations will produce small number > of results. What operations on columns of low cardinality produce a small number of results? That seems contradictory. > Also, the bitmap index is very small in low cardinality cases, where the > btree tends to take up at least 10 times more space. Also, are adding/changing rows is more expensive with bitmaps than btrees? -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 7/24/06 6:04 PM, "Bruce Momjian" <bruce@momjian.us> wrote: > Jie Zhang wrote: >>> IIRC they quoted the cardinality of 10000 as something that is still >>> faster than btree for several usecases. >>> >>> And also for AND-s of several indexes, where indexes are BIG, your btree >>> indexes may be almost as big as tables but the resulting set of pages is >>> small. >> >> Yeah, Hannu points it out very well -- the bitmap index works very well when >> columns have low cardinalities and AND operations will produce small number >> of results. > > What operations on columns of low cardinality produce a small number of > results? That seems contradictory. Let's see an example. Table 'T' includes two columns, 'p' and 's'. The column 'p' has 100 distinct values, say p1-p100 and the column 'status' has 20 distinct values, say s1-s20. The query 'select * from order where priority=p1 and status=s1' may produce small number of results. Also, if these related rows are clustered together, that would be even better. > >> Also, the bitmap index is very small in low cardinality cases, where the >> btree tends to take up at least 10 times more space. > > Also, are adding/changing rows is more expensive with bitmaps than > btrees? Inserting a row will only affect the last word (at most last several words) of a bitmap vector, so this should not be very expensive: 3-4 IOs. When a row is updated and the new row is inserted in the middle of the heap, currently the code will update the bit in the place -- where the bit should be. Searching for the page which includes the bit to be updated is not very efficient now, but this can be fixed. Currently, we have to scan the pages for a bitmap vector one by one until we hit the right page. Since the bitmap vector is compressed, updating a bit in the middle may cause its page to overflow. In this case, we create a new page to accommodate those extra bits, and insert this new page right after the original page. Overall, inserting a row or updating a row can be done efficiently. But it is true that the bitmap index does not perform well if there are lots of inserts and updates, especially updates.
On Mon, 24 Jul 2006, Bruce Momjian wrote: > Jie Zhang wrote: > > > IIRC they quoted the cardinality of 10000 as something that is still > > > faster than btree for several usecases. > > > > > > And also for AND-s of several indexes, where indexes are BIG, your btree > > > indexes may be almost as big as tables but the resulting set of pages is > > > small. > > > > Yeah, Hannu points it out very well -- the bitmap index works very well when > > columns have low cardinalities and AND operations will produce small number > > of results. > > What operations on columns of low cardinality produce a small number of > results? That seems contradictory. WHERE a = 1 and b = 2 a = 1 may be 5% of the table and b = 2 may be 5% of the table but their intersection may be .001%. Luke: the URL you sent to the bitmap slides was internal to Greenplum. Would you be able to put them on a public site? Thanks, Gavin
On Mon, Jul 24, 2006 at 09:04:28PM -0400, Bruce Momjian wrote: > Jie Zhang wrote: > > > IIRC they quoted the cardinality of 10000 as something that is still > > > faster than btree for several usecases. > > > > > > And also for AND-s of several indexes, where indexes are BIG, your btree > > > indexes may be almost as big as tables but the resulting set of pages is > > > small. > > Yeah, Hannu points it out very well -- the bitmap index works very well > > when columns have low cardinalities and AND operations will produce small > > number of results. > What operations on columns of low cardinality produce a small number of > results? That seems contradictory. Not necessarily. Cardinality of 2 means 1/2. 3 means 1/3. 4 means 1/4. Etc. Reading 1/4, for a larger table, has a good chance of being faster than reading 4/4 of the table. :-) No opinion on whether such tables exist in the real world - but the concept itself seems sound. > > Also, the bitmap index is very small in low cardinality cases, where the > > btree tends to take up at least 10 times more space. > Also, are adding/changing rows is more expensive with bitmaps than > btrees? Without looking at the code, but having read the Oracle docs, I would guess yes. Every vacuum/delete would need to clear all of the bits for the row. Every insert/update would need to allocate a bit in at least the bitmap tree for the row inserted/updated. Seems like more pages may need to be written. Although perhaps some clever optimization would limit this. :-) It seems interesting though. We won't really know until we see the benchmarks. I'm seeing it as a form of working directly with the intermediate form of the bitmap index scanner. If the existing index scanner, building the bitmaps on the fly can out-perform the regular index scanner, I'm seeing potentially in a pre-built bitmap. Obviously, it isn't the answer to everything. The use I see for it, are a few of my 1:N object attribute tables. The table has an object identifier, and a column indicating the attribute type that the value is for. If I have 20 or more attribute type values, however, most objects include rows for most attribute types, my regular index ends up repeating the attribute type for every row. If I want to scan the table for all rows that have a particular attribute type with a particular value, it's a seqscan right now. With the bitmap scanner, knowing which rows to skip to immediately is readily available. Will it be worth it or not? I won't know until I try it. :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Jie Zhang wrote: > > On 7/24/06 6:59 AM, "Hannu Krosing" <hannu@skype.net> wrote: > >> >> >> And also for AND-s of several indexes, where indexes are BIG, your btree >> indexes may be almost as big as tables but the resulting set of pages is >> small. > > Yeah, Hannu points it out very well -- the bitmap index works very well when > columns have low cardinalities and AND operations will produce small number > of results. > > Also, the bitmap index is very small in low cardinality cases, where the > btree tends to take up at least 10 times more space. > > The smallness of the bitmap index also means that some queries will require much less work_mem to achieve good performance e.g consider: TPCH dataset with scale factor 10 on my usual PIII HW, query - select count(*) from lineitem where l_linenumber=1; This executes in about 100 seconds with work_mem = 20M if there is a bitmap index on l_linenumber. It takes 3832 seconds (!) if there is a btree index on the same column. Obviously cranking up work_mem will even up the difference (200M gets the btree to about 110 seconds), but being able to get good performance with less memory is a good thing! Cheers Mark
mark@mark.mielke.cc writes: > Reading 1/4, for a larger table, has a good chance of being faster than > reading 4/4 of the table. :-) Really? If you have to hit one tuple out of four, it's pretty much guaranteed that you will need to fetch every heap page. So using an index provides zero I/O savings on the heap side, and any fetches needed to read the index are pure cost. Now you have to demonstrate that the CPU costs involved in processing the index are significantly cheaper than the cost of just testing the WHERE qual at every heap tuple --- not a bet that's likely to win at a one-in-four ratio. > Will it be worth it or not? I won't know until I try it. :-) Agreed. regards, tom lane
On Tue, Jul 25, 2006 at 12:36:42AM -0400, Tom Lane wrote: > mark@mark.mielke.cc writes: > > Reading 1/4, for a larger table, has a good chance of being faster than > > reading 4/4 of the table. :-) > Really? > > If you have to hit one tuple out of four, it's pretty much guaranteed > that you will need to fetch every heap page. So using an index provides > zero I/O savings on the heap side, and any fetches needed to read the > index are pure cost. Now you have to demonstrate that the CPU costs > involved in processing the index are significantly cheaper than the cost > of just testing the WHERE qual at every heap tuple --- not a bet that's > likely to win at a one-in-four ratio. Haha. Of course - but that's assuming uniform spread of the values. Next I would try clustering the table on the bitmap index... :-) My databases aren't as large as many of yours. Most or all of them will fit in 1 Gbytes of RAM. The I/O cost isn't substantial for these, but the WHERE clause might be. But yeah - we don't know. Waste of code or performance boost. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Sun, Jul 23, 2006 at 05:35:37PM -0700, Luke Lonergan wrote: > We presented them at the Postgres Anniversary summit talk (Bruce M. was > there) and the reaction was let's get this into 8.2 because many people > there (more than 5) really wanted to use it as a standard feature. > > A version of the slides with only the bitmap index results are located here: > http://intranet.greenplum.com/bitmap-index-perf-ayush.ppt. 404 -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Ühel kenal päeval, T, 2006-07-25 kell 12:49, kirjutas Jim C. Nasby: > On Sun, Jul 23, 2006 at 05:35:37PM -0700, Luke Lonergan wrote: > > We presented them at the Postgres Anniversary summit talk (Bruce M. was > > there) and the reaction was let's get this into 8.2 because many people > > there (more than 5) really wanted to use it as a standard feature. > > > > A version of the slides with only the bitmap index results are located here: > > http://intranet.greenplum.com/bitmap-index-perf-ayush.ppt. > > 404 Strange. I can download it both from my work and home . -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Tom, > (I'm also wondering whether this > doesn't overlap the use-case for GIN.) It does not. GIN is strictly for multi-value fields. I can think of applications where either GIN or Bitmaps would be an option, but for the majority, they wouldn't. One particular compelling situation for on-disk bitmaps is for terabyte tables where a btree index would not fit into memory. Index performance for an index which is 10x or more the size of RAM really sucks ... I can come up with some test results if you doubt that. Also note that "low cardinality" is relative. For a 1 billion row table, a column with 10,000 values is "low-cardinality", having around 100,000 rows per value ... but that's still 0.01% of the table per value, making index use still applicable. --Josh
Josh Berkus <josh@agliodbs.com> writes: > One particular compelling situation for on-disk bitmaps is for terabyte > tables where a btree index would not fit into memory. Index > performance for an index which is 10x or more the size of RAM really > sucks ... I can come up with some test results if you doubt that. Sure... > Also note that "low cardinality" is relative. For a 1 billion row > table, a column with 10,000 values is "low-cardinality", having around > 100,000 rows per value ... but that's still 0.01% of the table per > value, making index use still applicable. And your point is? Assuming a reasonable datatype like int4, a btree index on this table would occupy say 20GB (16 bytes per entry plus fillfactor overhead). The bitmap version would require 10,000 bitmaps of 1G bits apiece, or 1250GB (not even counting overhead). You need some wildly optimistic assumptions about the compressibility of the bitmaps to get even within hailing distance of the btree, let alone fit in RAM. A realistic assumption for the numbers you mention is that the bitmaps have 1-bits about 100 bits apart, which means you could get maybe 3-to-1 compression using the runlength scheme that's in there ... leaving the bitmap a factor of 20 bigger than the btree. AFAICS "low cardinality" has to mean just that, a few dozen distinct values at most, for this scheme to have any hope. regards, tom lane
Tom Lane wrote: > mark@mark.mielke.cc writes: >> Reading 1/4, for a larger table, has a good chance of being faster than >> reading 4/4 of the table. :-) > > Really? > > If you have to hit one tuple out of four, it's pretty much guaranteed > that you will need to fetch every heap page. I think it's not uncommon to have data that is more clustered than expected. In my case it shows up often in GIS data; where often a query that only accesses 1/4 of the rows in the table really will only access about 1/4 of the pages in the table -- for example when analyzing Texas or the Southwest US.
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > Tom Lane wrote: >> If you have to hit one tuple out of four, it's pretty much guaranteed >> that you will need to fetch every heap page. > I think it's not uncommon to have data that is more clustered than expected. Sure, if the data is fairly static ... and now that there's a fillfactor option for heaps, you can even tolerate (some low rate of) updates without losing the clusteredness. However, that benefit accrues to all index types not only bitmap. regards, tom lane
Tom Lane wrote: > > And your point is? Assuming a reasonable datatype like int4, a btree > index on this table would occupy say 20GB (16 bytes per entry plus > fillfactor overhead). The bitmap version would require 10,000 bitmaps > of 1G bits apiece, or 1250GB (not even counting overhead). You need > some wildly optimistic assumptions about the compressibility of the > bitmaps to get even within hailing distance of the btree, let alone > fit in RAM. A realistic assumption for the numbers you mention is > that the bitmaps have 1-bits about 100 bits apart, which means you > could get maybe 3-to-1 compression using the runlength scheme that's > in there ... leaving the bitmap a factor of 20 bigger than the btree. > > AFAICS "low cardinality" has to mean just that, a few dozen distinct > values at most, for this scheme to have any hope. > I did a little testing of this when Jie first submitted the patch - for a basically Zipfian distribution of int4's the bitmap is still clearly smaller even at 1000 distinct values (see below). It is twice as big at 10000, so the crossover point is somewhere in the 1000-10000 range (for this test - however the results seem to be reasonably typical). In DSS distinct values < 1000 is very common (days of week, months of year, lineitems of order, sex of animal...) so the applicability of this range is perhaps larger than it seems at first sight. Cheers Mark ------------------------------------------------------------------------ bitmap=# \d bitmaptest Table "public.bitmaptest" Column | Type | Modifiers --------+---------+----------- id | integer | not null val0 | integer | val1 | integer | val2 | integer | val3 | integer | fil | text | bitmap=# select count(distinct val0), count(distinct val1), count(distinct val2), count(distinct val3) from bitmaptest; count | count | count | count -------+-------+-------+------- 10 | 100 | 1000 | 10000 bitmap=# \i relsizes.sql (BTREE) relname | relpages -----------------+---------- bitmaptest | 93458 bitmaptest_val0 | 21899 bitmaptest_val1 | 21899 bitmaptest_val2| 21899 bitmaptest_val3 | 21899 bitmap=# \i relsizes.sql (BITMAP) relname | relpages -----------------+---------- bitmaptest | 93458 bitmaptest_val0 | 1286 bitmaptest_val1 | 2313 bitmaptest_val2| 5726 bitmaptest_val3 | 41292 For the interested, the data generator, schema, index files are here: http://homepages.paradise.net.nz/markir/download/bizgres/bitmaptest.tar.gz
I wrote: >> ... A realistic assumption for the numbers you mention is >> that the bitmaps have 1-bits about 100 bits apart, which means you >> could get maybe 3-to-1 compression using the runlength scheme that's >> in there ... leaving the bitmap a factor of 20 bigger than the btree. I'm surprised no one caught me making this bogus computation. I realized this morning it's wrong: if there are 10000 distinct values then on the average the 1-bits would be about 10000 bits apart, not 100. At that rate there *is* substantial compression. The representation used in the bitmap patch isn't amazingly efficient for this (I wonder whether they oughtn't use 16-bit instead of 8-bit HRL_WORDs) but it still manages to get down to something like 11 bytes per 1-bit. Since each 1-bit corresponds to a btree entry, this means the bitmap does come out somewhat smaller than a btree (16 bytes per entry ignoring overhead) --- at least if overhead such as wasted space on a page doesn't kill it. Still, this isn't going to make the difference between fitting in RAM or not. For small numbers of distinct values, like less than 100, the bitmap representation looks more plausible. In that range you're down to a byte or two per entry and so there is (very roughly) a 10x storage savings over btree. I don't believe the 100x numbers that have been bandied around in this discussion, but 10x is plenty enough to be interesting. regards, tom lane
Tom Lane wrote: > > > I'm surprised no one caught me making this bogus computation. I > realized this morning it's wrong: if there are 10000 distinct values > then on the average the 1-bits would be about 10000 bits apart, not 100. Right - I didn't think 10000 was *that* bad, but was too sleepy to try working it out :-). > > > I don't believe the 100x numbers that have been > bandied around in this discussion, but 10x is plenty enough to be > interesting. > Yep - I have not managed to get 100x in any of my tests. However, I do see some about half that for the TPCH scale 10 dataset: tpch=# \i relsizes.sql (BTREE) relname | relpages ------------------------+---------- customer | 41019 customer_c_custkey | 3288 customer_c_mktsegment | 5779 customer_c_nationkey | 3288 lineitem | 1535724 lineitem_l_linenumber | 131347 lineitem_l_orderkey | 131347 orders | 307567 orders_o_custkey | 32847 orders_o_orderpriority | 65876 orders_o_orderstatus | 41131 tpch=# \i relsizes.sql (MAINLY BITMAP) relname | relpages ------------------------+---------- customer | 41019 customer_c_custkey | 3288 customer_c_mktsegment | 157 customer_c_nationkey | 336 lineitem | 1535724 lineitem_l_linenumber | 7571 lineitem_l_orderkey | 131347 orders | 307567 orders_o_custkey | 32847 orders_o_orderpriority | 1427 orders_o_orderstatus | 717 The orders_o_orderpriority and orders_o_orderstatus bitmap indexes are 46 and 57 times smaller than their btree counterparts (hmm...might we see even better compression for larger scale factors?). An obvious deduction is that the TPCH dataset is much more amenable to run compression than my synthetic Zipfian data was. The interesting question is how well "real" datasets are run compressable, I suspect "better than my Zipfian data" is a safe assumption! Cheers Mark
Tom, On 7/26/06 7:26 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > I wonder > whether they oughtn't use 16-bit instead of 8-bit HRL_WORDs We tried variations from 8-bit to 32-bit and came to the conclusion that 8-bit was a better match for the more general case. For the moment I forget exactly why (maybe Jie or Ayush will recall). It's a #define I believe, so you can play with it to find out. There's a lot more to be done with this - I think we've got some big gains ahead. BTW - lots of excitement here at OSCON about bitmap index, and there's a fellow who is using the current Bizgres version, but wants *higher* cardinality support, so I discussed Jie's thoughts about implementing binning on values, basically combining bit vectors into value bins as the cardinality increases. I am still puzzled why our index creation time increases so dramatically as we approach cardinality 10,000. I know that we found that index page traversal for append began to take a lot of time as we started to increase the number of bitvector pages - we had talked about aggregating the append operations into groups before they were implemented to sequentialize the access. - Luke
On 7/26/06 8:55 PM, "Luke Lonergan" <llonergan@greenplum.com> wrote: > Tom, > > On 7/26/06 7:26 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > >> I wonder >> whether they oughtn't use 16-bit instead of 8-bit HRL_WORDs > > We tried variations from 8-bit to 32-bit and came to the conclusion that > 8-bit was a better match for the more general case. For the moment I forget > exactly why (maybe Jie or Ayush will recall). It's a #define I believe, so > you can play with it to find out. > > There's a lot more to be done with this - I think we've got some big gains > ahead. For low-cardinality cases, 8-bit word size is usually better than 16-bit and 32-bit. The reason is that in this case, the compression is better in 8-bit than in 16-bit or 32-bit. But for high-cardinality cases, like 10,000, 16-bit is better. That's because a compressed 8-bit can only represent at most 2^7*8=1024 bits. So every 10,000 bits requires about 10 bytes. However, a compressed 16-bit can represent 2^15*16=524288 bits. We only need 2 bytes. I have been thinking about this recently -- maybe we can support different word sizes for different cardinalities. > > BTW - lots of excitement here at OSCON about bitmap index, and there's a > fellow who is using the current Bizgres version, but wants *higher* > cardinality support, so I discussed Jie's thoughts about implementing > binning on values, basically combining bit vectors into value bins as the > cardinality increases. Yes, that's the basic idea. Essentially we want to limit the number of bins into an ideal value so that (1) the size of the bitmap index can be small; (2) this will still guarantee good query performance. The details still need to be working out. > > I am still puzzled why our index creation time increases so dramatically as > we approach cardinality 10,000. I know that we found that index page > traversal for append began to take a lot of time as we started to increase > the number of bitvector pages - we had talked about aggregating the append > operations into groups before they were implemented to sequentialize the > access. > I believe that this is because of 8-bit word size. We can try to increase it to 16-bit, and see how the result looks like. Thanks, Jie
Mark Kirkwood <markir@paradise.net.nz> writes: > An obvious deduction is that the TPCH dataset is much more amenable to > run compression than my synthetic Zipfian data was. The interesting > question is how well "real" datasets are run compressable, Yeah --- the back-of-the-envelope calculations I was making presupposed uniform random distribution, and we know that's often not realistic for real datasets. A nonuniform distribution would probably mean that some of the bitmaps compress better-than-expected and others worse. I have no idea how to model that and guess what the overall result is ... regards, tom lane
On 7/26/06 10:14 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Mark Kirkwood <markir@paradise.net.nz> writes: >> An obvious deduction is that the TPCH dataset is much more amenable to >> run compression than my synthetic Zipfian data was. The interesting >> question is how well "real" datasets are run compressable, > > Yeah --- the back-of-the-envelope calculations I was making presupposed > uniform random distribution, and we know that's often not realistic for > real datasets. A nonuniform distribution would probably mean that some > of the bitmaps compress better-than-expected and others worse. I have > no idea how to model that and guess what the overall result is ... > The paper "Optimizing Bitmap Indices With Efficient Compression" by Kesheng Wu et al gave an approximate answer for this question. Assume that there are c distinct values. Let the i-th value has a probability of p_i, the number of rows r, and the word size w. then the total size of the compressed bitmap index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in both \sum's, i is from 1 to c. The constraint for this equation is \sum(p_i)=1. Therefore, when all p_i are equal, or the attribute has randomly distributed values, the size of the bitmap index is the largest.
"Jie Zhang" <jzhang@greenplum.com> writes: > On 7/26/06 10:14 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> ... A nonuniform distribution would probably mean that some >> of the bitmaps compress better-than-expected and others worse. I have >> no idea how to model that and guess what the overall result is ... > The paper "Optimizing Bitmap Indices With Efficient Compression" by Kesheng > Wu et al gave an approximate answer for this question. Assume that there are > c distinct values. Let the i-th value has a probability of p_i, the number > of rows r, and the word size w. then the total size of the compressed bitmap > index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in > both \sum's, i is from 1 to c. Hm, but that's still begging the question no? It's still assuming that any one value is uniformly distributed. ISTM the cases that would break my simplistic calculation involve clustering of particular values, such that some areas of the bitmap are all-zero while other areas have lots of ones. regards, tom lane
On 7/26/06 11:50 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > "Jie Zhang" <jzhang@greenplum.com> writes: >> On 7/26/06 10:14 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >>> ... A nonuniform distribution would probably mean that some >>> of the bitmaps compress better-than-expected and others worse. I have >>> no idea how to model that and guess what the overall result is ... > >> The paper "Optimizing Bitmap Indices With Efficient Compression" by Kesheng >> Wu et al gave an approximate answer for this question. Assume that there are >> c distinct values. Let the i-th value has a probability of p_i, the number >> of rows r, and the word size w. then the total size of the compressed bitmap >> index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in >> both \sum's, i is from 1 to c. > > Hm, but that's still begging the question no? It's still assuming that > any one value is uniformly distributed. ISTM the cases that would break > my simplistic calculation involve clustering of particular values, such > that some areas of the bitmap are all-zero while other areas have lots > of ones. Yes, you are right -- each value is still uniformly distributed. But this will be the worst case in terms of the size of a bitmap vector. As for how to model the size of a bitmap vector for an non-uniformly distributed value, that's a good question. I don't really know. But we do know the best case and the worse case.
On Thu, Jul 27, 2006 at 09:13:21AM -0700, Jie Zhang wrote: > On 7/26/06 11:50 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > > "Jie Zhang" <jzhang@greenplum.com> writes: > >> On 7/26/06 10:14 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > >>> ... A nonuniform distribution would probably mean that some > >>> of the bitmaps compress better-than-expected and others worse. I have > >>> no idea how to model that and guess what the overall result is ... > > > >> The paper "Optimizing Bitmap Indices With Efficient Compression" by Kesheng > >> Wu et al gave an approximate answer for this question. Assume that there are > >> c distinct values. Let the i-th value has a probability of p_i, the number > >> of rows r, and the word size w. then the total size of the compressed bitmap > >> index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in > >> both \sum's, i is from 1 to c. > > > > Hm, but that's still begging the question no? It's still assuming that > > any one value is uniformly distributed. ISTM the cases that would break > > my simplistic calculation involve clustering of particular values, such > > that some areas of the bitmap are all-zero while other areas have lots > > of ones. > > Yes, you are right -- each value is still uniformly distributed. But this > will be the worst case in terms of the size of a bitmap vector. As for how > to model the size of a bitmap vector for an non-uniformly distributed value, > that's a good question. I don't really know. But we do know the best case > and the worse case. If the usefulness of bitmap indexes is still in doubt, could someone at Greenplum provide data from actual data warehouses from actual customers? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim, On 7/28/06 10:17 AM, "Jim C. Nasby" <jnasby@pervasive.com> wrote: > If the usefulness of bitmap indexes is still in doubt, could someone at > Greenplum provide data from actual data warehouses from actual > customers? First, is anyone in doubt? - Luke
Luke Lonergan wrote: > Jim, > > On 7/28/06 10:17 AM, "Jim C. Nasby" <jnasby@pervasive.com> wrote: > > > If the usefulness of bitmap indexes is still in doubt, could someone at > > Greenplum provide data from actual data warehouses from actual > > customers? > > First, is anyone in doubt? Sure. I think we are going to have to see the final patch and have users test it with their workload to find the useful range. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +