Thread: Stats Collector Error 7.4beta1 and 7.4beta2
I was attempting to get pg_autovacuum to work on my database and after much hammering at it I discovered that the stats systemwas not working. I tried it with both 7.4beta1 and 7.4beta2 in both cases the number of tuples inserted, deleted andupdated remained at 0 no matter what database activity occured. Matthew T. O'Connor looked at my system recompiled postgres, and checked my config files and was unable to solve my problem. I am using a default install of RedHat 9.0 on a VIA Samuel 2 processor. Anyone have any ideas what could be causing my problem? I have had hardware problems in the past but this is a new machine that I ran memtest86 and badblocks with the destructivetests on before I installed so I'm pretty sure thats not the issue. Additionally I made a copy of the hard driveand placed it in another server and saw the same thing happen. I did not have this problem with 7.3.3 so I think it has something to do with 7.4. --- Adam Kavan--- akavan@cox.net --- Adam Kavan --- American Amuesments --- akavan@cox.net --- 402-499-5145
Adam Kavan <akavan@cox.net> writes: > I was attempting to get pg_autovacuum to work on my database and after > much hammering at it I discovered that the stats system was not > working. Does 'ps' show that the stats collector and stats buffer postmaster child processes are alive? Are there any suggestive complaints in the postmaster's log? regards, tom lane
On Wed, 2003-09-03 at 23:50, Tom Lane wrote: > Does 'ps' show that the stats collector and stats buffer postmaster > child processes are alive? Are there any suggestive complaints in > the postmaster's log? As Adam mentioned, I took a look at his system since the initial report was about a problem with pg_autovacuum. Anyway, Yes ps shows the two stats collector related processes running, and no the log files don't show anything helpful, however I didn't try to change any logging settings. Initially I saw an error in the logs about an IPv6 address error but after I recompiled everthing with a simple ./configure --prefix=/home/user/somethingelse/ I didn't get the IPv6 error in the logs anymore. Short answer is I have no idea why this is happening, and I didn't see any obvious configuration problems that might cause this (make check passed all tests). Tom, Adam was able to give me a login to his machine, maybe he would do the same for you. Anyway, that is all I was able to see, hence the call for more help on hackers :-) Matthew
"Matthew T. O'Connor" <matthew@zeut.net> writes: > ... Initially I saw an error in the logs about an IPv6 address > error but after I recompiled everthing with a simple ./configure > --prefix=/home/user/somethingelse/ I didn't get the IPv6 error in the > logs anymore. Hm. Could it be an IPv6 issue --- that is, the stats collector is alive and faithfully listening on some UDP port, but it's not the same port the backends try to send to? Given the discussion over the past couple of days about bizarre interpretations of loopback addresses in pg_hba.conf, I could sure believe there's some similar kind of issue for the stats collector. regards, tom lane
On Thu, 2003-09-04 at 01:23, Tom Lane wrote: > Hm. Could it be an IPv6 issue --- that is, the stats collector is alive > and faithfully listening on some UDP port, but it's not the same port > the backends try to send to? Given the discussion over the past couple > of days about bizarre interpretations of loopback addresses in > pg_hba.conf, I could sure believe there's some similar kind of issue for > the stats collector. I had a similar thought, but I have no idea how I would verify this. The thing is, when I recompiled postgresql myself, I left pg_hba.conf at default settings, and it's running on RH9, which I am running and have not had a problem with...
Tom Lane wrote: > "Matthew T. O'Connor" <matthew@zeut.net> writes: > > ... Initially I saw an error in the logs about an IPv6 address > > error but after I recompiled everthing with a simple ./configure > > --prefix=/home/user/somethingelse/ I didn't get the IPv6 error in the > > logs anymore. > > Hm. Could it be an IPv6 issue --- that is, the stats collector is alive > and faithfully listening on some UDP port, but it's not the same port > the backends try to send to? Given the discussion over the past couple > of days about bizarre interpretations of loopback addresses in > pg_hba.conf, I could sure believe there's some similar kind of issue for > the stats collector. Doesn't the stats collector use unix domain sockets, not IP? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Doesn't the stats collector use unix domain sockets, not IP? No. IIRC, we deliberately chose IP/UDP because it had buffering behavior we liked. There are pipes involved in the stats stuff too, but the weak link in my mind is the backend-to-stats-buffer-process hop, which is UDP. regards, tom lane
On Thu, 4 Sep 2003, Bruce Momjian wrote: > Tom Lane wrote: > > "Matthew T. O'Connor" <matthew@zeut.net> writes: > > > ... Initially I saw an error in the logs about an IPv6 address > > > error but after I recompiled everthing with a simple ./configure > > > --prefix=/home/user/somethingelse/ I didn't get the IPv6 error in the > > > logs anymore. > > > > Hm. Could it be an IPv6 issue --- that is, the stats collector is alive > > and faithfully listening on some UDP port, but it's not the same port > > the backends try to send to? Given the discussion over the past couple > > of days about bizarre interpretations of loopback addresses in > > pg_hba.conf, I could sure believe there's some similar kind of issue for > > the stats collector. > > Doesn't the stats collector use unix domain sockets, not IP? Nup. for (addr = addrs; addr; addr = addr->ai_next) { #ifdef HAVE_UNIX_SOCKETS /* Ignore AF_UNIX sockets, if any are returned. */ if (addr->ai_family == AF_UNIX) continue; #endif if ((pgStatSock = socket(addr->ai_family, SOCK_DGRAM, 0)) >= 0) break; } I thing I haven't seen asked: is there a packet filter blocking local<->local UDP traffic by any chance? Thanks, Gavin
> >I thing I haven't seen asked: is there a packet filter blocking >local<->local UDP traffic by any chance? Iptables is set to accept everything. If it would help I can give you all log in information to poke around yourselves. I appreciate your help. --- Adam Kavan --- akavan@cox.net
On Thu, Sep 04, 2003 at 01:39:04AM -0400, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Doesn't the stats collector use unix domain sockets, not IP? > > No. IIRC, we deliberately chose IP/UDP because it had buffering > behavior we liked. Once you said it was because not all platforms have unix domain sockets. I asked why we weren't using something like socketpair(). Kurt
Kurt Roeckx wrote: > On Thu, Sep 04, 2003 at 01:39:04AM -0400, Tom Lane wrote: >> Bruce Momjian <pgman@candle.pha.pa.us> writes: >> > Doesn't the stats collector use unix domain sockets, not IP? >> >> No. IIRC, we deliberately chose IP/UDP because it had buffering >> behavior we liked. > > Once you said it was because not all platforms have unix domain > sockets. I asked why we weren't using something like > socketpair(). > The reason to use INET UDP is that this is the only connection type that simply drops packets if the stupid collector daemon isn't able to keep up with the traffic. Think of a 64 processor SMP machine where 60 backends utilize their own CPU and the poor little collector get's burried in packets, you don't want it to slow down the whole system, do you? And I agree with Tom that it is very likely that the IPV4/IPV6 stuff is the reason. IIRC the postmaster creates the socket and noone ever does bind(2) on it - so it uses it's dynamically assigned port number. Both, the collector and the backends inherit that socket via fork(2). The backends use this socket with it's own sockname to send the stats out, and the collector reads it with recvfrom(2) and verifies that the from address is identical to it's sockname ... that way noone can inject faked stat packets. Now this is a lot of sockname usage that could lead to either the packets not arriving in the collector, or being thrown away by the collector because of failing to see them coming from itself. Jan > > Kurt > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Kurt Roeckx wrote: > It could be useful to have a warning at the following line: > > if (memcmp(&fromaddr, &pgStatAddr, fromlen)) > continue; > > That way you can rule out that that is a problem. > > Anyway, I still didn't see the error message he got in the first > place. Maybe we're looking at the wrong thing? I think it's more this piece of code in postmaster/pgstat.c /* * The source address of the packet must be our own socket. * This ensures that only real hackersor our own backends * tell us something. (This should be redundant with a * kernel-level check dueto having used connect(), but let's * do it anyway.) */ if (memcmp(&fromaddr, &pgStatAddr, fromlen)) continue; Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Thu, Sep 04, 2003 at 05:01:54PM -0400, Jan Wieck wrote: > > > Kurt Roeckx wrote: > > > >It could be useful to have a warning at the following line: > > > > if (memcmp(&fromaddr, &pgStatAddr, fromlen)) > > continue; > > > >That way you can rule out that that is a problem. > > > >Anyway, I still didn't see the error message he got in the first > >place. Maybe we're looking at the wrong thing? > > I think it's more this piece of code in postmaster/pgstat.c And what do you think I pasted? Kurt
On Thu, Sep 04, 2003 at 04:04:38PM -0400, Jan Wieck wrote: > > And I agree with Tom that it is very likely that the IPV4/IPV6 stuff is > the reason. IIRC the postmaster creates the socket and noone ever does > bind(2) on it - so it uses it's dynamically assigned port number. Both, > the collector and the backends inherit that socket via fork(2). Actually, it does a bind (to localhost), but send the port to 0, so it gets the random port. Then it connects to itself. I don't get the logic behind that howver. It does: pgStatSock = socket(...); bind(pgStatSock, ...); getsockname(pgStatSock, ...); connect(pgStatSock, ...); So it creates a socket, binds to it, asks what address/port it's bound to, and connects to that port. I don't see the logic behind that connect(), how it can work, and how it would block anybody from sending to it, but it seems to work. > The > backends use this socket with it's own sockname to send the stats out, > and the collector reads it with recvfrom(2) and verifies that the from > address is identical to it's sockname ... that way noone can inject > faked stat packets. Now this is a lot of sockname usage that could lead > to either the packets not arriving in the collector, or being thrown > away by the collector because of failing to see them coming from itself. I'm trying to think about some kernel bug that sends packets using the wrong source address ..., but I think that was connecting to a local address it always showed the loopback address. It could be useful to have a warning at the following line: if (memcmp(&fromaddr, &pgStatAddr, fromlen)) continue; That way you can rule out that that is a problem. Anyway, I still didn't see the error message he got in the first place. Maybe we're looking at the wrong thing? Kurt
> >It could be useful to have a warning at the following line: > > if (memcmp(&fromaddr, &pgStatAddr, fromlen)) > continue; > >That way you can rule out that that is a problem. > >Anyway, I still didn't see the error message he got in the first >place. Maybe we're looking at the wrong thing? > > >Kurt This is the very line that is giving me problems. I commented it out and recompiled and now the stats system works. Of course I have to assume that its bad to go around with out that check... --- Adam Kavan --- akavan@cox.net
Adam Kavan <akavan@cox.net> writes: >> if (memcmp(&fromaddr, &pgStatAddr, fromlen)) >> continue; > This is the very line that is giving me problems. I commented it out and > recompiled and now the stats system works. Of course I have to assume that > its bad to go around with out that check... Hmm. Could you look and see what the actual values are in each address? regards, tom lane
At 06:49 PM 9/4/03 -0400, Tom Lane wrote: >Hmm. Could you look and see what the actual values are in each address? > > regards, tom lane I don't really know the layout of these structures so I dumped them to a file and attached them. The first 16 bytes is from fromaddr and the second is from pgStatAddr. --- Adam Kavan --- akavan@cox.net
Attachment
Kurt Roeckx wrote: > On Thu, Sep 04, 2003 at 05:01:54PM -0400, Jan Wieck wrote: >> >> >> Kurt Roeckx wrote: >> >> >> >It could be useful to have a warning at the following line: >> > >> > if (memcmp(&fromaddr, &pgStatAddr, fromlen)) >> > continue; >> > >> >That way you can rule out that that is a problem. >> > >> >Anyway, I still didn't see the error message he got in the first >> >place. Maybe we're looking at the wrong thing? >> >> I think it's more this piece of code in postmaster/pgstat.c > > And what do you think I pasted? Hmmm ... good question ... How can I know what I think before I read what I write? Jan :-) > > > Kurt -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
They are both structures of type sockaddr_in (sin_family 2 is AF_INET whereas sin_family 10 would've been AF_INET6), and all relevant fields of the structure look the same to me. The problem lies in the padding bytes that make sockaddr_in the same size as sockaddr. Since the static structure pgStatAddr is supposed to be initialized to nul bytes by the compiler and now does not contain those in the padding area, my guess would be that getsockaddr() is actually writing garbage into that padding area. This is a nasty change, as one cannot compare two addresses for equalness with memcmp() any more just because of sloppy programming in the IP stack. Well, the correct fix would be to compare only the relevant parts of the addresses, depending on the address family type. I personally wouldn't worry too much about removing the check entirely. If you got a hacker wasting his time and bandwidth with screwing up your statistic collector daemon by sending faked UDP packets to some guessed port number (it's only visible in the netstat output on your local machine), I think he's done with all the rest of his TODO for the day and you'll soon face other problems than that. Jan Adam Kavan wrote: > At 06:49 PM 9/4/03 -0400, Tom Lane wrote: >>Hmm. Could you look and see what the actual values are in each address? >> >> regards, tom lane > > I don't really know the layout of these structures so I dumped them to a > file and attached them. The first 16 bytes is from fromaddr and the second > is from pgStatAddr. > > --- Adam Kavan > --- akavan@cox.net -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Adam Kavan <akavan@cox.net> writes: > I don't really know the layout of these structures so I dumped them to a > file and attached them. The first 16 bytes is from fromaddr and the second > is from pgStatAddr. More legibly: 0000000 0200 8016 7f00 0001 0000 0000 0000 0000 0000010 0200 8016 7f00 0001 0000 0000 f001 0000 The 7f000001 is the IP loopback address, sure enough. I wonder what the f001 (or it might be little-endian 01f0) is. regards, tom lane
Kurt Roeckx <Q@ping.be> writes: > Then it connects to itself. I don't get the logic behind that > howver. At least on HPUX, the connect(2) man page saith If the socket is of type SOCK_DGRAM, connect() specifies the peer address to which messages are to be sent, and thecall returns immediately. Furthermore, this socket can only receive messages sent from this address. The "furthermore" is what we are after. regards, tom lane
Redhat 7.1 says The file descriptor sockfd must refer to a socket. If the socket is of type SOCK_DGRAM then the serv_addr addressis the address to which datagrams are sent by default, and the only address from which datagramsare received. If Looks like the test is obsolete. Any objections to remove it? Do people agree that it's a bugfix that can go into 7.4? Jan Tom Lane wrote: > Kurt Roeckx <Q@ping.be> writes: >> Then it connects to itself. I don't get the logic behind that >> howver. > > At least on HPUX, the connect(2) man page saith > > If the socket is of type SOCK_DGRAM, connect() specifies the peer > address to which messages are to be sent, and the call returns > immediately. Furthermore, this socket can only receive messages sent > from this address. > > The "furthermore" is what we are after. > > regards, tom lane -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck wrote: > Redhat 7.1 says > > The file descriptor sockfd must refer to a socket. If the > socket is of type SOCK_DGRAM then the serv_addr address is > the address to which datagrams are sent by default, and > the only address from which datagrams are received. If > > Looks like the test is obsolete. Any objections to remove it? Do people > agree that it's a bugfix that can go into 7.4? Oh, it definitely has to be in 7.4. We can't ship it in its broken state. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Jan Wieck <JanWieck@Yahoo.com> writes: > Redhat 7.1 says > The file descriptor sockfd must refer to a socket. If the > socket is of type SOCK_DGRAM then the serv_addr address is > the address to which datagrams are sent by default, and > the only address from which datagrams are received. If > Looks like the test is obsolete. Any objections to remove it? Do people > agree that it's a bugfix that can go into 7.4? The test is not "obsolete"; we understood when we wrote it that it should theoretically be redundant with the kernel-level restriction. But there may be systems out there that fail to make the check promised by the HPUX and Linux manpages. So I'd prefer to leave it in there if we can. I don't want to rip it out simply because we don't understand why it's failing on Adam's setup. What I'm wondering about is whether we are comparing the right number of bytes ... have both address structs been reported to have the same length? Maybe we need a min(). regards, tom lane
Tom Lane wrote: > Jan Wieck <JanWieck@Yahoo.com> writes: >> Redhat 7.1 says >> The file descriptor sockfd must refer to a socket. If the >> socket is of type SOCK_DGRAM then the serv_addr address is >> the address to which datagrams are sent by default, and >> the only address from which datagrams are received. If > >> Looks like the test is obsolete. Any objections to remove it? Do people >> agree that it's a bugfix that can go into 7.4? > > The test is not "obsolete"; we understood when we wrote it that it > should theoretically be redundant with the kernel-level restriction. > But there may be systems out there that fail to make the check promised > by the HPUX and Linux manpages. So I'd prefer to leave it in there if > we can. I don't want to rip it out simply because we don't understand > why it's failing on Adam's setup. > > What I'm wondering about is whether we are comparing the right number of > bytes ... have both address structs been reported to have the same > length? Maybe we need a min(). I disagree. If getsockname(), getpeername() or recvfrom() return different address length's, it'd be more an indicator that the addresses ARE different anyway. Just because an IPV4 address doesn't have that feature is no reason to assume that addresses of different length are the same if they match over min() bytes. If you want to continue to check the addresses and feel that HPUX and Linux manpage claims about kernel behaviour aren't enough, then we have to make it a specific check per "supported" AF. For now, that'd just be AF_INET and AF_INET6, and the default case bailing out with an error. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Fri, Sep 05, 2003 at 09:35:11AM -0400, Jan Wieck wrote: > Redhat 7.1 says > > The file descriptor sockfd must refer to a socket. If the > socket is of type SOCK_DGRAM then the serv_addr address is > the address to which datagrams are sent by default, and > the only address from which datagrams are received. If > > Looks like the test is obsolete. Any objections to remove it? Do people > agree that it's a bugfix that can go into 7.4? Reading SUS v2 and v3, both say: If the initiating socket is not connection-mode, then connect() sets the socket's peer address, but no connectionis made. For SOCK_DGRAM sockets, the peer address identifies where all datagrams are sent on subsequent send()calls, and limits the remote sender for subsequent recv() calls. So it looks good to me. I do wonder why nobody had a problem with this before however. Kurt
Jan Wieck <JanWieck@Yahoo.com> writes: > Tom Lane wrote: >> What I'm wondering about is whether we are comparing the right number of >> bytes ... have both address structs been reported to have the same >> length? Maybe we need a min(). > I disagree. If getsockname(), getpeername() or recvfrom() return > different address length's, it'd be more an indicator that the addresses > ARE different anyway. Hm, good point. But I still feel that we are jumping to a conclusion without understanding what's going on. I'd like to know *why* the addresses are different on Adam's machine, before we conclude that we mustn't try to check that they are the same. regards, tom lane
Tom Lane wrote: > Jan Wieck <JanWieck@Yahoo.com> writes: > > Tom Lane wrote: > >> What I'm wondering about is whether we are comparing the right number of > >> bytes ... have both address structs been reported to have the same > >> length? Maybe we need a min(). > > > I disagree. If getsockname(), getpeername() or recvfrom() return > > different address length's, it'd be more an indicator that the addresses > > ARE different anyway. > > Hm, good point. But I still feel that we are jumping to a conclusion > without understanding what's going on. I'd like to know *why* the > addresses are different on Adam's machine, before we conclude that we > mustn't try to check that they are the same. Agreed. We should know exactly what is happening. My only point earlier is that this has to be fixed for 7.4. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Jan Wieck <JanWieck@Yahoo.com> writes: > Tom Lane wrote: >> I was about to say "I give up, let's just take out the comparison". > Which then get's us back to your concern about assuming that HPUX and > Linux manpages can be taken as "every platform will" and hope all > kernels will limit the sender for recv() to the connected address. Well, I'd not have cared to trust just those couple of manpages, but if it's in the Single Unix Spec then it's more likely that everyone follows it. Also, I checked my yellowing first edition of Stevens, and it says the same thing: "only datagrams from this address will be received by the socket". So I'm thinking that this behavior has been passed down from the original Berkeley sockets code. > Since all involved processes are children of the postmaster, we can add > some other, random number based security signature into the message > itself. Noone outside will know what that is, it's really hard to guess > and can be checked with a few int32 compares, not even a function call > required. We could do that if we're feeling paranoid, but I'm now leaning to the view that it's not worth the trouble. regards, tom lane
Jan Wieck <JanWieck@Yahoo.com> writes: > I still believe that this is just garbage in the padding bytes after the > IPV4 address. Yes. I have been looking at the behavior on my own Linux box, on which it turns out stats are broken too in CVS tip. It is very clear that getsockname() is returning garbage --- it says that the address length is 16 bytes, but only the first 8 are actually used, and the values put into the other 8 bytes vary from run to run. recvfrom() also returns 16 bytes, but it seems to be careful to zero the extra space. Ugh. The reason 7.4 breaks where 7.3 worked is that 7.3 compared the addresses like this: if (fromaddr.sin_addr.s_addr != pgStatAddr.sin_addr.s_addr) continue; if (fromaddr.sin_port!= pgStatAddr.sin_port) continue; where 7.4 has if (memcmp(&fromaddr, &pgStatAddr, fromlen)) continue; > The code currently bind()'s and connect()'s explicitly to > an AF_INET address. Not in 7.4. Kurt rewrote that stuff so it would still work on a machine with only IPV6 addressing. It should work for either AF_INET or AF_INET6 sockets. (I'm unconvinced that assuming "localhost" can be looked up is an improvement over assuming "127.0.0.1" can be used, but that's not our immediate problem.) > After reading Kurt's quoting of the SUS manpage I have to agree with Tom > in that we cannot skip the check entirely. It says it limits for recv() > but we are using recvfrom() ... there might be a little difference on > that platform ... I was about to say "I give up, let's just take out the comparison". Your point is interesting but easily avoided; if we aren't going to check fromaddr anymore then there's no need to use recvfrom(), it could as well be recv() and save the kernel a few cycles. regards, tom lane
On a second thought, I still believe that this is just garbage in the padding bytes after the IPV4 address. The code currently bind()'s and connect()'s explicitly to an AF_INET address. So all we ever should see is something from and AF_INET address. Everything else in the sin_family has to be discarded. I do not think it is allowed to bind() and connect() to an IPV4 address and then get anything other than an IPV4 address back from the system. If that is the case, the whole idea is broken. An AF_INET address now has only two relevant fields, the sin_port and sin_addr. If they are the same, everything is fine. So the correct check would be that 1. fromlen > sizeof(sin_family), 2. sin_family == AF_INET, 3. sin_port and sin_addr identical. After reading Kurt's quoting of the SUS manpage I have to agree with Tom in that we cannot skip the check entirely. It says it limits for recv() but we are using recvfrom() ... there might be a little difference on that platform ... Jan Wieck wrote: > > Tom Lane wrote: > >> Jan Wieck <JanWieck@Yahoo.com> writes: >>> Redhat 7.1 says >>> The file descriptor sockfd must refer to a socket. If the >>> socket is of type SOCK_DGRAM then the serv_addr address is >>> the address to which datagrams are sent by default, and >>> the only address from which datagrams are received. If >> >>> Looks like the test is obsolete. Any objections to remove it? Do people >>> agree that it's a bugfix that can go into 7.4? >> >> The test is not "obsolete"; we understood when we wrote it that it >> should theoretically be redundant with the kernel-level restriction. >> But there may be systems out there that fail to make the check promised >> by the HPUX and Linux manpages. So I'd prefer to leave it in there if >> we can. I don't want to rip it out simply because we don't understand >> why it's failing on Adam's setup. >> >> What I'm wondering about is whether we are comparing the right number of >> bytes ... have both address structs been reported to have the same >> length? Maybe we need a min(). > > I disagree. If getsockname(), getpeername() or recvfrom() return > different address length's, it'd be more an indicator that the addresses > ARE different anyway. Just because an IPV4 address doesn't have that > feature is no reason to assume that addresses of different length are > the same if they match over min() bytes. > > If you want to continue to check the addresses and feel that HPUX and > Linux manpage claims about kernel behaviour aren't enough, then we have > to make it a specific check per "supported" AF. For now, that'd just be > AF_INET and AF_INET6, and the default case bailing out with an error. > > > Jan > -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Tom Lane wrote: > I was about to say "I give up, let's just take out the comparison". > Your point is interesting but easily avoided; if we aren't going to check > fromaddr anymore then there's no need to use recvfrom(), it could as > well be recv() and save the kernel a few cycles. Which then get's us back to your concern about assuming that HPUX and Linux manpages can be taken as "every platform will" and hope all kernels will limit the sender for recv() to the connected address. Since all involved processes are children of the postmaster, we can add some other, random number based security signature into the message itself. Noone outside will know what that is, it's really hard to guess and can be checked with a few int32 compares, not even a function call required. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
This analysis makes sense - I think using memcmp is clearly wrong here. cheers andrew Jan Wieck said: > On a second thought, > > I still believe that this is just garbage in the padding bytes after > the IPV4 address. The code currently bind()'s and connect()'s > explicitly to an AF_INET address. So all we ever should see is > something from and AF_INET address. Everything else in the sin_family > has to be discarded. I do not think it is allowed to bind() and > connect() to an IPV4 address and then get anything other than an IPV4 > address back from the system. If that is the case, the whole idea is > broken. > > An AF_INET address now has only two relevant fields, the sin_port and > sin_addr. If they are the same, everything is fine. So the correct > check would be that 1. fromlen > sizeof(sin_family), 2. sin_family == > AF_INET, 3. sin_port and sin_addr identical. > > After reading Kurt's quoting of the SUS manpage I have to agree with > Tom in that we cannot skip the check entirely. It says it limits for > recv() but we are using recvfrom() ... there might be a little > difference on that platform ... > >
<andrew@dunslane.net> writes: > This analysis makes sense - I think using memcmp is clearly wrong here. Yeah, now that I think about it, we're betting on the kernel to faithfully zero all unused bits in addrinfo structures. In an ideal world, all kernels would do that, but in the real world it seems like a losing bet. I could go for Jan's idea of putting a random key into the messages, if anyone feels that we should not trust to the kernel to enforce the packet source address restriction. But the memcmp() test seems a clear loser given today's discussions. regards, tom lane
Tom Lane wrote: > <andrew@dunslane.net> writes: > > This analysis makes sense - I think using memcmp is clearly wrong here. > > Yeah, now that I think about it, we're betting on the kernel to > faithfully zero all unused bits in addrinfo structures. In an ideal > world, all kernels would do that, but in the real world it seems like > a losing bet. Yeah, I've always been under the impression that it's a bad idea in general to memcmp() structs, if only because in doing so you make a lot of implicit assumptions about the structs in question that aren't necessarily true, especially when dealing with multiple architectures. Makes me wonder if there are other parts of the code where we're vulnerable to the same sort of issue... > I could go for Jan's idea of putting a random key into the messages, > if anyone feels that we should not trust to the kernel to enforce the > packet source address restriction. But the memcmp() test seems a clear > loser given today's discussions. The test in the 7.3.x code looked reasonable to me, especially if it's possible to make it work with IPV6 (if it doesn't already). It's doing basically the right thing, at any rate: directly comparing the actual fields that are relevant. Does this test represent a significant performance hit? -- Kevin Brown kevin@sysexperts.com
On Tue, Sep 09, 2003 at 02:10:20AM -0700, Kevin Brown wrote: > > I could go for Jan's idea of putting a random key into the messages, > > if anyone feels that we should not trust to the kernel to enforce the > > packet source address restriction. But the memcmp() test seems a clear > > loser given today's discussions. > > The test in the 7.3.x code looked reasonable to me, especially if it's > possible to make it work with IPV6 (if it doesn't already). It's doing > basically the right thing, at any rate: directly comparing the actual > fields that are relevant. Does this test represent a significant > performance hit? The reason I used a memcmp() instead of dealing with the structure members themself is because it was easier. Checking that they're the same address family is easy, and if they're different the kernel is really broken. For the addresses and port, in case of IPv4, you have to cast it to sockaddr_in *, and compare the sin_addr and sin_port like before. For IPv6 you could do it with a memcmp on the sin6_addr part, and put it inside an #ifdef HAVE_IPV6. If you want to write code to compare 2 addresses, please make it a general function and place it in ip.c. Anyway, I'm happy with the current use of recv(). Kurt
Kurt Roeckx wrote: > On Tue, Sep 09, 2003 at 02:10:20AM -0700, Kevin Brown wrote: >> > I could go for Jan's idea of putting a random key into the messages, >> > if anyone feels that we should not trust to the kernel to enforce the >> > packet source address restriction. But the memcmp() test seems a clear >> > loser given today's discussions. >> >> The test in the 7.3.x code looked reasonable to me, especially if it's >> possible to make it work with IPV6 (if it doesn't already). It's doing >> basically the right thing, at any rate: directly comparing the actual >> fields that are relevant. Does this test represent a significant >> performance hit? > > The reason I used a memcmp() instead of dealing with the > structure members themself is because it was easier. :-/ > > Checking that they're the same address family is easy, and if > they're different the kernel is really broken. Agreed. > > For the addresses and port, in case of IPv4, you have to cast it > to sockaddr_in *, and compare the sin_addr and sin_port like > before. Using a decent C compiler (and who compiles PostgreSQL without) this should reduce to some sort of a 32-bit and another 16-bit comparisions ... no? > For IPv6 you could do it with a memcmp on the sin6_addr part, and > put it inside an #ifdef HAVE_IPV6. > > If you want to write code to compare 2 addresses, please make it > a general function and place it in ip.c. > > Anyway, I'm happy with the current use of recv(). I disagree here. The clean and secure way is still to do *some* check (because we carefully don't assume that all OS's behave like some manpages happen to agree to). The performant way is to do it with information that is available without any extra system call. So either we do the random signature thing, which I would favor as a one time be all, end all solution - or you do the actual from-address based implementation by restoring the old IPV4 behaviour and adding correct IPV6 behaviour. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck <JanWieck@Yahoo.com> writes: > So either we do the random signature thing, which I would favor as a one > time be all, end all solution - or you do the actual from-address based > implementation by restoring the old IPV4 behaviour and adding correct > IPV6 behaviour. My feeling at this point is that it's not worth spending any effort on. But if someone wants to expend effort, let's go with Jan's random-signature idea. That is simple, unquestionably portable, and AFAICS it defends against more than the source-address check would defend against, even after we got it right. (Consider spoofed packet source addresses.) regards, tom lane
Tom Lane said: > Jan Wieck <JanWieck@Yahoo.com> writes: >> So either we do the random signature thing, which I would favor as a >> one time be all, end all solution - or you do the actual from-address >> based implementation by restoring the old IPV4 behaviour and adding >> correct IPV6 behaviour. > > My feeling at this point is that it's not worth spending any effort on. > But if someone wants to expend effort, let's go with Jan's > random-signature idea. That is simple, unquestionably portable, and > AFAICS it defends against more than the source-address check would > defend against, even after we got it right. (Consider spoofed packet > source addresses.) > I see that currently the check has been removed rather than fixed. If someone can spoof the packet address isn't there also a possibility that they can read your packets and see your random signature? I'm not clear what would be gained by an attacker being able to insert such spoofed packets into the stream, though. IOW, how big is the security threat? cheers andrew
On Wed, Sep 10, 2003 at 07:27:02AM -0400, Andrew Dunstan wrote: > If someone can spoof the packet address isn't there also a possibility > that they can read your packets and see your random signature? Spoofing the packet source address is not quite the same as sniffing a connection, which should be encrypted if you do not trust your environment AFAIU. > I'm not clear what would be gained by an attacker being able to insert > such spoofed packets into the stream, though. IOW, how big is the security > threat? DoS caused by the autovacuum daemon running too much? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "The Gord often wonders why people threaten never to come back after they've been told never to return" (www.actsofgord.com)
Jan Wieck wrote: > > For IPv6 you could do it with a memcmp on the sin6_addr part, and > > put it inside an #ifdef HAVE_IPV6. > > > > If you want to write code to compare 2 addresses, please make it > > a general function and place it in ip.c. > > > > Anyway, I'm happy with the current use of recv(). > > I disagree here. The clean and secure way is still to do *some* check > (because we carefully don't assume that all OS's behave like some > manpages happen to agree to). The performant way is to do it with If the OS behavior doesn't match the man page, can't we just call that an OS bug? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Wed, Sep 10, 2003 at 07:27:02AM -0400, Andrew Dunstan wrote: >> If someone can spoof the packet address isn't there also a possibility >> that they can read your packets and see your random signature? > Spoofing the packet source address is not quite the same as sniffing a > connection, which should be encrypted if you do not trust your > environment AFAIU. Remember this is a local-loopback connection; the packets will never leave your own kernel. If the attacker can sniff the packets then he is already into your kernel, in which case game over. But depending on how careful your kernel is, it's possible that an attacker who doesn't yet own your machine could inject forged packets with a local source address. So I think that indeed there are scenarios where a random-signature check would be more secure than a source-address check. The question is whether any of this is worth worrying about in PG. ISTM the correct solution to such a risk is to tighten your kernel's packet filtering, not harden one piece of one application. regards, tom lane
On Wed, Sep 10, 2003 at 12:49:31 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The question is whether any of this is worth worrying about in PG. > ISTM the correct solution to such a risk is to tighten your kernel's > packet filtering, not harden one piece of one application. On linux at least, it is pretty easy to make sure packets claiming to be from loopback are dropped if they don't come in on the loopback interface.