Thread: human validation on post comments
Hi, my name is Travis, I am new to pgsql-www. I have been integrating a component that will ask the user to enter the word in a dynamic image before their comments can be submitted. I had resumed this work in progress from Gevik. The current setup is the development site in my sandbox http://travis.pgadmin.org I was looking for feedback, and guidance for the following things: - is the input validator being invoked on all the spots where it should be? currently the comments feedback for the documentation, i know about. validation is invoked from /system/page/form.php - currently the invalid human feedback page is a simple validation failed message, outside of portal look and feel. and technical details (if they follow our best practices) - there is a single table session_capture for managing remote ip, session, and the word used in the generated image. - the validation script that manipulates dynamic image and session table are in a top level folder /validation - gettext macro used for the messages displayed - modify system/page.php to add validate case to action handler list - modify system/page/form.php to send to our /validate handler, instead of /system/handleform.php - modify the .htaccess to pass /validationimage through to the /validation/validation_image.php Then if things are looking ok, what and how is the process for integrating the enhancements back to the site. looking forward to constructive comments, Travis
Travis, > I have been integrating a component that will ask the user to enter the > word in a dynamic image before their comments can be submitted. Terrific! I'm sure the people who clear the comments will have nice things to say. The image is generated dynamically? That's good -- the spammers are already working on systems that harvest static images from sites and match them against a database. Grrrr. -- Josh Berkus Aglio Database Solutions San Francisco
> my name is Travis, I am new to pgsql-www. > > I have been integrating a component that will ask the user to > enter the word in a dynamic image before their comments can > be submitted. > > I had resumed this work in progress from Gevik. Great! > - is the input validator being invoked on all the spots where > it should be? > currently the comments feedback for the documentation, i know about. > validation is invoked from /system/page/form.php I believe all form submissions go through there ATM. But it would probably be good if there was something "API-like" so you could call it from elsewhere if required, unless that's too much work? Will the validation image automatically be added to all forms, or is it something we need to set for each form that needs it? And if it's automatic, is there a way to turn it off for a form through form.php? > - currently the invalid human feedback page is a simple > validation failed message, outside of portal look and feel. That would be nice if it could be fixed to be a page inside the portal. IIRC I have code around to fix that for the generic forms, that will go in any day now :-) Might help you build off. > - the validation script that manipulates dynamic image and > session table are in a top level folder /validation Any reason why this is not in /system/? I like the way things are now where all the code goes in one subdir. If it's many files, it could go in /system/validation? > - modify system/page/form.php > to send to our /validate handler, instead of > /system/handleform.php I assume the validate handler then passes control back into handleform, or does form-handling move completely into the validator? Rest looks very good! > Then if things are looking ok, what and how is the process > for integrating the enhancements back to the site. Send a patch through to the list for even more comments, I guess. //Magnus
> -----Original Message----- > From: pgsql-www-owner@postgresql.org > [mailto:pgsql-www-owner@postgresql.org] On Behalf Of Travis Hein > Sent: 18 March 2006 15:57 > To: pgsql-www@postgresql.org > Subject: [pgsql-www] human validation on post comments > > Hi, > > my name is Travis, I am new to pgsql-www. Hi Travis, > I have been integrating a component that will ask the user to > enter the word > in a dynamic image before their comments can be submitted. > > I had resumed this work in progress from Gevik. > > The current setup is the development site in my sandbox > http://travis.pgadmin.org > > I was looking for feedback, and guidance for the following things: > > > - is the input validator being invoked on all the spots where > it should be? > currently the comments feedback for the documentation, i know about. > validation is invoked from /system/page/form.php Most likely if it's from there. News, events, professional services, comments & bug reports all seem to be covered. > - currently the invalid human feedback page is a simple > validation failed > message, outside of portal look and feel. Yes, this needs to be fixed. Also, during my testing I hit the limit for uncompleted submissions - it gave me the message actually in the image, but cropped so it couldn't be fully read, but still asked me to enter the word and hit submit! Can the message be moved out and into a proper page please? Also, we could probably do with a couple of extra words above the image just to clarify why the user must enter the word. > - the validation script that manipulates dynamic image and > session table are > in a top level folder /validation Yup - as Magnus said these should be under /system somewhere. The URL exposed to the user (which will be rewritten in the .htaccess file) can be in the root directory though. > Then if things are looking ok, what and how is the process > for integrating the > enhancements back to the site. First off, resolve mine and any other issues raised (unless they result in any discussion/objections, in which case wait for the outcome of that first). Secondly, cvs update your code and merge with the current. Magnus committed a bunch of changes over the weekend so you'll need to make sure everything lives happily together. Then, post a patch in diff -c format for review. Cheers, Dave.
On Sat, Mar 18, 2006 at 10:15:12AM -0800, Josh Berkus wrote: > Travis, > > > I have been integrating a component that will ask the user to > > enter the word in a dynamic image before their comments can be > > submitted. > > Terrific! I'm sure the people who clear the comments will have nice > things to say. > > The image is generated dynamically? That's good -- the spammers > are already working on systems that harvest static images from sites > and match them against a database. Grrrr. Actually, they've already got one, and here's how it works: 1. Put up a free porn site. 2. Present somebody else's capcha image as an entry. 3. Let the person see the porn if they've correctly cracked the capcha. 4. Spam site. The sad part of this one is that they don't have to crack any single capcha system. Instead, they've cracked the entire capcha process. Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
> -----Original Message----- > From: pgsql-www-owner@postgresql.org > [mailto:pgsql-www-owner@postgresql.org] On Behalf Of David Fetter > Sent: 21 March 2006 05:43 > To: PostgreSQL WWW > Subject: Re: [pgsql-www] human validation on post comments > > Actually, they've already got one, and here's how it works: > > 1. Put up a free porn site. > 2. Present somebody else's capcha image as an entry. > 3. Let the person see the porn if they've correctly cracked the > capcha. > 4. Spam site. > > The sad part of this one is that they don't have to crack any single > capcha system. Instead, they've cracked the entire capcha process. Grrr, where's my baseball bat? Actually though that shouldn't be too much of a problem as long as the images timeout after a few minutes- and we still have all the normal moderation in place. Regards, Dave.
Dave Page schrieb: > > > >>-----Original Message----- >>From: pgsql-www-owner@postgresql.org >>[mailto:pgsql-www-owner@postgresql.org] On Behalf Of David Fetter ... >>The sad part of this one is that they don't have to crack any single >>capcha system. Instead, they've cracked the entire capcha process. > > > Grrr, where's my baseball bat? > > Actually though that shouldn't be too much of a problem as long as the > images timeout after a few minutes- and we still have all the normal > moderation in place. > I should point out, the whole captcha thing isnt WAI compliant: http://www.w3.org/WAI/ so better not follow that path if we dont want to shy away diabled people... --Tino
> -----Original Message----- > From: Tino Wildenhain [mailto:tino@wildenhain.de] > Sent: 21 March 2006 08:26 > To: Dave Page > Cc: David Fetter; PostgreSQL WWW > Subject: Re: [pgsql-www] human validation on post comments > > Dave Page schrieb: > > > > > > > >>-----Original Message----- > >>From: pgsql-www-owner@postgresql.org > >>[mailto:pgsql-www-owner@postgresql.org] On Behalf Of David Fetter > ... > >>The sad part of this one is that they don't have to crack any single > >>capcha system. Instead, they've cracked the entire capcha process. > > > > > > Grrr, where's my baseball bat? > > > > Actually though that shouldn't be too much of a problem as > long as the > > images timeout after a few minutes- and we still have all the normal > > moderation in place. > > > I should point out, the whole captcha thing isnt WAI compliant: > http://www.w3.org/WAI/ > > so better not follow that path if we dont want to shy away > diabled people... Hmm, that's a good point (one I should have thought of considering the BSI's new guidance notes on building accessible websites just landed on my desk!). I think we would be covered if we offered an audio equivalent as well though wouldn't we? Perhaps there is some text-to-speech code that we can use from within PHP? Regards, Dave.
> > > I have been integrating a component that will ask the > user to enter > > > the word in a dynamic image before their comments can be > submitted. > > > > Terrific! I'm sure the people who clear the comments will > have nice > > things to say. > > > > The image is generated dynamically? That's good -- the spammers > > are already working on systems that harvest static images > from sites > > and match them against a database. Grrrr. > > Actually, they've already got one, and here's how it works: > > 1. Put up a free porn site. > 2. Present somebody else's capcha image as an entry. > 3. Let the person see the porn if they've correctly cracked the > capcha. > 4. Spam site. > > The sad part of this one is that they don't have to crack any > single capcha system. Instead, they've cracked the entire > capcha process. I don't know how this particular system is set up, but how can they defeat something like: * Fill in form data. Submit * Generate verification page containing an image. Along with the code, store the hash of the form data. * Validate the image against the hash of the data. Means you need to put in all your data in the form beforehand, so you have to tailor one page to each set of contenst. Or am I thinking completely wrong here :-) //Magnus
On Tue, Mar 21, 2006 at 08:12:05AM -0000, Dave Page wrote: > > -----Original Message----- > > From: pgsql-www-owner@postgresql.org > > [mailto:pgsql-www-owner@postgresql.org] On Behalf Of David Fetter > > Sent: 21 March 2006 05:43 > > To: PostgreSQL WWW > > Subject: Re: [pgsql-www] human validation on post comments > > > > Actually, they've already got one, and here's how it works: > > > > 1. Put up a free porn site. > > 2. Present somebody else's capcha image as an entry. > > 3. Let the person see the porn if they've correctly cracked the > > capcha. > > 4. Spam site. > > > > The sad part of this one is that they don't have to crack any > > single capcha system. Instead, they've cracked the entire capcha > > process. > > Grrr, where's my baseball bat? > > Actually though that shouldn't be too much of a problem as long as > the images timeout after a few minutes- and we still have all the > normal moderation in place. The porn thing works just fine no matter what the timeout is, as the spam is queued up already and the capcha gets presented as soon as it's generated. The porn surfer will generally not dally when presented with the capcha. But apart from its ineffectiveness on spammers, as others have mentioned, capcha excludes blind people. :( Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
> -----Original Message----- > From: David Fetter [mailto:david@fetter.org] > Sent: 21 March 2006 16:45 > To: Dave Page > Cc: PostgreSQL WWW > Subject: Re: [pgsql-www] human validation on post comments > > The porn thing works just fine no matter what the timeout is, as the > spam is queued up already and the capcha gets presented as soon as > it's generated. The porn surfer will generally not dally when > presented with the capcha. Generating enough real traffic to a dummy site to ensure that there is always user ready to read a single capcha within a few minutes of it being generated just to post a single piece of spam seems like a pretty mean feat. I would think they could generate more revenue from bunging a few ads on the site than hoping that the spam they manage to get on a completely unrelated site might actually generate a customer. Still, I'm only speculating so may be completely wrong. > But apart from its ineffectiveness on spammers, as others have > mentioned, capcha excludes blind people. :( Yes - it's a shame none of us thought about it when Gevik was originally working on it. There is the audio option I suggested which Paypal use IIRC - alternatively we could use some sort of puzzle - such as 'enter the third, second from last and 2nd character from this string'. Regards, Dave.
On Tue, Mar 21, 2006 at 04:54:24PM -0000, Dave Page wrote: > > > > -----Original Message----- > > From: David Fetter [mailto:david@fetter.org] > > Sent: 21 March 2006 16:45 > > To: Dave Page > > Cc: PostgreSQL WWW > > Subject: Re: [pgsql-www] human validation on post comments > > > > The porn thing works just fine no matter what the timeout is, as > > the spam is queued up already and the capcha gets presented as > > soon as it's generated. The porn surfer will generally not dally > > when presented with the capcha. > > Generating enough real traffic to a dummy site to ensure that there > is always user ready to read a single capcha within a few minutes of > it being generated just to post a single piece of spam seems like a > pretty mean feat. I see I didn't explain it well enough. Here's the flow: 1. Spammer generates spam and queues it up for sites. 2. A person arrives at the porn site. 3. The spam system generates a request including the spam to the target site. Clock starts ticking. 4. The spam system presents the resulting capcha to the porn surfer. Less than a second has elapsed. 5. Porn surfer types in the string as asked. Time elapsed is probably still under 5 seconds. 6. Spam system sends the string to the target site. Time elapsed is under 10 seconds for >90% of cases. > I would think they could generate more revenue from bunging a few > ads on the site than hoping that the spam they manage to get on a > completely unrelated site might actually generate a customer. Still, > I'm only speculating so may be completely wrong. It's very cheap to set up such a system, and spammers routinely expect--and profit from--"hit rates" that are less than one in a million. > > But apart from its ineffectiveness on spammers, as others have > > mentioned, capcha excludes blind people. :( > > Yes - it's a shame none of us thought about it when Gevik was > originally working on it. > > There is the audio option I suggested which Paypal use IIRC - > alternatively we could use some sort of puzzle - such as 'enter the > third, second from last and 2nd character from this string'. That lends itself to exactly the same attack I sketched out above. Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
> -----Original Message----- > From: David Fetter [mailto:david@fetter.org] > Sent: 21 March 2006 17:16 > To: Dave Page > Cc: PostgreSQL WWW > Subject: Re: [pgsql-www] human validation on post comments > > I see I didn't explain it well enough. Here's the flow: > > 1. Spammer generates spam and queues it up for sites. > 2. A person arrives at the porn site. > 3. The spam system generates a request including the spam to the > target site. Clock starts ticking. > 4. The spam system presents the resulting capcha to the porn surfer. > Less than a second has elapsed. > 5. Porn surfer types in the string as asked. Time elapsed is > probably still under 5 seconds. > 6. Spam system sends the string to the target site. Time elapsed is > under 10 seconds for >90% of cases. Ahh, gotcha. > > > > But apart from its ineffectiveness on spammers, as others have > > > mentioned, capcha excludes blind people. :( > > > > Yes - it's a shame none of us thought about it when Gevik was > > originally working on it. > > > > There is the audio option I suggested which Paypal use IIRC - > > alternatively we could use some sort of puzzle - such as 'enter the > > third, second from last and 2nd character from this string'. > > That lends itself to exactly the same attack I sketched out above. Undoubtedley, but unless they write something specifically to work with our site which is a lot of effort... And all we do then is fall back to how things are now until we've broken whatever they were doing by modifying the regexps in the auto-reject code or re-jigged the puzzles. Of course, doing any of this we mustn't make it too difficult for the user to submit things. Regards, Dave.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Pardon if this has already been brought up as I am late to the thread, but why can't we just use a "unseen by default" policy in which all posts are not made viewable until approved by a moderator? - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200603211227 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFEIDjRvJuQZxSWSsgRAtydAKCts/A5xWRpzbxs//VjZqmCrKUQXwCaArh3 tpO4osfGfRUS2GPGJKab2dk= =Ha6g -----END PGP SIGNATURE-----
> Pardon if this has already been brought up as I am late to > the thread, but why can't we just use a "unseen by default" > policy in which all posts are not made viewable until > approved by a moderator? We already do this, and have done for a while (before that they used to be show-by-default, but can be rejected, which missed a lot). This is AFAIK about making life for the moderators easie r:-) //Magnus
On Tuesday 21 March 2006 12:49, Magnus Hagander wrote: > > Pardon if this has already been brought up as I am late to > > the thread, but why can't we just use a "unseen by default" > > policy in which all posts are not made viewable until > > approved by a moderator? > > We already do this, and have done for a while (before that they used to > be show-by-default, but can be rejected, which missed a lot). This is > AFAIK about making life for the moderators easie r:-) > Honestly I don't find the amount of spam as annoying as the amount of pointless posts and/or support questions... but if you really want such a system, why not have one that switches randomly between captcha, math problems, and anagrams and also verifies the referer url. This wouldn't be foolproof, but the point here is only to make it more of a hassle than the next guys site. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Hi all, sorry been a bit since this one, I have been out pondering this. ok, so far, from the way the captcha works, it uses a text file as a dictionary of words. and randomly picks a word from this file. then it uses the Pear PHP modules to dynamically generate a random placement and orientation of the word into an image, with two random sets of coincentric circles. I am thinking that it would deter the bots or anyone from using a match image lookup, character recognition, or saving previous images. (and if someone is that creative to get through, they still have to deal with the moderators :) ) There is a simple database tabe table that stores the word, when the image is first generated, and links it to your session. You have 3 tries to enter the right word, then the image becomes not useable any more. Once you do enter the correct word, this image / session binding is updated as being used, and cannot be used again. there is also a time out period where by the image expires, in the session captcha, assuming you keep your session active long enough. This was all Gavik's work, seems pretty complete on these parts. On March 21, 2006 03:43 am, Dave Page wrote: > I should point out, the whole captcha thing isnt WAI compliant: > http://www.w3.org/WAI/ > > so better not follow that path if we dont want to shy away > diabled people... > I think we would be covered if we offered an audio equivalent as well > though wouldn't we? Perhaps there is some text-to-speech code that we > can use from within PHP? > But apart from its ineffectiveness on spammers, as others have > mentioned, capcha excludes blind people. :( So, what I am thinking, is for us to be WAI compliant, we need an option to "click here to receive an audible version of this word in the image that you must type in" I have not been able to find a good, universal, works in all places text to speech converter, so how about, if we (someone with a good voice) pre-records each and every word in the capatcha word dictionary, (eventually we could also do it for all languages we support too, but i was thinking if the word was "shovel" that the voice version of the word could be "s" "h" "o" "v" "e" "l" . you know, spell out the letters, so that it is more language independent. (currently, the dictionary for captacha words is english only now too.) I would consider using a small database table, for the dictionary, to manage all the words for the dictionary, as opposed to cluttering up a folder in the docroot with the audio files? The table would have a column for the word, as text, and a column for the word, as a .wav (or the standard acceptable audio format) recorded, spelled out by the letters version of this word. then on the captacha image screen, the link "click here to get the audio version of the word shown in the image button" would be the facility to retreive the word as a downloadable / playable attachment for the user, to play, and type the letters back in. From there, the existing session captcha features would be used. I recognise that the challenge is to get someone to vocalize the dictionary of captcha words. I have not done this yet, but wanted to get the general feedback to if this was a good idea. > -- Tue Mar 28 19:48:12 EST 2006 19:48:12 up 21 min, 1 user, load average: 12.87, 9.76, 5.24
> -----Original Message----- > From: pgsql-www-owner@postgresql.org > [mailto:pgsql-www-owner@postgresql.org] On Behalf Of Travis Hein > Sent: 29 March 2006 02:10 > To: PostgreSQL WWW > Subject: Re: [pgsql-www] human validation on post comments > Hi Travis, > > But apart from its ineffectiveness on spammers, as others have > > mentioned, capcha excludes blind people. :( > > So, what I am thinking, is for us to be WAI compliant, we > need an option to > "click here to receive an audible version of this word in the > image that you > must type in" > > I have not been able to find a good, universal, works in all > places text to > speech converter, Well, we could always use a 'works in one place' one and record the output. > so how about, if we (someone with a good voice) pre-records > each and every > word in the capatcha word dictionary, (eventually we could > also do it for all > languages we support too, but i was thinking if the word was > "shovel" that > the voice version of the word could be "s" "h" "o" "v" "e" > "l" . you know, > spell out the letters, so that it is more language > independent. (currently, > the dictionary for captacha words is english only now too.) Yes, that seems the easiest way. Simpler yet, perhaps we should make it a numeric code. > I would consider using a small database table, for the > dictionary, to manage > all the words for the dictionary, as opposed to cluttering up > a folder in the > docroot with the audio files? > The table would have a column for the word, as text, and a > column for the > word, as a .wav (or the standard acceptable audio format) > recorded, spelled > out by the letters version of this word. It might be better to keep it all in the filesystem to be honest, otherwise it'll add a fair bit of load to wwwmaster. In the fs, at least it will all come from one of the frontend servers. > then on the captacha image screen, the link "click here to > get the audio > version of the word shown in the image button" would be the > facility to > retreive the word as a downloadable / playable attachment for > the user, to > play, and type the letters back in. From there, the existing > session captcha > features would be used. Yes. > I recognise that the challenge is to get someone to vocalize > the dictionary of > captcha words. > I have not done this yet, but wanted to get the general > feedback to if this > was a good idea. It's doable, but is it worth the effort? I'm beginning to think not if we have to prerecord everything rather than being able to generate it on the fly. Currently I'm only seeing a few spams per day - it looks like the latest additions to the regexp reject list is working pretty well. Regards, Dave