<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>bhiv.com &#187; Security</title>
	<atom:link href="http://bhiv.com/category/software/security/feed/" rel="self" type="application/rss+xml" />
	<link>http://bhiv.com</link>
	<description></description>
	<lastBuildDate>Tue, 15 Jul 2008 22:14:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>22 Million Reasons This is a Stupid Anti-Spam Measure</title>
		<link>http://bhiv.com/22-million-reasons-this-is-a-stupid-anti-spam-measure/</link>
		<comments>http://bhiv.com/22-million-reasons-this-is-a-stupid-anti-spam-measure/#comments</comments>
		<pubDate>Mon, 09 Apr 2007 18:50:38 +0000</pubDate>
		<dc:creator>Brent</dc:creator>
				<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://bhiv.com/22-million-reasons-this-is-a-stupid-anti-spam-measure/</guid>
		<description><![CDATA[Ever been reading a site and they show an email address as: &#8220;name AT example DOT com&#8221; to try and fool email address harvesters? Well I have 22 million reasons why this is a stupid method. Anyone want to buy an email list?]]></description>
			<content:encoded><![CDATA[<p>Ever been reading a site and they show an email address as: &#8220;name AT example DOT com&#8221; to try and fool <a href="http://en.wikipedia.org/wiki/E-mail_address_harvesting">email address harvesters</a>?</p>
<p>Well I have <a href="http://www.google.com/search?hl=en&#038;safe=off&#038;q=%22*+AT+*+dot+com%22&#038;btnG=Search"><strong>22 million reasons</strong></a> why this is a stupid method. Anyone want to buy an email list?</p>
]]></content:encoded>
			<wfw:commentRss>http://bhiv.com/22-million-reasons-this-is-a-stupid-anti-spam-measure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Digg’s Evolving CAPTCHA</title>
		<link>http://bhiv.com/diggs-evolving-captcha/</link>
		<comments>http://bhiv.com/diggs-evolving-captcha/#comments</comments>
		<pubDate>Sun, 02 Oct 2005 15:31:18 +0000</pubDate>
		<dc:creator>Brent</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://bhiv.com/?p=8</guid>
		<description><![CDATA[Approximately 30 hours after I posted Digg made a change to CAPTCHA by altering the foreground colors and background colors, alternating normal and bold face, and mixing upper an lowercase letters. With the a slight modification to my program (not making all letters lowercase) it reduced my program’s accuracy to 11 % (out of 200 [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal">Approximately 30 hours after I <a target="_self" href="http://bhiv.com/2005/09/30/defeating-diggs-captcha/">posted</a> Digg made a change to CAPTCHA by altering the foreground colors and background colors, alternating normal and bold face, and mixing upper an lowercase letters. With the a slight modification to my program (not making all letters lowercase) it reduced my program’s accuracy to 11 % (out of 200 samples). But there was another problem, approximately 15% were not human solvable.<br />
<span id="more-8"></span>
</p>
<p class="MsoNormal">Examples:<br />
<img width="140" height="30" border="0" src="http://bhiv.com/wp-content/1128135.215216.jpg" /><br />
<img width="140" height="30" border="0" src="http://bhiv.com/wp-content/1128134.611966.jpg" /><br />
<img width="140" height="30" border="0" src="http://bhiv.com/wp-content/1128134.347671.jpg" />
</p>
<p class="MsoNormal">I’m sure they received complaints and they changed it again, this time altering the thickness of the background lines and dropping the dictionary words, this new version also seems to only produce ~1% human unsolvable.</p>
<p class="MsoNormal">More Examples:<br />
<img width="140" height="30" border="0" src="http://bhiv.com/wp-content/1128265.jpg" /><br />
<img width="140" height="30" border="0" src="http://bhiv.com/wp-content/1128265_080.jpg" /><br />
<img width="140" height="30" border="0" src="http://bhiv.com/wp-content/1128265_334.jpg" />
</p>
<p class="MsoNormal">My past technique would no longer work. While it is definitely better, it still has the weakness of a consistent font and placement. If I wanted to break it again, I could take samples of colors from the areas I know would not be letters, remove them, and train the OCR to the font they use. I won’t be demonstrating how because ultimately I want to continue to use digg and they are obviously aware of the problem now.</p>
<p class="MsoNormal">Now onto another CAPTCHA that will be 100% breakable…</p>
]]></content:encoded>
			<wfw:commentRss>http://bhiv.com/diggs-evolving-captcha/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Defeating Digg&#8217;s CAPTCHA</title>
		<link>http://bhiv.com/defeating-diggs-captcha/</link>
		<comments>http://bhiv.com/defeating-diggs-captcha/#comments</comments>
		<pubDate>Thu, 29 Sep 2005 20:19:04 +0000</pubDate>
		<dc:creator>Brent</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://bhiv.com/?p=7</guid>
		<description><![CDATA[While using digg.com, I was surprised to see such an obviously weak CAPTCHA challenge. I was able to create a script that defeats it with a 88% accuracy within a couple hours using nothing but free software. (If you are looking for code, forget it. This is almost too much information) Digg’s CAPTCHA Weaknesses: Dictionary [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal">While using <a target="_self" href="http://digg.com">digg.com</a>, I was surprised to see such an obviously weak <a target="_self" title="Wikipedia's CAPTCHA entry" href="http://en.wikipedia.org/wiki/Captcha">CAPTCHA</a> challenge. I was able to create a script that defeats it with a 88% accuracy within a couple hours using nothing but free software. <em><font color="#666666">(If you are looking for code, forget it. This is almost too much information)<span id="more-7"></span></font></em></p>
<p><em>   </em><strong>Digg’s CAPTCHA Weaknesses:<br />
</strong></p>
<ol>
<li>Dictionary      Words</li>
<li>Same      background</li>
<li>Same      Font</li>
<li>No      deformations</li>
<li>All      lowercase letters</li>
<li>Constant      colors</li>
</ol>
<p><strong>Tools<br />
</strong></p>
<ul>
<li><a target="_self" href="http://jocr.sourceforge.net/">gocr</a> &#8211; a GPL Optical Character Recognition program</li>
<li><a target="_self" href="http://www.imagemagick.org/script/index.php">ImageMagick</a> &#8211; for command time image editing</li>
<li>Perl &#8211; to tie everything together</li>
</ul>
<p><strong>Sample Size</strong><br />
100 images with 95 different words with an average word length of 5.3 letters.</p>
<p><strong>First Test</strong><br />
Just dumping all the images through gocr yielded 26% correct responses. Not too shabby. It yields some easily manipulated results:</p>
<ul>
<li><img width="140" height="30" border="0" src="http://bhiv.com/wp-content/digg-captcha-groups.jpg" /> = groUDS</li>
<li><img width="140" height="30" border="0" src="http://bhiv.com/wp-content/digg-captcha-single.jpg" /> = single</li>
<li><img width="140" height="30" border="0" src="http://bhiv.com/wp-content/digg-captcha-police.jpg" /> = t&#8217; o.l,i.c,e &#8216;. . .&#8217;.&#8217;. &#8221;,</li>
<li><img width="140" height="30" border="0" src="http://bhiv.com/wp-content/digg-captcha-because.jpg" /> = be.cause.</li>
</ul>
<p>Looking at the results I’m sure that we could improve the results with a little string manipulation.</p>
<p><strong>   Tweaking output</strong><br />
We can mess with the output to yield better results</p>
<ul>
<li>Convert all output to lowercase</li>
<li>Remove non letter characters</li>
<li>Spell checker</li>
</ul>
<p>The first two yield <strong>53%</strong> correct responses; just with this simple tweak we are able to get more correct guesses than incorrect. With adding the first guess of a spell checker it bumps the accuracy to <strong>67%</strong></p>
<p><strong>Tweaking input</strong></p>
<ul>
<li>Removing boarder</li>
<li>Adjusting contrast and brightness</li>
<li>Using edge detection</li>
</ul>
<p>So <img width="140" height="30" border="0" src="http://bhiv.com/wp-content/digg-captcha-groups.jpg" /> becomes <img width="138" height="28" border="0" src="http://bhiv.com/wp-content/digg-captcha-groups2.jpg" /></p>
<p>Since we are already over 2/3rds accurate we don’t need to adjust the input of every image, just the results that aren’t dictionary words. Part of them problem is that while one adjustment will improve results for one image, it will degrade the results for another. My solution was to try 10 variations, run them through the OCR and then spell check. I then had the program pick the solution with duplicate results, in the case of a tie or no duplicate I had the program pick the one with the fewest number of variations. This method resulted in the <strong>final accuracy of 88%</strong>.</p>
<p><strong>Problems with this technique</strong><br />
While these quick results have come close to becoming usable, they are still a far cry from 100% accuracy. Since digg uses a consistent font I could train gocr for problematic letters (such as p) also given that in 100 images I received 5 sets of duplicate words I would estimate their dictionary is only a couple thousand and could hand tweak the results.</p>
<p><strong>Other resouces</strong></p>
<ul>
<li><a target="_self" href="http://sam.zoy.org/pwntcha/">PWNtcha</a> &#8211; a project to build a captcha decoder</li>
<li><a target="_self" href="http://www.cs.sfu.ca/%7Emori/research/gimpy/">Breaking a Visual CAPTCHA</a> &#8211; the breaking of EZ-Gimpy CAPTCHAs</li>
</ul>
<p><strong>Disclaimer</strong><br />
I did contact digg last week to let them know I would be publishing this and offered them the opportunity to have it delayed while the upgraded their CAPTCHA. I haven’t heard from them as of now. I still offer them the opportunity to contact me and I will temporarily remove this article.</p>
<p><font color="#990000"><strong>Update:</strong></font><br />
~30 hours after I posted this they have changed the nature of their CAPTCHA, I will be writting a follow up soon &#8230; <a title="My response" target="_self" href="http://bhiv.com/2005/10/02/diggs-evolving-captcha/">here is my reponse</a></p>
]]></content:encoded>
			<wfw:commentRss>http://bhiv.com/defeating-diggs-captcha/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
	</channel>
</rss>

