<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Tips on Writing a Scraper</title>
	<atom:link href="http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/</link>
	<description></description>
	<lastBuildDate>Sat, 29 May 2010 14:08:08 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: Joseph</title>
		<link>http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/comment-page-1/#comment-403</link>
		<dc:creator>Joseph</dc:creator>
		<pubDate>Mon, 28 Jan 2008 16:45:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/#comment-403</guid>
		<description>DO you know of any example php scripts that could help me learn. I want to learn how to grab links, images, and content. Thanks</description>
		<content:encoded><![CDATA[<p>DO you know of any example php scripts that could help me learn. I want to learn how to grab links, images, and content. Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: rob</title>
		<link>http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/comment-page-1/#comment-407</link>
		<dc:creator>rob</dc:creator>
		<pubDate>Thu, 24 Jan 2008 11:54:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/#comment-407</guid>
		<description>Nice post :) Would you keep [a] tags and style tags etc in place in the content or would you strip them out - or by taking &#039;everything you can&#039; do you mean scraping everything in the [body] tags?

I&#039;ve tried using perl&#039;s HTML::TreeBuilder module to strip out content from [p] tags, but that massacres any line breaks etc a bit too much.</description>
		<content:encoded><![CDATA[<p>Nice post <img src='http://www.dellanave.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Would you keep [a] tags and style tags etc in place in the content or would you strip them out &#8211; or by taking &#8216;everything you can&#8217; do you mean scraping everything in the [body] tags?</p>
<p>I&#8217;ve tried using perl&#8217;s HTML::TreeBuilder module to strip out content from [p] tags, but that massacres any line breaks etc a bit too much.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brent</title>
		<link>http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/comment-page-1/#comment-401</link>
		<dc:creator>Brent</dc:creator>
		<pubDate>Wed, 16 Jan 2008 00:28:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/#comment-401</guid>
		<description>Good stuff. I&#039;d say though that Google isn&#039;t even that good at rooting out scraper sites. Especially programming related sites. There&#039;s a ton of forum type programming sites in their index. Maybe the code syntax throws off their ability to detect scraped content.</description>
		<content:encoded><![CDATA[<p>Good stuff. I&#8217;d say though that Google isn&#8217;t even that good at rooting out scraper sites. Especially programming related sites. There&#8217;s a ton of forum type programming sites in their index. Maybe the code syntax throws off their ability to detect scraped content.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: &#187; Von Oldschool Spam und Lamazüchtern &#124; seoFM - der erste deutsche PodCast für SEOs und Online-Marketer</title>
		<link>http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/comment-page-1/#comment-402</link>
		<dc:creator>&#187; Von Oldschool Spam und Lamazüchtern &#124; seoFM - der erste deutsche PodCast für SEOs und Online-Marketer</dc:creator>
		<pubDate>Tue, 15 Jan 2008 19:55:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/#comment-402</guid>
		<description>[...] Tipps f&#252;rs Scraper bauen - Gute Tipps von Dave! [...]</description>
		<content:encoded><![CDATA[<p>[...] Tipps f&uuml;rs Scraper bauen &#8211; Gute Tipps von Dave! [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: david</title>
		<link>http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/comment-page-/#comment-404</link>
		<dc:creator>david</dc:creator>
		<pubDate>Sat, 12 Jan 2008 00:16:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/#comment-404</guid>
		<description>1) Extremely rare.

2) Most of the time its easier.  The java applet or Flash app is running on YOUR computer.  That means its talking to a server and exchanging data.  Sniff that traffic, figure out how to mimic it, and then connect to the server with your application that harvests data.</description>
		<content:encoded><![CDATA[<p>1) Extremely rare.</p>
<p>2) Most of the time its easier.  The java applet or Flash app is running on YOUR computer.  That means its talking to a server and exchanging data.  Sniff that traffic, figure out how to mimic it, and then connect to the server with your application that harvests data.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kristian Schmidt</title>
		<link>http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/comment-page-1/#comment-405</link>
		<dc:creator>Kristian Schmidt</dc:creator>
		<pubDate>Sat, 12 Jan 2008 00:11:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/#comment-405</guid>
		<description>Regarding the last line, what about flash files? Or, god forbid, java applets?

How would you go about scraping those?</description>
		<content:encoded><![CDATA[<p>Regarding the last line, what about flash files? Or, god forbid, java applets?</p>
<p>How would you go about scraping those?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andre</title>
		<link>http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/comment-page-1/#comment-406</link>
		<dc:creator>Andre</dc:creator>
		<pubDate>Fri, 11 Jan 2008 20:20:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.dellanave.com/blog/2008/01/11/tips-on-writing-a-scraper/#comment-406</guid>
		<description>Good info, I&#039;ve recently wrote several in PHP. Good point about grabbing all of the data you can, why waste time going back.</description>
		<content:encoded><![CDATA[<p>Good info, I&#8217;ve recently wrote several in PHP. Good point about grabbing all of the data you can, why waste time going back.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
