how about javascript?
raw html webspiders are a thing of the past...
you need a full blown browser api @ your finger tips.
how about pages that modify the dom after the page has already been loaded?
ajax
example safari books does this.. just to make it difficult on the end user from simply stripping the html...
see if you can use: WebClient Class (System.Net)
you can use c# and c# visual studio express to test it out.
if not u need to build mozilla or hijack ie using dll's the goal is to work with a browser programatically. See mono project for their mozilla client api.
its on my todo list..
|