#1  
Old 29th October 2007, 12:13
shinyjoy shinyjoy is offline
Junior Member
 
Join Date: Oct 2007
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Default Web Spiders

Hiii All,

Can anyone help me out in creating web spiders. I have been able to do it site specifically using CURL ie for one site. But i need to integrate several sites. Can anyone help me out??


Regards,

Shiny
Reply With Quote
Sponsored Links
  #2  
Old 29th October 2007, 19:45
edge edge is offline
Moderator
 
Join Date: Dec 2005
Location: The Netherlands
Posts: 2,033
Thanks: 261
Thanked 149 Times in 130 Posts
Default

Maybe this is of some use: http://www.sphider.eu/
__________________
Never execute code written on a Friday or a Monday.
Reply With Quote
  #3  
Old 30th October 2007, 04:40
shinyjoy shinyjoy is offline
Junior Member
 
Join Date: Oct 2007
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Default Thank you but........

Quote:
Originally Posted by edge
Maybe this is of some use: http://www.sphider.eu/
But you see , that spider is to perform only searches. What I wanted was to login into authorized sites using user name and password and retrieve data from that site. Its possible using CURL, i did one site using it, but I have to use several sites , hence to make a generalized one using database and all.



Regards,

Shiny
Reply With Quote
  #4  
Old 30th October 2007, 17:36
falko falko is offline
Super Moderator
 
Join Date: Apr 2005
Location: Lüneburg, Germany
Posts: 41,701
Thanks: 1,900
Thanked 2,732 Times in 2,569 Posts
Default

Should be possible with Curl. wget and Snoopy ( http://sourceforge.net/projects/snoopy/ ) might be other options.
__________________
Falko
--
Download the ISPConfig 3 Manual! | Check out the ISPConfig 3 Billing Module!

FB: http://www.facebook.com/howtoforge

nginx-Webhosting: Timme Hosting | Follow me on:
Reply With Quote
  #5  
Old 15th November 2007, 20:07
leblanc leblanc is offline
Junior Member
 
Join Date: Sep 2007
Posts: 14
Thanks: 0
Thanked 0 Times in 0 Posts
Default how about javascript?

raw html webspiders are a thing of the past...

you need a full blown browser api @ your finger tips.

how about pages that modify the dom after the page has already been loaded?
ajax


example safari books does this.. just to make it difficult on the end user from simply stripping the html...

see if you can use: WebClient Class (System.Net)
you can use c# and c# visual studio express to test it out.

if not u need to build mozilla or hijack ie using dll's the goal is to work with a browser programatically. See mono project for their mozilla client api.

its on my todo list..
Reply With Quote
  #6  
Old 21st February 2008, 23:03
petter5 petter5 is offline
Junior Member
 
Join Date: Mar 2007
Posts: 13
Thanks: 0
Thanked 0 Times in 0 Posts
Default OmniFind

You can download OmniFind for free from

http://omnifind.ibm.yahoo.net/register/form.php
It will meet your requirements.

it's based on nutch

http://lucene.apache.org/nutch/

Omnifind is easy to use and and can be installed by absolutely noobs.

It runs on :


* 32-bit Red Hat Enterprise Linux
Version 5
* 32-bit SUSE Linux Enterprise 10
* 32-bit Windows XP SP2
* 32-bit Windows 2003 Server SP2

/ Petter
Reply With Quote
  #7  
Old 23rd December 2009, 06:21
jonepain jonepain is offline
Junior Member
 
Join Date: Dec 2009
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
 
Default

There are a few ways to spider.The first, which I'll call general spidering,simply grabs a page, and searches it for whatever you're looking for for instance,a search phrase.The second, specific spidering, grabs only a certain portion of a page.This scenario is useful in cases where you might want to grab news headlines from another site.If you want to get fancy,you can build in functionality to ignore links that are within the same site.You have used ASP page.There are a few drawbacks,however. Normally,you can get around this issue by not allowing the ITC to use default values specify the values every time.Another, more serious, problem involves licensing issues.ASPs do not have the ability to invoke the license manager.The license manager checks the key in the actual component, and compares it to the one in the Windows registry.If they're not the same,the component won't work.
__________________
r4i software

Last edited by jonepain; 24th December 2009 at 05:36.
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problems with the virtual email system with postfix, courier,mysql Greg Parker HOWTO-Related Questions 3 3rd January 2007 17:59
get thousands of "Mailsize" mails from own server torusturtle Installation/Configuration 20 24th December 2006 14:51
Virtual Users And Domains With Postfix, Courier And MySQL (mail not relaying) RinoM1 Server Operation 1 12th November 2006 15:31
Default Web Dir fro Installation/Configuration 1 21st October 2006 12:03
Mail using postfix receive but cannot send garfabian Installation/Configuration 17 2nd September 2006 13:55


All times are GMT +2. The time now is 16:02.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.