本页二维码,扫一扫分享到朋友圈
朋友圈

[英] 50 种常用的开源网页抓取工具


阅读179 评论0 赞 22返回首页    go 编程与技术  go 其它


A web crawler (also known in other terms like ants, automatic

indexers, bots, web spiders, web robots or web scutters) is an automated

program, or script, that methodically scans or "crawls" through web

pages to create an index of the data it is set to look for. This process

is called Web crawling or spidering.

There are various uses for web crawlers, but essentially a web

crawler is used to collect/mine data from the Internet. Most search

engines use it as a means of providing up-to-date data and to find

what’s new on the Internet. Analytics companies and market researchers

use web crawlers to determine customer and market trends in a given

geography. In this article, we present top 50 open source web crawlers

available on the web for data mining.

Name

Language

Platform

Heritrix

Java

Linux

Nutch

Java

Cross-platform

Scrapy

Python

Cross-platform

DataparkSearch

C++

Cross-platform

GNU Wget

C

Linux

GRUB

C#, C, Python, Perl

Cross-platform

ht://Dig

C++

Unix

HTTrack

C/C++

Cross-platform

ICDL Crawler

C++

Cross-platform

mnoGoSearch

C

Windows

Norconex HTTP Collector

Java

Cross-platform

Open Source Server

C/C++, Java PHP

Cross-platform

PHP-Crawler

PHP

Cross-platform

YaCy

Java

Cross-platform

WebSPHINX

Java

Cross-platform

WebLech

Java

Cross-platform

Arale

Java

Cross-platform

JSpider

Java

Cross-platform

HyperSpider

Java

Cross-platform

Arachnid

Java

Cross-platform

Spindle

Java

Cross-platform

Spider

Java

Cross-platform

LARM

Java

Cross-platform

Metis

Java

Cross-platform

SimpleSpider

Java

Cross-platform

Grunk

Java

Cross-platform

CAPEK

Java

Cross-platform

Aperture

Java

Cross-platform

Smart and Simple Web Crawler

Java

Cross-platform

Web Harvest

Java

Cross-platform

Aspseek

C++

Linux

Bixo

Java

Cross-platform

crawler4j

Java

Cross-platform

Ebot

Erland

Linux

Hounder

Java

Cross-platform

Hyper Estraier

C/C++

Cross-platform

OpenWebSpider

C#, PHP

Cross-platform

Pavuk

C

Lunix

Sphider

PHP

Cross-platform

Xapian

C++

Cross-platform

Arachnode.net

C#

Windows

Crawwwler

C++

Java

Distributed Web Crawler

C, Java, Python

Cross-platform

iCrawler

Java

Cross-platform

pycreep

Java

Cross-platform

Opese

C++

Linux

Andjing

Java


Ccrawler

C#

Windows

WebEater

Java

Cross-platform

JoBo

Java

Cross-platform


  上一篇:go 在C-Free下运行C语言程序(C-Free5下载)
  下一篇:go 揭开正则表达式的神秘面纱


评论


用QQ登录管理/创建网站 用微博登录管理/创建网站   发布于:05-05