请输入您要查询的百科知识:

 

词条 Anansi
释义

概念

Anansi是一个利用网络连接计算机来探索世界网络资源的研究项目。原则上我们希望基于准确性和性能在分布式网络爬虫上做一个评估,经过考虑,BOINC是我们最终的选择。在这样的一个系统中包括准确性,稳定性,适应性和性能等将被测量。

运作

Anansi,客户返回的唯一的URI被抓取与URI的HTTP状态代码,联营公司,indcating它的空房情况。只有计划http本身可以达到公众的URI将被抓取。没有E - mail地址,文字内容或用户,密码将被收集。它是一个非CPU密集型的项目,这是试图在客户端上,以减少CPU负载。机器人排斥和一些网页的内容,如联想信息正在收集和抓取过程中使用BOINC的志愿者,但他们都将返回到Anansi服务器。

Anansi收集的数据(URI)来将用于地图,减少引擎,计算每个URI的重点。当务之急是建立后入度,出度和Anansi服务器创建时间戳。Anansi服务器考虑,重新计划,保持continuely工作的系统

原文:In Anansi, clients returned only URIs been crawled associate with URI's http status code that indcating availability of it. Only URIs with scheme http itself that can be reached by the public will be crawled. No E-mail address, words content or user, password will be collected. It is an non-cpu-intensive project, which is trying to reduce CPU loads on the client. Associative information such as robots exclusion and some page contents are being collected and used by BOINC Volunteers during crawling, but none of them will be returned to Anansi server.

The data(URIs) collected by Anansi will be used by a Map-reduce engine that calculates priorities for each URI. The priority is established upon In-degree, out-degree and timestamp created by Anansi Server. Anansi server take it into consideration for revisit plans, which maintains a continuely working system.

随便看

 

百科全书收录4421916条中文百科知识,基本涵盖了大多数领域的百科知识,是一部内容开放、自由的电子版百科全书。

 

Copyright © 2004-2023 Cnenc.net All Rights Reserved
更新时间:2024/12/24 4:15:52