SEO学习¾|?/title><link>http://www.mhhacj.live/</link><description>SEO学习资料</description><generator>RainbowSoft Studio Z-Blog 1.8 Walle Build 100427</generator><language>zh-CN</language><copyright></copyright><pubDate>Mon, 16 May 2016 16:46:13 +0800</pubDate><item><title>搜烦引擎工作原理 搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆä¸‰åQ?/title><author>8943459@qq.com (Recollection)</author><link>http://www.mhhacj.live/seo-rumen/346/</link><pubDate>Fri, 16 Oct 2015 21:49:03 +0800</pubDate><guid>http://www.mhhacj.live/seo-rumen/346/</guid><description><![CDATA[<p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>¾~–者按åQšç«™é•¿æœ‹å‹ä»¬åQŒä»ŠåŽå®šæœŸéƒ½ž®†åœ¨˜q™é‡Œè·Ÿå¤§å®¶åˆ†äº«ä¸€äº›æœ‰å…Ïxœç´¢å¼•æ“Žå·¥ä½œåŽŸç†åŠ¾|‘ç«™˜qè¥ç›¸å…³çš„内容,今天先简单介¾lä¸€ä¸‹å…³äºŽæœç´¢å¼•æ“ŽæŠ“取系¾lŸä¸­æœ‰å…³æŠ“取¾pȝ»ŸåŸºæœ¬æ¡†æž¶ã€æŠ“取中涉及的网¾lœåè®®ã€æŠ“取的基本˜q‡ç¨‹ä¸‰éƒ¨åˆ†ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>互联¾|‘信息爆发式增长åQŒå¦‚何有效的获取òq¶åˆ©ç”¨è¿™äº›ä¿¡æ¯æ˜¯æœçƒ¦å¼•æ“Žå·¥ä½œä¸­çš„首要环节。数据抓取系¾lŸä½œä¸ºæ•´ä¸ªæœç´¢ç³»¾lŸä¸­çš„上游,主要负责互联¾|‘信息的搜集、保存、更新环节,它像蜘蛛一样在¾|‘络间爬来爬去,因此通常会被叫做“spider”</span><span>ã€?/span><span>例如我们常用的几安™€šç”¨æœçƒ¦å¼•æ“Žèœ˜è››è¢«å«åšï¼š</span><span>Baiduspdier</span><span>ã€?/span><span>Googlebot</span><span>ã€?/span><span>Sogou Web Spider</span><span>½{‰ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>Spider</span><span>抓取¾pȝ»Ÿæ˜¯æœç´¢å¼•æ“Žæ•°æ®æ¥æºçš„重要保证åQŒå¦‚果把</span><span>web</span><span>理解ä¸ÞZ¸€ä¸ªæœ‰å‘图åQŒé‚£ä¹?/span><span>spider</span><span>的工作过½E‹å¯ä»¥è®¤ä¸ºæ˜¯å¯¹è¿™ä¸ªæœ‰å‘图的遍历。从一些重要的¿Uå­</span><span><span class="Apple-converted-space"> </span>URL</span><span>开始,通过™åµé¢ä¸Šçš„­‘…链接关¾p»ï¼Œä¸æ–­çš„发现新</span><span>URL</span><span>òq¶æŠ“取,ž®½æœ€å¤§å¯èƒ½æŠ“取到更多的有价值网™åüc€‚对于类似百度这æ ïLš„大型</span><span>spider</span><span>¾pȝ»ŸåQŒå› ä¸ºæ¯æ—?/span><span><span class="Apple-converted-space"> </span></span><span>每刻都存在网™åµè¢«ä¿®æ”¹ã€åˆ é™¤æˆ–出现新的­‘…链接的可能åQŒå› æ­¤ï¼Œ˜q˜è¦å¯?/span><span>spider</span><span>˜q‡åŽ»æŠ“取˜q‡çš„™åµé¢ä¿æŒæ›´æ–°åQŒç»´æŠ¤ä¸€ä¸?/span><span>URL</span><span>库和™åµé¢åº“ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>1</span></strong><strong><span>ã€?/span></strong><strong><span>spider</span></strong><strong><span>抓取¾pȝ»Ÿçš„基本框æž?/span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>如下ä¸?/span><span>spider</span><span>抓取¾pȝ»Ÿçš„基本框架图åQŒå…¶ä¸­åŒ…括链接存储系¾lŸã€é“¾æŽ¥é€‰å–¾pȝ»Ÿã€?/span><span>dns</span><span>解析服务¾pȝ»Ÿã€æŠ“取调度系¾lŸã€ç½‘™åµåˆ†æžç³»¾lŸã€é“¾æŽ¥æå–ç³»¾lŸã€é“¾æŽ¥åˆ†æžç³»¾lŸã€ç½‘™åµå­˜å‚¨ç³»¾lŸã€?/span><img style="border-bottom: 0px; border-left: 0px; max-width: 675px; border-top: 0px; border-right: 0px" src="http://www.mhhacj.live/upload/201510162149378716.JPG" alt="" /></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span><br /></span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>2</span></strong><strong><span>ã€?/span></strong><strong><span>spider</span></strong><strong><span>抓取˜q‡ç¨‹ä¸­æ¶‰åŠçš„¾|‘络协议</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>搜烦引擎与资源提供者之间存在相互依赖的关系åQŒå…¶ä¸­æœç´¢å¼•æ“Žéœ€è¦ç«™é•¿äؓ其提供资源,否则搜烦引擎ž®±æ— æ³•æ»¡­‘³ç”¨æˆäh£€ç´¢éœ€æ±‚;而站镉Kœ€è¦é€šè¿‡æœçƒ¦å¼•æ“Žž®†è‡ªå·Þqš„</span><span><span class="Apple-converted-space"> </span></span><span>内容推广出去获取更多的受众ã€?/span><span>spider</span><span>抓取¾pȝ»Ÿç›´æŽ¥æ¶‰åŠäº’联¾|‘资源提供者的利益åQŒäؓ了ä‹É搜素引擎与站长能够达到双赢,在抓取过½E‹ä¸­åŒæ–¹å¿…须遵守一定的</span><span>规范åQŒä»¥ä¾¿äºŽåŒæ–¹çš„数据处理及å¯ÒŽŽ¥ã€‚è¿™¿Uè¿‡½E‹ä¸­éµå®ˆçš„规范也ž®±æ˜¯æ—¥å¸¸ä¸­æˆ‘们所说的一些网¾lœåè®®ã€‚以下简单列举:</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>http</span><span>协议åQ?/span></strong><span>­‘…文本传输协议,是互联网上应用最为广泛的一¿Uç½‘¾lœåè®®ï¼Œå®¢æˆ·ç«¯å’ŒæœåŠ¡å™¨ç«¯è¯äh±‚和应½{”的标准。客æˆïL«¯ä¸€èˆ¬æƒ…冉|˜¯æŒ‡ç»ˆç«¯ç”¨æˆøP¼ŒæœåŠ¡å™¨ç«¯åÏxŒ‡¾|?/span><span><span class="Apple-converted-space"> </span></span><span>站。终端用户通过‹¹è§ˆå™¨ã€èœ˜è››ç­‰å‘服务器指定端口发é€?/span><span>http</span><span>è¯äh±‚。发é€?/span><span>http</span><span>è¯äh±‚会返回对应的</span><span>httpheader</span><span>信息åQŒå¯ä»¥çœ‹åˆ°åŒ…括是否成功、服åŠ?/span><span><span class="Apple-converted-space"> </span></span><span>器类型、网™å‰|œ€˜q‘更新时间等内容ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>https</span><span>协议åQ?/span></strong><span>实际是加密版</span><span>http</span><span>åQŒä¸€¿Uæ›´åŠ å®‰å…¨çš„数据传输协议ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>UA</span><span>属性:</span></strong><span>UA</span><span>å?/span><span>user-agent</span><span>åQŒæ˜¯</span><span>http</span><span>协议中的一个属性,代表了终端的íw«ä†¾åQŒå‘服务器端表明我是谁来òq²å˜›åQŒè¿›è€ŒæœåŠ¡å™¨ç«¯å¯ä»¥æ ¹æ®ä¸åŒçš„íw«ä†¾æ¥åšå‡ÞZ¸åŒçš„反馈¾l“æžœã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>robots</span><span>协议åQ?/span></strong><span>robots.txt</span><span>是搜索引擎访问一个网站时要访问的½W¬ä¸€ä¸ªæ–‡ä»Óž¼Œç”¨ä»¥æ¥ç¡®å®šå“ªäº›æ˜¯è¢«å…è®¸æŠ“取的哪些是被¼›æ­¢æŠ“取的ã€?/span><span><span class="Apple-converted-space"> </span>robots.txt</span><span>必须攑֜¨¾|‘站根目录下åQŒä¸”æ–‡äšg名要ž®å†™ã€‚详¾l†çš„</span><span>robots.txt</span><span>写法可参è€?/span><span><span class="Apple-converted-space"> </span>http://www.robotstxt.org<span class="Apple-converted-space"> </span></span><span>。百度严格按ç…?/span><span>robots</span><span>协议执行åQŒå¦å¤–,同样支持¾|‘页内容中添加的名äؓ</span><span>robots</span><span>çš?/span><span>meta</span><span>æ ?/span><span><span class="Apple-converted-space"> </span></span><span>½{¾ï¼Œ</span><span>index</span><span>ã€?/span><span>follow</span><span>ã€?/span><span>nofollow</span><span>½{‰æŒ‡ä»¤ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>3</span></strong><strong><span>、spider抓取的基本过½E?/span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>spider</span><span>的基本抓取过½E‹å¯ä»¥ç†è§£äؓ如下的流½E‹å›¾åQ?/span><img style="border-bottom: 0px; border-left: 0px; max-width: 675px; border-top: 0px; border-right: 0px" src="http://bs.baidu.com/zhanzhang/files/006791413183862.JPG" alt="" /></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>如果大家å¯ÒŽœç´¢å¼•æ“ŽæŠ“取还有别的疑问,大家可以到[学院同学汇]</span><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://bbs.zhanzhang.baidu.com/thread-21436-1-1.html">[学习讨论] ã€?搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆäºŒåQ‰ã€?/a><span>讨论帖中发表自己的看法,我们的工作äh员会å…Ïx³¨˜q™é‡Œòq¶ä¸Žå¤§å®¶˜q›è¡ŒæŽ¢è®¨ã€?/span></p><p>Copyright © 2008</p><p><a href="http://www.mhhacj.live/seo-rumen/346/" target="_blank">¾l§ç®‹é˜…读《搜索引擎工作原ç?搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆä¸‰åQ‰ã€‹çš„全文内容...</a></p><p>分类: <a href="http://www.mhhacj.live/seo-rumen/">SEO入门</a> | Tags: <a href="http://www.mhhacj.live/catalog.asp?tags=%E6%90%9C%E7%B4%A2%E5%BC%95%E6%93%8E">搜烦引擎</a>  <a href="http://www.mhhacj.live/catalog.asp?tags=%E5%B7%A5%E4%BD%9C%E5%8E%9F%E7%90%86">工作原理</a>  <a href="http://www.mhhacj.live/catalog.asp?tags=%E6%8A%93%E5%8F%96%E7%B3%BB%E7%BB%9F">抓取¾pȝ»Ÿ</a>   | <a href="http://www.mhhacj.live/seo-rumen/346/#comment" target="_blank">æ·ÕdŠ è¯„论</a>(0)</p><h3>相关文章:</h3><ul><li><a href="http://www.mhhacj.live/seo-rumen/344/">搜烦引擎工作原理 搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆä¸‰åQ?/a> (2015-10-16 21:45:35) </li><li><a href="http://www.mhhacj.live/seo-rumen/345/">搜烦引擎工作原理 搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆå››åQ?/a> (2015-10-16 21:45:35) </li><li><a href="http://www.mhhacj.live/seo-rumen/342/">搜烦引擎工作原理 搜烦引擎‹‚€ç´¢ç³»¾lŸæ¦‚˜qŽÍ¼ˆäºŒï¼‰</a> (2015-10-16 17:29:44) </li><li><a href="http://www.mhhacj.live/seo-rumen/341/">搜烦引擎工作原理 搜烦引擎‹‚€ç´¢ç³»¾lŸæ¦‚˜qŽÍ¼ˆä¸€åQ?/a> (2015-10-16 17:26:48) </li><li><a href="http://www.mhhacj.live/seo-rumen/338/">癑ֺ¦æœçƒ¦å¼•æ“ŽåŸºç¡€çŸ¥è¯† 抓取、过滤、徏立烦引和输出¾l“æžœ</a> (2015-10-16 16:17:36) </li></ul>]]></description><category>SEO入门</category><comments>http://www.mhhacj.live/seo-rumen/346/#comment</comments><wfw:comment>http://www.mhhacj.live/</wfw:comment><wfw:commentRss>http://www.mhhacj.live/feed.asp?cmt=346</wfw:commentRss><trackback:ping>http://www.mhhacj.live/cmd.asp?act=tb&id=346&key=24448d7c</trackback:ping></item><item><title>搜烦引擎工作原理 搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆå››åQ?/title><author>8943459@qq.com (Recollection)</author><link>http://www.mhhacj.live/seo-rumen/345/</link><pubDate>Fri, 16 Oct 2015 21:45:35 +0800</pubDate><guid>http://www.mhhacj.live/seo-rumen/345/</guid><description><![CDATA[<p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>¾~–者按åQšä¹‹å‰ä¸Žå¤§å®¶åˆ†äín了关于搜索引擎抓取系¾lŸä¸­æœ‰å…³æŠ“取¾pȝ»ŸåŸºæœ¬æ¡†æž¶ã€æŠ“取中涉及的网¾lœåè®®ã€æŠ“取的基本˜q‡ç¨‹çš„内容,今天ž®†äºŽå¤§å®¶åˆ†äín搜烦引擎抓取¾pȝ»Ÿ½W¬äºŒéƒ¨åˆ†å†…容—spider抓取</span><span>˜q‡ç¨‹ä¸­çš„½{–ç•¥ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>spider在抓取过½E‹ä¸­é¢å¯¹ç€å¤æ‚的网¾lœçŽ¯å¢ƒï¼Œä¸ÞZº†ä½?/span><span>¾pȝ»Ÿå¯ä»¥æŠ“取到尽可能多的有ä­hå€ÆDµ„源åƈ保持¾pȝ»ŸåŠå®žé™…环境中™åµé¢çš„一致性同时不¾l™ç½‘站体验造成压力åQŒä¼šè®¾è®¡å¤šç§å¤æ‚的抓取策略。以下简单介¾lä¸€ä¸‹æŠ“取过½E‹ä¸­æ¶‰åŠåˆ°çš„主要½{–ç•¥¾cÕdž‹</span><span>åQ?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>1、抓取友好性:抓取压力调配降低对网站的讉K—®åŽ‹åŠ›</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>2、常用抓取返回码½Cºæ„</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>3、多¿Uurl重定向的识别</span><br /> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>4、抓取优先çñ”调配</span><br /> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>5、重复url的过æ»?/span><br /> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>6、暗¾|‘数据的获取</span><br /> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>7、抓取反作弊</span><br /> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>8、提高抓取效率,高效利用带宽</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>1、抓取友好æ€?/span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>互联¾|‘资源庞大的数量¾U§ï¼Œ˜q™å°±è¦æ±‚抓取¾pȝ»Ÿž®½å¯èƒ½çš„高效利用带宽åQŒåœ¨æœ‰é™çš„硬件和带宽资源下尽可能多的抓取到有价å€ÆDµ„源。这ž®±é€ æˆäº†å¦ä¸€ä¸ªé—®é¢˜ï¼Œè€—费被抓¾|‘站的带宽造成讉K—®åŽ‹åŠ›åQŒå¦‚果程度过大将直接影响被抓¾|‘站的正常用戯‚®¿é—®è¡Œä¸ºã€‚因此,在抓取过½E‹ä¸­ž®Þp¦˜q›è¡Œä¸€å®šçš„抓取压力控制åQŒè¾¾åˆ°æ—¢ä¸åª„响网站的正常用户讉K—®åˆèƒ½ž®½é‡å¤šçš„抓取到有价å€ÆDµ„源的目的ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>通常情况下,最基本的是åŸÞZºŽ</span><span>ip</span><span>的压力控制。这是因为如果基于域名,可能存在一 个域名对多个</span><span>ip</span><span>åQˆå¾ˆå¤šå¤§¾|‘ç«™åQ‰æˆ–多个域名对应同一ä¸?/span><span>ip</span><span>åQˆå°¾|‘ç«™å…׃ín</span><span>ip</span><span>åQ‰çš„问题。实际中åQŒå¾€å¾€æ ÒŽ®</span><span>ip</span><span>及域名的多种条äšg˜q›è¡ŒåŽ‹åŠ›è°ƒé…æŽ§åˆ¶ã€‚同æ—Óž¼Œç«™é•¿òq›_°ä¹ŸæŽ¨å‡ÞZº†åŽ‹åŠ›åé¦ˆå·¥å…·åQŒç«™é•¿å¯ä»¥äh工调配对自己¾|‘站的抓取压力,˜q™æ—¶ç™‘Öº¦</span><span>spider</span><span>ž®†ä¼˜å…ˆæŒ‰ç…§ç«™é•¿çš„要求˜q›è¡ŒæŠ“取压力控制ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>对同一个站点的抓取速度控制一般分ä¸ÞZ¸¤¾c»ï¼šå…¶ä¸€åQŒä¸€ŒD‰|—¶é—´å†…的抓取频率;其二åQŒä¸€ŒD‰|—¶é—´å†…的抓取流量。同一站点不同的时间抓取速度也会不同åQŒä¾‹å¦‚夜æ·×ƒh静月黑风高时候抓取的可能ž®×ƒ¼šå¿«ä¸€äº›ï¼Œä¹Ÿè§†å…·ä½“站点¾cÕdž‹è€Œå®šåQŒä¸»è¦æ€æƒ³æ˜¯é”™å¼€æ­£å¸¸ç”¨æˆ·è®‰K—®é«˜å³°åQŒä¸æ–­çš„调整。对于不同站点,也需要不同的抓取速度ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>2、常用抓取返回码½Cºæ„</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>½Ž€å•ä»‹¾lå‡ ¿Uç™¾åº¦æ”¯æŒçš„˜q”回码:</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>1</span><span>åQ?/span><span><span class="Apple-converted-space"> </span></span><span>最常见çš?/span><span>404</span><span>代表</span><span>“NOT FOUND”</span><span>åQŒè®¤ä¸ºç½‘™åµå·²¾lå¤±æ•ˆï¼Œé€šå¸¸ž®†åœ¨åº“中删除åQŒåŒæ—¶çŸ­æœŸå†…如果</span><span>spider</span><span>再次发现˜q™æ¡</span><span>url</span><span>也不会抓取;</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>2</span><span>åQ?/span><span><span class="Apple-converted-space"> </span>503</span><span>代表</span><span>“Service Unavailable”</span><span>åQŒè®¤ä¸ºç½‘™åµäÍ时不可访问,通常¾|‘站临时关闭åQŒå¸¦å®½æœ‰é™ç­‰ä¼šäñ”生这¿Uæƒ…å†üc€‚对于网™åµè¿”å›?/span><span>503</span><span>状态码åQŒç™¾åº?/span><span>spider</span><span>不会把这æ?/span><span>url</span><span>直接删除åQŒåŒæ—¶çŸ­æœŸå†…ž®†ä¼šåå¤è®‰K—®å‡ æ¬¡åQŒå¦‚果网™åµå·²æ¢å¤åQŒåˆ™æ­£å¸¸æŠ“取åQ›å¦‚æžœç‘ô¾l­è¿”å›?/span><span>503</span><span>åQŒé‚£ä¹ˆè¿™æ?/span><span>url</span><span>仍会被认为是失效链接åQŒä»Žåº“中删除ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>3</span><span>åQ?/span><span><span class="Apple-converted-space"> </span>403</span><span>代表</span><span>“Forbidden”</span><span>åQŒè®¤ä¸ºç½‘™å늛®å‰ç¦æ­¢è®¿é—®ã€‚如果是æ–?/span><span>url</span><span>åQ?/span><span>spider</span><span>暂时不抓取,短期内同样会反复讉K—®å‡ æ¬¡åQ›å¦‚果是已收å½?/span><span>url</span><span>åQŒä¸ä¼šç›´æŽ¥åˆ é™¤ï¼ŒçŸ­æœŸå†…同样反复访问几‹Æ¡ã€‚如果网™å‰|­£å¸¸è®¿é—®ï¼Œåˆ™æ­£å¸¸æŠ“取;如果仍然¼›æ­¢è®‰K—®åQŒé‚£ä¹ˆè¿™æ?/span><span>url</span><span>也会被认为是失效链接åQŒä»Žåº“中删除ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>4</span><span>åQ?/span><span>301<span class="Apple-converted-space"> </span></span><span>代表æ˜?/span><span>“Moved Permanently”</span><span>åQŒè®¤ä¸ºç½‘™åµé‡å®šå‘è‡Ïx–°</span><span>url</span><span>。当遇到站点˜qç§»ã€åŸŸåæ›´æ¢ã€ç«™ç‚ÒŽ”¹ç‰ˆçš„情况æ—Óž¼Œæˆ‘们推荐使用</span><span>301</span><span>˜q”回码,同时使用站长òq›_°¾|‘站改版工具åQŒä»¥å‡å°‘改版对网站流量造成的损失ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>3、多¿U?/span><span>url</span></strong><strong><span>重定向的识别</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>互联¾|‘中一部分¾|‘页因äؓ各种各样的原因存åœ?/span><span>url</span><span>重定向状态,ä¸ÞZº†å¯¹è¿™éƒ¨åˆ†èµ„源正常抓取åQŒå°±è¦æ±‚</span><span>spider</span><span>å¯?/span><span>url</span><span>重定向进行识别判断,同时防止作弊行äؓ。重定向可分ä¸ÞZ¸‰¾c»ï¼š</span><span>http 30x</span><span>重定向ã€?/span><span>meta refresh</span><span>重定向和</span><span>js</span><span>重定向。另外,癑ֺ¦ä¹Ÿæ”¯æŒ?/span><span>Canonical</span><span>标签åQŒåœ¨æ•ˆæžœä¸Šå¯ä»¥è®¤ä¸ÞZ¹Ÿæ˜¯ä¸€¿Ué—´æŽ¥çš„重定向ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>4、抓取优先çñ”调配</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>ç”׃ºŽäº’联¾|‘资源规模的巨大以及˜q…速的变化åQŒå¯¹äºŽæœç´¢å¼•æ“Žæ¥è¯´å…¨éƒ¨æŠ“取到òq¶åˆç†çš„更新保持一致性几乎是不可能的事情åQŒå› æ­¤è¿™ž®Þp¦æ±‚抓取系¾lŸè®¾è®¡ä¸€å¥—合理的抓取优先¾U§è°ƒé…ç­–略。主要包括:深度优先遍历½{–略、宽度优先遍历策略ã€?/span><span>pr</span><span>优先½{–略、反铄¡­–略、社会化分äín指导½{–ç•¥½{‰ç­‰ã€‚每个策略各有优劣,在实际情况中往往是多¿Uç­–略结合ä‹É用以辑ֈ°æœ€ä¼˜çš„抓取效果ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>5、重å¤?/span><span>url</span></strong><strong><span>的过æ»?/span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>spider</span><span>在抓取过½E‹ä¸­éœ€è¦åˆ¤æ–­ä¸€ä¸ªé¡µé¢æ˜¯å¦å·²¾læŠ“取过了,如果˜q˜æ²¡æœ‰æŠ“取再˜q›è¡ŒæŠ“取¾|‘页的行为åƈ攑֜¨å·²æŠ“取网址集合中。判断是否已¾læŠ“取其中涉及到最核心的是快速查扑Öƈå¯ÒŽ¯”åQŒåŒæ—¶æ¶‰åŠåˆ°</span><span>url</span><span>归一化识别,例如一ä¸?/span><span>url</span><span>中包含大量无效参数而实际是同一个页面,˜q™å°†è§†äؓ同一ä¸?/span><span>url</span><span>来对待ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>6、暗¾|‘数据的获取</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>互联¾|‘中存在着大量的搜索引擎暂时无法抓取到的数据,被称为暗¾|‘数据。一斚w¢åQŒå¾ˆå¤šç½‘站的大量数据是存在于¾|‘络数据库中åQ?/span><span>spider</span><span><span>难以采用抓取¾|‘页的方式获得完整内容;另一斚w¢åQŒç”±äºŽç½‘¾lœçŽ¯å¢ƒã€ç½‘站本íw«ä¸½W¦åˆè§„范、孤岛等½{‰é—®é¢˜ï¼Œä¹Ÿä¼šé€ æˆæœçƒ¦å¼•æ“Žæ— æ³•æŠ“取。目前来è¯ß_¼Œå¯¹äºŽæš—网数据的获取主要思èµ\仍然是通过开攑Öã^台采用数据提交的方式来解冻I¼Œä¾‹å¦‚“癑ֺ¦ç«™é•¿òq›_°”“</span><span>癑ֺ¦å¼€æ”‘Öã^å?/span><span>”½{‰ç­‰ã€?/span></span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>7、抓取反作弊</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>spider</span><span>在抓取过½E‹ä¸­å¾€å¾€ä¼šé‡åˆ°æ‰€è°“抓取黑‹zžæˆ–者面临大量低质量™åµé¢çš„困扎ͼŒ˜q™å°±è¦æ±‚抓取¾pȝ»Ÿä¸­åŒæ ·éœ€è¦è®¾è®¡ä¸€å¥—完善的抓取反作弊系¾lŸã€‚例如分æž?/span><span>url</span><span>特征、分析页面大ž®åŠå†…容、分析站点规模对应抓取规模等½{‰ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px">如果大家å¯ÒŽœç´¢å¼•æ“ŽæŠ“取还有别的疑问,大家可以到[学院同学汇]<a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://bbs.zhanzhang.baidu.com/thread-21436-1-1.html">[学习讨论] ã€?搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆäºŒåQ‰ã€?/a>讨论帖中发表自己的看法,我们的工作äh员会å…Ïx³¨˜q™é‡Œòq¶ä¸Žå¤§å®¶˜q›è¡ŒæŽ¢è®¨ã€?/p><p>Copyright © 2008</p><p><a href="http://www.mhhacj.live/seo-rumen/345/" target="_blank">¾l§ç®‹é˜…读《搜索引擎工作原ç?搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆå››åQ‰ã€‹çš„全文内容...</a></p><p>分类: <a href="http://www.mhhacj.live/seo-rumen/">SEO入门</a> | Tags: <a href="http://www.mhhacj.live/catalog.asp?tags=%E6%90%9C%E7%B4%A2%E5%BC%95%E6%93%8E">搜烦引擎</a>  <a href="http://www.mhhacj.live/catalog.asp?tags=%E5%B7%A5%E4%BD%9C%E5%8E%9F%E7%90%86">工作原理</a>  <a href="http://www.mhhacj.live/catalog.asp?tags=%E6%8A%93%E5%8F%96%E7%B3%BB%E7%BB%9F">抓取¾pȝ»Ÿ</a>   | <a href="http://www.mhhacj.live/seo-rumen/345/#comment" target="_blank">æ·ÕdŠ è¯„论</a>(0)</p><h3>相关文章:</h3><ul><li><a href="http://www.mhhacj.live/seo-rumen/346/">搜烦引擎工作原理 搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆä¸‰åQ?/a> (2015-10-16 21:49:3) </li><li><a href="http://www.mhhacj.live/seo-rumen/344/">搜烦引擎工作原理 搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆä¸‰åQ?/a> (2015-10-16 21:45:35) </li><li><a href="http://www.mhhacj.live/seo-rumen/342/">搜烦引擎工作原理 搜烦引擎‹‚€ç´¢ç³»¾lŸæ¦‚˜qŽÍ¼ˆäºŒï¼‰</a> (2015-10-16 17:29:44) </li><li><a href="http://www.mhhacj.live/seo-rumen/341/">搜烦引擎工作原理 搜烦引擎‹‚€ç´¢ç³»¾lŸæ¦‚˜qŽÍ¼ˆä¸€åQ?/a> (2015-10-16 17:26:48) </li><li><a href="http://www.mhhacj.live/seo-rumen/338/">癑ֺ¦æœçƒ¦å¼•æ“ŽåŸºç¡€çŸ¥è¯† 抓取、过滤、徏立烦引和输出¾l“æžœ</a> (2015-10-16 16:17:36) </li></ul>]]></description><category>SEO入门</category><comments>http://www.mhhacj.live/seo-rumen/345/#comment</comments><wfw:comment>http://www.mhhacj.live/</wfw:comment><wfw:commentRss>http://www.mhhacj.live/feed.asp?cmt=345</wfw:commentRss><trackback:ping>http://www.mhhacj.live/cmd.asp?act=tb&id=345&key=2cb772ff</trackback:ping></item><item><title>搜烦引擎工作原理 搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆä¸‰åQ?/title><author>8943459@qq.com (Recollection)</author><link>http://www.mhhacj.live/seo-rumen/344/</link><pubDate>Fri, 16 Oct 2015 21:45:35 +0800</pubDate><guid>http://www.mhhacj.live/seo-rumen/344/</guid><description><![CDATA[<p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>¾~–者按åQšä¹‹å‰ä¸Žå¤§å®¶åˆ†äín了关于搜索引擎抓取系¾lŸä¸­æœ‰å…³æŠ“取¾pȝ»ŸåŸºæœ¬æ¡†æž¶ã€æŠ“取中涉及的网¾lœåè®®ã€æŠ“取的基本˜q‡ç¨‹çš„内容,今天ž®†äºŽå¤§å®¶åˆ†äín搜烦引擎抓取¾pȝ»Ÿ½W¬äºŒéƒ¨åˆ†å†…容—spider抓取</span><span>˜q‡ç¨‹ä¸­çš„½{–ç•¥ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>spider在抓取过½E‹ä¸­é¢å¯¹ç€å¤æ‚的网¾lœçŽ¯å¢ƒï¼Œä¸ÞZº†ä½?/span><span>¾pȝ»Ÿå¯ä»¥æŠ“取到尽可能多的有ä­hå€ÆDµ„源åƈ保持¾pȝ»ŸåŠå®žé™…环境中™åµé¢çš„一致性同时不¾l™ç½‘站体验造成压力åQŒä¼šè®¾è®¡å¤šç§å¤æ‚的抓取策略。以下简单介¾lä¸€ä¸‹æŠ“取过½E‹ä¸­æ¶‰åŠåˆ°çš„主要½{–ç•¥¾cÕdž‹</span><span>åQ?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>1、抓取友好性:抓取压力调配降低对网站的讉K—®åŽ‹åŠ›</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>2、常用抓取返回码½Cºæ„</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>3、多¿Uurl重定向的识别</span><br /> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>4、抓取优先çñ”调配</span><br /> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>5、重复url的过æ»?/span><br /> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>6、暗¾|‘数据的获取</span><br /> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>7、抓取反作弊</span><br /> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>8、提高抓取效率,高效利用带宽</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>1、抓取友好æ€?/span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>互联¾|‘资源庞大的数量¾U§ï¼Œ˜q™å°±è¦æ±‚抓取¾pȝ»Ÿž®½å¯èƒ½çš„高效利用带宽åQŒåœ¨æœ‰é™çš„硬件和带宽资源下尽可能多的抓取到有价å€ÆDµ„源。这ž®±é€ æˆäº†å¦ä¸€ä¸ªé—®é¢˜ï¼Œè€—费被抓¾|‘站的带宽造成讉K—®åŽ‹åŠ›åQŒå¦‚果程度过大将直接影响被抓¾|‘站的正常用戯‚®¿é—®è¡Œä¸ºã€‚因此,在抓取过½E‹ä¸­ž®Þp¦˜q›è¡Œä¸€å®šçš„抓取压力控制åQŒè¾¾åˆ°æ—¢ä¸åª„响网站的正常用户讉K—®åˆèƒ½ž®½é‡å¤šçš„抓取到有价å€ÆDµ„源的目的ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>通常情况下,最基本的是åŸÞZºŽ</span><span>ip</span><span>的压力控制。这是因为如果基于域名,可能存在一 个域名对多个</span><span>ip</span><span>åQˆå¾ˆå¤šå¤§¾|‘ç«™åQ‰æˆ–多个域名对应同一ä¸?/span><span>ip</span><span>åQˆå°¾|‘ç«™å…׃ín</span><span>ip</span><span>åQ‰çš„问题。实际中åQŒå¾€å¾€æ ÒŽ®</span><span>ip</span><span>及域名的多种条äšg˜q›è¡ŒåŽ‹åŠ›è°ƒé…æŽ§åˆ¶ã€‚同æ—Óž¼Œç«™é•¿òq›_°ä¹ŸæŽ¨å‡ÞZº†åŽ‹åŠ›åé¦ˆå·¥å…·åQŒç«™é•¿å¯ä»¥äh工调配对自己¾|‘站的抓取压力,˜q™æ—¶ç™‘Öº¦</span><span>spider</span><span>ž®†ä¼˜å…ˆæŒ‰ç…§ç«™é•¿çš„要求˜q›è¡ŒæŠ“取压力控制ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>对同一个站点的抓取速度控制一般分ä¸ÞZ¸¤¾c»ï¼šå…¶ä¸€åQŒä¸€ŒD‰|—¶é—´å†…的抓取频率;其二åQŒä¸€ŒD‰|—¶é—´å†…的抓取流量。同一站点不同的时间抓取速度也会不同åQŒä¾‹å¦‚夜æ·×ƒh静月黑风高时候抓取的可能ž®×ƒ¼šå¿«ä¸€äº›ï¼Œä¹Ÿè§†å…·ä½“站点¾cÕdž‹è€Œå®šåQŒä¸»è¦æ€æƒ³æ˜¯é”™å¼€æ­£å¸¸ç”¨æˆ·è®‰K—®é«˜å³°åQŒä¸æ–­çš„调整。对于不同站点,也需要不同的抓取速度ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>2、常用抓取返回码½Cºæ„</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>½Ž€å•ä»‹¾lå‡ ¿Uç™¾åº¦æ”¯æŒçš„˜q”回码:</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>1</span><span>åQ?/span><span><span class="Apple-converted-space"> </span></span><span>最常见çš?/span><span>404</span><span>代表</span><span>“NOT FOUND”</span><span>åQŒè®¤ä¸ºç½‘™åµå·²¾lå¤±æ•ˆï¼Œé€šå¸¸ž®†åœ¨åº“中删除åQŒåŒæ—¶çŸ­æœŸå†…如果</span><span>spider</span><span>再次发现˜q™æ¡</span><span>url</span><span>也不会抓取;</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>2</span><span>åQ?/span><span><span class="Apple-converted-space"> </span>503</span><span>代表</span><span>“Service Unavailable”</span><span>åQŒè®¤ä¸ºç½‘™åµäÍ时不可访问,通常¾|‘站临时关闭åQŒå¸¦å®½æœ‰é™ç­‰ä¼šäñ”生这¿Uæƒ…å†üc€‚对于网™åµè¿”å›?/span><span>503</span><span>状态码åQŒç™¾åº?/span><span>spider</span><span>不会把这æ?/span><span>url</span><span>直接删除åQŒåŒæ—¶çŸ­æœŸå†…ž®†ä¼šåå¤è®‰K—®å‡ æ¬¡åQŒå¦‚果网™åµå·²æ¢å¤åQŒåˆ™æ­£å¸¸æŠ“取åQ›å¦‚æžœç‘ô¾l­è¿”å›?/span><span>503</span><span>åQŒé‚£ä¹ˆè¿™æ?/span><span>url</span><span>仍会被认为是失效链接åQŒä»Žåº“中删除ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>3</span><span>åQ?/span><span><span class="Apple-converted-space"> </span>403</span><span>代表</span><span>“Forbidden”</span><span>åQŒè®¤ä¸ºç½‘™å늛®å‰ç¦æ­¢è®¿é—®ã€‚如果是æ–?/span><span>url</span><span>åQ?/span><span>spider</span><span>暂时不抓取,短期内同样会反复讉K—®å‡ æ¬¡åQ›å¦‚果是已收å½?/span><span>url</span><span>åQŒä¸ä¼šç›´æŽ¥åˆ é™¤ï¼ŒçŸ­æœŸå†…同样反复访问几‹Æ¡ã€‚如果网™å‰|­£å¸¸è®¿é—®ï¼Œåˆ™æ­£å¸¸æŠ“取;如果仍然¼›æ­¢è®‰K—®åQŒé‚£ä¹ˆè¿™æ?/span><span>url</span><span>也会被认为是失效链接åQŒä»Žåº“中删除ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>4</span><span>åQ?/span><span>301<span class="Apple-converted-space"> </span></span><span>代表æ˜?/span><span>“Moved Permanently”</span><span>åQŒè®¤ä¸ºç½‘™åµé‡å®šå‘è‡Ïx–°</span><span>url</span><span>。当遇到站点˜qç§»ã€åŸŸåæ›´æ¢ã€ç«™ç‚ÒŽ”¹ç‰ˆçš„情况æ—Óž¼Œæˆ‘们推荐使用</span><span>301</span><span>˜q”回码,同时使用站长òq›_°¾|‘站改版工具åQŒä»¥å‡å°‘改版对网站流量造成的损失ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>3、多¿U?/span><span>url</span></strong><strong><span>重定向的识别</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>互联¾|‘中一部分¾|‘页因äؓ各种各样的原因存åœ?/span><span>url</span><span>重定向状态,ä¸ÞZº†å¯¹è¿™éƒ¨åˆ†èµ„源正常抓取åQŒå°±è¦æ±‚</span><span>spider</span><span>å¯?/span><span>url</span><span>重定向进行识别判断,同时防止作弊行äؓ。重定向可分ä¸ÞZ¸‰¾c»ï¼š</span><span>http 30x</span><span>重定向ã€?/span><span>meta refresh</span><span>重定向和</span><span>js</span><span>重定向。另外,癑ֺ¦ä¹Ÿæ”¯æŒ?/span><span>Canonical</span><span>标签åQŒåœ¨æ•ˆæžœä¸Šå¯ä»¥è®¤ä¸ÞZ¹Ÿæ˜¯ä¸€¿Ué—´æŽ¥çš„重定向ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>4、抓取优先çñ”调配</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>ç”׃ºŽäº’联¾|‘资源规模的巨大以及˜q…速的变化åQŒå¯¹äºŽæœç´¢å¼•æ“Žæ¥è¯´å…¨éƒ¨æŠ“取到òq¶åˆç†çš„更新保持一致性几乎是不可能的事情åQŒå› æ­¤è¿™ž®Þp¦æ±‚抓取系¾lŸè®¾è®¡ä¸€å¥—合理的抓取优先¾U§è°ƒé…ç­–略。主要包括:深度优先遍历½{–略、宽度优先遍历策略ã€?/span><span>pr</span><span>优先½{–略、反铄¡­–略、社会化分äín指导½{–ç•¥½{‰ç­‰ã€‚每个策略各有优劣,在实际情况中往往是多¿Uç­–略结合ä‹É用以辑ֈ°æœ€ä¼˜çš„抓取效果ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>5、重å¤?/span><span>url</span></strong><strong><span>的过æ»?/span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>spider</span><span>在抓取过½E‹ä¸­éœ€è¦åˆ¤æ–­ä¸€ä¸ªé¡µé¢æ˜¯å¦å·²¾læŠ“取过了,如果˜q˜æ²¡æœ‰æŠ“取再˜q›è¡ŒæŠ“取¾|‘页的行为åƈ攑֜¨å·²æŠ“取网址集合中。判断是否已¾læŠ“取其中涉及到最核心的是快速查扑Öƈå¯ÒŽ¯”åQŒåŒæ—¶æ¶‰åŠåˆ°</span><span>url</span><span>归一化识别,例如一ä¸?/span><span>url</span><span>中包含大量无效参数而实际是同一个页面,˜q™å°†è§†äؓ同一ä¸?/span><span>url</span><span>来对待ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>6、暗¾|‘数据的获取</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>互联¾|‘中存在着大量的搜索引擎暂时无法抓取到的数据,被称为暗¾|‘数据。一斚w¢åQŒå¾ˆå¤šç½‘站的大量数据是存在于¾|‘络数据库中åQ?/span><span>spider</span><span><span>难以采用抓取¾|‘页的方式获得完整内容;另一斚w¢åQŒç”±äºŽç½‘¾lœçŽ¯å¢ƒã€ç½‘站本íw«ä¸½W¦åˆè§„范、孤岛等½{‰é—®é¢˜ï¼Œä¹Ÿä¼šé€ æˆæœçƒ¦å¼•æ“Žæ— æ³•æŠ“取。目前来è¯ß_¼Œå¯¹äºŽæš—网数据的获取主要思èµ\仍然是通过开攑Öã^台采用数据提交的方式来解冻I¼Œä¾‹å¦‚“癑ֺ¦ç«™é•¿òq›_°”“</span><span>癑ֺ¦å¼€æ”‘Öã^å?/span><span>”½{‰ç­‰ã€?/span></span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>7、抓取反作弊</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>spider</span><span>在抓取过½E‹ä¸­å¾€å¾€ä¼šé‡åˆ°æ‰€è°“抓取黑‹zžæˆ–者面临大量低质量™åµé¢çš„困扎ͼŒ˜q™å°±è¦æ±‚抓取¾pȝ»Ÿä¸­åŒæ ·éœ€è¦è®¾è®¡ä¸€å¥—完善的抓取反作弊系¾lŸã€‚例如分æž?/span><span>url</span><span>特征、分析页面大ž®åŠå†…容、分析站点规模对应抓取规模等½{‰ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px">如果大家å¯ÒŽœç´¢å¼•æ“ŽæŠ“取还有别的疑问,大家可以到[学院同学汇]<a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://bbs.zhanzhang.baidu.com/thread-21436-1-1.html">[学习讨论] ã€?搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆäºŒåQ‰ã€?/a>讨论帖中发表自己的看法,我们的工作äh员会å…Ïx³¨˜q™é‡Œòq¶ä¸Žå¤§å®¶˜q›è¡ŒæŽ¢è®¨ã€?/p><p>Copyright © 2008</p><p><a href="http://www.mhhacj.live/seo-rumen/344/" target="_blank">¾l§ç®‹é˜…读《搜索引擎工作原ç?搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆä¸‰åQ‰ã€‹çš„全文内容...</a></p><p>分类: <a href="http://www.mhhacj.live/seo-rumen/">SEO入门</a> | Tags: <a href="http://www.mhhacj.live/catalog.asp?tags=%E6%90%9C%E7%B4%A2%E5%BC%95%E6%93%8E">搜烦引擎</a>  <a href="http://www.mhhacj.live/catalog.asp?tags=%E5%B7%A5%E4%BD%9C%E5%8E%9F%E7%90%86">工作原理</a>  <a href="http://www.mhhacj.live/catalog.asp?tags=%E6%8A%93%E5%8F%96%E7%B3%BB%E7%BB%9F">抓取¾pȝ»Ÿ</a>   | <a href="http://www.mhhacj.live/seo-rumen/344/#comment" target="_blank">æ·ÕdŠ è¯„论</a>(0)</p><h3>相关文章:</h3><ul><li><a href="http://www.mhhacj.live/seo-rumen/346/">搜烦引擎工作原理 搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆä¸‰åQ?/a> (2015-10-16 21:49:3) </li><li><a href="http://www.mhhacj.live/seo-rumen/345/">搜烦引擎工作原理 搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆå››åQ?/a> (2015-10-16 21:45:35) </li><li><a href="http://www.mhhacj.live/seo-rumen/342/">搜烦引擎工作原理 搜烦引擎‹‚€ç´¢ç³»¾lŸæ¦‚˜qŽÍ¼ˆäºŒï¼‰</a> (2015-10-16 17:29:44) </li><li><a href="http://www.mhhacj.live/seo-rumen/341/">搜烦引擎工作原理 搜烦引擎‹‚€ç´¢ç³»¾lŸæ¦‚˜qŽÍ¼ˆä¸€åQ?/a> (2015-10-16 17:26:48) </li><li><a href="http://www.mhhacj.live/seo-rumen/338/">癑ֺ¦æœçƒ¦å¼•æ“ŽåŸºç¡€çŸ¥è¯† 抓取、过滤、徏立烦引和输出¾l“æžœ</a> (2015-10-16 16:17:36) </li></ul>]]></description><category>SEO入门</category><comments>http://www.mhhacj.live/seo-rumen/344/#comment</comments><wfw:comment>http://www.mhhacj.live/</wfw:comment><wfw:commentRss>http://www.mhhacj.live/feed.asp?cmt=344</wfw:commentRss><trackback:ping>http://www.mhhacj.live/cmd.asp?act=tb&id=344&key=2a796cf2</trackback:ping></item><item><title>如何建立½W¦åˆæœçƒ¦å¼•æ“ŽæŠ“取¾pȝ»Ÿä¹ æƒ¯çš„网ç«?/title><author>8943459@qq.com (Recollection)</author><link>http://www.mhhacj.live/seo-rumen/343/</link><pubDate>Fri, 16 Oct 2015 21:26:22 +0800</pubDate><guid>http://www.mhhacj.live/seo-rumen/343/</guid><description><![CDATA[<p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>¾~–者按åQ?/span></strong><span>前两周简要地¾l™å¤§å®¶ä»‹¾läº†æœçƒ¦æŠ“取¾pȝ»Ÿå·¥ä½œåŽŸç†åQŒæ ¹æ®è¯¥å·¥ä½œåŽŸç†ä»Šå¤©½Ž€è¦ä»‹¾lä¸€ä¸‹å¦‚何徏立网站是½W¦åˆæœçƒ¦å¼•æ“ŽæŠ“取¾pȝ»Ÿä¹ æƒ¯çš„ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>1、简单明了的¾|‘ç«™¾l“æž„</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>Spider</span><span>抓取相当于对</span><span>web</span><span><span>˜q™ä¸ªæœ‰å‘图进行遍历,那么一个简单明了结构层‹Æ¡åˆ†æ˜Žçš„¾|‘站肯定是它所喜欢çš?/span><span>åQŒåƈž®½é‡ä¿è¯</span></span><span>spider</span><span>的可è¯ÀL€§ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰æ ‘型结æž?/span><span>最优的¾l“æž„å?ldquo;首页—频道—详情™å?rdquo;åQ?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰æ‰òq?/span><span>首页到详情页的层‹Æ¡å°½é‡å°‘åQŒæ—¢å¯ÒŽŠ“取友好又可以很好的传递权重ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰ç½‘çŠ?/span><span>保证每个™åµé¢éƒ½è‡³ž®‘有一个文本链接指向,可以使网站尽可能全面的被抓取收录åQŒå†…é“‘Ö¾è®‘֐Œæ ·å¯¹æŽ’序能够产生¿U¯æžä½œç”¨ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰å¯¼èˆ?/span><span>为每个页面加一个导航方便用æˆïLŸ¥æ™“所在èµ\径ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰å­åŸŸä¸Žç›®å½•çš„选择</span><span>ç›æ€¿¡æœ‰å¤§æ‰¹çš„站长å¯ÒŽ­¤æœ‰ç–‘问,在我们看来,当内容较ž®‘åƈ且内容相兛_º¦è¾ƒé«˜æ—¶å¾è®®ä»¥ç›®å½•å½¢å¼æ¥å®žçŽŽÍ¼Œæœ‰åˆ©äºŽæƒé‡çš„¾l§æ‰¿ä¸Žæ”¶æ•›ï¼›å½“内定w‡è¾ƒå¤šòq¶ä¸”与主站相兛_º¦ç•¥å·®æ—¶å¾è®®å†ä»¥å­åŸŸçš„形式来实现ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>2、简‹zç¾Žè§‚çš„</span></strong><strong><span>url</span></strong><strong><span>规则</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰å”¯ä¸€æ€?/span><span>¾|‘站中同一内容™åµåªä¸Žå”¯ä¸€ä¸€ä¸?/span><span>url</span><span>相对应,˜q‡å¤šå½¢å¼çš?/span><span>url</span><span>ž®†åˆ†æ•£è¯¥™åµé¢çš„权重,òq¶ä¸”目标</span><span>url</span><span>在系¾lŸä¸­æœ‰è¢«æ»¤é‡çš„风险;</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰ç®€‹zæ€?/span><span>动态参数尽量少åQŒä¿è¯?/span><span>url</span><span>ž®½é‡çŸ­ï¼›</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰ç¾Žè§‚æ€?/span><span>使得用户及机器能够通过</span><span>url</span><span>卛_¯åˆ¤æ–­å‡ºé¡µé¢å†…容的ä¸ÀL—¨åQ?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>我们推荐如下形式çš?/span><span>url</span><span>åQ?/span><span>url</span><span>ž®½é‡çŸ­ä¸”易读使得用户能够快速理解,例如使用拼音作äؓ目录名称åQ›åŒä¸€å†…容在系¾lŸä¸­åªäñ”生唯一çš?/span><span>url</span><span>与之对应åQŒåŽ»æŽ‰æ— æ„ä¹‰çš„参敎ͼ›å¦‚果无法保证</span><span>url</span><span>的唯一性,ž®½é‡ä½¿ä¸åŒåŞ式的</span><span>url301</span><span>到目æ ?/span><span>url</span><span>åQ›é˜²æ­¢ç”¨æˆ¯‚¾“错的备用域名</span><span>301</span><span>至主域名ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>3、其他注意事™å?/span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰ä¸è¦å¿½ç•¥å€’霉çš?/span><span>robots</span><span>文äšgåQŒé»˜è®¤æƒ…况下部分¾pȝ»Ÿ</span><span>robots</span><span>是封¼›æœç´¢å¼•æ“ŽæŠ“取的åQŒå½“¾|‘站建立后及时查看åƈ书写合适的</span><span>robots</span><span>文äšgåQŒç½‘站日常维护过½E‹ä¸­ä¹Ÿè¦æ³¨æ„å®šæœŸ‹‚€æŸ¥ï¼›</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰å¾ç«‹ç½‘ç«?/span><span>sitemap</span><span>文äšg、死链文ä»Óž¼Œòq¶åŠæ—‰™€šè¿‡ç™‘Öº¦ç«™é•¿òq›_°˜q›è¡Œæäº¤åQ?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰éƒ¨åˆ†ç”µå•†ç½‘站存在地域蟩转问题,有货无货å»ø™®®¾lŸä¸€åšæˆä¸€ä¸ªé¡µé¢ï¼Œåœ¨é¡µé¢ä¸­æ ‡è¯†æœ‰æ— è´§å³å¯ï¼Œä¸è¦æ­¤åœ°åŒºæ— è´§å³˜q”回一个无效页面,ç”׃ºŽ</span><span>spider</span><span>出口的有限性将造成正常™åµé¢æ— æ³•æ”¶å½•ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰åˆç†åˆ©ç”¨ç«™é•¿åã^台提供的</span><span>robots</span><span>ã€?/span><span>sitemap</span><span>、烦引量、抓取压力、死链提交、网站改版等工具ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px">如果大家å¯ÒŽœç´¢æŠ“取还有别的疑问,大家可以到[学院同学汇]<a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://bbs.zhanzhang.baidu.com/thread-21438-1-1.html">[学习讨论] 《徏立符合搜索抓取习惯的¾|‘ç«™ã€?/a>讨论帖中发表自己的看法,我们的工作äh员会å…Ïx³¨˜q™é‡Œòq¶ä¸Žå¤§å®¶˜q›è¡ŒæŽ¢è®¨ã€?/p><p>Copyright © 2008</p><p><a href="http://www.mhhacj.live/seo-rumen/343/" target="_blank">¾l§ç®‹é˜…读《如何徏立符合搜索引擎抓取系¾lŸä¹ æƒ¯çš„¾|‘站》的全文内容...</a></p><p>分类: <a href="http://www.mhhacj.live/seo-rumen/">SEO入门</a> | Tags: <a href="http://www.mhhacj.live/catalog.asp?tags=%E6%90%9C%E7%B4%A2%E6%8A%93%E5%8F%96">搜烦抓取</a>   | <a href="http://www.mhhacj.live/seo-rumen/343/#comment" target="_blank">æ·ÕdŠ è¯„论</a>(0)</p><p><a href="http://www.mhhacj.live/seo-rumen/343/#comment" target="_blank">˜q˜æ²¡æœ‰ç›¸å…Ïx–‡ç« ï¼Œæ‚¨æ¥è¯´ä¸¤å¥ï¼Ÿ</a></p>]]></description><category>SEO入门</category><comments>http://www.mhhacj.live/seo-rumen/343/#comment</comments><wfw:comment>http://www.mhhacj.live/</wfw:comment><wfw:commentRss>http://www.mhhacj.live/feed.asp?cmt=343</wfw:commentRss><trackback:ping>http://www.mhhacj.live/cmd.asp?act=tb&id=343&key=fb89ec4d</trackback:ping></item><item><title>搜烦引擎工作原理 搜烦引擎‹‚€ç´¢ç³»¾lŸæ¦‚˜qŽÍ¼ˆäºŒï¼‰8943459@qq.com (Recollection)http://www.mhhacj.live/seo-rumen/342/Fri, 16 Oct 2015 17:29:44 +0800http://www.mhhacj.live/seo-rumen/342/众所周知åQŒæœç´¢å¼•æ“Žçš„主要工作˜q‡ç¨‹åŒ…括åQšæŠ“取、存储、页面分析、烦引、检索等几个主要˜q‡ç¨‹ã€‚过åŽÕd‡ å‘¨ç»™å¤§å®¶ä»‹ç»äº†æŠ“取相关的½Ž€è¦è¿‡½E‹ã€‚今天简要介¾lä¸€ä¸‹çƒ¦å¼•ç³»¾lŸï¼Œä»¥äº¿ä¸ºå•ä½çš„¾|‘页库中查找特定的某些关键词犹如大æ“v里面捞针åQŒä¹Ÿè®æ€¸€å®šçš„æ—‰™—´å†…可以完成查找,但是用户½{‰ä¸èµøP¼Œä»Žç”¨æˆ·ä½“验角度我们必™åÕdœ¨æ¯«ç§’¾U§åˆ«¾l™äºˆç”¨æˆ·æ»¡æ„çš„结果,否则用户只能‹¹å¤±ã€‚怎样才能辑ֈ°˜q™ç§è¦æ±‚呢?

如果能知道用æˆähŸ¥æ‰„¡š„关键词(query切词后)都出现在哪些™åµé¢ä¸­ï¼Œé‚£ä¹ˆç”¨æˆ·‹‚€ç´¢çš„处理˜q‡ç¨‹å›_¯ä»¥æƒ³è±¡äؓ包含äº?/span>query中切词后不同部分的页面集合求交的˜q‡ç¨‹åQŒè€Œæ£€ç´¢å³å˜æˆäº†é¡µé¢å¿UîC¹‹é—´çš„比较、求交。这æ øP¼Œåœ¨æ¯«¿U’内以亿为单位的‹‚€ç´¢æˆä¸ÞZº†å¯èƒ½ã€‚è¿™ž®±æ˜¯é€šå¸¸æ‰€è¯´çš„倒排索引及求交检索的˜q‡ç¨‹ã€‚如下äؓ建立倒排索引的基本过½E‹ï¼š

åQ?åQ?span>™åµé¢åˆ†æžçš„过½E‹å®žé™…上是将原始™åµé¢çš„不同部分进行识别åƈ标记åQŒä¾‹å¦‚:titleã€?/span>keywordsã€?/span>contentã€?/span>linkã€?/span>anchor、评论、其他非重要区域½{‰ç­‰åQ?/span>

åQ?åQ?span>分词的过½E‹å®žé™…上包括了切词分词同义词转换同义词替换等½{‰ï¼Œä»¥å¯¹æŸé¡µé?/span>title分词ä¸ÞZ¾‹åQŒå¾—到的ž®†æ˜¯˜q™æ ·çš„数据:term文本ã€?/span>termid、词¾c…R€è¯æ€§ç­‰½{‰ï¼›

åQ?åQ?span>之前的准备工作完成后åQŒæŽ¥ä¸‹æ¥åÏx˜¯å»ºç«‹å€’排索引åQŒåŞæˆ?/span>{termàdoc}åQŒå¯ä»¥ç²—略的理解为如下,ä¸ÞZ»€ä¹ˆæ˜¯ã€?/span>term->docã€?/span>,而不是直接应用ã€?/span>doc->term】呢åQ?/span>

上述åÏx˜¯ç´¢å¼•¾pȝ»Ÿä¸­çš„倒排索引˜q‡ç¨‹åQŒæ˜¯æœçƒ¦å¼•æ“Žå®žçŽ°æ¯«ç§’¾U§æ£€ç´¢éžå¸”R‡è¦çš„一个环节ã€?/span>

Copyright © 2008

¾l§ç®‹é˜…读《搜索引擎工作原ç?搜烦引擎‹‚€ç´¢ç³»¾lŸæ¦‚˜qŽÍ¼ˆäºŒï¼‰ã€‹çš„全文内容...

分类: SEO入门 | Tags: 搜烦引擎  ‹‚€ç´¢ç³»¾l?/a>  å·¥ä½œåŽŸç†   | æ·ÕdŠ è¯„论(0)

相关文章:

]]>
SEO入门http://www.mhhacj.live/seo-rumen/342/#commenthttp://www.mhhacj.live/http://www.mhhacj.live/feed.asp?cmt=342http://www.mhhacj.live/cmd.asp?act=tb&id=342&key=ba8f4df0搜烦引擎工作原理 搜烦引擎‹‚€ç´¢ç³»¾lŸæ¦‚˜qŽÍ¼ˆä¸€åQ?/title><author>8943459@qq.com (Recollection)</author><link>http://www.mhhacj.live/seo-rumen/341/</link><pubDate>Fri, 16 Oct 2015 17:26:48 +0800</pubDate><guid>http://www.mhhacj.live/seo-rumen/341/</guid><description><![CDATA[<p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>前面½Ž€è¦ä»‹¾lè¿‡äº†æœç´¢å¼•æ“Žçš„索引¾pȝ»ŸåQŒå®žé™…上在徏立倒排索引的最后还需要有一个入库写库的˜q‡ç¨‹åQŒè€Œäؓ了提高效率这个过½E‹è¿˜éœ€è¦å°†å…¨éƒ¨</span><span>term</span><span>以及偏移量保存在文äšg头部åQŒåƈ且对数据˜q›è¡ŒåŽ‹ç¾ƒåQŒè¿™æ¶‰åŠåˆ°çš„˜q‡äºŽæŠ€æœ¯åŒ–在此ž®×ƒ¸å¤šæäº†ã€‚今天简要给大家介绍一下烦引之后的‹‚€ç´¢ç³»¾lŸã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>‹‚€ç´¢ç³»¾lŸä¸»è¦åŒ…含了五个部分åQŒå¦‚下图所½Cºï¼š</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><img style="border-bottom: 0px; border-left: 0px; max-width: 675px; border-top: 0px; border-right: 0px" title="索引&‹‚€ç´?jpg" border="0" hspace="0" alt="" src="http://www.mhhacj.live/upload/201510161727517228.jpg" /></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰Query</span><span>串切词分词即ž®†ç”¨æˆïLš„查询词进行分词,对之后的查询做准备,ä»?ldquo;</span><span>10</span><span>åïLº¿åœ°é“æ•…éšœ”ä¸ÞZ¾‹åQŒå¯èƒ½çš„分词如下åQˆåŒä¹‰è¯é—®é¢˜æš‚时略过åQ‰ï¼š</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>10 0x123abc</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>å?/span><span><span class="Apple-converted-space"> </span>0x13445d</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>¾U?/span><span><span class="Apple-converted-space"> </span>0x234d</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>地铁</span><span><span class="Apple-converted-space"> </span>0x145cf</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>故障</span><span><span class="Apple-converted-space"> </span>0x354df</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰æŸ¥å‡ºå«æ¯ä¸ª</span><span>term</span><span>的文档集合,åÏx‰¾å‡ºå¾…选集合,如下åQ?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>0x123abc 1 2 3 4 7 9…..</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>0x13445d 2 5 8 9 10 11……</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>……</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>……</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰æ±‚交,上述求交åQŒæ–‡æ¡?/span><span>2</span><span>和文æ¡?/span><span>9</span><span>可能是我们需要找的,整个求交˜q‡ç¨‹å®žé™…上关¾pȝ€æ•´ä¸ª¾pȝ»Ÿçš„性能åQŒè¿™é‡Œé¢åŒ…含了ä‹É用缓存等½{‰æ‰‹ŒDµè¿›è¡Œæ€§èƒ½ä¼˜åŒ–åQ?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰å„¿Uè¿‡æ»¤ï¼Œä¸¾ä¾‹å¯èƒ½åŒ…含˜q‡æ×o掉死链、重复数据、色情、垃圄¡»“果以及你懂的åQ?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>åQ?åQ‰æœ€¾lˆæŽ’序,ž®†æœ€èƒ½æ»¡­‘³ç”¨æˆ·éœ€æ±‚çš„¾l“果排序在最前,可能包括的有用信息如åQšç½‘站的整体评ä­h、网™åµè´¨é‡ã€å†…容质量、资源质量、匹配程度、分散度、时效性等½{‰ï¼Œä¹‹åŽä¼šè¯¦¾l†ç»™å¤§å®¶ä»‹ç»ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px">如果大家å¯ÒŽœç´¢å¼•æ“Žæ£€ç´¢è¿˜æœ‰åˆ«çš„疑问,大家可以到[学院同学汇]<a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://bbs.zhanzhang.baidu.com/thread-21440-1-1.html">[学习讨论]《搜索引擎检索系¾lŸæ¦‚˜q°ã€?/a>讨论帖中发表自己的看法,我们的工作äh员会å…Ïx³¨˜q™é‡Œòq¶ä¸Žå¤§å®¶˜q›è¡ŒæŽ¢è®¨ã€?/p><p>Copyright © 2008</p><p><a href="http://www.mhhacj.live/seo-rumen/341/" target="_blank">¾l§ç®‹é˜…读《搜索引擎工作原ç?搜烦引擎‹‚€ç´¢ç³»¾lŸæ¦‚˜qŽÍ¼ˆä¸€åQ‰ã€‹çš„全文内容...</a></p><p>分类: <a href="http://www.mhhacj.live/seo-rumen/">SEO入门</a> | Tags: <a href="http://www.mhhacj.live/catalog.asp?tags=%E6%90%9C%E7%B4%A2%E5%BC%95%E6%93%8E">搜烦引擎</a>  <a href="http://www.mhhacj.live/catalog.asp?tags=%E6%A3%80%E7%B4%A2%E7%B3%BB%E7%BB%9F">‹‚€ç´¢ç³»¾l?/a>  <a href="http://www.mhhacj.live/catalog.asp?tags=%E5%B7%A5%E4%BD%9C%E5%8E%9F%E7%90%86">工作原理</a>   | <a href="http://www.mhhacj.live/seo-rumen/341/#comment" target="_blank">æ·ÕdŠ è¯„论</a>(0)</p><h3>相关文章:</h3><ul><li><a href="http://www.mhhacj.live/seo-rumen/346/">搜烦引擎工作原理 搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆä¸‰åQ?/a> (2015-10-16 21:49:3) </li><li><a href="http://www.mhhacj.live/seo-rumen/345/">搜烦引擎工作原理 搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆå››åQ?/a> (2015-10-16 21:45:35) </li><li><a href="http://www.mhhacj.live/seo-rumen/344/">搜烦引擎工作原理 搜烦引擎抓取¾pȝ»Ÿæ¦‚è¿°åQˆä¸‰åQ?/a> (2015-10-16 21:45:35) </li><li><a href="http://www.mhhacj.live/seo-rumen/342/">搜烦引擎工作原理 搜烦引擎‹‚€ç´¢ç³»¾lŸæ¦‚˜qŽÍ¼ˆäºŒï¼‰</a> (2015-10-16 17:29:44) </li><li><a href="http://www.mhhacj.live/seo-rumen/338/">癑ֺ¦æœçƒ¦å¼•æ“ŽåŸºç¡€çŸ¥è¯† 抓取、过滤、徏立烦引和输出¾l“æžœ</a> (2015-10-16 16:17:36) </li></ul>]]></description><category>SEO入门</category><comments>http://www.mhhacj.live/seo-rumen/341/#comment</comments><wfw:comment>http://www.mhhacj.live/</wfw:comment><wfw:commentRss>http://www.mhhacj.live/feed.asp?cmt=341</wfw:commentRss><trackback:ping>http://www.mhhacj.live/cmd.asp?act=tb&id=341&key=0f3347fc</trackback:ping></item><item><title>癑ֺ¦èœ˜è››è¯†åˆ« 如何识别Baiduspider IP地址 8943459@qq.com (Recollection)http://www.mhhacj.live/seo-rumen/340/Fri, 16 Oct 2015 17:22:48 +0800http://www.mhhacj.live/seo-rumen/340/上周癑ֺ¦ç«™é•¿òq›_°æŽ¥åˆ°æŸç«™é•¿æ±‚助,表示误封¼›äº†Baiduspider的IPåQŒè¯¢é—®æ˜¯å¦æœ‰åŠžæ³•èŽ·å¾—Baiduspider的所有IPåQŒæ‰“½Ž—放入白名单加以保护åQŒé˜²æ­¢å†‹Æ¡è¯¯ž®ã€‚在此要告诉各位站长åQŒBaiduspider的IP池是不断变动的,我们无法提供IP全集ã€?/p>

除此之外åQŒä¹‹å‰è¿˜æœ‰ç«™é•¿å‘来质疑说Baiduspider光顾˜q‡äºŽé¢‘繁åQŒå·²­‘…越服务器承受能力。而百度站长åã^台追查发玎ͼŒBaiduspider对该站点的抓取åƈ无异常,那只spider极有可能是个李鬼ã€?/p>

那么åQŒç«™é•¿è¯¥å¦‚何通过IP来判断此spider是不是来自百度搜索引擎的呢?

可以通过DNS反查方式来解册™¿™ä¸ªé—®é¢˜ã€‚根据åã^åîC¸åŒéªŒè¯æ–¹æ³•ä¸åŒï¼Œå¦?/span>linux/windows/os三种òq›_°ä¸‹çš„验证æ–ÒŽ³•åˆ†åˆ«å¦‚下åQ?/span>

1ã€?/span>åœ?/span>linuxòq›_°ä¸‹ï¼Œæ‚¨å¯ä»¥ä‹Éç”?/span>host ip命ä×o反解ip来判断是否来è‡?/span>Baiduspider的抓取ã€?/span>Baiduspiderçš?/span>hostnameä»?/span> *.baidu.com æˆ?/span> *.baidu.jp çš„格式命名,é?/span> *.baidu.com æˆ?/span> *.baidu.jp å³äؓ冒充ã€?/span>

2ã€?/span>åœ?/span>windowsòq›_°æˆ–è€?/span>IBM OS/2òq›_°ä¸‹ï¼Œæ‚¨å¯ä»¥ä‹Éç”?/span>nslookup ip命ä×o反解ipæ?/span> åˆ¤æ–­æ˜¯å¦æ¥è‡ªBaiduspider的抓取。打开命ä×o处理å™?/span> è¾“å…¥nslookup xxx.xxx.xxx.xxxåQ?/span>IPåœ?/span> å€åQ‰å°±èƒ½è§£æž?/span>ipåQ?/span> æ¥åˆ¤æ–­æ˜¯å¦æ¥è‡?/span>Baiduspider的抓取,Baiduspiderçš?/span>hostnameä»?/span>*.baidu.com æˆ?/span>*.baidu.jp çš„格式命名,é?/span> *.baidu.com æˆ?/span> *.baidu.jp å³äؓ冒充ã€?/span>

3ã€?/span>åœ?/span>mac osòq›_°ä¸‹ï¼Œæ‚¨å¯ä»¥ä‹Éç”?/span>dig å‘½ä×o反解ipæ?/span> åˆ¤æ–­æ˜¯å¦æ¥è‡ªBaiduspider的抓取。打开命ä×o处理å™?/span> è¾“å…¥dig xxx.xxx.xxx.xxxåQ?/span>IPåœ?/span> å€åQ‰å°±èƒ½è§£æž?/span>ipåQ?/span> æ¥åˆ¤æ–­æ˜¯å¦æ¥è‡?/span>Baiduspider的抓取,Baiduspiderçš?/span>hostnameä»?/span> *.baidu.com æˆ?/span>*.baidu.jp çš„格式命名,é?/span> *.baidu.com æˆ?/span> *.baidu.jp å³äؓ冒充ã€?/span>

如果大家对如何识别Baiduspider˜q˜æœ‰åˆ«çš„ç–‘é—®åQŒå¤§å®¶å¯ä»¥åˆ°[学院同学汇][学习讨论] 《如何识别Baiduspiderã€?/a>讨论帖中发表自己的看法,我们的工作äh员会å…Ïx³¨˜q™é‡Œòq¶ä¸Žå¤§å®¶˜q›è¡ŒæŽ¢è®¨ã€?/p>

Copyright © 2008

¾l§ç®‹é˜…读《百度蜘蛛识åˆ?如何识别Baiduspider IP地址 》的全文内容...

分类: SEO入门 | Tags: 癑ֺ¦èœ˜è››  Baiduspider   | æ·ÕdŠ è¯„论(0)

˜q˜æ²¡æœ‰ç›¸å…Ïx–‡ç« ï¼Œæ‚¨æ¥è¯´ä¸¤å¥ï¼Ÿ

]]>
SEO入门http://www.mhhacj.live/seo-rumen/340/#commenthttp://www.mhhacj.live/http://www.mhhacj.live/feed.asp?cmt=340http://www.mhhacj.live/cmd.asp?act=tb&id=340&key=365ca4fb
多域名优化解å†Ïx–¹æ³?多域名同内容常见问题8943459@qq.com (Recollection)http://www.mhhacj.live/seo-ziliao/339/Fri, 16 Oct 2015 16:19:05 +0800http://www.mhhacj.live/seo-ziliao/339/【问】多个域名指向同一个域名算是作弊么?

【答】如果某公司从品牌保护或长远发展角度出发åQŒæ³¨å†Œäº†å¤šä¸ªåŸŸååQŒä¸”多域å?01重定向指向一个常用域名,那此行äؓ本èín不属于作弊。但是如果进è¡?01重定向的多域名本íw«æœ‰ä½œå¼Šè¡ŒäؓåQŒé‚£ä¹ˆè¢«æŒ‡å‘的域名有可能受到牵连ã€?/p>

【问】多个域名同内容是否有利于提高排名?

【答】此ä¸ùNžå¸æ€¸åˆ©äºŽæé«˜æŽ’名åQŒå› ä¸ºå¤šä¸ªåŸŸåä¼šåˆ†æ•£åŽŸæœ¬åº”该属于单独域名的外链进而媄响权重,不利于单独域名获得更好的排名ã€?/p>

【问】我们先推个‹¹‹è¯•åŸŸååQŒèµ°ä¸Šæ­£è½¨åŽå†å¯ç”¨æ­£å¼åŸŸåï¼Œä¼šæœ‰ä»€ä¹ˆåª„响吗åQ?/span>
 

【答】如果两个域名的内容完全一æ øP¼Œå¯ÒŽ­£å¼åŸŸåçš„收录可能会有影响。搜索引擎会认äؓ˜q™ä¸¤ä¸ªç½‘站重复,在已¾læ”¶å½•äº†å‰è€…的前提下,会对后者限制收录。的¼‹®æ›¾¾læœ‰æŸç†è´¢ç½‘站先搞一个äÍ时域名在搜烦引擎试水åQŒç»“果等正牌域名上线后却˜qŸè¿Ÿå¾—不到收录的事情发生ã€?/p>

【问】如果公司已¾læœ‰å¤šä¸ªåŸŸåæ˜¯ç›¸åŒå†…容该怎么处理åQ?/p>

【答】先¼‹®è®¤ä¸€ä¸ªå¥½è®°æ˜“懂的域名作äؓ“唯一域名”åQŒè¿›è¡Œé‡ç‚¹å®£ä¼ æŽ¨ä»‹ï¼Œå…¶ä½™çš„域名也千万不能放ä“Q自流åQŒå°¤å…¶æ˜¯å½“测试域名已¾læœ‰äº†è¾ƒå¥½çš„收录和排位时åQŒå¯ä»¥è®¾¾|?01重定向,指向唯一域名。同时登录百度站长åã^台对新旧¾|‘ç«™˜q›è¡ŒéªŒè¯åQŒç„¶åŽåœ¨¾|‘站改版工具中进行相应的操作åQŒå¯ä»¥è®©“唯一域名”¾l§æ‰¿åŽŸåŸŸåå·²èŽ·å¾—的权重ã€?/p>

¾|‘站改版工具地址åQšhttp://zhanzhang.baidu.com/rewrite/index

帮助说明åQšhttp://zhanzhang.baidu.com/wiki/106

如果大家对多域名同内容还有别的疑问,大家可以到[学院同学汇][学习讨论] 《多域名同内容的常见问题ã€?/a>讨论贴中发表自己的看法,我们的工作äh员会å…Ïx³¨˜q™é‡Œòq¶ä¸Žå¤§å®¶˜q›è¡ŒæŽ¢è®¨ã€?/p>

Copyright © 2008

¾l§ç®‹é˜…读《多域名优化解决æ–ÒŽ³•,多域名同内容常见问题》的全文内容...

分类: SEO资料 | Tags: 多域å?/a>   | æ·ÕdŠ è¯„论(0)

˜q˜æ²¡æœ‰ç›¸å…Ïx–‡ç« ï¼Œæ‚¨æ¥è¯´ä¸¤å¥ï¼Ÿ

]]>
SEO资料http://www.mhhacj.live/seo-ziliao/339/#commenthttp://www.mhhacj.live/http://www.mhhacj.live/feed.asp?cmt=339http://www.mhhacj.live/cmd.asp?act=tb&id=339&key=7d9cf466
癑ֺ¦æœçƒ¦å¼•æ“ŽåŸºç¡€çŸ¥è¯† 抓取、过滤、徏立烦引和输出¾l“æžœ8943459@qq.com (Recollection)http://www.mhhacj.live/seo-rumen/338/Fri, 16 Oct 2015 16:17:36 +0800http://www.mhhacj.live/seo-rumen/338/从输入关键词åQŒåˆ°ç™‘Öº¦¾l™å‡ºæœçƒ¦¾l“果的过½E‹ï¼Œå¾€å¾€ä»…需几毫¿U’即可完成。百度是如何在浩如烟‹¹ïLš„互联¾|‘资源中åQŒä»¥å¦‚此之快的速度ž®†æ‚¨çš„网站内容展现给用户åQŸè¿™èƒŒåŽè•´è—ç€ä»€ä¹ˆæ ·çš„工作流½E‹å’Œ˜qç®—逻辑åQŸäº‹å®žä¸ŠåQŒç™¾åº¦æœç´¢å¼•æ“Žçš„工作òq‰™žä»…仅如同首页搜烦框一æ ïL®€å•ã€?/span>

搜烦引擎为用户展现的每一条搜索结果,都对应着互联¾|‘上的一个页面。每一条搜索结果从产生到被搜烦引擎展现¾l™ç”¨æˆøP¼Œéƒ½éœ€è¦ç»˜q‡å››ä¸ªè¿‡½E‹ï¼šæŠ“取、过滤、徏立烦引和输出¾l“æžœã€?/span>

抓取

BaiduspideråQŒæˆ–¿U°ç™¾åº¦èœ˜è››ï¼Œä¼šé€šè¿‡æœçƒ¦å¼•æ“Ž¾pȝ»Ÿçš„计½Ž—,来决定对哪些¾|‘站施行抓取åQŒä»¥åŠæŠ“取的内容和频率倹{€‚搜索引擎的计算˜q‡ç¨‹ä¼šå‚考您的网站在历史中的表现åQŒæ¯”如内å®ÒŽ˜¯å¦èƒö够优质,是否存在对用户不友好的设¾|®ï¼Œæ˜¯å¦å­˜åœ¨˜q‡åº¦çš„搜索引擎优化行为等½{‰ã€?/span>

当您的网站äñ”生新内容æ—Óž¼ŒBaiduspider会通过互联¾|‘中某个指向该页面的链接˜q›è¡Œè®‰K—®å’ŒæŠ“取,如果您没有设¾|®ä“Q何外部链接指向网站中的新增内容,åˆ?/span>Baiduspider是无法对其进行抓取的。对于已被抓取过的内容,搜烦引擎会对抓取的页面进行记录,òq¶ä¾æ®è¿™äº›é¡µé¢å¯¹ç”¨æˆ·çš„重要程度安排不同频‹Æ¡çš„抓取更新工作ã€?/span>

需您要注意的是åQŒæœ‰ä¸€äº›æŠ“取èÊYä»Óž¼Œä¸ÞZº†å„种目的åQŒä¼šä¼ªè£…æˆ?/span>Baiduspiderå¯ÒŽ‚¨çš„网站进行抓取,˜q™å¯èƒ½æ˜¯ä¸å—控制的抓取行为,严重时会影响到网站的正常˜qä½œã€?/span>ç‚ÒŽ­¤è¯†åˆ«Baiduspider的真ä¼?/span>ã€?/span>

˜q‡æ×o

互联¾|‘中òq‰™žæ‰€æœ‰çš„¾|‘页都对用户有意义,比如一些明昄¡š„‹Æºéª—用户的网™åµï¼Œæ­»é“¾æŽ¥ï¼Œ½Iºç™½å†…容™åµé¢½{‰ã€‚这些网™åµå¯¹ç”¨æˆ·ã€ç«™é•¿å’Œç™‘Öº¦æ¥è¯´åQŒéƒ½æ²¡æœ‰­‘›_¤Ÿçš„ä­hå€û|¼Œå› æ­¤ç™‘Öº¦ä¼šè‡ªåŠ¨å¯¹˜q™äº›å†…容˜q›è¡Œ˜q‡æ×oåQŒä»¥é¿å…ä¸ºç”¨æˆ·å’Œæ‚¨çš„¾|‘站带来不必要的éºÈƒ¦ã€?/span>

建立索引

癑ֺ¦å¯ÒŽŠ“取回来的内容会逐一˜q›è¡Œæ ‡è®°å’Œè¯†åˆ«ï¼Œòq¶å°†˜q™äº›æ ‡è®°˜q›è¡Œå‚¨å­˜ä¸ºç»“构化的数据,比如¾|‘页çš?/span>tagtitleã€?/span>metadescripiton、网™åµå¤–铑֏Šæè¿°ã€æŠ“取记录。同æ—Óž¼Œä¹Ÿä¼šž®†ç½‘™åµä¸­çš„关键词信息˜q›è¡Œè¯†åˆ«å’Œå‚¨å­˜ï¼Œä»¥ä¾¿ä¸Žç”¨æˆähœç´¢çš„内容˜q›è¡ŒåŒšw…ã€?/span>

输出¾l“æžœ

用户输入的关键词åQŒç™¾åº¦ä¼šå¯¹å…¶˜q›è¡Œä¸€¾pÕdˆ—复杂的分析,òq¶æ ¹æ®åˆ†æžçš„¾l“论在烦引库中寻找与之最为匹配的一¾pÕdˆ—¾|‘页åQŒæŒ‰ç…§ç”¨æˆ¯‚¾“入的关键词所体现的需求强弱和¾|‘页的优劣进行打分,òq¶æŒ‰ç…§æœ€¾lˆçš„分数˜q›è¡ŒæŽ’列åQŒå±•çŽ°ç»™ç”¨æˆ·ã€?/span>

¾lég¸ŠåQŒæ‚¨è‹¥å¸Œæœ›é€šè¿‡æœçƒ¦å¼•æ“Žä¸ºç”¨æˆ·å¸¦æ¥æ›´å¥½çš„体验åQ?span>需要您对网站进行严格的内容å»ø™®¾åQŒä‹É之更½W¦åˆç”¨æˆ·çš„浏览需求。需要您注意的是åQŒç½‘站的内容å»ø™®¾å§‹ç»ˆéœ€è¦è€ƒè™‘的一个问题是åQŒè¿™å¯¹ç”¨æˆäh˜¯å¦æœ‰ä»·å€¹{€?/span>

如果大家寏V€Šç™¾åº¦æœç´¢å¼•æ“ŽåŸº¼‹€çŸ¥è¯†ã€‹è¿˜æœ‰åˆ«çš„疑问,大家可以到[学院同学汇][学习讨论]《百度搜索引擎基¼‹€çŸ¥è¯†ã€?/a>讨论帖中发表自己的看法,我们的工作äh员会å…Ïx³¨˜q™é‡Œòq¶ä¸Žå¤§å®¶˜q›è¡ŒæŽ¢è®¨ã€?/p>

Copyright © 2008

¾l§ç®‹é˜…读《百度搜索引擎基¼‹€çŸ¥è¯† 抓取、过滤、徏立烦引和输出¾l“果》的全文内容...

分类: SEO入门 | Tags: 搜烦引擎  ç™‘Öº¦  åŸºç¡€çŸ¥è¯†   | æ·ÕdŠ è¯„论(0)

相关文章:

]]>
SEO入门http://www.mhhacj.live/seo-rumen/338/#commenthttp://www.mhhacj.live/http://www.mhhacj.live/feed.asp?cmt=338http://www.mhhacj.live/cmd.asp?act=tb&id=338&key=0ff3316c
Web2.0反垃圾详¾l†æ”»ç•?癑ֺ¦åˆ¤åˆ«ä¸ºåžƒåœ‘Ö†…å®ÒŽ¡ˆä¾?/title><author>8943459@qq.com (Recollection)</author><link>http://www.mhhacj.live/seo-ziliao/337/</link><pubDate>Fri, 16 Oct 2015 16:15:47 +0800</pubDate><guid>http://www.mhhacj.live/seo-ziliao/337/</guid><description><![CDATA[<p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>一ã€?/span></strong><strong><span>web2.0</span></strong><strong><span>站点与垃圑ֆ…å®?/span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>ç”׃ºŽå¤§å¤šæ•?/span><span>web2.0</span><span>建站¾pȝ»Ÿå­˜åœ¨æ¼æ´žåQŒæ”»å…‹æŠ€æœ¯æˆæœ¬è¾ƒä½Žï¼Œä¸”群发èÊY件ä­hæ ég½Žå»‰ï¼Œå®ÒŽ˜“被作弊者利用,˜q‘期我们发现大量</span><span>web2.0</span><span>站点被群发的垃圾信息困扰。这些垃圄¡¾¤å‘内å®ÒŽ— å­”不入,除论坛、博客等传统çš?/span><span>web2.0</span><span>站点受到困扰外,现已蔓åšg到微博ã€?/span><span>SNS</span><span>ã€?/span><span>B2B</span><span>商情™åüc€å…¬å”R»„™åüc€åˆ†¾cÖM¿¡æ¯ã€è§†é¢‘站、网盘等更多领域内,甚至˜qžæ–°å…´çš„分äín½C‘ÖŒºä¹Ÿå—åˆîCº†å½±å“ã€‚从以前的论坛帖子、博客日志,扩展åˆîC¾›æ±‚信息页、视频页、用戯‚µ„料页åQŒè¿™äº›ä“Q何由用户填写和生成内容的地方åQŒéƒ½ä¼šè¢«</span><span>作弊è€?/span><span>发掘利用åQŒåŞ成大量的</span><span>web2.0</span><span>性质的垃åœùN¡µé¢ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span><br /></span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>搜烦引擎在发çŽ?/span><span>web2.0</span><span>性质的垃åœùN¡µé¢åŽå¿…将做出相应应对åQŒä½†å¯¹çœŸæ­£æ“ä½œç¾¤å‘çš„</span><span>作弊è€?/span><span>很难有效的打击,所ä»?/span><span>作弊è€?/span><span>å®ÒŽ˜“利用</span><span>web2.0</span><span>站点极低成本且自íw«å®‰å…¨è¿™äº›ç‰¹ç‚¹ï¼Œåšå‡ºæ›´å¤šå±å®³¾|‘站、危害用戗÷€å±å®Ïxœç´¢å¼•æ“Žçš„è¡Œäؓ。若¾|‘站自èín½Ž¡ç†ä¸ä¸¥æŽ§åˆ¶ä¸åŠ›åQŒå¾ˆå®ÒŽ˜“成äؓ垃圾内容的温床;有些¾|‘ç«™ä¸ÞZº†çŸ­æœŸ‹¹é‡è€Œå¯¹åžƒåœ¾å†…容¾|®ä¹‹ä¸ç†åQŒè¿™æ— å¼‚于饮鸩止渴。网站不应仅仅是òq›_°çš„提供者,更应该是内容的管理者,¿U¯æž¾l´æŠ¤¾|‘站自èín质量非常重要。若¾|‘ç«™ä»È”±åžƒåœ¾å†…容滋长åQŒä¸ä»…会影响¾|‘站的用户体验,同时也会破坏¾|‘站口碑和自íw«å“ç‰Œå¾è®¾ï¼Œé€ æˆæ­£å¸¸ç”¨æˆ·‹¹å¤±åQŒä¸¥é‡æ—¶åQŒè¿˜ä¼šä‹É搜烦引擎降低对网站的评ä­hã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>对于</span><span>作弊è€?/span><span>来说åQŒåœ¨</span><span>web2.0</span><span>站点上发布垃圑ֆ…容的目的ž®±æ˜¯è¢«æœç´¢å¼•æ“Žæ”¶å½•ï¼Œå¦‚果不能让垃åœùN¡µé¢åœ¨¾|‘站和搜索引擎上消失åQŒä»–们依然会持箋不断åœîCñ”生更多垃圑ֆ…宏V€‚百度站长åã^台希望和站长一èµäh‰“å‡ÕdžƒåœùN¡µé¢ï¼Œå¸®åŠ©¾|‘站良性发展,共同¾l´æŠ¤äº’联¾|‘生态环境ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong><span>二、哪些内容会被百度判别äؓ垃圾内容</span></strong></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>一切对用户无意义,且会伤害用户的内容,ž®±æ˜¯åžƒåœ¾å†…容。我们æ€È»“了以下几¿Uæ¯”较典型的案例åQŒä»¥½Cø™¯´æ˜Žï¼š</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>1</span><span>、与¾|‘ç«™</span><span>æˆ?/span><span>论坛版块主题不符的内å®?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>¾Ÿ¤å‘者通常都是大面¿U¯ç¾¤å‘内容,多数情况下不会注意站点及版块主题åQŒæœ‰æ—¶æˆ‘们会在视频网站中见到</span><span>“</span><span>XXX</span><span>医院æ²È–—白癫风效果好</span><span>”</span><span>的内容,会在化妆品论坛发现航½Iºå…¬å¸çš„虚假电话åQŒä¼šåœ¨éŸ³ä¹ç½‘站中扑ֈ°å•†å“æŽ¨é”€ä¿¡æ¯åQˆå½“然不是卖</span><span>CD</span><span>的)½{‰ç­‰ã€‚对于这些主题明¼‹®çš„站点或论坛,清理垃圾内容的意义不仅在于保证网站体验,也是从自íw«å‘展考虑¾l´æŠ¤ç”¨æˆ·å¿ è¯šåº¦ï¼Œæé«˜æ ¸å¿ƒç«žäº‰åŠ›çš„事情。ä‹D例:</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/79ff52406a9358986d115dc8.html" target="_blank">http://cang.baidu.com/cases99/snap/79ff52406a9358986d115dc8.html<span><span class="Apple-converted-space"> </span></span></a><span>¾|‘站主题为化妆品åQŒå‡ºçŽ?/span><span>“</span><span>扑ְå§?/span><span>”</span><span>½{‰ä¸è‰¯å†…容广å‘?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/f84bec4e99508525a9e67fce.html" target="_blank">http://cang.baidu.com/cases99/snap/f84bec4e99508525a9e67fce.html</a><span>¾|‘站主题ä¸ø™§†é¢‘,出现明显商业òq¿å‘Šæ€§è´¨çš„医疗信æ?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>2</span><span>、欺骗搜索引擎用æˆïLš„内容</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>1</span><span>åQ‰åžƒåœ¾ä¿¡æ¯äؓ了在众多搜烦¾l“果中脱颖而出、吸引用æˆäh³¨æ„ï¼Œé€šå¸¸ä¼šä‹É用诱人的标题åQŒæˆ–在内容中æ·ÕdŠ å¤§é‡å…³é”®è¯ï¼Œæœ‰åˆ«äºŽçœŸå®žç”¨æˆ·å‘帖时使用自然语言表达的情å†üc€‚ä‹D例:</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/c2c0b07346650b4d292e0368.html" target="_blank">http://cang.baidu.com/cases99/snap/c2c0b07346650b4d292e0368.html</a><span>“</span><span>优酷土豆</span><span>%</span><span>守望的天½I?/span><span>29</span><span>é›?/span><span>”--</span><span>有悖于普通用户发布信息的习惯ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>2</span><span>åQ‰æœ‰äº›å¸–子内å®ÒŽ˜¯ä¸€ŒD‰|²¡æœ‰ä“Q何意义的文字åQŒæˆ–者随æ„?/span><span>采集</span><span>来一½‹‡æ–‡ç« ï¼Œè€Œä¸­é—´ç©¿æ’了一些热门关键词。ä‹D例:</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/c17615311d6d4531bb4b33cc.html" target="_blank"><span>http://cang.baidu.com/cases99/snap/c17615311d6d4531bb4b33cc.html</span></a></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/1baad31c3d640eeceb11823d.html" target="_blank"><span>http://cang.baidu.com/cases99/snap/1baad31c3d640eeceb11823d.html</span></a></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>3</span><span>åQ‰æœ‰äº›æ–‡ç« çœ‹æ ‡é¢˜ä»¥äؓ在说</span><span>A</span><span>事,而主要内容却在讲</span><span>B</span><span>åQŒä¸”ä¸?/span><span>A</span><span>毫无关系。ä‹D例:</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/ce87d21d625937ebd9eee4c2.html" target="_blank"><span>http://cang.baidu.com/cases99/snap/ce87d21d625937ebd9eee4c2.html</span></a></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/c17615311d6d4531bb4b33cc.html" target="_blank"><span>http://cang.baidu.com/cases99/snap/c17615311d6d4531bb4b33cc.html</span></a></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>4</span><span>åQ‰å¯¹äºŽè§†é¢‘音频网站来è¯ß_¼Œæ— è®ºå†…容上传者是否äؓ恶意åQŒåªè¦è§†é¢‘或音频文äšg不能满èƒö用户需求或者与标题所˜qîC¸½W¦éƒ½åº”该清除掉。ä‹D例:</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/c8ea73b9a98c51205104b3c1.html" target="_blank">http://cang.baidu.com/cases99/snap/c8ea73b9a98c51205104b3c1.html</a><span>åQŒå®žé™…视频åã^均不­‘?/span><span>1</span><span>分钟</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/1e7b322fb94512c064e0fec0.html" target="_blank">http://cang.baidu.com/cases99/snap/1e7b322fb94512c064e0fec0.html</a><span><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/1e7b322fb94512c064e0fec0.html"><span><span class="Apple-converted-space"> </span></span></a></span><span>视频内嵌入了联系方式åQŒåä¸ÞZ»‹¾læ­¦è‰ºï¼Œå®žé™…是在推广另一è‰ÞZhåQŒè§†é¢‘ç«™ç‚ÒŽˆä¸ºå…¶å…è´¹çš„推òq¿åã^台ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>3</span><span>、欺骗网站诈取分帐式òq¿å‘Šæ”¶ç›Šçš„内å®?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>部分</span><span>web2.0</span><span>站点ä¸ÞZº†é¼“励用户上传内容åQŒä¼šè®¾è®¡ä¸€å¥—现金鼓励机åˆÓž¼Œæ¯”如视频¾|‘ç«™åQŒæ ¹æ®è§†é¢‘前面的òq¿å‘Šå±•çŽ°é‡æ¥è®¡ç®—用户收益åQŒå°‘数分成用户会采取一些不正当的手ŒDµä»Žæœçƒ¦å¼•æ“Žéª—取‹¹é‡åQŒä»Žç«™æ–¹è¯ˆå–分成收益。如大量上传短小视频åQŒåƈ在视频网™åµä¸Šå †ç§¯è¯×ƒh的关键词ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>4</span><span>、恶意利ç”?/span><span>web2.0</span><span>¾|‘ç«™ä¸ø™‡ªå·±åšæŽ¨å¹¿ã€è°‹¼›åˆ©çš„内宏V€‚ä‹D例:</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/16107c3e4e885c024d29ed38.html" target="_blank"><span>http://cang.baidu.com/cases99/snap/16107c3e4e885c024d29ed38.html</span></a></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/1e7b322fb94512c064e0fec0.html" target="_blank"><span>http://cang.baidu.com/cases99/snap/1e7b322fb94512c064e0fec0.html</span></a></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>视频内嵌入了联系方式åQŒåä¸ÞZ»‹¾læ­¦è‰ºï¼Œå®žé™…是在推广另一è‰ÞZhåQŒè§†é¢‘ç«™ç‚ÒŽˆä¸ºå…¶å…è´¹çš„推òq¿åã^台ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>5</span><span>、有˜qæ³•å¾‹æ³•è§„的不良信息åQŒå¦‚诈骗中奖联系方式、虚假联¾pȝ”µè¯ã€ä¸è‰¯ä¿¡æ¯ã€‚ä‹D例:</span><br /> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/79ff52406a9358986d115dc8.html" target="_blank"><span>http://cang.baidu.com/cases99/snap/79ff52406a9358986d115dc8.html</span></a></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/30c36a2b013ae249aacfbc3e.html" target="_blank"><span>http://cang.baidu.com/cases99/snap/30c36a2b013ae249aacfbc3e.html</span></a></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/af71c5ec8b83e2eed1cb783d.html" target="_blank"><span>http://cang.baidu.com/cases99/snap/af71c5ec8b83e2eed1cb783d.html</span></a></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/cases99/snap/f4633d781c76393f9b11343d.html" target="_blank">http://cang.baidu.com/cases99/snap/f4633d781c76393f9b11343d.html</a></span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><strong>三、网站管理员面对垃圾内容åQŒå¦‚何应å¯?/strong><br /> </p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>å‡ÞZºŽå¯¹ç½‘站自íw«å‘展的考虑åQŒäؓ了ä‹É搜烦引擎能够提供更加公åã^的结果,ä¸ÞZº†¾l´æŠ¤äº’联¾|‘生态环境,以及¾l™ç½‘民提供更好的上网体验åQŒæˆ‘们认ä¸?/span><span>web2.0</span><span>站点</span><span>æˆ?/span><span>论坛版块里存在上˜q°å†…å®ÒŽ˜¯éžå¸¸ä¸åˆé€‚çš„åQŒç½‘站管理员应对垃圾内容˜q›è¡Œé‡ç‚¹æ¸…理åQŒå¯ä»¥é‡‡å–以下措施:</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>1</span><span>、删除垃圑ֆ…容,òq¶å°†˜q™äº›™åµé¢è®„¡½®ä¸?/span><span>404</span><span>™åµé¢åŽï¼ŒåŠæ—¶é€šè¿‡<a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://zhanzhang.baidu.com/" target="_blank">癑ֺ¦ç«™é•¿òq›_°</a></span><span>的死铑ַ¥å…ähäº¤æ­»é“‘Öˆ—表。不仅ä×o癑ֺ¦å¯¹ç«™ç‚¹çš„自我清理行äؓ及时响应åQŒæ›´æ–¹ä¾¿ç«™ç‚¹ä¸ÕdŠ¨æŽ§åˆ¶¾|‘站内容在搜索引擎的呈现情况ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>2</span><span>、提高注册用户门槛,限制机器注册</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>1</span><span>åQ‰ç¾¤å‘èÊY仉™€šå¸¸ä½¿ç”¨è‡ªåŠ¨çš„程序探‹¹‹è®ºå›é»˜è®¤çš„注册文äšg名、发帖文件名。管理员可以不定期的修改注册用户文äšg名、发帖文件名åQ›æ³¨å†Œã€å‘帖按钮ä‹É用图片;与程序默认的不同åQŒå¯ä»¥é˜²æ­¢è¢«è‡ªåŠ¨½E‹åºæœçƒ¦åˆ°ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>2</span><span>åQ‰å‘帖机通常是机器注册,行äؓ模式单一。管理员可添加一些需要äh工操作的步骤åQŒæœ‰åŠ©äºŽé™åˆ¶æœºå™¨æ³¨å†Œã€?/span><span><span class="Apple-converted-space"> </span></span><span>如:使用验证码;限制同一邮箱注册</span><span>ID</span><span>的数量,同时启用邮箱验证åQ›ä‹É用更为复杂的验证机制åQ›ç»å¸¸æ›´æ¢æ³¨å†Œé—®½{”ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>3</span><span>åQ‰é™¤äº†åœ¨æ³¨å†Œå¤„设¾|®é—¨æ§›å¤–åQŒè¿˜å¯ä»¥æŽ§åˆ¶æ–°ç”¨æˆähƒé™ã€‚如要求完成上传头像、完善用户信息等人工操作步骤后才开攑֏‘帖功能;在一定时间内限制新用户发帖;限制新用户发布带链接的帖子,待达åˆîC¸€å®šçñ”别后再放开ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>3</span><span>、严控机器发帖行为,如ä‹É用验证码、限制短旉™—´å†…è¿ž¾l­å‘帖等ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>4</span><span>、徏立黑名单机制åQŒå°†¾Ÿ¤å‘常用词、广告电话和¾|‘址½{‰åŠ å…¥é»‘名单åQŒå¯¹å«æœ‰é»‘名单内容的帖子˜q›è¡Œé™åˆ¶æˆ–清除。黑名单应该不断¾l´æŠ¤åQŒä»¥å ‰|ˆªåŽŸæœ‰åžƒåœ¾è¯æ±‡å‘生变åŞ和新生垃圾词汇ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>5</span><span>、对站内的异常进行监控。发现注册量、帖子数åQŒç”šè‡³ç«™ç‚ÒŽµé‡çˆ†å¢žåŽåQŒåŠæ—¶å‘现和查找原因ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>6</span><span>、对站点内用æˆïLš„è¡Œäؓ˜q›è¡Œç›‘控</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>1</span><span>åQ‰éƒ¨åˆ†å¼‚常用æˆïLš„</span><span>ID</span><span>¾l“构有别于普通用æˆøP¼Œå¦‚ä‹É用无意义的字母数字、或几个单个汉字的无序组合,如:</span><span>gtu4gn6dy1</span><span>、蝶淑琴åQ›ä‹É用商业词作äؓ</span><span>ID</span><span>åQŒå¦‚åQšèáu承天åœ?/span><span>7</span><span>ã€?/span><span>hangkongfuwu123</span><span>ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>2</span><span>åQ‰å‘布内定w—´éš”过çŸ?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>3</span><span>åQ‰å‘布的内容¾lå¤§éƒ¨åˆ†éžå¸¸¾cÖM¼¼</span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>4</span><span>åQ‰å‘布的大部分内定w‡Œå«æœ‰¾cÖM¼¼çš„特征,如某个网址、电话ã€?/span><span>QQ</span><span>åïL ½{‰è”¾pÀL–¹å¼?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>7</span><span>、不允许发布带有可执行代码的内容åQŒé¿å…å¼¹½H—、蟩转等严重影响用户体验的情况发生ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>8</span><span>、对部分</span><span>web2.0</span><span>位置提及的链接,使用</span><span>“nofollow”</span><span>˜q›è¡Œæ ‡è®°åQŒå¦‚åQ?/span><span>bbs</span><span>½{‘֐å†…的链接ã€?/span><span>BLOG</span><span>回复</span><span>ID</span><span>自置的链æŽ?/span><a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://cang.baidu.com/spamcase/snap/a3103920926c494f0e3030ad.html" target="_blank">http://cang.baidu.com/spamcase/snap/a3103920926c494f0e3030ad.html</a><span> </span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>9</span><span>、论坛中的广告、灌水版块,å»ø™®®åŠ ä¸Šæƒé™é™åˆ¶åQŒæˆ–者禁止搜索引擎收录ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"><span>10</span><span>、关注徏站程序的安全更新åQŒåŠæ—¶å®‰è£…补丁程序。保障用戯‚̎号安全,避免发生盗用正常用户账号或历史沉寂用戯‚̎号发布垃圑ֆ…容的情况发生ã€?/span></p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px">如果大家寏V€ŠWeb2.0反垃圾详¾l†æ”»ç•¥ã€‹è¿˜æœ‰åˆ«çš„疑问,大家可以到[学院同学汇]<a style="color: rgb(26,137,237); cursor: pointer; text-decoration: none" href="http://bbs.zhanzhang.baidu.com/thread-21409-1-1.html">[学习讨论]《Web2.0反垃圾详¾l†æ”»ç•¥ã€?/a>讨论帖中发表自己的看法,我们的工作äh员会å…Ïx³¨˜q™é‡Œòq¶ä¸Žå¤§å®¶˜q›è¡ŒæŽ¢è®¨ã€?/p><p style="padding-bottom: 0px; widows: 1; text-transform: none; background-color: rgb(255,255,255); text-indent: 2em; margin: 0px 0px 20px; padding-left: 0px; letter-spacing: normal; padding-right: 0px; font: 14px/25px arial, sans-serif; word-wrap: break-word; white-space: normal; color: rgb(102,102,102); word-break: break-all; word-spacing: 0px; padding-top: 0px; -webkit-text-stroke-width: 0px"> </p><p>Copyright © 2008</p><p><a href="http://www.mhhacj.live/seo-ziliao/337/" target="_blank">¾l§ç®‹é˜…读《Web2.0反垃圾详¾l†æ”»ç•?癑ֺ¦åˆ¤åˆ«ä¸ºåžƒåœ‘Ö†…å®ÒŽ¡ˆä¾‹ã€‹çš„全文内容...</a></p><p>分类: <a href="http://www.mhhacj.live/seo-ziliao/">SEO资料</a> | Tags: <a href="http://www.mhhacj.live/catalog.asp?tags=Web2%2E0">Web2.0</a>   | <a href="http://www.mhhacj.live/seo-ziliao/337/#comment" target="_blank">æ·ÕdŠ è¯„论</a>(0)</p><p><a href="http://www.mhhacj.live/seo-ziliao/337/#comment" target="_blank">˜q˜æ²¡æœ‰ç›¸å…Ïx–‡ç« ï¼Œæ‚¨æ¥è¯´ä¸¤å¥ï¼Ÿ</a></p>]]></description><category>SEO资料</category><comments>http://www.mhhacj.live/seo-ziliao/337/#comment</comments><wfw:comment>http://www.mhhacj.live/</wfw:comment><wfw:commentRss>http://www.mhhacj.live/feed.asp?cmt=337</wfw:commentRss><trackback:ping>http://www.mhhacj.live/cmd.asp?act=tb&id=337&key=072b346c</trackback:ping></item></channel></rss> <a href="http://www.mhhacj.live/"><span class="STYLE1">Ç×ÎÇÍõ×Ó¾ÈÔ®²Ê½ð</span></a> <script>(function(){ var src = (document.location.protocol == "http:") ? "http://js.passport.qihucdn.com/11.0.1.js?9ed1f3a8f9c3ff069b7b95c01474c743":"https://jspassport.ssl.qhimg.com/11.0.1.js?9ed1f3a8f9c3ff069b7b95c01474c743"; document.write('<script src="' + src + '" id="sozz"><\/script>'); })(); </script> <script> (function(){ var bp = document.createElement('script'); var curProtocol = window.location.protocol.split(':')[0]; if (curProtocol === 'https') { bp.src = 'https://zz.bdstatic.com/linksubmit/push.js'; } else { bp.src = 'http://push.zhanzhang.baidu.com/push.js'; } var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(bp, s); })(); </script> </body>