{"id":1421,"date":"2015-01-19T09:47:43","date_gmt":"2015-01-19T06:17:43","guid":{"rendered":"http:\/\/vua.nadiran.com\/?p=1421"},"modified":"2015-01-19T10:18:28","modified_gmt":"2015-01-19T06:48:28","slug":"%db%8c%da%a9%d9%be%d8%a7%d8%b1%da%86%d9%87-%d8%b3%d8%a7%d8%b2%db%8c-nutch-1-7-%d8%a8%d8%a7-elasticsearch","status":"publish","type":"post","link":"https:\/\/vua.nadiran.com\/?p=1421","title":{"rendered":"\u06cc\u06a9\u067e\u0627\u0631\u0686\u0647 \u0633\u0627\u0632\u06cc Nutch 1.7 \u0628\u0627 ElasticSearch"},"content":{"rendered":"<p>\u06cc\u06a9\u067e\u0627\u0631\u0686\u0647 \u0633\u0627\u0632\u06cc Nutch 1.7 \u0628\u0627 ElasticSearch<\/p>\n<p>\u0642\u0627\u0628\u0644\u06cc\u062a \u06cc\u06a9\u067e\u0627\u0631\u0686\u0647 \u0633\u0627\u0632\u06cc Nutch 1.7 \u0628\u0627 ElasticSearch \u0628\u0648\u062c\u0648\u062f \u0622\u0645\u062f\u0647 \u0627\u0633\u062a.<br \/>\n\u062a\u0646\u0638\u06cc\u0645 \u06cc\u06a9\u067e\u0627\u0631\u0686\u0647 \u0633\u0627\u0632\u06cc \u0627\u0631\u0632\u0634 \u0641\u0648\u0642 \u0627\u0644\u0639\u0627\u062f\u0647 \u0627\u06cc \u062f\u0627\u0631\u062f<\/p>\n<p>\u0627\u06cc\u0646 \u0631\u0627\u0647\u0646\u0645\u0627 \u0628\u0631\u0627\u06cc \u0627\u0641\u0631\u0627\u062f\u06cc \u06a9\u0647 \u0628\u0627 Nutch \u0648 ElasticSearch \u06a9\u0627\u0631 \u06a9\u0631\u062f\u0647 \u0627\u0646\u062f \u0645\u06cc \u062a\u0648\u0627\u0646\u062f \u062f\u0633\u062a\u0648\u0631\u0627\u0644\u0639\u0645\u0644 \u062e\u0648\u0628\u06cc \u0628\u0627\u0634\u062f<\/p>\n<p>Nutch \u06a9\u0627\u0631 \u062e\u0632\u0634 (Crawl) \u060c \u0648\u0627\u06a9\u0634\u06cc (fetch) \u0648 \u062a\u062c\u0632\u06cc\u0647 (parse) \u0631\u0627 \u0628\u0631\u0627\u06cc \u0646\u0645\u0627\u06cc\u0647 \u0633\u0627\u0632\u06cc (indexing) \u0628\u0647 \u0637\u0648\u0631 \u0645\u0639\u062c\u0632\u0627 \u0622\u0633\u0627\u06cc\u06cc \u0627\u0646\u062c\u0627\u0645 \u0645\u06cc \u062f\u0647\u062f \u060c \u0648\u0644\u06cc \u0628\u0627 \u0627\u06cc\u0646 \u062d\u0627\u0644 \u06cc\u06a9\u067e\u0627\u0631\u0686\u0647 \u0646\u06cc\u0633\u062a.<\/p>\n<p>\u0627\u06cc\u0646 \u06a9\u0627\u0631\u06cc \u06a9\u0647 \u0627\u0646\u062c\u0627\u0645 \u0645\u06cc\u062f\u0647\u06cc\u0645 \u062a\u063a\u06cc\u06cc\u0631 \u0641\u0627\u06cc\u0644 nutch-site.xml \u062f\u0631 \u0634\u0627\u062e\u0647 conf \u062f\u0631 \u062c\u0627\u06cc\u06cc \u06a9\u0647 Nutch \u0646\u0635\u0628 \u0634\u062f\u0647 \u0627\u0633\u062a.<br \/>\n\u0627\u0648\u0644 \u0627\u0632 \u0647\u0645\u0647 \u0627\u062d\u062a\u06cc\u0627\u062c \u062f\u0627\u0631\u06cc\u0645 \u06a9\u0647 \u0627\u0641\u0632\u0648\u0646\u0647 \u0646\u0645\u0627\u06cc\u0647 \u0633\u0627\u0632 ( Indexer Plugin ) \u0631\u0627 \u0641\u0639\u0627\u0644 \u06a9\u0646\u06cc\u0645 \u06a9\u0647 \u0627\u06cc\u0646 \u06a9\u0627\u0631 \u0631\u0627 \u0628\u0627 \u062f\u0633\u062a\u0648\u0631\u0627\u062a \u0632\u06cc\u0631 \u0627\u0646\u062c\u0627\u0645 \u0645\u06cc\u062f\u0647\u06cc\u0645 :<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;property&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;name&gt;plugin.includes&lt;\/name&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;value&gt;protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-elastic|scoring-opic|urlnormalizer-(pass|regex|basic)&lt;\/value&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;description&gt;Regular expression naming plugin directory names to\u00a0include. Any plugin not matching this expression is excluded.<\/p>\n<p style=\"direction: ltr; text-align: left;\">In any<strong>\u00a0<span style=\"color: #3366ff;\">case<\/span><\/strong><span style=\"color: #3366ff;\">\u00a0<\/span>you need at least include the nutch-extensionpoints plugin. By\u00a0default Nutch includes crawling just HTML and plain text via HTTP,<\/p>\n<p style=\"direction: ltr; text-align: left;\">and basic indexing and search plugins. In order to use HTTPS please\u00a0<span style=\"color: #ff00ff;\"><strong>enable\u00a0<\/strong><\/span>protocol-httpclient, but be aware of possible intermittent problems with the<\/p>\n<p style=\"direction: ltr; text-align: left;\">underlying commons-httpclient library.<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;\/description&gt;<\/p>\n<p style=\"direction: ltr;\">&lt;\/property&gt;<\/p>\n<p>\u0622\u06cc\u062a\u0645 \u0647\u0627\u06cc \u06a9\u0647 \u0627\u06cc\u0646\u062c\u0627 \u0627\u0636\u0627\u0641\u0647 \u0634\u062f\u0647 \u0627\u0646\u062f \u0628\u0631\u0627\u06cc \u0646\u0645\u0627\u06cc\u0647 \u0633\u0627\u0632 Elastic \u0647\u0633\u062a\u0646\u062f.<br \/>\n\u062f\u0631 \u0645\u0631\u062d\u0644\u0647 \u062f\u0648\u0645 \u0627\u062d\u062a\u06cc\u0627\u062c \u062f\u0627\u0631\u06cc\u0645 \u06a9\u0647 \u0645\u0648\u0627\u0631\u062f \u0632\u06cc\u0631 \u0631\u0627 \u062f\u0631 nutch-site.xml \u062a\u063a\u06cc\u06cc\u0631 \u062f\u0647\u06cc\u0645<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;!&#8211; Elasticsearch properties &#8211;&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;property&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;name&gt;elastic.host&lt;\/name&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;value&gt;<span style=\"color: #333333;\">localhost<\/span>&lt;\/value&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;description&gt;The<strong>\u00a0<span style=\"color: #ff00ff;\">hostname\u00a0<\/span><\/strong>to send documents to using TransportClient. Either host<\/p>\n<p style=\"direction: ltr; text-align: left;\">and port must be defined or cluster.&lt;\/description&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;\/property&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;property&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;name&gt;elastic.port&lt;\/name&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;value&gt;9300&lt;\/value&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;description&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;\/description&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;\/property&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;property&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;name&gt;elastic.cluster&lt;\/name&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;value&gt;elasticsearch&lt;\/value&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;description&gt;The cluster name to discover. Either host and potr must be defined<\/p>\n<p style=\"direction: ltr; text-align: left;\">or cluster.&lt;\/description&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;\/property&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;property&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;name&gt;elastic.index&lt;\/name&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;value&gt;nutch&lt;\/value&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;description&gt;Default index to send documents to.&lt;\/description&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;\/property&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;property&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;name&gt;elastic.max.bulk.docs&lt;\/name&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;value&gt;250&lt;\/value&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;description&gt;Maximum size of the bulk\u00a0<span style=\"color: #3366ff;\"><strong>in\u00a0<\/strong><\/span>number of documents.&lt;\/description&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;\/property&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;property&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;name&gt;elastic.max.bulk.size&lt;\/name&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;value&gt;2500500&lt;\/value&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;description&gt;Maximum size of the bulk\u00a0<span style=\"color: #3366ff;\"><strong>in\u00a0<\/strong><\/span>bytes.&lt;\/description&gt;<\/p>\n<p style=\"direction: ltr; text-align: left;\">&lt;\/property&gt;<\/p>\n<p>\u062f\u0631 \u0627\u06cc\u0646 \u0645\u0648\u0631\u062f \u0645\u0646 ElasticSearch \u0631\u0627 \u0631\u0648\u06cc \u0647\u0645\u0627\u0646 \u06a9\u06cc\u0633 \u0646\u0635\u0628 \u06a9\u0631\u062f\u0647 \u0627\u0645 \u060c \u0628\u0647 \u0647\u0645\u06cc\u0646 \u062f\u0644\u06cc\u0644 elastic.host \u0646\u0627\u0645 localhost \u0645\u0646 \u0647\u0633\u062a<\/p>\n<p>\u0646\u06a9\u062a\u0647 \u0645\u0647\u0645 \u062f\u06cc\u06af\u0631 \u0646\u0627\u0645 elastic.cluster \u0627\u0633\u062a\u060c \u0627\u06af\u0631 \u0634\u0645\u0627 \u0686\u06cc\u0632\u06cc \u062f\u0631 \u0627\u06cc\u0646 \u0645\u0648\u0631\u062f \u0646\u0645\u06cc\u062f\u0627\u0646\u06cc\u062f \u0641\u0627\u06cc\u0644 elasticsearch.yml \u0631\u0627 \u062f\u0631 \u0634\u0627\u062e\u0647 \u0627\u06cc \u06a9\u0647 \u062a\u0646\u0638\u06cc\u0645\u0627\u062a \u0646\u0635\u0628 ElasticSearch \u0642\u0631\u0627\u0631 \u062f\u0627\u0631\u062f \u0645\u06cc \u062a\u0648\u0627\u0646\u06cc\u062f \u067e\u06cc\u062f\u0627 \u06a9\u0646\u06cc\u062f.<\/p>\n<p>\u067e\u0648\u0631\u062a elastic.port \u0628\u0647 \u0635\u0648\u0631\u062a \u067e\u06cc\u0634 \u0641\u0631\u0636 \u06f9\u06f3\u06f0\u06f0 \u0628\u0631\u0627\u06cc \u0648\u0627\u0633\u0637 \u0627\u0633\u062a ( \u0628\u0631\u0627\u06cc \u062e\u0631\u0645\u062c\u06cc \u0648\u0628 \u067e\u0648\u0631\u062a \u06f9\u06f2\u06f0\u06f0 \u06a9\u0647 \u0632\u0645\u0627\u0646\u06cc \u0627\u0633\u062a \u06a9\u0647 \u0628\u0627 nutch \u06cc\u06a9\u067e\u0627\u0631\u0686\u0647 \u0633\u0627\u0632\u06cc \u0646\u0634\u062f\u0647 ).<br \/>\n\u062f\u0631 \u0646\u0647\u0627\u06cc\u062a \u0627\u06cc\u0646\u062f\u06a9\u0633 \u0631\u0627 \u062f\u0631 ElasticSearch \u062f\u0631 \u0641\u0627\u06cc\u0644 \u062a\u0646\u0638\u06cc\u0645\u0627\u062a elastic.index \u0628\u0633\u0627\u0632\u06cc\u062f.<\/p>\n<p>\u062f\u06cc\u06af\u0631 \u0646\u06cc\u0627\u0632 \u0646\u06cc\u0633\u062a \u06a9\u0647 conf\/elasticsearch.conf \u0631\u0627 \u062a\u063a\u06cc\u06cc\u0631 \u062f\u0647\u06cc\u062f \u0648 \u06cc\u0627 \u0628\u0647 Nutch 2.x \u0627\u0631\u062a\u0642\u0627 \u062f\u0647\u06cc\u0645.<\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: right;\">\u062a\u0631\u062c\u0645\u0647 : \u0646\u0627\u062f\u06cc \u0633\u0646\u062c\u0627\u0646\u06cc<\/p>\n<p style=\"direction: rtl; text-align: left;\">\u0645\u0646\u0628\u0639 : <a href=\"https:\/\/www.mind-it.info\/integrating-nutch-1-7-elasticsearch\/\">https:\/\/www.mind-it.info\/integrating-nutch-1-7-elasticsearch<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u06cc\u06a9\u067e\u0627\u0631\u0686\u0647 \u0633\u0627\u0632\u06cc Nutch 1.7 \u0628\u0627 ElasticSearch \u0642\u0627\u0628\u0644\u06cc\u062a \u06cc\u06a9\u067e\u0627\u0631\u0686\u0647 \u0633\u0627\u0632\u06cc Nutch 1.7 \u0628\u0627 ElasticSearch \u0628\u0648\u062c\u0648\u062f \u0622\u0645\u062f\u0647 \u0627\u0633\u062a. \u062a\u0646\u0638\u06cc\u0645 \u06cc\u06a9\u067e\u0627\u0631\u0686\u0647 \u0633\u0627\u0632\u06cc \u0627\u0631\u0632\u0634 \u0641\u0648\u0642 \u0627\u0644\u0639\u0627\u062f\u0647 \u0627\u06cc \u062f\u0627\u0631\u062f \u0627\u06cc\u0646 \u0631\u0627\u0647\u0646\u0645\u0627 \u0628\u0631\u0627\u06cc \u0627\u0641\u0631\u0627\u062f\u06cc \u06a9\u0647 \u0628\u0627 Nutch \u0648 ElasticSearch \u06a9\u0627\u0631 \u06a9\u0631\u062f\u0647 \u0627\u0646\u062f \u0645\u06cc \u062a\u0648\u0627\u0646\u062f \u062f\u0633\u062a\u0648\u0631\u0627\u0644\u0639\u0645\u0644 \u062e\u0648\u0628\u06cc \u0628\u0627\u0634\u062f Nutch \u06a9\u0627\u0631 \u062e\u0632\u0634 (Crawl) \u060c \u0648\u0627\u06a9\u0634\u06cc (fetch) \u0648 \u062a\u062c\u0632\u06cc\u0647 (parse) \u0631\u0627 \u0628\u0631\u0627\u06cc \u0646\u0645\u0627\u06cc\u0647 \u0633\u0627\u0632\u06cc <a href='https:\/\/vua.nadiran.com\/?p=1421' class='excerpt-more'>[&#8230;]<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[34],"tags":[],"class_list":["post-1421","post","type-post","status-publish","format-standard","hentry","category-34","category-34-id","post-seq-1","post-parity-odd","meta-position-corners","fix"],"_links":{"self":[{"href":"https:\/\/vua.nadiran.com\/index.php?rest_route=\/wp\/v2\/posts\/1421"}],"collection":[{"href":"https:\/\/vua.nadiran.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vua.nadiran.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vua.nadiran.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/vua.nadiran.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1421"}],"version-history":[{"count":4,"href":"https:\/\/vua.nadiran.com\/index.php?rest_route=\/wp\/v2\/posts\/1421\/revisions"}],"predecessor-version":[{"id":1424,"href":"https:\/\/vua.nadiran.com\/index.php?rest_route=\/wp\/v2\/posts\/1421\/revisions\/1424"}],"wp:attachment":[{"href":"https:\/\/vua.nadiran.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1421"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vua.nadiran.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1421"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vua.nadiran.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1421"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}