153 messages

org.apache.lucene.nutch-dev [All Lists]

2013 July [All Months]

Page 1 (Messages 1 to 25): 1 2 3 4 5 6 7

Jenkins build is back to normal : Nutch-trunk #2263 - Apache Jenkins Server
Adding nutch stage - Ahmet Emre Aladağ
[jira] [Created] (NUTCH-1597) HeadingsParseFilter to trim and remove exess whitespace - Markus Jelsma (JIRA)
[jira] [Created] (NUTCH-1599) Obtain consensus on new description of Nutch - Lewis John McGibbney (JIRA)
Re: Inlinks not being saved in the database - brian4
[jira] [Updated] (NUTCH-1595) Upgrade to Tika 1.4 - Markus Jelsma (JIRA)
Jenkins build is back to normal : Nutch-trunk #2268 - Apache Jenkins Server
[jira] [Commented] (NUTCH-1595) Upgrade to Tika 1.4 - Hudson (JIRA)
[jira] [Created] (NUTCH-1608) SolrDeleteDuplicates bug: choosing preferred page when duplicates does not work - Brian (JIRA)
[jira] [Updated] (NUTCH-1608) SolrDeleteDuplicates bug: choosing preferred page when duplicates does not work - Lewis John McGibbney (JIRA)
[jira] [Created] (NUTCH-1609) java.net.MalformedURLException when running nutch crawl with apache-nutch-2.1.jar with hadoop - vishal toshniwal (JIRA)
[jira] [Resolved] (NUTCH-1609) java.net.MalformedURLException when running nutch crawl with apache-nutch-2.1.jar with hadoop - Lewis John McGibbney (JIRA)
[jira] [Updated] (NUTCH-1611) Elastic Search Indexer Creates field in elastic search "boost" as a string value, so cannot be used in custom boost queries - Markus Jelsma (JIRA)
Jenkins build is back to normal : Nutch-trunk #2286 - Apache Jenkins Server
[jira] [Created] (NUTCH-1613) Timeouts in protocol-httpclient when crawling same host with >2 threads and added cookie strings for both http protocols - Brian (JIRA)
[jira] [Commented] (NUTCH-1614) Plugin to exclude URLs matching regex list from indexing - to enable crawl but do not index - Markus Jelsma (JIRA)
[jira] [Closed] (NUTCH-1612) Getting URl Malformed exception with Nutch 2.2 and Hadoop 1.0.3 - Lewis John McGibbney (JIRA)
[jira] [Updated] (NUTCH-1613) Timeouts in protocol-httpclient when crawling same host with >2 threads and added cookie strings for both http protocols - Lewis John McGibbney (JIRA)
[jira] [Commented] (NUTCH-1616) SegmentMerger missing proper crawl_fetch datum - Sebastian Nagel (JIRA)
[jira] [Updated] (NUTCH-1457) Nutch2 Refactor the update process so that fetched items are only processed once - Riyaz Shaik (JIRA)
[jira] [Updated] (NUTCH-1618) Fetches some websites multiple times for long lasting queues - Talat UYARER (JIRA)
[jira] [Commented] (NUTCH-1124) JUnit test for scoring-opic - Lewis John McGibbney (JIRA)
[jira] [Updated] (NUTCH-1616) SegmentMerger missing proper crawl_fetch datum - Markus Jelsma (JIRA)
[jira] [Commented] (NUTCH-1616) SegmentMerger missing proper crawl_fetch datum - Markus Jelsma (JIRA)
[jira] [Updated] (NUTCH-1294) IndexClean job with solr implementation. - Claudiu Chis (JIRA)

Page 1 (Messages 1 to 25): 1 2 3 4 5 6 7