162 messages

org.apache.lucene.nutch-dev [All Lists]

2018 July [All Months]

Page 1 (Messages 1 to 25): 1 2 3 4 5 6 7

[jira] [Commented] (NUTCH-2510) Crawl script modification. HostDb : generate, optional usage and description - Hudson (JIRA)
[jira] [Updated] (NUTCH-2612) Support for sitemap processing by hostname - Markus Jelsma (JIRA)
[jira] [Comment Edited] (NUTCH-2614) NPE in CrawlDbReader - Markus Jelsma (JIRA)
[jira] [Commented] (NUTCH-2612) Support for sitemap processing by hostname - Markus Jelsma (JIRA)
[jira] [Commented] (NUTCH-1541) Indexer plugin to write CSV - ASF GitHub Bot (JIRA)
[jira] [Resolved] (NUTCH-2617) Disable Exchange component by default - Sebastian Nagel (JIRA)
[jira] [Commented] (NUTCH-1541) Indexer plugin to write CSV - ASF GitHub Bot (JIRA)
[jira] [Created] (NUTCH-2618) protocol-okhttp not to use http.timeout for max duration to fetch document - Sebastian Nagel (JIRA)
[jira] [Commented] (NUTCH-2618) protocol-okhttp not to use http.timeout for max duration to fetch document - ASF GitHub Bot (JIRA)
[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers - Roannel Fernández Hernández (JIRA)
[jira] [Updated] (NUTCH-2620) urlfilter-validator incorrectly assumes that top-level domains are not longer than 4 characters - Sebastian Nagel (JIRA)
Jenkins build is back to normal : Nutch-trunk #3545 - Apache Jenkins Server
[jira] [Commented] (NUTCH-1106) Options to skip url's based on length - ASF GitHub Bot (JIRA)
[jira] [Commented] (NUTCH-2616) Review routing of deletions by Exchange component - ASF GitHub Bot (JIRA)
[jira] [Commented] (NUTCH-2353) Create seed file with metadata using the REST API - Sebastian Nagel (JIRA)
[jira] [Commented] (NUTCH-2152) CommonCrawl dump via Service endpoint - ASF GitHub Bot (JIRA)
[jira] [Commented] (NUTCH-2618) protocol-okhttp not to use http.timeout for max duration to fetch document - ASF GitHub Bot (JIRA)
[jira] [Resolved] (NUTCH-1993) Nutch does not use backup parsers - Sebastian Nagel (JIRA)
[jira] [Commented] (NUTCH-2619) protocol-okhttp: allow to keep partially fetched docs as truncated - Hudson (JIRA)
[jira] [Resolved] (NUTCH-2095) WARC exporter for the CommonCrawlDataDumper - Sebastian Nagel (JIRA)
[jira] [Created] (NUTCH-2625) ProtocolFactory.getProtocol(url) may create multiple plugin instances - Sebastian Nagel (JIRA)
[jira] [Resolved] (NUTCH-2624) protocol-okhttp resource leak - Sebastian Nagel (JIRA)
[jira] [Commented] (NUTCH-2623) Fetcher to guarantee delay for same host/domain/ip independent of http/https protocol - ASF GitHub Bot (JIRA)
[jira] [Commented] (NUTCH-2623) Fetcher to guarantee delay for same host/domain/ip independent of http/https protocol - Sebastian Nagel (JIRA)
[jira] [Created] (NUTCH-2629) Documentation for CSV Index Writer - Roannel Fernández Hernández (JIRA)

Page 1 (Messages 1 to 25): 1 2 3 4 5 6 7