- CACHE - Static variable in class org.apache.nutch.protocol.RobotRulesParser
-
- CACHING_FORBIDDEN_ALL - Static variable in interface org.apache.nutch.metadata.Nutch
-
Don't show either original forbidden content or summaries.
- CACHING_FORBIDDEN_CONTENT - Static variable in interface org.apache.nutch.metadata.Nutch
-
Don't show original forbidden content, but show summaries.
- CACHING_FORBIDDEN_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
Sites may request that search engines don't provide access to cached
documents.
- CACHING_FORBIDDEN_KEY_UTF8 - Static variable in interface org.apache.nutch.metadata.Nutch
-
- CACHING_FORBIDDEN_NONE - Static variable in interface org.apache.nutch.metadata.Nutch
-
Show both original forbidden content and summaries (default).
- calculate(WebPage) - Method in class org.apache.nutch.crawl.MD5Signature
-
- calculate(WebPage) - Method in class org.apache.nutch.crawl.Signature
-
- calculate(WebPage) - Method in class org.apache.nutch.crawl.TextMD5Signature
-
- calculate(WebPage) - Method in class org.apache.nutch.crawl.TextProfileSignature
-
- calculateLastFetchTime(WebPage) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
This method return the last fetch time of the CrawlDatum
- calculateLastFetchTime(WebPage) - Method in interface org.apache.nutch.crawl.FetchSchedule
-
Calculates last fetch time of the given CrawlDatum.
- call() - Method in class org.apache.nutch.webui.client.impl.RemoteCommandExecutor.JobStateChecker
-
- canStop(boolean) - Method in class org.apache.nutch.api.NutchServer
-
Safety and convenience method to determine whether or not it is safe to
shut down the server.
- CCIndexingFilter - Class in org.creativecommons.nutch
-
Adds basic searchable fields to a document.
- CCIndexingFilter() - Constructor for class org.creativecommons.nutch.CCIndexingFilter
-
- CCParseFilter - Class in org.creativecommons.nutch
-
Adds metadata identifying the Creative Commons license used, if any.
- CCParseFilter() - Constructor for class org.creativecommons.nutch.CCParseFilter
-
- CCParseFilter.Walker - Class in org.creativecommons.nutch
-
Walks DOM tree, looking for RDF in comments and licenses in anchors.
- cdata(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive notification of cdata.
- CHAR_ENCODING_FOR_CONVERSION - Static variable in interface org.apache.nutch.metadata.Nutch
-
- characters(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive notification of character data.
- charactersRaw(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
If available, when the disable-output-escaping attribute is used, output
raw text without escaping.
- CHECK_BLOCKING - Static variable in interface org.apache.nutch.protocol.Protocol
-
Property name.
- CHECK_ROBOTS - Static variable in interface org.apache.nutch.protocol.Protocol
-
Property name.
- checkClientTrusted(X509Certificate[], String) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
-
- checkMark(WebPage) - Method in enum org.apache.nutch.storage.Mark
-
- checkOutputSpecs(JobContext) - Method in class org.apache.nutch.indexer.IndexerOutputFormat
-
- checkServerTrusted(X509Certificate[], String) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
-
- childLen - Variable in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
-
- children - Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
-
- childrenList - Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
-
- chooseRepr(String, String, boolean) - Static method in class org.apache.nutch.util.URLUtil
-
Given two urls, a src and a destination of a redirect, it returns the
representative url.
- CircularDependencyException - Exception in org.apache.nutch.plugin
-
CircularDependencyException
will be thrown if a circular
dependency is detected.
- CircularDependencyException(Throwable) - Constructor for exception org.apache.nutch.plugin.CircularDependencyException
-
- CircularDependencyException(String) - Constructor for exception org.apache.nutch.plugin.CircularDependencyException
-
- cleanField(String) - Static method in class org.apache.nutch.util.StringUtil
-
Takes in a String value and cleans out any offending "�"
- CleaningJob - Class in org.apache.nutch.indexer
-
- CleaningJob() - Constructor for class org.apache.nutch.indexer.CleaningJob
-
- CleaningJob.CleanMapper - Class in org.apache.nutch.indexer
-
- CleaningJob.CleanReducer - Class in org.apache.nutch.indexer
-
- CleanMapper() - Constructor for class org.apache.nutch.indexer.CleaningJob.CleanMapper
-
- cleanMimeType(String) - Static method in class org.apache.nutch.util.MimeUtil
-
Cleans a MimeType
name by removing out the actual MimeType
,
from a string of the form:
- CleanReducer() - Constructor for class org.apache.nutch.indexer.CleaningJob.CleanReducer
-
- cleanup(Reducer<UrlWithScore, NutchWritable, String, WebPage>.Context) - Method in class org.apache.nutch.crawl.DbUpdateReducer
-
- cleanup(Reducer<Text, LongWritable, Text, LongWritable>.Context) - Method in class org.apache.nutch.crawl.WebTableReader.WebTableStatCombiner
-
- cleanup(Reducer<Text, LongWritable, Text, LongWritable>.Context) - Method in class org.apache.nutch.crawl.WebTableReader.WebTableStatReducer
-
- cleanup(Reducer<String, WebPage, NullWritable, NullWritable>.Context) - Method in class org.apache.nutch.indexer.CleaningJob.CleanReducer
-
- cleanup(Mapper<String, WebPage, String, NutchDocument>.Context) - Method in class org.apache.nutch.indexer.IndexingJob.IndexerMapper
-
- cleanup(Reducer<Text, SolrDeleteDuplicates.SolrRecord, Text, SolrDeleteDuplicates.SolrRecord>.Context) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
-
- clear() - Method in class org.apache.nutch.metadata.Metadata
-
Remove all mappings from metadata.
- clearArgs() - Method in class org.apache.nutch.storage.ParseStatus.Builder
-
Clears the value of the 'args' field
- clearArgs() - Method in class org.apache.nutch.storage.ProtocolStatus.Builder
-
Clears the value of the 'args' field
- clearBaseUrl() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'baseUrl' field
- clearBatchId() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'batchId' field
- clearClues() - Method in class org.apache.nutch.util.EncodingDetector
-
Clears all clues.
- clearCode() - Method in class org.apache.nutch.storage.ProtocolStatus.Builder
-
Clears the value of the 'code' field
- clearContent() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'content' field
- clearContentType() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'contentType' field
- clearFetchInterval() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'fetchInterval' field
- clearFetchTime() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'fetchTime' field
- clearHeaders() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'headers' field
- clearInlinks() - Method in class org.apache.nutch.storage.Host.Builder
-
Clears the value of the 'inlinks' field
- clearInlinks() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'inlinks' field
- clearLastModified() - Method in class org.apache.nutch.storage.ProtocolStatus.Builder
-
Clears the value of the 'lastModified' field
- clearMajorCode() - Method in class org.apache.nutch.storage.ParseStatus.Builder
-
Clears the value of the 'majorCode' field
- clearMarkers() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'markers' field
- clearMetadata() - Method in class org.apache.nutch.storage.Host.Builder
-
Clears the value of the 'metadata' field
- clearMetadata() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'metadata' field
- clearMinorCode() - Method in class org.apache.nutch.storage.ParseStatus.Builder
-
Clears the value of the 'minorCode' field
- clearModifiedTime() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'modifiedTime' field
- clearOutlinks() - Method in class org.apache.nutch.storage.Host.Builder
-
Clears the value of the 'outlinks' field
- clearOutlinks() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'outlinks' field
- clearParseStatus() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'parseStatus' field
- clearPrevFetchTime() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'prevFetchTime' field
- clearPrevModifiedTime() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'prevModifiedTime' field
- clearPrevSignature() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'prevSignature' field
- clearProtocolStatus() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'protocolStatus' field
- clearReprUrl() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'reprUrl' field
- clearRetriesSinceFetch() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'retriesSinceFetch' field
- clearScore() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'score' field
- clearSignature() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'signature' field
- clearSitemaps() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'sitemaps' field
- clearStatus() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'status' field
- clearStmPriority() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'stmPriority' field
- clearText() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'text' field
- clearTitle() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Clears the value of the 'title' field
- Client - Class in org.apache.nutch.protocol.ftp
-
Client.java encapsulates functionalities necessary for nutch to get dir list
and retrieve file from an FTP server.
- Client() - Constructor for class org.apache.nutch.protocol.ftp.Client
-
- close() - Method in class org.apache.nutch.crawl.WebTableReader.WebTableStatMapper
-
- close() - Method in class org.apache.nutch.host.HostDb
-
- close() - Method in interface org.apache.nutch.indexer.IndexWriter
-
- close() - Method in class org.apache.nutch.indexer.IndexWriters
-
- close() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecordReader
-
- close() - Method in class org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
-
- close() - Method in class org.apache.nutch.indexwriter.hbase.HBaseIndexWriter
-
- close() - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- close() - Method in class org.apache.nutch.tools.arc.ArcRecordReader
-
Closes the record reader resources.
- close() - Method in class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsMapper
-
- closeReaders(SequenceFile.Reader[]) - Static method in class org.apache.nutch.util.FSUtils
-
Closes a group of SequenceFile readers.
- closeReaders(MapFile.Reader[]) - Static method in class org.apache.nutch.util.FSUtils
-
Closes a group of MapFile readers.
- CLUSTER - Static variable in interface org.apache.nutch.indexwriter.elastic.ElasticConstants
-
- CollectionManager - Class in org.apache.nutch.collection
-
- CollectionManager(Configuration) - Constructor for class org.apache.nutch.collection.CollectionManager
-
- CollectionManager() - Constructor for class org.apache.nutch.collection.CollectionManager
-
Used for testing
- ColorEnumLabel<E extends java.lang.Enum<E>> - Class in org.apache.nutch.webui.pages.components
-
Label which renders connection status as bootstrap label
- ColorEnumLabelBuilder<E extends java.lang.Enum<E>> - Class in org.apache.nutch.webui.pages.components
-
- ColorEnumLabelBuilder(String) - Constructor for class org.apache.nutch.webui.pages.components.ColorEnumLabelBuilder
-
- commandExecuted(Crawl, RemoteCommand, int) - Method in interface org.apache.nutch.webui.client.impl.CrawlingCycleListener
-
- commandExecuted(Crawl, RemoteCommand, int) - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- CommandRunner - Class in org.apache.nutch.util
-
- CommandRunner() - Constructor for class org.apache.nutch.util.CommandRunner
-
- comment(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Report an XML comment anywhere in the document.
- commit() - Method in interface org.apache.nutch.indexer.IndexWriter
-
- commit() - Method in class org.apache.nutch.indexer.IndexWriters
-
- commit() - Method in class org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
-
- commit() - Method in class org.apache.nutch.indexwriter.hbase.HBaseIndexWriter
-
- commit() - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- COMMIT_INDEX - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
-
- COMMIT_INDEX - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- COMMIT_SIZE - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
-
- COMMIT_SIZE - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- compare(byte[], byte[]) - Static method in class org.apache.nutch.crawl.SignatureComparator
-
- compare(ByteBuffer, ByteBuffer) - Static method in class org.apache.nutch.crawl.SignatureComparator
-
- compare(UrlWithScore, UrlWithScore) - Method in class org.apache.nutch.crawl.UrlWithScore.UrlScoreComparator
-
- compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.crawl.UrlWithScore.UrlScoreComparator
-
- compare(UrlWithScore, UrlWithScore) - Method in class org.apache.nutch.crawl.UrlWithScore.UrlScoreComparator.UrlOnlyComparator
-
- compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.crawl.UrlWithScore.UrlScoreComparator.UrlOnlyComparator
-
- compare(byte[], byte[]) - Method in class org.apache.nutch.util.Bytes.ByteArrayComparator
-
- compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.util.Bytes.ByteArrayComparator
-
- compareTo(GeneratorJob.SelectorEntry) - Method in class org.apache.nutch.crawl.GeneratorJob.SelectorEntry
-
- compareTo(UrlWithScore) - Method in class org.apache.nutch.crawl.UrlWithScore
-
- compareTo(byte[], byte[]) - Static method in class org.apache.nutch.util.Bytes
-
- compareTo(byte[], int, int, byte[], int, int) - Static method in class org.apache.nutch.util.Bytes
-
Lexographically compare two arrays.
- compareTo(TrieStringMatcher.TrieNode) - Method in class org.apache.nutch.util.TrieStringMatcher.TrieNode
-
- conf - Variable in class org.apache.nutch.plugin.Plugin
-
- conf - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- configManager - Variable in class org.apache.nutch.api.resources.AbstractResource
-
- ConfigResource - Class in org.apache.nutch.api.resources
-
- ConfigResource() - Constructor for class org.apache.nutch.api.resources.ConfigResource
-
- ConfManager - Interface in org.apache.nutch.api
-
- ConnectionStatus - Enum in org.apache.nutch.webui.client.model
-
- constructRealm(JaxRsApplication, ConfManager, String) - Static method in class org.apache.nutch.api.security.SecurityUtils
-
Constructs realm
- contains(String) - Method in class org.apache.nutch.storage.Host
-
- Content - Class in org.apache.nutch.protocol
-
- Content() - Constructor for class org.apache.nutch.protocol.Content
-
- Content(String, String, byte[], String, Metadata, Configuration) - Constructor for class org.apache.nutch.protocol.Content
-
- Content(String, String, byte[], String, Metadata, MimeUtil) - Constructor for class org.apache.nutch.protocol.Content
-
- CONTENT_DISPOSITION - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- CONTENT_ENCODING - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- CONTENT_LANGUAGE - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- CONTENT_LENGTH - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- CONTENT_LOCATION - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- CONTENT_MD5 - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- CONTENT_TYPE - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- CONTENT_TYPE_UTF8 - Static variable in class org.apache.nutch.util.EncodingDetector
-
- CONTRIBUTOR - Static variable in interface org.apache.nutch.metadata.DublinCore
-
An entity responsible for making contributions to the content of the
resource.
- convertPage(WebPage, Set<String>) - Static method in class org.apache.nutch.api.impl.db.DbPageConverter
-
- COVERAGE - Static variable in interface org.apache.nutch.metadata.DublinCore
-
The extent or scope of the content of the resource.
- CpmIteratorAdapter<T> - Class in org.apache.nutch.webui.pages.components
-
This is iterator adapter, which wraps iterable items with
CompoundPropertyModel.
- CpmIteratorAdapter(Iterable<T>) - Constructor for class org.apache.nutch.webui.pages.components.CpmIteratorAdapter
-
- Crawl - Class in org.apache.nutch.webui.client.model
-
- Crawl() - Constructor for class org.apache.nutch.webui.client.model.Crawl
-
- Crawl.CrawlStatus - Enum in org.apache.nutch.webui.client.model
-
- CRAWL_ID_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- CRAWLDB_ADDITIONS_ALLOWED - Static variable in class org.apache.nutch.crawl.DbUpdateReducer
-
- CrawlingCycle - Class in org.apache.nutch.webui.client.impl
-
This class implements crawl cycle as in crawl script
- CrawlingCycle(CrawlingCycleListener, RemoteCommandExecutor, Crawl, List<RemoteCommand>) - Constructor for class org.apache.nutch.webui.client.impl.CrawlingCycle
-
- CrawlingCycleListener - Interface in org.apache.nutch.webui.client.impl
-
- crawlingFinished(Crawl) - Method in interface org.apache.nutch.webui.client.impl.CrawlingCycleListener
-
- crawlingFinished(Crawl) - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- crawlingStarted(Crawl) - Method in interface org.apache.nutch.webui.client.impl.CrawlingCycleListener
-
- crawlingStarted(Crawl) - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- CrawlPanel - Class in org.apache.nutch.webui.pages.crawls
-
- CrawlPanel(String) - Constructor for class org.apache.nutch.webui.pages.crawls.CrawlPanel
-
- CrawlService - Interface in org.apache.nutch.webui.service
-
- CrawlServiceImpl - Class in org.apache.nutch.webui.service.impl
-
- CrawlServiceImpl() - Constructor for class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- CrawlsPage - Class in org.apache.nutch.webui.pages.crawls
-
This page is for crawls management
- CrawlsPage() - Constructor for class org.apache.nutch.webui.pages.crawls.CrawlsPage
-
- CrawlStatus - Class in org.apache.nutch.crawl
-
- CrawlStatus() - Constructor for class org.apache.nutch.crawl.CrawlStatus
-
- create(NutchConfig) - Method in interface org.apache.nutch.api.ConfManager
-
- create(NutchConfig) - Method in class org.apache.nutch.api.impl.RAMConfManager
-
Creates hadoop configuration for given Nutch configuration.
- create(JobConfig) - Method in class org.apache.nutch.api.impl.RAMJobManager
-
- create(JobConfig) - Method in interface org.apache.nutch.api.JobManager
-
- create(JobConfig) - Method in class org.apache.nutch.api.resources.JobResource
-
- create() - Static method in class org.apache.nutch.util.NutchConfiguration
-
- create(boolean, Properties) - Static method in class org.apache.nutch.util.NutchConfiguration
-
- createClient() - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- createCommands(Crawl) - Method in class org.apache.nutch.webui.client.impl.RemoteCommandsBatchFactory
-
- createConfig(NutchConfig) - Method in class org.apache.nutch.api.resources.ConfigResource
-
- createCrawlDao() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- createDao(Class<T>) - Method in class org.apache.nutch.webui.config.CustomDaoFactory
-
- createKey() - Method in class org.apache.nutch.tools.arc.ArcRecordReader
-
Creates a new instance of the Text
object for the key.
- createLockFile(FileSystem, Path, boolean) - Static method in class org.apache.nutch.util.LockUtil
-
Create a lock file.
- createNutchDao() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
-
- createRule(boolean, String) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
- createRule(boolean, String) - Method in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
-
- createRule(boolean, String) - Method in class org.apache.nutch.urlfilter.regex.RegexURLFilter
-
- createSeed(SeedList) - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- createSeed(SeedList) - Method in interface org.apache.nutch.webui.client.NutchClient
-
Create seed list and return seed directory location
- createSeedFile(SeedList) - Method in class org.apache.nutch.api.resources.SeedResource
-
- createSeedListDao() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- createSeedUrlDao() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- createSocket(String, int, InetAddress, int) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
-
- createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
-
Attempts to get a new socket connection to the given host within the given
time limit.
- createSocket(String, int) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
-
- createSocket(Socket, String, int, boolean) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
-
- createSubCollection(String, String) - Method in class org.apache.nutch.collection.CollectionManager
-
Create a new subcollection.
- createTableCreator() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- createToolByClassName(String, Configuration) - Method in class org.apache.nutch.api.impl.JobFactory
-
- createToolByType(JobManager.JobType, Configuration) - Method in class org.apache.nutch.api.impl.JobFactory
-
- createValue() - Method in class org.apache.nutch.tools.arc.ArcRecordReader
-
Creates a new instance of the BytesWritable
object for the key
- createWebStore(Configuration, Class<K>, Class<V>) - Static method in class org.apache.nutch.storage.StorageUtils
-
Creates a store for the given persistentClass.
- CreativeCommons - Interface in org.apache.nutch.metadata
-
A collection of Creative Commons properties names.
- CREATOR - Static variable in interface org.apache.nutch.metadata.DublinCore
-
An entity primarily responsible for making the content of the resource.
- currentInstance - Variable in class org.apache.nutch.webui.pages.AbstractBasePage
-
- currentJob - Variable in class org.apache.nutch.util.NutchTool
-
- currentJobNum - Variable in class org.apache.nutch.util.NutchTool
-
- CustomDaoFactory - Class in org.apache.nutch.webui.config
-
- CustomDaoFactory(ConnectionSource) - Constructor for class org.apache.nutch.webui.config.CustomDaoFactory
-
- CustomTableCreator - Class in org.apache.nutch.webui.config
-
- CustomTableCreator(ConnectionSource, List<Dao<?, ?>>) - Constructor for class org.apache.nutch.webui.config.CustomTableCreator
-
- generate(long, long, boolean, boolean, boolean) - Method in class org.apache.nutch.crawl.GeneratorJob
-
Mark URLs ready for fetching.
- GENERATE_COUNT - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATE_TIME_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- GENERATE_UPDATE_CRAWLDB - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATOR_COUNT_MODE - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATOR_COUNT_VALUE_DOMAIN - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATOR_COUNT_VALUE_HOST - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATOR_COUNT_VALUE_IP - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATOR_CUR_TIME - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATOR_DELAY - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATOR_FILTER - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATOR_MAX_COUNT - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATOR_MIN_SCORE - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATOR_NORMALISE - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATOR_RANDOM_SEED - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATOR_SITEMAP - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GENERATOR_TOP_N - Static variable in class org.apache.nutch.crawl.GeneratorJob
-
- GeneratorJob - Class in org.apache.nutch.crawl
-
- GeneratorJob() - Constructor for class org.apache.nutch.crawl.GeneratorJob
-
- GeneratorJob(Configuration) - Constructor for class org.apache.nutch.crawl.GeneratorJob
-
- GeneratorJob.SelectorEntry - Class in org.apache.nutch.crawl
-
- GeneratorJob.SelectorEntryComparator - Class in org.apache.nutch.crawl
-
- GeneratorMapper - Class in org.apache.nutch.crawl
-
- GeneratorMapper() - Constructor for class org.apache.nutch.crawl.GeneratorMapper
-
- GeneratorReducer - Class in org.apache.nutch.crawl
-
Reduce class for generate
The #reduce() method write a random integer to all generated URLs.
- GeneratorReducer() - Constructor for class org.apache.nutch.crawl.GeneratorReducer
-
- generatorSortValue(String, WebPage, float) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- generatorSortValue(String, WebPage, float) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
- generatorSortValue(String, WebPage, float) - Method in interface org.apache.nutch.scoring.ScoringFilter
-
This method prepares a sort value for the purpose of sorting and selecting
top N scoring pages during fetchlist generation.
- generatorSortValue(String, WebPage, float) - Method in class org.apache.nutch.scoring.ScoringFilters
-
Calculate a sort value for Generate.
- generatorSortValue(String, WebPage, float) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- GenericWritableConfigurable - Class in org.apache.nutch.util
-
A generic Writable wrapper that can inject Configuration to
Configurable
s
- GenericWritableConfigurable() - Constructor for class org.apache.nutch.util.GenericWritableConfigurable
-
- get(String) - Method in interface org.apache.nutch.api.ConfManager
-
- get(String) - Method in class org.apache.nutch.api.impl.RAMConfManager
-
Returns configuration map for give configuration id.
- get(String, String) - Method in class org.apache.nutch.api.impl.RAMJobManager
-
- get(String, String) - Method in interface org.apache.nutch.api.JobManager
-
- get(String) - Method in class org.apache.nutch.host.HostDb
-
- get(String) - Method in class org.apache.nutch.metadata.Metadata
-
Get the value associated to a metadata name.
- get(String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
- get(Configuration) - Static method in class org.apache.nutch.plugin.PluginRepository
-
- get(int) - Method in class org.apache.nutch.storage.Host
-
- get(int) - Method in class org.apache.nutch.storage.ParseStatus
-
- get(int) - Method in class org.apache.nutch.storage.ProtocolStatus
-
- get(int) - Method in class org.apache.nutch.storage.WebPage
-
- get(String) - Method in class org.apache.nutch.util.domain.DomainSuffixes
-
Return the
DomainSuffix
object for the extension, if extension is a
top level domain returned object will be an instance of
TopLevelDomain
- get(Configuration) - Static method in class org.apache.nutch.util.ObjectCache
-
- getAccept() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getAcceptCharset() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getAcceptedIssuers() - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
-
- getAcceptLanguage() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
Value of "Accept-Language" request header sent by Nutch.
- getActiveConfId() - Method in class org.apache.nutch.api.model.response.NutchStatus
-
Gets active configuration id
- getActiveConfId() - Method in class org.apache.nutch.api.NutchServer
-
Get id of active configuration.
- getActiveConfId() - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- getAliases() - Method in class org.apache.nutch.parse.ParsePluginList
-
- getAll() - Method in class org.apache.nutch.collection.CollectionManager
-
Returns all collections
- getAllJobs() - Method in class org.apache.nutch.api.impl.NutchServerPoolExecutor
-
- getAnchor() - Method in class org.apache.nutch.parse.Outlink
-
- getAnchor() - Method in class org.apache.nutch.scoring.ScoreDatum
-
- getArg(ParseStatus, int) - Static method in class org.apache.nutch.parse.ParseStatusUtils
-
- getArgs() - Method in class org.apache.nutch.api.model.request.JobConfig
-
- getArgs() - Method in class org.apache.nutch.api.model.response.JobInfo
-
- getArgs() - Method in class org.apache.nutch.storage.ParseStatus.Builder
-
Gets the value of the 'args' field
- getArgs() - Method in class org.apache.nutch.storage.ParseStatus
-
Gets the value of the 'args' field.
- getArgs() - Method in class org.apache.nutch.storage.ParseStatus.Tombstone
-
Gets the value of the 'args' field.
- getArgs() - Method in class org.apache.nutch.storage.ProtocolStatus.Builder
-
Gets the value of the 'args' field
- getArgs() - Method in class org.apache.nutch.storage.ProtocolStatus
-
Gets the value of the 'args' field.
- getArgs() - Method in class org.apache.nutch.storage.ProtocolStatus.Tombstone
-
Gets the value of the 'args' field.
- getArgs() - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- getArgs() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getAsMap(String) - Method in interface org.apache.nutch.api.ConfManager
-
- getAsMap(String) - Method in class org.apache.nutch.api.impl.RAMConfManager
-
Returns configuration map for give configuration id.
- getAsyncExecutor() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- getAttribute() - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument.DocumentField
-
- getAttribute(String) - Method in class org.apache.nutch.plugin.Extension
-
Returns a attribute value, that is setuped in the manifest file and is
definied by the extension point xml schema.
- getAuthentication(String, Configuration) - Static method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
-
This method is responsible for providing Basic authentication information.
- getBase(Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils
-
If Node contains a BASE tag then it's HREF is returned.
- getBaseHref() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getBaseUrl() - Method in class org.apache.nutch.protocol.Content
-
The base url for relative links contained in the content.
- getBaseUrl() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'baseUrl' field
- getBaseUrl() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'baseUrl' field.
- getBaseUrl() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'baseUrl' field.
- getBasicPattern() - Static method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
-
Provides a pattern which can be used by an outside resource to determine if
this class can provide credentials based on simple header information.
- getBatchId() - Method in class org.apache.nutch.api.model.request.DbFilter
-
- getBatchId() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'batchId' field
- getBatchId() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'batchId' field.
- getBatchId() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'batchId' field.
- getBlackListString() - Method in class org.apache.nutch.collection.Subcollection
-
Returns blacklist String
- getBoost() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
-
- getBoost() - Method in class org.apache.nutch.util.domain.DomainSuffix
-
- getBuilder(String) - Static method in class org.apache.nutch.webui.pages.components.ColorEnumLabel
-
- getByHostName(String) - Method in class org.apache.nutch.host.HostDb
-
- getCachedClass(PluginDescriptor, String) - Method in class org.apache.nutch.plugin.PluginRepository
-
- getCacheKey(URL) - Static method in class org.apache.nutch.protocol.http.api.HttpRobotRulesParser
-
Compose unique key to store and access robot rules in cache for given URL
- getClasses() - Method in class org.apache.nutch.api.NutchServer
-
Get a set of root resource and provider classes.
- getClassLoader() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a cached classloader for a plugin.
- getClazz() - Method in class org.apache.nutch.plugin.Extension
-
Returns the full class name of the extension point implementation
- getClient(NutchInstance) - Method in class org.apache.nutch.webui.client.NutchClientFactory
-
- getCode() - Method in interface org.apache.nutch.net.protocols.Response
-
Returns the response code.
- getCode(int) - Method in exception org.apache.nutch.protocol.file.FileError
-
- getCode() - Method in class org.apache.nutch.protocol.file.FileResponse
-
Returns the response code.
- getCode(int) - Method in exception org.apache.nutch.protocol.ftp.FtpError
-
- getCode() - Method in class org.apache.nutch.protocol.ftp.FtpResponse
-
Returns the response code.
- getCode() - Method in class org.apache.nutch.protocol.http.HttpResponse
-
- getCode() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
-
- getCode() - Method in class org.apache.nutch.storage.ProtocolStatus.Builder
-
Gets the value of the 'code' field
- getCode() - Method in class org.apache.nutch.storage.ProtocolStatus
-
Gets the value of the 'code' field.
- getCode() - Method in class org.apache.nutch.storage.ProtocolStatus.Tombstone
-
Gets the value of the 'code' field.
- getCollectionManager(Configuration) - Static method in class org.apache.nutch.collection.CollectionManager
-
- getCommand() - Method in class org.apache.nutch.util.CommandRunner
-
- getConf() - Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser
-
- getConf() - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
-
- getConf() - Method in class org.apache.nutch.crawl.URLPartitioner.FetchEntryPartitioner
-
- getConf() - Method in class org.apache.nutch.crawl.URLPartitioner
-
- getConf() - Method in class org.apache.nutch.crawl.URLPartitioner.SelectorEntryPartitioner
-
- getConf() - Method in class org.apache.nutch.host.HostDbUpdateJob
-
- getConf() - Method in class org.apache.nutch.host.HostInjectorJob
-
- getConf() - Method in class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.CleaningJob
-
- getConf() - Method in class org.apache.nutch.indexer.html.HtmlIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.IndexingFiltersChecker
-
- getConf() - Method in class org.apache.nutch.indexer.jsoup.extractor.JsoupIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.metadata.MetadataIndexer
-
- getConf() - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
-
- getConf() - Method in class org.apache.nutch.indexer.tld.TLDIndexingFilter
-
- getConf() - Method in class org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
-
- getConf() - Method in class org.apache.nutch.indexwriter.hbase.HBaseIndexWriter
-
- getConf() - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- getConf() - Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
-
- getConf() - Method in class org.apache.nutch.microformats.reltag.RelTagParser
-
- getConf() - Method in class org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer
-
- getConf() - Method in class org.apache.nutch.parse.html.HtmlParser
-
- getConf() - Method in class org.apache.nutch.parse.js.JSParseFilter
-
- getConf() - Method in class org.apache.nutch.parse.jsoup.extractor.JsoupHtmlParser
-
- getConf() - Method in class org.apache.nutch.parse.metatags.MetaTagsParser
-
- getConf() - Method in class org.apache.nutch.parse.NutchSitemapParser
-
- getConf() - Method in class org.apache.nutch.parse.ParserChecker
-
- getConf() - Method in class org.apache.nutch.parse.ParserJob
-
- getConf() - Method in class org.apache.nutch.parse.ParseUtil
-
- getConf() - Method in class org.apache.nutch.parse.tika.TikaParser
-
- getConf() - Method in class org.apache.nutch.protocol.file.File
-
- getConf() - Method in class org.apache.nutch.protocol.ftp.Ftp
-
- getConf() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getConf() - Method in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
-
- getConf() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
-
- getConf() - Method in class org.apache.nutch.protocol.RobotRulesParser
-
- getConf() - Method in class org.apache.nutch.protocol.sftp.Sftp
-
- getConf() - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- getConf() - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
- getConf() - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- getConf() - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
- getConf() - Method in class org.apache.nutch.urlfilter.domain.DomainURLFilter
-
- getConf() - Method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
-
- getConf() - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- getConf() - Method in class org.apache.nutch.urlfilter.validator.UrlValidator
-
- getConf() - Method in class org.apache.nutch.util.domain.DomainStatistics
-
- getConf() - Method in class org.apache.nutch.util.GenericWritableConfigurable
-
- getConf() - Method in class org.creativecommons.nutch.CCIndexingFilter
-
- getConf() - Method in class org.creativecommons.nutch.CCParseFilter
-
- getConfId() - Method in class org.apache.nutch.api.model.request.JobConfig
-
- getConfId() - Method in class org.apache.nutch.api.model.response.JobInfo
-
- getConfId() - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- getConfId() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getConfig(String) - Method in class org.apache.nutch.api.resources.ConfigResource
-
- getConfigId() - Method in class org.apache.nutch.api.model.request.NutchConfig
-
- getConfigs() - Method in class org.apache.nutch.api.resources.ConfigResource
-
- getConfiguration() - Method in class org.apache.nutch.api.model.response.NutchStatus
-
Gets configuration ids
- getConfiguration() - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- getConfMgr() - Method in class org.apache.nutch.api.NutchServer
-
Get configuration manager.
- getConnectionSource() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- getConnectionStatus() - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- getConnectionStatus() - Method in interface org.apache.nutch.webui.client.NutchClient
-
- getConnectionStatus() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getConnectionStatus(Long) - Method in class org.apache.nutch.webui.service.impl.NutchServiceImpl
-
- getConnectionStatus(Long) - Method in interface org.apache.nutch.webui.service.NutchService
-
- getContent() - Method in interface org.apache.nutch.net.protocols.Response
-
Returns the full content of the response.
- getContent() - Method in class org.apache.nutch.protocol.Content
-
The binary content retrieved.
- getContent() - Method in class org.apache.nutch.protocol.file.FileResponse
-
- getContent() - Method in class org.apache.nutch.protocol.ftp.FtpResponse
-
- getContent() - Method in class org.apache.nutch.protocol.http.HttpResponse
-
- getContent() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
-
- getContent() - Method in class org.apache.nutch.protocol.ProtocolOutput
-
- getContent() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'content' field
- getContent() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'content' field.
- getContent() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'content' field.
- getContentType() - Method in exception org.apache.nutch.parse.ParserNotFound
-
- getContentType() - Method in class org.apache.nutch.protocol.Content
-
The media type of the retrieved content.
- getContentType() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'contentType' field
- getContentType() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'contentType' field.
- getContentType() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'contentType' field.
- getCopyMap() - Method in class org.apache.nutch.indexwriter.solr.SolrMappingReader
-
- getCount(E) - Method in class org.apache.nutch.util.Histogram
-
- getCountryName() - Method in class org.apache.nutch.util.domain.TopLevelDomain
-
Returns the country name if TLD is Country Code TLD
- getCrawlDelay() - Method in interface org.apache.nutch.protocol.RobotRules
-
Get Crawl-Delay, in milliseconds.
- getCrawlId() - Method in class org.apache.nutch.api.model.request.JobConfig
-
- getCrawlId() - Method in class org.apache.nutch.api.model.response.JobInfo
-
- getCrawlId() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getCrawlId() - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- getCrawlId() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getCrawlName() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getCrawls() - Method in interface org.apache.nutch.webui.service.CrawlService
-
- getCrawls() - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- getCreatedDaos() - Method in class org.apache.nutch.webui.config.CustomDaoFactory
-
- getCredentials() - Method in interface org.apache.nutch.protocol.httpclient.HttpAuthentication
-
Gets the credentials generated by the HttpAuthentication object.
- getCredentials() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
-
Gets the Basic credentials generated by this HttpBasicAuthentication object
- getCssSelector() - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument.DocumentField
-
- getCurrentInstance() - Method in class org.apache.nutch.webui.pages.AbstractBasePage
-
- getCurrentKey() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecordReader
-
- getCurrentNode() - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Get the node currently being processed.
- getCurrentNode() - Method in class org.apache.nutch.util.NodeWalker
-
Return the current node.
- getCurrentValue() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecordReader
-
- getDaoFactory() - Method in class org.apache.nutch.webui.config.SpringConfiguration
-
- getDataStoreClass(Configuration) - Static method in class org.apache.nutch.storage.StorageUtils
-
Return the Persistent Gora class used to persist Nutch Web data.
- getDatum() - Method in class org.apache.nutch.crawl.URLWebPage
-
- getDefaultValue() - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument.DocumentField
-
- getDependencies() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a array of plugin ids.
- getDescriptor() - Method in class org.apache.nutch.plugin.Extension
-
return the plugin descriptor.
- getDescriptor() - Method in class org.apache.nutch.plugin.Plugin
-
Returns the plugin descriptor
- getDistance() - Method in class org.apache.nutch.scoring.ScoreDatum
-
- getDocBegin() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
-
- getDocumentFields() - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument
-
- getDocumentMeta() - Method in class org.apache.nutch.indexer.NutchDocument
-
- getDocuments() - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocumentReader
-
- getDom(InputStream) - Static method in class org.apache.nutch.util.DomUtil
-
Returns parsed dom tree or null if any error
- getDomain() - Method in class org.apache.nutch.util.domain.DomainSuffix
-
- getDomainName(URL) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the domain name of the url.
- getDomainName(String) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the domain name of the url.
- getDomainSuffix(URL) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the
DomainSuffix
corresponding to the last public part of
the hostname
- getDomainSuffix(String) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the
DomainSuffix
corresponding to the last public part of
the hostname
- getEmptyParse(Exception, Configuration) - Static method in class org.apache.nutch.parse.ParseStatusUtils
-
- getEmptyParse(int, String, Configuration) - Static method in class org.apache.nutch.parse.ParseStatusUtils
-
- getEndKey() - Method in class org.apache.nutch.api.model.request.DbFilter
-
- getException() - Method in class org.apache.nutch.api.model.response.ErrorResponse
-
- getExitValue() - Method in class org.apache.nutch.util.CommandRunner
-
- getExpireTime() - Method in interface org.apache.nutch.protocol.RobotRules
-
Get expire time
- getExportedLibUrls() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a array exported librareis as URLs
- getExtensionInstance() - Method in class org.apache.nutch.plugin.Extension
-
Return an instance of the extension implementation.
- getExtensionPoint(String) - Method in class org.apache.nutch.plugin.PluginRepository
-
Returns a extension point indentified by a extension point id.
- getExtensions(String) - Method in class org.apache.nutch.parse.ParserFactory
-
Finds the best-suited parse plugin for a given contentType.
- getExtensions() - Method in class org.apache.nutch.plugin.ExtensionPoint
-
Returns a array of extensions that listen to this extension point
- getExtensions() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns an array of extensions.
- getExtenstionPoints() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a array of extension points.
- getFamily(String) - Method in class org.apache.nutch.indexwriter.hbase.HBaseMappingReader
-
- getFetchInterval() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'fetchInterval' field
- getFetchInterval() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'fetchInterval' field.
- getFetchInterval() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'fetchInterval' field.
- getFetchSchedule(Configuration) - Static method in class org.apache.nutch.crawl.FetchScheduleFactory
-
Return the FetchSchedule implementation.
- getFetchTime() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'fetchTime' field
- getFetchTime() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'fetchTime' field.
- getFetchTime() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'fetchTime' field.
- getFieldNames() - Method in class org.apache.nutch.indexer.NutchDocument
-
- getFields() - Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser
-
- getFields() - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
-
- getFields() - Method in class org.apache.nutch.api.model.request.DbFilter
-
- getFields() - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
- getFields() - Method in interface org.apache.nutch.crawl.FetchSchedule
-
- getFields(Job) - Method in class org.apache.nutch.crawl.GeneratorJob
-
- getFields() - Method in class org.apache.nutch.crawl.MD5Signature
-
- getFields() - Method in class org.apache.nutch.crawl.Signature
-
- getFields(Configuration) - Static method in class org.apache.nutch.crawl.SignatureFactory
-
- getFields() - Method in class org.apache.nutch.crawl.TextMD5Signature
-
- getFields() - Method in class org.apache.nutch.crawl.TextProfileSignature
-
- getFields(Job) - Method in class org.apache.nutch.fetcher.FetcherJob
-
- getFields() - Method in class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
-
Gets all the fields for a given
WebPage
Many datastores need to
setup the mapreduce job by specifying the fields needed.
- getFields() - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
-
Gets all the fields for a given
WebPage
Many datastores need to
setup the mapreduce job by specifying the fields needed.
- getFields(Job) - Method in class org.apache.nutch.indexer.CleaningJob
-
- getFields() - Method in class org.apache.nutch.indexer.html.HtmlIndexingFilter
-
- getFields() - Method in class org.apache.nutch.indexer.IndexCleaningFilters
-
- getFields() - Method in class org.apache.nutch.indexer.IndexingFilters
-
Gets all the fields for a given
WebPage
Many datastores need to
setup the mapreduce job by specifying the fields needed.
- getFields() - Method in class org.apache.nutch.indexer.jsoup.extractor.JsoupIndexingFilter
-
- getFields() - Method in class org.apache.nutch.indexer.metadata.MetadataIndexer
-
- getFields() - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
-
- getFields() - Method in class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
-
- getFields() - Method in class org.apache.nutch.indexer.tld.TLDIndexingFilter
-
- getFields() - Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
-
Gets all the fields for a given
WebPage
Many datastores need to
setup the mapreduce job by specifying the fields needed.
- getFields() - Method in class org.apache.nutch.microformats.reltag.RelTagParser
-
Gets all the fields for a given
WebPage
Many datastores need to
setup the mapreduce job by specifying the fields needed.
- getFields() - Method in class org.apache.nutch.parse.html.HtmlParser
-
- getFields() - Method in class org.apache.nutch.parse.js.JSParseFilter
-
Gets all the fields for a given
WebPage
Many datastores need to
setup the mapreduce job by specifying the fields needed.
- getFields() - Method in class org.apache.nutch.parse.jsoup.extractor.JsoupHtmlParser
-
- getFields() - Method in class org.apache.nutch.parse.metatags.MetaTagsParser
-
- getFields() - Method in class org.apache.nutch.parse.NutchSitemapParser
-
- getFields() - Method in class org.apache.nutch.parse.ParseFilters
-
- getFields() - Method in class org.apache.nutch.parse.ParserFactory
-
- getFields(Job) - Method in class org.apache.nutch.parse.ParserJob
-
- getFields() - Method in class org.apache.nutch.parse.tika.TikaParser
-
- getFields() - Method in interface org.apache.nutch.plugin.FieldPluggable
-
- getFields() - Method in class org.apache.nutch.protocol.file.File
-
- getFields() - Method in class org.apache.nutch.protocol.ftp.Ftp
-
- getFields() - Method in class org.apache.nutch.protocol.http.Http
-
- getFields() - Method in class org.apache.nutch.protocol.httpclient.Http
-
- getFields() - Method in class org.apache.nutch.protocol.ProtocolFactory
-
- getFields() - Method in class org.apache.nutch.protocol.sftp.Sftp
-
- getFields() - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- getFields() - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
- getFields() - Method in class org.apache.nutch.scoring.ScoringFilters
-
- getFields() - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- getFields() - Method in class org.creativecommons.nutch.CCIndexingFilter
-
- getFields() - Method in class org.creativecommons.nutch.CCParseFilter
-
- getFieldsCount() - Method in class org.apache.nutch.storage.Host
-
Gets the total field count.
- getFieldsCount() - Method in class org.apache.nutch.storage.ParseStatus
-
Gets the total field count.
- getFieldsCount() - Method in class org.apache.nutch.storage.ProtocolStatus
-
Gets the total field count.
- getFieldsCount() - Method in class org.apache.nutch.storage.WebPage
-
Gets the total field count.
- getFieldValue(String) - Method in class org.apache.nutch.indexer.NutchDocument
-
- getFieldValues(String) - Method in class org.apache.nutch.indexer.NutchDocument
-
- getFirst() - Method in class org.apache.nutch.util.Pair
-
- getFParsePluginsFile() - Method in class org.apache.nutch.parse.ParsePluginsReader
-
- getGeneralTags() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Returns all collected values of the general meta tags.
- getHeader(String) - Method in interface org.apache.nutch.net.protocols.Response
-
Returns the value of a named header.
- getHeader(String) - Method in class org.apache.nutch.protocol.file.FileResponse
-
Returns the value of a named header.
- getHeader(String) - Method in class org.apache.nutch.protocol.ftp.FtpResponse
-
Returns the value of a named header.
- getHeader(String) - Method in class org.apache.nutch.protocol.http.HttpResponse
-
- getHeader(String) - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
-
- getHeaders() - Method in interface org.apache.nutch.net.protocols.Response
-
Returns all the headers.
- getHeaders() - Method in class org.apache.nutch.protocol.http.HttpResponse
-
- getHeaders() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
-
- getHeaders() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'headers' field
- getHeaders() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'headers' field.
- getHeaders() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'headers' field.
- getHomePage() - Method in class org.apache.nutch.webui.NutchUiApplication
-
- getHost(String) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the lowercased hostname for the url or null if the url is not well
formed.
- getHost() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getHostBatches(URL) - Static method in class org.apache.nutch.util.URLUtil
-
Partitions of the hostname of the url by "."
- getHostBatches(String) - Static method in class org.apache.nutch.util.URLUtil
-
Partitions of the hostname of the url by "."
- getHttpEquivTags() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Returns all collected values of the "http-equiv" meta tags.
- getHttpSolrServer(Configuration) - Static method in class org.apache.nutch.indexer.solr.SolrUtils
-
- getHttpSolrServer(Configuration) - Static method in class org.apache.nutch.indexwriter.solr.SolrUtils
-
- getId() - Method in class org.apache.nutch.api.model.request.SeedList
-
- getId() - Method in class org.apache.nutch.api.model.request.SeedUrl
-
- getId() - Method in class org.apache.nutch.api.model.response.JobInfo
-
- getId() - Method in class org.apache.nutch.collection.Subcollection
-
- getId() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
-
- getId() - Method in class org.apache.nutch.plugin.Extension
-
Return the unique id of the extension.
- getId() - Method in class org.apache.nutch.plugin.ExtensionPoint
-
Returns the unique id of the extension point.
- getId() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getId() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getId() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getId() - Method in class org.apache.nutch.webui.model.SeedList
-
- getId() - Method in class org.apache.nutch.webui.model.SeedUrl
-
- getIndex() - Method in enum org.apache.nutch.storage.Host.Field
-
Gets field's index.
- getIndex() - Method in enum org.apache.nutch.storage.ParseStatus.Field
-
Gets field's index.
- getIndex() - Method in enum org.apache.nutch.storage.ProtocolStatus.Field
-
Gets field's index.
- getIndex() - Method in enum org.apache.nutch.storage.WebPage.Field
-
Gets field's index.
- getInfo() - Method in class org.apache.nutch.api.impl.JobWorker
-
- getInfo(String) - Method in class org.apache.nutch.api.impl.NutchServerPoolExecutor
-
- getInfo(String, String) - Method in class org.apache.nutch.api.resources.JobResource
-
- getInlinks() - Method in class org.apache.nutch.storage.Host.Builder
-
Gets the value of the 'inlinks' field
- getInlinks() - Method in class org.apache.nutch.storage.Host
-
Gets the value of the 'inlinks' field.
- getInlinks() - Method in class org.apache.nutch.storage.Host.Tombstone
-
Gets the value of the 'inlinks' field.
- getInlinks() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'inlinks' field
- getInlinks() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'inlinks' field.
- getInlinks() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'inlinks' field.
- getInstance(Configuration) - Static method in class org.apache.nutch.core.jsoup.extractor.JsoupDocumentReader
-
- getInstance(Configuration) - Static method in class org.apache.nutch.indexwriter.hbase.HBaseMappingReader
-
- getInstance(Configuration) - Static method in class org.apache.nutch.indexwriter.solr.SolrMappingReader
-
- getInstance() - Static method in class org.apache.nutch.util.domain.DomainSuffixes
-
Singleton instance, lazy instantination
- getInstance(Configuration) - Static method in class org.apache.nutch.util.NutchJob
-
- getInstance(Configuration, String) - Static method in class org.apache.nutch.util.NutchJob
-
- getInstance(Long) - Method in class org.apache.nutch.webui.service.impl.NutchInstanceServiceImpl
-
- getInstance(Long) - Method in interface org.apache.nutch.webui.service.NutchInstanceService
-
- getInstances() - Method in class org.apache.nutch.webui.config.NutchGuiConfiguration
-
- getInstances() - Method in class org.apache.nutch.webui.service.impl.NutchInstanceServiceImpl
-
- getInstances() - Method in interface org.apache.nutch.webui.service.NutchInstanceService
-
- getInt(String, int) - Method in class org.apache.nutch.storage.Host
-
- getJobClassName() - Method in class org.apache.nutch.api.model.request.JobConfig
-
- getJobClassName() - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- getJobConfig() - Method in class org.apache.nutch.webui.client.impl.RemoteCommand
-
- getJobHistory() - Method in class org.apache.nutch.api.impl.NutchServerPoolExecutor
-
- getJobInfo(String) - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- getJobInfo() - Method in class org.apache.nutch.webui.client.impl.RemoteCommand
-
- getJobInfo(String) - Method in interface org.apache.nutch.webui.client.NutchClient
-
- getJobMgr() - Method in class org.apache.nutch.api.NutchServer
-
Get job manager.
- getJobRunning() - Method in class org.apache.nutch.api.impl.NutchServerPoolExecutor
-
- getJobs() - Method in class org.apache.nutch.api.model.response.NutchStatus
-
Gets jobs
- getJobs(String) - Method in class org.apache.nutch.api.resources.JobResource
-
- getJobs() - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- getKey() - Method in class org.apache.nutch.fetcher.FetchEntry
-
- getKeyMap() - Method in class org.apache.nutch.indexwriter.solr.SolrMappingReader
-
- getKeys() - Method in class org.apache.nutch.util.Histogram
-
- getLastModified() - Method in class org.apache.nutch.storage.ProtocolStatus.Builder
-
Gets the value of the 'lastModified' field
- getLastModified() - Method in class org.apache.nutch.storage.ProtocolStatus
-
Gets the value of the 'lastModified' field.
- getLastModified() - Method in class org.apache.nutch.storage.ProtocolStatus.Tombstone
-
Gets the value of the 'lastModified' field.
- getLength() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
-
- getLocations() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
-
- getLong(String, long) - Method in class org.apache.nutch.storage.Host
-
- getMajorCode() - Method in class org.apache.nutch.storage.ParseStatus.Builder
-
Gets the value of the 'majorCode' field
- getMajorCode() - Method in class org.apache.nutch.storage.ParseStatus
-
Gets the value of the 'majorCode' field.
- getMajorCode() - Method in class org.apache.nutch.storage.ParseStatus.Tombstone
-
Gets the value of the 'majorCode' field.
- getMappedKey(String) - Method in class org.apache.nutch.indexwriter.hbase.HBaseMappingReader
-
- getMarkers() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'markers' field
- getMarkers() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'markers' field.
- getMarkers() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'markers' field.
- getMaxContent() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getMessage() - Method in class org.apache.nutch.api.model.response.ErrorResponse
-
- getMessage(ParseStatus) - Static method in class org.apache.nutch.parse.ParseStatusUtils
-
A convenience method.
- getMessage(ProtocolStatus) - Static method in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- getMeta(String) - Method in class org.apache.nutch.metadata.MetaWrapper
-
Get metadata.
- getMeta(String) - Method in class org.apache.nutch.scoring.ScoreDatum
-
- getMetaData() - Method in class org.apache.nutch.metadata.Metadata
-
Get the metadata list
- getMetadata() - Method in class org.apache.nutch.metadata.MetaWrapper
-
Get all metadata.
- getMetadata() - Method in class org.apache.nutch.protocol.Content
-
Other protocol-specific data.
- getMetadata() - Method in class org.apache.nutch.storage.Host.Builder
-
Gets the value of the 'metadata' field
- getMetadata() - Method in class org.apache.nutch.storage.Host
-
Gets the value of the 'metadata' field.
- getMetadata() - Method in class org.apache.nutch.storage.Host.Tombstone
-
Gets the value of the 'metadata' field.
- getMetadata() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'metadata' field
- getMetadata() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'metadata' field.
- getMetadata() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'metadata' field.
- getMetaTags(HTMLMetaTags, Node, URL) - Static method in class org.apache.nutch.parse.html.HTMLMetaProcessor
-
Sets the indicators in robotsMeta
to appropriate values, based
on any META tags found under the given node
.
- getMetaTags(HTMLMetaTags, Node, URL) - Static method in class org.apache.nutch.parse.tika.HTMLMetaProcessor
-
Sets the indicators in robotsMeta
to appropriate values, based
on any META tags found under the given node
.
- getMetaValues(String) - Method in class org.apache.nutch.metadata.MetaWrapper
-
Get multiple metadata.
- getMimeType(String) - Method in class org.apache.nutch.util.MimeUtil
-
Facade interface to Tika's underlying MimeTypes.getMimeType(String)
method.
- getMimeType(File) - Method in class org.apache.nutch.util.MimeUtil
-
Facade interface to Tika's underlying MimeTypes.getMimeType(File)
method.
- getMinorCode() - Method in class org.apache.nutch.storage.ParseStatus.Builder
-
Gets the value of the 'minorCode' field
- getMinorCode() - Method in class org.apache.nutch.storage.ParseStatus
-
Gets the value of the 'minorCode' field.
- getMinorCode() - Method in class org.apache.nutch.storage.ParseStatus.Tombstone
-
Gets the value of the 'minorCode' field.
- getModifiedTime() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'modifiedTime' field
- getModifiedTime() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'modifiedTime' field.
- getModifiedTime() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'modifiedTime' field.
- getMsg() - Method in class org.apache.nutch.api.model.response.JobInfo
-
- getMsg() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getName() - Method in class org.apache.nutch.api.model.request.SeedList
-
- getName() - Method in class org.apache.nutch.collection.Subcollection
-
- getName() - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument.DocumentField
-
- getName(byte) - Static method in class org.apache.nutch.crawl.CrawlStatus
-
- getName() - Method in class org.apache.nutch.plugin.ExtensionPoint
-
Returns the name of the extension point.
- getName() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns the name of the plugin.
- getName(int) - Static method in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- getName() - Method in enum org.apache.nutch.storage.Host.Field
-
Gets field's name.
- getName() - Method in enum org.apache.nutch.storage.Mark
-
- getName() - Method in enum org.apache.nutch.storage.ParseStatus.Field
-
Gets field's name.
- getName() - Method in enum org.apache.nutch.storage.ProtocolStatus.Field
-
Gets field's name.
- getName() - Method in enum org.apache.nutch.storage.WebPage.Field
-
Gets field's name.
- getName() - Method in class org.apache.nutch.webui.model.NutchConfig
-
- getName() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getName() - Method in class org.apache.nutch.webui.model.SeedList
-
- getNoCache() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getNoFollow() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getNoIndex() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getNormalizedName(String) - Static method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
Get the normalized name of metadata attribute name.
- getNormalizer() - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument.DocumentField
-
- getNotExportedLibUrls() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a array of libraries as URLs that are not exported by the plugin.
- getNumberOfRounds() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getNutchConfig(String) - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- getNutchConfig(String) - Method in interface org.apache.nutch.webui.client.NutchClient
-
- getNutchConfig(Long) - Method in class org.apache.nutch.webui.service.impl.NutchServiceImpl
-
- getNutchConfig(Long) - Method in interface org.apache.nutch.webui.service.NutchService
-
- getNutchInstance() - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- getNutchInstance() - Method in interface org.apache.nutch.webui.client.NutchClient
-
- getNutchStatus(HttpHeaders) - Method in class org.apache.nutch.api.resources.AdminResource
-
- getNutchStatus() - Method in class org.apache.nutch.webui.client.impl.NutchClientImpl
-
- getNutchStatus() - Method in interface org.apache.nutch.webui.client.NutchClient
-
- getNutchStatus(Long) - Method in class org.apache.nutch.webui.service.impl.NutchServiceImpl
-
- getNutchStatus(Long) - Method in interface org.apache.nutch.webui.service.NutchService
-
- getObject(String) - Method in class org.apache.nutch.util.ObjectCache
-
- getOutlinkMap() - Method in class org.apache.nutch.parse.NutchSitemapParse
-
- getOutlinks(URL, ArrayList<Outlink>, Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils
-
This method finds all anchors below the supplied DOM
node
, and
creates appropriate
Outlink
records for each (relative to the
supplied
base
URL), and adds them to the
outlinks
ArrayList
.
- getOutlinks(String, Configuration) - Static method in class org.apache.nutch.parse.OutlinkExtractor
-
Extracts Outlink
from given plain text.
- getOutlinks(String, String, Configuration) - Static method in class org.apache.nutch.parse.OutlinkExtractor
-
Extracts Outlink
from given plain text and adds anchor to the
extracted Outlink
s
- getOutlinks() - Method in class org.apache.nutch.parse.Parse
-
- getOutlinks(URL, ArrayList<Outlink>, Node) - Method in class org.apache.nutch.parse.tika.DOMContentUtils
-
This method finds all anchors below the supplied DOM
node
, and
creates appropriate
Outlink
records for each (relative to the
supplied
base
URL), and adds them to the
outlinks
ArrayList
.
- getOutlinks() - Method in class org.apache.nutch.storage.Host.Builder
-
Gets the value of the 'outlinks' field
- getOutlinks() - Method in class org.apache.nutch.storage.Host
-
Gets the value of the 'outlinks' field.
- getOutlinks() - Method in class org.apache.nutch.storage.Host.Tombstone
-
Gets the value of the 'outlinks' field.
- getOutlinks() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'outlinks' field
- getOutlinks() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'outlinks' field.
- getOutlinks() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'outlinks' field.
- getOutputCommitter(TaskAttemptContext) - Method in class org.apache.nutch.indexer.IndexerOutputFormat
-
- getPage(String) - Static method in class org.apache.nutch.util.URLUtil
-
Returns the page for the url.
- getParams() - Method in class org.apache.nutch.api.model.request.NutchConfig
-
- getParse(String, WebPage) - Method in class org.apache.nutch.parse.html.HtmlParser
-
- getParse(String, WebPage) - Method in class org.apache.nutch.parse.js.JSParseFilter
-
Parse a JavaScript file and extract outlinks
- getParse(String, WebPage) - Method in class org.apache.nutch.parse.NutchSitemapParser
-
- getParse(String, WebPage) - Method in interface org.apache.nutch.parse.Parser
-
This method parses content in WebPage instance
- getParse(String, WebPage) - Method in class org.apache.nutch.parse.tika.TikaParser
-
- getParserById(String) - Method in class org.apache.nutch.parse.ParserFactory
-
Function returns a
Parser
instance with the specified
extId
, representing its extension ID.
- getParsers(String, String) - Method in class org.apache.nutch.parse.ParserFactory
-
Function returns an array of
Parser
s for a given content type.
- getParseStatus() - Method in class org.apache.nutch.parse.NutchSitemapParse
-
- getParseStatus() - Method in class org.apache.nutch.parse.Parse
-
- getParseStatus() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'parseStatus' field
- getParseStatus() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'parseStatus' field.
- getParseStatus() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'parseStatus' field.
- getPartition(IntWritable, FetchEntry, int) - Method in class org.apache.nutch.crawl.URLPartitioner.FetchEntryPartitioner
-
- getPartition(String, int) - Method in class org.apache.nutch.crawl.URLPartitioner
-
- getPartition(GeneratorJob.SelectorEntry, WebPage, int) - Method in class org.apache.nutch.crawl.URLPartitioner.SelectorEntryPartitioner
-
- getPartition(UrlWithScore, NutchWritable, int) - Method in class org.apache.nutch.crawl.UrlWithScore.UrlOnlyPartitioner
-
- getPassAllFilter() - Static method in class org.apache.nutch.util.HadoopFSUtil
-
Returns PathFilter that passes all paths through.
- getPassDirectoriesFilter(FileSystem) - Static method in class org.apache.nutch.util.HadoopFSUtil
-
Returns PathFilter that passes directories through.
- getPassword() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getPassword() - Method in class org.apache.nutch.webui.pages.auth.User
-
- getPaths(FileStatus[]) - Static method in class org.apache.nutch.util.HadoopFSUtil
-
Turns an array of FileStatus into an array of Paths.
- getPluginClass() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns the fully qualified name of the class which implements the abstarct
Plugin
class.
- getPluginDescriptor(String) - Method in class org.apache.nutch.plugin.PluginRepository
-
Returns the descriptor of one plugin identified by a plugin id.
- getPluginDescriptors() - Method in class org.apache.nutch.plugin.PluginRepository
-
Returns all registed plugin descriptors.
- getPluginFolder(String) - Method in class org.apache.nutch.plugin.PluginManifestParser
-
Return the named plugin folder.
- getPluginId() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns the unique identifier of the plug-in or null
.
- getPluginInstance(PluginDescriptor) - Method in class org.apache.nutch.plugin.PluginRepository
-
Returns a instance of a plugin.
- getPluginList(String) - Method in class org.apache.nutch.parse.ParsePluginList
-
- getPluginPath() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns the directory path of the plugin.
- getPort() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getPos() - Method in class org.apache.nutch.tools.arc.ArcRecordReader
-
Returns the current position in the file.
- getPrevFetchTime() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'prevFetchTime' field
- getPrevFetchTime() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'prevFetchTime' field.
- getPrevFetchTime() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'prevFetchTime' field.
- getPrevModifiedTime() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'prevModifiedTime' field
- getPrevModifiedTime() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'prevModifiedTime' field.
- getPrevModifiedTime() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'prevModifiedTime' field.
- getPrevSignature() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'prevSignature' field
- getPrevSignature() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'prevSignature' field.
- getPrevSignature() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'prevSignature' field.
- getProgress() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecordReader
-
- getProgress() - Method in class org.apache.nutch.tools.arc.ArcRecordReader
-
Returns the percentage of progress in processing the file.
- getProgress() - Method in class org.apache.nutch.util.NutchTool
-
Returns relative progress of the tool, a float in range [0,1]
- getProgress() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getProperty(String, String) - Method in class org.apache.nutch.api.resources.ConfigResource
-
- getProtocol(String) - Method in class org.apache.nutch.protocol.ProtocolFactory
-
Returns the appropriate
Protocol
implementation for a url.
- getProtocolOutput(String, WebPage) - Method in class org.apache.nutch.protocol.file.File
-
- getProtocolOutput(String, WebPage) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
- getProtocolOutput(String, WebPage) - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getProtocolOutput(String, WebPage) - Method in interface org.apache.nutch.protocol.Protocol
-
- getProtocolOutput(String, WebPage) - Method in class org.apache.nutch.protocol.sftp.Sftp
-
- getProtocolStatus() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'protocolStatus' field
- getProtocolStatus() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'protocolStatus' field.
- getProtocolStatus() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'protocolStatus' field.
- getProviderName() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
- getProxyHost() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getProxyPort() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getRealm() - Method in interface org.apache.nutch.protocol.httpclient.HttpAuthentication
-
Gets the realm used by the HttpAuthentication object during creation.
- getRealm() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
-
Gets the realm attribute of the HttpBasicAuthentication object.
- getRecordReader(InputSplit, JobConf, Reporter) - Method in class org.apache.nutch.tools.arc.ArcInputFormat
-
Returns the RecordReader
for reading the arc file.
- getRecordWriter(TaskAttemptContext) - Method in class org.apache.nutch.indexer.IndexerOutputFormat
-
- getRefresh() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getRefreshHref() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getRefreshTime() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getRepresentation(Status, Request, Response) - Method in class org.apache.nutch.api.misc.ErrorStatusService
-
- getReprUrl() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'reprUrl' field
- getReprUrl() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'reprUrl' field.
- getReprUrl() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'reprUrl' field.
- getResource(String) - Method in class org.apache.nutch.plugin.PluginClassLoader
-
- getResourceAsStream(String) - Method in class org.apache.nutch.plugin.PluginClassLoader
-
- getResources(String) - Method in class org.apache.nutch.plugin.PluginClassLoader
-
- getResourceString(String, Locale) - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a I18N'd resource string.
- getResponse(URL, WebPage, boolean) - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getResponse(URL, WebPage, boolean) - Method in class org.apache.nutch.protocol.http.Http
-
- getResponse(URL, WebPage, boolean) - Method in class org.apache.nutch.protocol.httpclient.Http
-
Fetches the url
with a configured HTTP client and gets the
response.
- getResult() - Method in class org.apache.nutch.api.model.response.JobInfo
-
- getResult() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getRetriesSinceFetch() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'retriesSinceFetch' field
- getRetriesSinceFetch() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'retriesSinceFetch' field.
- getRetriesSinceFetch() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'retriesSinceFetch' field.
- getReversedHost(String) - Static method in class org.apache.nutch.util.TableUtil
-
Given a reversed url, returns the reversed host E.g
"com.foo.bar:http:8983/to/index.html?a=b" -> "com.foo.bar"
- getRobotRules(String, WebPage) - Method in class org.apache.nutch.protocol.file.File
-
No robots parsing is done for file protocol.
- getRobotRules(String, WebPage) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
Get the robots rules for a given url
- getRobotRules(String, WebPage) - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getRobotRules(String, WebPage) - Method in interface org.apache.nutch.protocol.Protocol
-
Retrieve robot rules applicable for this url.
- getRobotRules(String, WebPage) - Method in class org.apache.nutch.protocol.sftp.Sftp
-
- getRobotRulesSet(Protocol, URL) - Method in class org.apache.nutch.protocol.ftp.FtpRobotRulesParser
-
The hosts for which the caching of robots rules is yet to be done, it sends
a Ftp request to the host corresponding to the URL
passed, gets
robots file, parses the rules and caches the rules object to avoid re-work
in future.
- getRobotRulesSet(Protocol, URL) - Method in class org.apache.nutch.protocol.http.api.HttpRobotRulesParser
-
Get the rules from robots.txt which applies for the given url
.
- getRobotRulesSet(Protocol, String) - Method in class org.apache.nutch.protocol.RobotRulesParser
-
- getRobotRulesSet(Protocol, URL) - Method in class org.apache.nutch.protocol.RobotRulesParser
-
- getRoles(JaxRsApplication) - Static method in class org.apache.nutch.api.security.SecurityUtils
-
- getRoles() - Method in class org.apache.nutch.webui.pages.auth.SignInSession
-
- getRootNode() - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Get the root node of the DOM being created.
- getRowKey() - Method in class org.apache.nutch.indexwriter.hbase.HBaseMappingReader
-
- getRulesReader(Configuration) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
Returns the name of the file of rules to use for a particular
implementation.
- getRulesReader(Configuration) - Method in class org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
-
Rules specified as a config property will override rules specified as a
config file.
- getRulesReader(Configuration) - Method in class org.apache.nutch.urlfilter.regex.RegexURLFilter
-
Rules specified as a config property will override rules specified as a
config file.
- getRunningJobs() - Method in class org.apache.nutch.api.model.response.NutchStatus
-
Gets running jobs
- getRunningJobs() - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- getRuns() - Method in class org.apache.nutch.tools.Benchmark.BenchmarkResults
-
- getSchema() - Method in class org.apache.nutch.plugin.ExtensionPoint
-
Returns a path to the xml schema of a extension point.
- getSchema() - Method in class org.apache.nutch.storage.Host
-
- getSchema() - Method in class org.apache.nutch.storage.ParseStatus
-
- getSchema() - Method in class org.apache.nutch.storage.ProtocolStatus
-
- getSchema() - Method in class org.apache.nutch.storage.WebPage
-
- getScopedRules() - Method in class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
-
- getScore() - Method in class org.apache.nutch.crawl.UrlWithScore
-
- getScore() - Method in class org.apache.nutch.indexer.NutchDocument
-
- getScore() - Method in class org.apache.nutch.scoring.ScoreDatum
-
- getScore() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'score' field
- getScore() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'score' field.
- getScore() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'score' field.
- getSecond() - Method in class org.apache.nutch.util.Pair
-
- getSeedDirectory() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getSeedList() - Method in class org.apache.nutch.api.model.request.SeedUrl
-
- getSeedList() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getSeedList() - Method in class org.apache.nutch.webui.model.SeedUrl
-
- getSeedList(Long) - Method in class org.apache.nutch.webui.service.impl.SeedListServiceImpl
-
- getSeedList(Long) - Method in interface org.apache.nutch.webui.service.SeedListService
-
- getSeedUrls() - Method in class org.apache.nutch.api.model.request.SeedList
-
- getSeedUrls() - Method in class org.apache.nutch.webui.model.SeedList
-
- getSeedUrlsCount() - Method in class org.apache.nutch.webui.model.SeedList
-
- getSignature(Configuration) - Static method in class org.apache.nutch.crawl.SignatureFactory
-
- getSignature() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'signature' field
- getSignature() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'signature' field.
- getSignature() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'signature' field.
- getSitemaps() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'sitemaps' field
- getSitemaps() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'sitemaps' field.
- getSitemaps() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'sitemaps' field.
- getSplits(JobContext) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
-
- getStackTrace() - Method in class org.apache.nutch.api.model.response.ErrorResponse
-
- getStages() - Method in class org.apache.nutch.tools.Benchmark.BenchmarkResults
-
- getStartDate() - Method in class org.apache.nutch.api.model.response.NutchStatus
-
- getStartDate() - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- getStarted() - Method in class org.apache.nutch.api.NutchServer
-
- getStartKey() - Method in class org.apache.nutch.api.model.request.DbFilter
-
- getState() - Method in class org.apache.nutch.api.model.response.JobInfo
-
- getState() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getStatus(Throwable, Request, Response) - Method in class org.apache.nutch.api.misc.ErrorStatusService
-
- getStatus() - Method in class org.apache.nutch.protocol.ProtocolOutput
-
- getStatus() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'status' field
- getStatus() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'status' field.
- getStatus() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'status' field.
- getStatus() - Method in class org.apache.nutch.util.domain.DomainSuffix
-
- getStatus() - Method in class org.apache.nutch.util.NutchTool
-
Returns current status of the running tool
- getStatus() - Method in class org.apache.nutch.webui.client.model.Crawl
-
- getStmPriority() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'stmPriority' field
- getStmPriority() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'stmPriority' field.
- getStmPriority() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'stmPriority' field.
- getSubColection(String) - Method in class org.apache.nutch.collection.CollectionManager
-
Returns named subcollection
- getSubCollections(String) - Method in class org.apache.nutch.collection.CollectionManager
-
Return names of collections url is part of
- getSystemName() - Method in class org.apache.nutch.protocol.ftp.Client
-
Fetches the system type name from the server and returns the string.
- getTableName() - Method in class org.apache.nutch.indexwriter.hbase.HBaseMappingReader
-
- getTargetPoint() - Method in class org.apache.nutch.plugin.Extension
-
Returns the Id of the extension point, that is implemented by this
extension.
- getText(StringBuilder, Node, boolean) - Method in class org.apache.nutch.parse.html.DOMContentUtils
-
This method takes a StringBuilder
and a DOM Node
, and will
append all the content text found beneath the DOM node to the
StringBuilder
.
- getText(StringBuilder, Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils
-
- getText() - Method in class org.apache.nutch.parse.Parse
-
- getText(StringBuffer, Node) - Method in class org.apache.nutch.parse.tika.DOMContentUtils
-
- getText() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'text' field
- getText() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'text' field.
- getText() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'text' field.
- getThrownError() - Method in class org.apache.nutch.util.CommandRunner
-
- getTikaConfig() - Method in class org.apache.nutch.parse.tika.TikaParser
-
- getTimeout() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getTimeout() - Method in class org.apache.nutch.util.CommandRunner
-
- getTimeout() - Method in class org.apache.nutch.webui.client.impl.RemoteCommand
-
- getTitle(StringBuilder, Node) - Method in class org.apache.nutch.parse.html.DOMContentUtils
-
This method takes a StringBuilder
and a DOM Node
, and will
append the content text found beneath the first title
node to
the StringBuilder
.
- getTitle() - Method in class org.apache.nutch.parse.Parse
-
- getTitle(StringBuffer, Node) - Method in class org.apache.nutch.parse.tika.DOMContentUtils
-
This method takes a StringBuffer
and a DOM Node
, and will
append the content text found beneath the first title
node to
the StringBuffer
.
- getTitle() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Gets the value of the 'title' field
- getTitle() - Method in class org.apache.nutch.storage.WebPage
-
Gets the value of the 'title' field.
- getTitle() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Gets the value of the 'title' field.
- getTlsPreferredCipherSuites() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getTlsPreferredProtocols() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getTombstone() - Method in class org.apache.nutch.storage.Host
-
- getTombstone() - Method in class org.apache.nutch.storage.ParseStatus
-
- getTombstone() - Method in class org.apache.nutch.storage.ProtocolStatus
-
- getTombstone() - Method in class org.apache.nutch.storage.WebPage
-
- getToUrl() - Method in class org.apache.nutch.parse.Outlink
-
- getTstamp() - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
-
- getType() - Method in class org.apache.nutch.api.model.request.JobConfig
-
- getType() - Method in class org.apache.nutch.api.model.response.JobInfo
-
- getType() - Method in class org.apache.nutch.util.domain.TopLevelDomain
-
- getType() - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- getType() - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- getTypes() - Method in class org.apache.nutch.crawl.NutchWritable
-
- getTypeString() - Method in enum org.apache.nutch.crawl.InjectType
-
- getUniqueKey() - Method in class org.apache.nutch.indexwriter.solr.SolrMappingReader
-
- getUrl() - Method in class org.apache.nutch.api.model.request.SeedUrl
-
- getUrl() - Method in class org.apache.nutch.crawl.URLWebPage
-
- getUrl() - Method in class org.apache.nutch.crawl.UrlWithScore
-
- getUrl() - Method in interface org.apache.nutch.net.protocols.Response
-
Returns the URL used to retrieve this response.
- getUrl() - Method in exception org.apache.nutch.parse.ParserNotFound
-
- getUrl() - Method in class org.apache.nutch.protocol.Content
-
The url fetched.
- getUrl() - Method in class org.apache.nutch.protocol.http.HttpResponse
-
- getUrl() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
-
- getUrl() - Method in exception org.apache.nutch.protocol.ProtocolNotFound
-
- getUrl() - Method in class org.apache.nutch.scoring.ScoreDatum
-
- getUrl() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getUrl() - Method in class org.apache.nutch.webui.model.SeedUrl
-
- getUrlPattern() - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument
-
- getUseHttp11() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getUser(String, String) - Method in class org.apache.nutch.webui.NutchUiApplication
-
- getUser() - Method in class org.apache.nutch.webui.pages.auth.SignInPage
-
- getUser() - Method in class org.apache.nutch.webui.pages.auth.SignInSession
-
- getUserAgent() - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- getUsername() - Method in class org.apache.nutch.webui.model.NutchInstance
-
- getUsername() - Method in class org.apache.nutch.webui.pages.auth.User
-
- getUUID(Configuration) - Static method in class org.apache.nutch.util.NutchConfiguration
-
Retrieve a Nutch UUID of this configuration object, or null if the
configuration was created elsewhere.
- getValue(String, String) - Method in class org.apache.nutch.storage.Host
-
- getValue(E) - Method in class org.apache.nutch.util.Histogram
-
- getValue() - Method in class org.apache.nutch.webui.model.NutchConfig
-
- getValues() - Method in class org.apache.nutch.api.model.response.DbQueryResult
-
- getValues(String) - Method in class org.apache.nutch.metadata.Metadata
-
Get the values associated to a metadata name.
- getValues(String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
- getVersion() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
- getWaitForExit() - Method in class org.apache.nutch.util.CommandRunner
-
- getWebPage() - Method in class org.apache.nutch.fetcher.FetchEntry
-
- getWebPage() - Method in class org.apache.nutch.util.WebPageWritable
-
- getWhiteList() - Method in class org.apache.nutch.collection.Subcollection
-
Returns whitelist
- getWhiteListString() - Method in class org.apache.nutch.collection.Subcollection
-
Returns whitelist String
- getWriter() - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Return null since there is no Writer for this class.
- GONE - Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
-
Resource is gone.
- guessEncoding(WebPage, String) - Method in class org.apache.nutch.util.EncodingDetector
-
Guess the encoding with the previously specified list of clues.
- GZIPUtils - Class in org.apache.nutch.util
-
A collection of utility methods for working on GZIPed data.
- GZIPUtils() - Constructor for class org.apache.nutch.util.GZIPUtils
-
- HadoopFSUtil - Class in org.apache.nutch.util
-
- HadoopFSUtil() - Constructor for class org.apache.nutch.util.HadoopFSUtil
-
- handle(String, HttpServletRequest, HttpServletResponse, int) - Method in class org.apache.nutch.tools.proxy.AbstractTestbedHandler
-
- handle(Request, HttpServletResponse, String, int) - Method in class org.apache.nutch.tools.proxy.AbstractTestbedHandler
-
- handle(Request, HttpServletResponse, String, int) - Method in class org.apache.nutch.tools.proxy.DelayHandler
-
- handle(Request, HttpServletResponse, String, int) - Method in class org.apache.nutch.tools.proxy.FakeHandler
-
- handle(Request, HttpServletResponse, String, int) - Method in class org.apache.nutch.tools.proxy.LogDebugHandler
-
- handle(Request, HttpServletResponse, String, int) - Method in class org.apache.nutch.tools.proxy.NotFoundHandler
-
- hasArgs() - Method in class org.apache.nutch.storage.ParseStatus.Builder
-
Checks whether the 'args' field has been set
- hasArgs() - Method in class org.apache.nutch.storage.ProtocolStatus.Builder
-
Checks whether the 'args' field has been set
- hasBaseUrl() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'baseUrl' field has been set
- hasBatchId() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'batchId' field has been set
- hasCode() - Method in class org.apache.nutch.storage.ProtocolStatus.Builder
-
Checks whether the 'code' field has been set
- hasContent() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'content' field has been set
- hasContentType() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'contentType' field has been set
- hasCopy(String) - Method in class org.apache.nutch.indexwriter.solr.SolrMappingReader
-
- hasFetchInterval() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'fetchInterval' field has been set
- hasFetchTime() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'fetchTime' field has been set
- hashCode() - Method in class org.apache.nutch.api.model.request.SeedList
-
- hashCode() - Method in class org.apache.nutch.api.model.request.SeedUrl
-
- hashCode() - Method in class org.apache.nutch.crawl.GeneratorJob.SelectorEntry
-
- hashCode() - Method in class org.apache.nutch.plugin.PluginClassLoader
-
- hashCode() - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
-
- hashCode(byte[]) - Static method in class org.apache.nutch.util.Bytes
-
- hashCode(byte[], int) - Static method in class org.apache.nutch.util.Bytes
-
- hashCode() - Method in class org.apache.nutch.webui.model.SeedList
-
- hashCode() - Method in class org.apache.nutch.webui.model.SeedUrl
-
- hasHeaders() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'headers' field has been set
- hasInlinks() - Method in class org.apache.nutch.storage.Host.Builder
-
Checks whether the 'inlinks' field has been set
- hasInlinks() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'inlinks' field has been set
- hasLastModified() - Method in class org.apache.nutch.storage.ProtocolStatus.Builder
-
Checks whether the 'lastModified' field has been set
- hasMajorCode() - Method in class org.apache.nutch.storage.ParseStatus.Builder
-
Checks whether the 'majorCode' field has been set
- hasMarkers() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'markers' field has been set
- hasMetadata() - Method in class org.apache.nutch.storage.Host.Builder
-
Checks whether the 'metadata' field has been set
- hasMetadata() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'metadata' field has been set
- hasMinorCode() - Method in class org.apache.nutch.storage.ParseStatus.Builder
-
Checks whether the 'minorCode' field has been set
- hasModifiedTime() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'modifiedTime' field has been set
- hasNext() - Method in class org.apache.nutch.api.impl.db.DbIterator
-
- hasNext() - Method in class org.apache.nutch.util.NodeWalker
-
* Returns true if there are more nodes on the current stack.
- hasOutlinks() - Method in class org.apache.nutch.storage.Host.Builder
-
Checks whether the 'outlinks' field has been set
- hasOutlinks() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'outlinks' field has been set
- hasParseStatus() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'parseStatus' field has been set
- hasPrevFetchTime() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'prevFetchTime' field has been set
- hasPrevModifiedTime() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'prevModifiedTime' field has been set
- hasPrevSignature() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'prevSignature' field has been set
- hasProtocolStatus() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'protocolStatus' field has been set
- hasReprUrl() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'reprUrl' field has been set
- hasRetriesSinceFetch() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'retriesSinceFetch' field has been set
- hasScore() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'score' field has been set
- hasSignature() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'signature' field has been set
- hasSitemaps() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'sitemaps' field has been set
- hasStatus() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'status' field has been set
- hasStmPriority() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'stmPriority' field has been set
- hasText() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'text' field has been set
- hasTitle() - Method in class org.apache.nutch.storage.WebPage.Builder
-
Checks whether the 'title' field has been set
- HBaseIndexWriter - Class in org.apache.nutch.indexwriter.hbase
-
- HBaseIndexWriter() - Constructor for class org.apache.nutch.indexwriter.hbase.HBaseIndexWriter
-
- HBaseMappingReader - Class in org.apache.nutch.indexwriter.hbase
-
- HBaseMappingReader(Configuration) - Constructor for class org.apache.nutch.indexwriter.hbase.HBaseMappingReader
-
- head(byte[], int) - Static method in class org.apache.nutch.util.Bytes
-
- Histogram<E> - Class in org.apache.nutch.util
-
- Histogram() - Constructor for class org.apache.nutch.util.Histogram
-
- HOST - Static variable in interface org.apache.nutch.indexwriter.elastic.ElasticConstants
-
- Host - Class in org.apache.nutch.storage
-
Host represents a store of webpages or other data which resides on a server or other computer so that it can be accessed over the Internet
- Host() - Constructor for class org.apache.nutch.storage.Host
-
- Host.Builder - Class in org.apache.nutch.storage
-
RecordBuilder for Host instances.
- Host.Field - Enum in org.apache.nutch.storage
-
Enum containing all data bean's fields.
- Host.Tombstone - Class in org.apache.nutch.storage
-
- HostDb - Class in org.apache.nutch.host
-
A caching wrapper for the host datastore.
- HostDb(Configuration) - Constructor for class org.apache.nutch.host.HostDb
-
- HOSTDB_CONCURRENCY_LEVEL - Static variable in class org.apache.nutch.host.HostDb
-
- HOSTDB_LRU_SIZE - Static variable in class org.apache.nutch.host.HostDb
-
- HostDbReader - Class in org.apache.nutch.host
-
Display entries from the hostDB.
- HostDbReader() - Constructor for class org.apache.nutch.host.HostDbReader
-
- HostDbUpdateJob - Class in org.apache.nutch.host
-
Scans the web table and create host entries for each unique host.
- HostDbUpdateJob() - Constructor for class org.apache.nutch.host.HostDbUpdateJob
-
- HostDbUpdateJob(Configuration) - Constructor for class org.apache.nutch.host.HostDbUpdateJob
-
- HostDbUpdateJob.Mapper - Class in org.apache.nutch.host
-
Maps each WebPage to a host key.
- HostDbUpdateReducer - Class in org.apache.nutch.host
-
Combines all WebPages with the same host key to create a Host object, with
some statistics.
- HostDbUpdateReducer() - Constructor for class org.apache.nutch.host.HostDbUpdateReducer
-
- HostInjectorJob - Class in org.apache.nutch.host
-
Creates or updates an existing host table from a text file.
The files contain one host name per line, optionally followed by custom
metadata separated by tabs with the metadata key is separated from the
corresponding value by '='.
- HostInjectorJob() - Constructor for class org.apache.nutch.host.HostInjectorJob
-
- HostInjectorJob(Configuration) - Constructor for class org.apache.nutch.host.HostInjectorJob
-
- HostInjectorJob.UrlMapper - Class in org.apache.nutch.host
-
- HtmlIndexingFilter - Class in org.apache.nutch.indexer.html
-
Add raw HTML content of a document to the index.
- HtmlIndexingFilter() - Constructor for class org.apache.nutch.indexer.html.HtmlIndexingFilter
-
- HTMLLanguageParser - Class in org.apache.nutch.analysis.lang
-
Adds metadata identifying language of document if found We could also run
statistical analysis here but we'd miss all other formats
- HTMLLanguageParser() - Constructor for class org.apache.nutch.analysis.lang.HTMLLanguageParser
-
- HTMLMetaProcessor - Class in org.apache.nutch.parse.html
-
Class for parsing META Directives from DOM trees.
- HTMLMetaProcessor() - Constructor for class org.apache.nutch.parse.html.HTMLMetaProcessor
-
- HTMLMetaProcessor - Class in org.apache.nutch.parse.tika
-
Class for parsing META Directives from DOM trees.
- HTMLMetaProcessor() - Constructor for class org.apache.nutch.parse.tika.HTMLMetaProcessor
-
- HTMLMetaTags - Class in org.apache.nutch.parse
-
This class holds the information about HTML "meta" tags extracted from a
page.
- HTMLMetaTags() - Constructor for class org.apache.nutch.parse.HTMLMetaTags
-
- HTMLPARSEFILTER_ORDER - Static variable in class org.apache.nutch.parse.ParseFilters
-
- HtmlParser - Class in org.apache.nutch.parse.html
-
- HtmlParser() - Constructor for class org.apache.nutch.parse.html.HtmlParser
-
- Http - Class in org.apache.nutch.protocol.http
-
- Http() - Constructor for class org.apache.nutch.protocol.http.Http
-
- Http - Class in org.apache.nutch.protocol.httpclient
-
This class is a protocol plugin that configures an HTTP client for Basic,
Digest and NTLM authentication schemes for web server as well as proxy
server.
- Http() - Constructor for class org.apache.nutch.protocol.httpclient.Http
-
Constructs this plugin.
- HttpAuthentication - Interface in org.apache.nutch.protocol.httpclient
-
The base level of services required for Http Authentication
- HttpAuthenticationException - Exception in org.apache.nutch.protocol.httpclient
-
Can be used to identify problems during creation of Authentication objects.
- HttpAuthenticationException() - Constructor for exception org.apache.nutch.protocol.httpclient.HttpAuthenticationException
-
Constructs a new exception with null as its detail message.
- HttpAuthenticationException(String) - Constructor for exception org.apache.nutch.protocol.httpclient.HttpAuthenticationException
-
Constructs a new exception with the specified detail message.
- HttpAuthenticationException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.httpclient.HttpAuthenticationException
-
Constructs a new exception with the specified message and cause.
- HttpAuthenticationException(Throwable) - Constructor for exception org.apache.nutch.protocol.httpclient.HttpAuthenticationException
-
Constructs a new exception with the specified cause and detail message from
given clause if it is not null.
- HttpAuthenticationFactory - Class in org.apache.nutch.protocol.httpclient
-
Provides the Http protocol implementation with the ability to authenticate
when prompted.
- HttpAuthenticationFactory(Configuration) - Constructor for class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
-
- HttpBase - Class in org.apache.nutch.protocol.http.api
-
- HttpBase() - Constructor for class org.apache.nutch.protocol.http.api.HttpBase
-
Creates a new instance of HttpBase
- HttpBase(Logger) - Constructor for class org.apache.nutch.protocol.http.api.HttpBase
-
Creates a new instance of HttpBase
- HttpBasicAuthentication - Class in org.apache.nutch.protocol.httpclient
-
Implementation of RFC 2617 Basic Authentication.
- HttpBasicAuthentication(String, Configuration) - Constructor for class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
-
Construct an HttpBasicAuthentication for the given challenge parameters.
- HttpDateFormat - Class in org.apache.nutch.net.protocols
-
class to handle HTTP dates.
- HttpDateFormat() - Constructor for class org.apache.nutch.net.protocols.HttpDateFormat
-
- HttpException - Exception in org.apache.nutch.protocol.http.api
-
- HttpException() - Constructor for exception org.apache.nutch.protocol.http.api.HttpException
-
- HttpException(String) - Constructor for exception org.apache.nutch.protocol.http.api.HttpException
-
- HttpException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.http.api.HttpException
-
- HttpException(Throwable) - Constructor for exception org.apache.nutch.protocol.http.api.HttpException
-
- HttpHeaders - Interface in org.apache.nutch.metadata
-
A collection of HTTP header names.
- HttpResponse - Class in org.apache.nutch.protocol.http
-
An HTTP response.
- HttpResponse(HttpBase, URL, WebPage) - Constructor for class org.apache.nutch.protocol.http.HttpResponse
-
- HttpResponse - Class in org.apache.nutch.protocol.httpclient
-
An HTTP response.
- HttpResponse.Scheme - Enum in org.apache.nutch.protocol.http
-
- HttpRobotRulesParser - Class in org.apache.nutch.protocol.http.api
-
This class is used for parsing robots for urls belonging to HTTP protocol.
- HttpRobotRulesParser(Configuration) - Constructor for class org.apache.nutch.protocol.http.api.HttpRobotRulesParser
-
- ID_FIELD - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
-
- ID_FIELD - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- IDENTIFIER - Static variable in interface org.apache.nutch.metadata.DublinCore
-
Recommended best practice is to identify the resource by means of a string
or number conforming to a formal identification system.
- IdentityPageReducer - Class in org.apache.nutch.util
-
- IdentityPageReducer() - Constructor for class org.apache.nutch.util.IdentityPageReducer
-
- ignorableWhitespace(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive notification of ignorable whitespace in element content.
- in - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- incrementBytes(byte[], long) - Static method in class org.apache.nutch.util.Bytes
-
Bytewise binary increment/deincrement of long contained in byte array on
given amount.
- index(String, WebPage) - Method in class org.apache.nutch.indexer.IndexUtil
-
Index a
WebPage
, here we add the following fields:
id: default uniqueKey for the
NutchDocument
.
digest: Digest is used to identify pages (like unique ID) and
is used to remove duplicates during the dedup procedure.
- INDEX - Static variable in interface org.apache.nutch.indexwriter.elastic.ElasticConstants
-
- IndexCleaningFilter - Interface in org.apache.nutch.indexer
-
Extension point for indexing.
- IndexCleaningFilter_ORDER - Static variable in class org.apache.nutch.indexer.IndexCleaningFilters
-
- IndexCleaningFilters - Class in org.apache.nutch.indexer
-
- IndexCleaningFilters(Configuration) - Constructor for class org.apache.nutch.indexer.IndexCleaningFilters
-
- IndexerMapper() - Constructor for class org.apache.nutch.indexer.IndexingJob.IndexerMapper
-
- IndexerOutputFormat - Class in org.apache.nutch.indexer
-
- IndexerOutputFormat() - Constructor for class org.apache.nutch.indexer.IndexerOutputFormat
-
- indexerScore(String, NutchDocument, WebPage, float) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- indexerScore(String, NutchDocument, WebPage, float) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
Dampen the boost value by scorePower.
- indexerScore(String, NutchDocument, WebPage, float) - Method in interface org.apache.nutch.scoring.ScoringFilter
-
This method calculates a Lucene document boost.
- indexerScore(String, NutchDocument, WebPage, float) - Method in class org.apache.nutch.scoring.ScoringFilters
-
- indexerScore(String, NutchDocument, WebPage, float) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- IndexingException - Exception in org.apache.nutch.indexer
-
- IndexingException() - Constructor for exception org.apache.nutch.indexer.IndexingException
-
- IndexingException(String) - Constructor for exception org.apache.nutch.indexer.IndexingException
-
- IndexingException(String, Throwable) - Constructor for exception org.apache.nutch.indexer.IndexingException
-
- IndexingException(Throwable) - Constructor for exception org.apache.nutch.indexer.IndexingException
-
- IndexingFilter - Interface in org.apache.nutch.indexer
-
Extension point for indexing.
- INDEXINGFILTER_ORDER - Static variable in class org.apache.nutch.indexer.IndexingFilters
-
- IndexingFilters - Class in org.apache.nutch.indexer
-
- IndexingFilters(Configuration) - Constructor for class org.apache.nutch.indexer.IndexingFilters
-
- IndexingFiltersChecker - Class in org.apache.nutch.indexer
-
Reads and parses a URL and run the indexers on it.
- IndexingFiltersChecker() - Constructor for class org.apache.nutch.indexer.IndexingFiltersChecker
-
- IndexingJob - Class in org.apache.nutch.indexer
-
- IndexingJob() - Constructor for class org.apache.nutch.indexer.IndexingJob
-
- IndexingJob.IndexerMapper - Class in org.apache.nutch.indexer
-
- indexUtil - Variable in class org.apache.nutch.indexer.IndexingJob.IndexerMapper
-
- IndexUtil - Class in org.apache.nutch.indexer
-
Utility to create an indexed document from a webpage.
- IndexUtil(Configuration) - Constructor for class org.apache.nutch.indexer.IndexUtil
-
- IndexWriter - Interface in org.apache.nutch.indexer
-
- IndexWriters - Class in org.apache.nutch.indexer
-
- IndexWriters(Configuration) - Constructor for class org.apache.nutch.indexer.IndexWriters
-
- inflate(byte[]) - Static method in class org.apache.nutch.util.DeflateUtils
-
Returns an inflated copy of the input array.
- inflateBestEffort(byte[]) - Static method in class org.apache.nutch.util.DeflateUtils
-
Returns an inflated copy of the input array.
- inflateBestEffort(byte[], int) - Static method in class org.apache.nutch.util.DeflateUtils
-
Returns an inflated copy of the input array, truncated to
sizeLimit
bytes, if necessary.
- init() - Method in class org.apache.nutch.collection.CollectionManager
-
- init(FilterConfig) - Method in class org.apache.nutch.tools.proxy.LogDebugHandler
-
- init() - Method in class org.apache.nutch.webui.NutchUiApplication
-
- initialize(Element) - Method in class org.apache.nutch.collection.Subcollection
-
Initialize Subcollection from dom element
- initialize(InputSplit, TaskAttemptContext) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecordReader
-
- initializeSchedule(String, WebPage) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
Initialize fetch schedule related data.
- initializeSchedule(String, WebPage) - Method in interface org.apache.nutch.crawl.FetchSchedule
-
Initialize fetch schedule related data.
- initialScore(String, WebPage) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- initialScore(String, WebPage) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
Set to 0.0f (unknown value) - inlink contributions will bring it to a
correct level.
- initialScore(String, WebPage) - Method in interface org.apache.nutch.scoring.ScoringFilter
-
Set an initial score for newly discovered pages.
- initialScore(String, WebPage) - Method in class org.apache.nutch.scoring.ScoringFilters
-
Calculate a new initial score, used when adding newly discovered pages.
- initialScore(String, WebPage) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- initMapperJob(Job, Collection<WebPage.Field>, Class<K>, Class<V>, Class<? extends GoraMapper<String, WebPage, K, V>>) - Static method in class org.apache.nutch.storage.StorageUtils
-
- initMapperJob(Job, Collection<WebPage.Field>, Class<K>, Class<V>, Class<? extends GoraMapper<String, WebPage, K, V>>, Class<? extends Partitioner<K, V>>) - Static method in class org.apache.nutch.storage.StorageUtils
-
- initMapperJob(Job, Collection<WebPage.Field>, Class<K>, Class<V>, Class<? extends GoraMapper<String, WebPage, K, V>>, Class<? extends Partitioner<K, V>>, boolean) - Static method in class org.apache.nutch.storage.StorageUtils
-
- initMapperJob(Job, Collection<WebPage.Field>, Class<K>, Class<V>, Class<? extends GoraMapper<String, WebPage, K, V>>, Class<? extends Partitioner<K, V>>, Filter<String, WebPage>, boolean) - Static method in class org.apache.nutch.storage.StorageUtils
-
- initMapperJob(Job, Collection<WebPage.Field>, Class<K>, Class<V>, Class<? extends GoraMapper<String, WebPage, K, V>>, Filter<String, WebPage>) - Static method in class org.apache.nutch.storage.StorageUtils
-
- initPage(IModel<SeedList>) - Method in class org.apache.nutch.webui.pages.seed.SeedPage
-
- initReducerJob(Job, Class<? extends GoraReducer<K, V, String, WebPage>>) - Static method in class org.apache.nutch.storage.StorageUtils
-
- inject(Path) - Method in class org.apache.nutch.crawl.InjectorJob
-
- inject(Path) - Method in class org.apache.nutch.host.HostInjectorJob
-
- injectedScore(String, WebPage) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- injectedScore(String, WebPage) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
- injectedScore(String, WebPage) - Method in interface org.apache.nutch.scoring.ScoringFilter
-
Set an initial score for newly injected pages.
- injectedScore(String, WebPage) - Method in class org.apache.nutch.scoring.ScoringFilters
-
Calculate a new initial score, used when injecting new pages.
- injectedScore(String, WebPage) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- InjectorJob - Class in org.apache.nutch.crawl
-
This class takes a flat file of URLs and adds them to the of pages to be
crawled.
- InjectorJob() - Constructor for class org.apache.nutch.crawl.InjectorJob
-
- InjectorJob(Configuration) - Constructor for class org.apache.nutch.crawl.InjectorJob
-
- InjectorJob.UrlMapper - Class in org.apache.nutch.crawl
-
- InjectType - Enum in org.apache.nutch.crawl
-
- instance(JobInfo.JobType) - Static method in class org.apache.nutch.webui.client.impl.RemoteCommandBuilder
-
- instance() - Static method in class org.apache.nutch.webui.pages.assets.NutchUiCssReference
-
- InstancePanel - Class in org.apache.nutch.webui.pages.instances
-
- InstancePanel(String) - Constructor for class org.apache.nutch.webui.pages.instances.InstancePanel
-
- InstancesPage - Class in org.apache.nutch.webui.pages.instances
-
- InstancesPage() - Constructor for class org.apache.nutch.webui.pages.instances.InstancesPage
-
- isActionAuthorized(Component, Action) - Method in class org.apache.nutch.webui.pages.auth.AuthorizationStrategy
-
- isAllowed(URL) - Method in interface org.apache.nutch.protocol.RobotRules
-
Returns false
if the robots.txt
file prohibits us
from accessing the given url
, or true
otherwise.
- isArgsDirty() - Method in class org.apache.nutch.storage.ParseStatus
-
Checks the dirty status of the 'args' field.
- isArgsDirty() - Method in class org.apache.nutch.storage.ParseStatus.Tombstone
-
Checks the dirty status of the 'args' field.
- isArgsDirty() - Method in class org.apache.nutch.storage.ProtocolStatus
-
Checks the dirty status of the 'args' field.
- isArgsDirty() - Method in class org.apache.nutch.storage.ProtocolStatus.Tombstone
-
Checks the dirty status of the 'args' field.
- isBaseUrlDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'baseUrl' field.
- isBaseUrlDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'baseUrl' field.
- isBatchIdDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'batchId' field.
- isBatchIdDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'batchId' field.
- isClientTrusted(X509Certificate[]) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
-
- isCodeDirty() - Method in class org.apache.nutch.storage.ProtocolStatus
-
Checks the dirty status of the 'code' field.
- isCodeDirty() - Method in class org.apache.nutch.storage.ProtocolStatus.Tombstone
-
Checks the dirty status of the 'code' field.
- isContentDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'content' field.
- isContentDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'content' field.
- isContentTypeDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'contentType' field.
- isContentTypeDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'contentType' field.
- isDomainSuffix(String) - Method in class org.apache.nutch.util.domain.DomainSuffixes
-
return whether the extension is a registered domain entry
- isEmpty(String) - Static method in class org.apache.nutch.util.StringUtil
-
Checks if a string is empty (ie is null or empty).
- isFetchIntervalDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'fetchInterval' field.
- isFetchIntervalDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'fetchInterval' field.
- isFetchTimeDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'fetchTime' field.
- isFetchTimeDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'fetchTime' field.
- isForce() - Method in class org.apache.nutch.api.model.request.NutchConfig
-
- isHeadersDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'headers' field.
- isHeadersDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'headers' field.
- isIgnoreCase() - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- isInlinksDirty() - Method in class org.apache.nutch.storage.Host
-
Checks the dirty status of the 'inlinks' field.
- isInlinksDirty() - Method in class org.apache.nutch.storage.Host.Tombstone
-
Checks the dirty status of the 'inlinks' field.
- isInlinksDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'inlinks' field.
- isInlinksDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'inlinks' field.
- isInstantiationAuthorized(Class) - Method in class org.apache.nutch.webui.pages.auth.AuthorizationStrategy
-
- isKeysReversed() - Method in class org.apache.nutch.api.model.request.DbFilter
-
- isLastModifiedDirty() - Method in class org.apache.nutch.storage.ProtocolStatus
-
Checks the dirty status of the 'lastModified' field.
- isLastModifiedDirty() - Method in class org.apache.nutch.storage.ProtocolStatus.Tombstone
-
Checks the dirty status of the 'lastModified' field.
- isMagic(byte[]) - Static method in class org.apache.nutch.tools.arc.ArcRecordReader
-
Returns true if the byte array passed matches the gzip header magic number.
- isMajorCodeDirty() - Method in class org.apache.nutch.storage.ParseStatus
-
Checks the dirty status of the 'majorCode' field.
- isMajorCodeDirty() - Method in class org.apache.nutch.storage.ParseStatus.Tombstone
-
Checks the dirty status of the 'majorCode' field.
- isMarkersDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'markers' field.
- isMarkersDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'markers' field.
- isMetadataDirty() - Method in class org.apache.nutch.storage.Host
-
Checks the dirty status of the 'metadata' field.
- isMetadataDirty() - Method in class org.apache.nutch.storage.Host.Tombstone
-
Checks the dirty status of the 'metadata' field.
- isMetadataDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'metadata' field.
- isMetadataDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'metadata' field.
- isMinorCodeDirty() - Method in class org.apache.nutch.storage.ParseStatus
-
Checks the dirty status of the 'minorCode' field.
- isMinorCodeDirty() - Method in class org.apache.nutch.storage.ParseStatus.Tombstone
-
Checks the dirty status of the 'minorCode' field.
- isModeAccept() - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- isModifiedTimeDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'modifiedTime' field.
- isModifiedTimeDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'modifiedTime' field.
- isMultiValued(String) - Method in class org.apache.nutch.metadata.Metadata
-
Returns true if named value is multivalued.
- isOutlinksDirty() - Method in class org.apache.nutch.storage.Host
-
Checks the dirty status of the 'outlinks' field.
- isOutlinksDirty() - Method in class org.apache.nutch.storage.Host.Tombstone
-
Checks the dirty status of the 'outlinks' field.
- isOutlinksDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'outlinks' field.
- isOutlinksDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'outlinks' field.
- isParseStatusDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'parseStatus' field.
- isParseStatusDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'parseStatus' field.
- isPrevFetchTimeDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'prevFetchTime' field.
- isPrevFetchTimeDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'prevFetchTime' field.
- isPrevModifiedTimeDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'prevModifiedTime' field.
- isPrevModifiedTimeDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'prevModifiedTime' field.
- isPrevSignatureDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'prevSignature' field.
- isPrevSignatureDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'prevSignature' field.
- isProtocolStatusDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'protocolStatus' field.
- isProtocolStatusDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'protocolStatus' field.
- isRemoteVerificationEnabled() - Method in class org.apache.nutch.protocol.ftp.Client
-
Return whether or not verification of the remote host participating in data
connections is enabled.
- isReprUrlDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'reprUrl' field.
- isReprUrlDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'reprUrl' field.
- isRetriesSinceFetchDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'retriesSinceFetch' field.
- isRetriesSinceFetchDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'retriesSinceFetch' field.
- isRunning() - Method in class org.apache.nutch.api.NutchServer
-
Convenience method to determine whether a Nutch server is running.
- isSameDomainName(URL, URL) - Static method in class org.apache.nutch.util.URLUtil
-
Returns whether the given urls have the same domain name.
- isSameDomainName(String, String) - Static method in class org.apache.nutch.util.URLUtil
-
Returns whether the given urls have the same domain name.
- isScoreDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'score' field.
- isScoreDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'score' field.
- isServerTrusted(X509Certificate[]) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
-
- isSignatureDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'signature' field.
- isSignatureDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'signature' field.
- isSitemap(WebPage) - Static method in class org.apache.nutch.net.URLFilters
-
If the page is a sitemap, return true
- isSitemapsDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'sitemaps' field.
- isSitemapsDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'sitemaps' field.
- isStatusDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'status' field.
- isStatusDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'status' field.
- isStmPriorityDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'stmPriority' field.
- isStmPriorityDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'stmPriority' field.
- isSuccess(ParseStatus) - Static method in class org.apache.nutch.parse.ParseStatusUtils
-
- isSuccess() - Method in class org.apache.nutch.storage.ProtocolStatus
-
- isTextDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'text' field.
- isTextDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'text' field.
- isTitleDirty() - Method in class org.apache.nutch.storage.WebPage
-
Checks the dirty status of the 'title' field.
- isTitleDirty() - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Checks the dirty status of the 'title' field.
- isTruncated(String, WebPage) - Static method in class org.apache.nutch.parse.ParserJob
-
Checks if the page's content is truncated.
- isWhiteSpace(char) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer
-
Returns whether the specified ch conforms to the XML 1.0
definition of whitespace.
- isWhiteSpace(char[], int, int) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer
-
Tell if the string is whitespace.
- isWhiteSpace(StringBuffer) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer
-
Tell if the string is whitespace.
- isWhiteSpace(String) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer
-
Tell if the string is whitespace.
- iterateOnSplits(byte[], byte[], int) - Static method in class org.apache.nutch.util.Bytes
-
Iterate over keys within the passed inclusive range.
- iterator() - Method in class org.apache.nutch.indexer.NutchDocument
-
Iterate over all fields.
- ObjectCache - Class in org.apache.nutch.util
-
- onCrawlError(Crawl, String) - Method in interface org.apache.nutch.webui.client.impl.CrawlingCycleListener
-
- onCrawlError(Crawl, String) - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- onInitialize() - Method in class org.apache.nutch.webui.pages.components.ColorEnumLabel
-
- open(Configuration) - Method in interface org.apache.nutch.indexer.IndexWriter
-
- open(Configuration) - Method in class org.apache.nutch.indexer.IndexWriters
-
- open(Configuration) - Method in class org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
-
- open(Configuration) - Method in class org.apache.nutch.indexwriter.hbase.HBaseIndexWriter
-
- open(Configuration) - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- OPICScoringFilter - Class in org.apache.nutch.scoring.opic
-
- OPICScoringFilter() - Constructor for class org.apache.nutch.scoring.opic.OPICScoringFilter
-
- org.apache.nutch.analysis.lang - package org.apache.nutch.analysis.lang
-
Text document language identifier.
- org.apache.nutch.api - package org.apache.nutch.api
-
REST API to run and control crawl jobs.
- org.apache.nutch.api.impl - package org.apache.nutch.api.impl
-
Implementations of REST API interfaces.
- org.apache.nutch.api.impl.db - package org.apache.nutch.api.impl.db
-
- org.apache.nutch.api.misc - package org.apache.nutch.api.misc
-
- org.apache.nutch.api.model.request - package org.apache.nutch.api.model.request
-
- org.apache.nutch.api.model.response - package org.apache.nutch.api.model.response
-
- org.apache.nutch.api.resources - package org.apache.nutch.api.resources
-
- org.apache.nutch.api.security - package org.apache.nutch.api.security
-
- org.apache.nutch.collection - package org.apache.nutch.collection
-
Subcollection is a subset of an index.
- org.apache.nutch.core.jsoup.extractor - package org.apache.nutch.core.jsoup.extractor
-
core package of jsoup-extractor containing XML configuration parser, document structure
- org.apache.nutch.core.jsoup.extractor.normalizer - package org.apache.nutch.core.jsoup.extractor.normalizer
-
Normalizers for jsoup-extractor
- org.apache.nutch.crawl - package org.apache.nutch.crawl
-
Crawl control code and tools to run the crawler.
- org.apache.nutch.fetcher - package org.apache.nutch.fetcher
-
The Nutch robot.
- org.apache.nutch.host - package org.apache.nutch.host
-
Host database to store metadata per host.
- org.apache.nutch.indexer - package org.apache.nutch.indexer
-
Index content, configure and run indexing and cleaning jobs to
add, update, and delete documents from an index.
- org.apache.nutch.indexer.anchor - package org.apache.nutch.indexer.anchor
-
An indexing plugin for inbound anchor text.
- org.apache.nutch.indexer.basic - package org.apache.nutch.indexer.basic
-
A basic indexing plugin, adds basic fields: url, host, title, content, etc.
- org.apache.nutch.indexer.html - package org.apache.nutch.indexer.html
-
Index raw HTML content.
- org.apache.nutch.indexer.jsoup.extractor - package org.apache.nutch.indexer.jsoup.extractor
-
Indexing filter for jsoup-extractor plugin
- org.apache.nutch.indexer.metadata - package org.apache.nutch.indexer.metadata
-
Indexing filter to add document metadata to the index.
- org.apache.nutch.indexer.more - package org.apache.nutch.indexer.more
-
A more indexing plugin, adds "more" index fields:
last modified date, MIME type, content length.
- org.apache.nutch.indexer.solr - package org.apache.nutch.indexer.solr
-
- org.apache.nutch.indexer.subcollection - package org.apache.nutch.indexer.subcollection
-
Indexing filter to assign documents to subcollections.
- org.apache.nutch.indexer.tld - package org.apache.nutch.indexer.tld
-
Top Level Domain Indexing plugin.
- org.apache.nutch.indexwriter.elastic - package org.apache.nutch.indexwriter.elastic
-
- org.apache.nutch.indexwriter.hbase - package org.apache.nutch.indexwriter.hbase
-
- org.apache.nutch.indexwriter.solr - package org.apache.nutch.indexwriter.solr
-
- org.apache.nutch.metadata - package org.apache.nutch.metadata
-
A Multi-valued Metadata container, and set
of constant fields for Nutch Metadata.
- org.apache.nutch.microformats.reltag - package org.apache.nutch.microformats.reltag
-
A microformats
Rel-Tag
Parser/Indexer/Querier plugin.
- org.apache.nutch.net - package org.apache.nutch.net
-
- org.apache.nutch.net.protocols - package org.apache.nutch.net.protocols
-
- org.apache.nutch.net.urlnormalizer.basic - package org.apache.nutch.net.urlnormalizer.basic
-
URL normalizer performing basic normalizations: remove default ports
and dot segments in path.
- org.apache.nutch.net.urlnormalizer.pass - package org.apache.nutch.net.urlnormalizer.pass
-
URL normalizer dummy which does not change URLs.
- org.apache.nutch.net.urlnormalizer.regex - package org.apache.nutch.net.urlnormalizer.regex
-
URL normalizer with configurable rules based on regular expressions
(Pattern
).
- org.apache.nutch.parse - package org.apache.nutch.parse
-
The
Parse
interface and related classes.
- org.apache.nutch.parse.html - package org.apache.nutch.parse.html
-
An HTML document parsing plugin.
- org.apache.nutch.parse.js - package org.apache.nutch.parse.js
-
Parser and parse filter plugin to extract all (possible) links
from JavaScript files and embedded JavaScript code snippets.
- org.apache.nutch.parse.jsoup.extractor - package org.apache.nutch.parse.jsoup.extractor
-
Parse filter based on
Jsoup
- org.apache.nutch.parse.metatags - package org.apache.nutch.parse.metatags
-
Parse filter to extract meta tags: keywords, description, etc.
- org.apache.nutch.parse.tika - package org.apache.nutch.parse.tika
-
Parse various document formats with help of
Apache Tika.
- org.apache.nutch.plugin - package org.apache.nutch.plugin
-
- org.apache.nutch.protocol - package org.apache.nutch.protocol
-
- org.apache.nutch.protocol.file - package org.apache.nutch.protocol.file
-
Protocol plugin which supports retrieving local file resources.
- org.apache.nutch.protocol.ftp - package org.apache.nutch.protocol.ftp
-
Protocol plugin which supports retrieving documents via the ftp protocol.
- org.apache.nutch.protocol.http - package org.apache.nutch.protocol.http
-
Protocol plugin which supports retrieving documents via the http protocol.
- org.apache.nutch.protocol.http.api - package org.apache.nutch.protocol.http.api
-
- org.apache.nutch.protocol.httpclient - package org.apache.nutch.protocol.httpclient
-
Protocol plugin which supports retrieving documents via the HTTP and
HTTPS protocols, optionally with Basic, Digest and NTLM authentication
schemes for web server as well as proxy server.
- org.apache.nutch.protocol.sftp - package org.apache.nutch.protocol.sftp
-
Protocol plugin which supports retrieving documents via the sftp protocol.
- org.apache.nutch.scoring - package org.apache.nutch.scoring
-
- org.apache.nutch.scoring.link - package org.apache.nutch.scoring.link
-
Scoring filter
- org.apache.nutch.scoring.opic - package org.apache.nutch.scoring.opic
-
Scoring filter implementing a variant of the Online Page Importance Computation
(OPIC) algorithm.
- org.apache.nutch.scoring.tld - package org.apache.nutch.scoring.tld
-
Top Level Domain Scoring plugin.
- org.apache.nutch.storage - package org.apache.nutch.storage
-
- org.apache.nutch.tools - package org.apache.nutch.tools
-
Miscellaneous tools.
- org.apache.nutch.tools.arc - package org.apache.nutch.tools.arc
-
- org.apache.nutch.tools.proxy - package org.apache.nutch.tools.proxy
-
- org.apache.nutch.urlfilter.api - package org.apache.nutch.urlfilter.api
-
Generic
URL filter
library,
abstracting away from regular expression implementations.
- org.apache.nutch.urlfilter.automaton - package org.apache.nutch.urlfilter.automaton
-
- org.apache.nutch.urlfilter.domain - package org.apache.nutch.urlfilter.domain
-
URL filter plugin to include only URLs which match an element in a given list of
domain suffixes, domain names, and/or host names.
- org.apache.nutch.urlfilter.prefix - package org.apache.nutch.urlfilter.prefix
-
URL filter plugin to include only URLs which match one of a given list of URL prefixes.
- org.apache.nutch.urlfilter.regex - package org.apache.nutch.urlfilter.regex
-
URL filter plugin to include and/or exclude URLs matching Java regular expressions.
- org.apache.nutch.urlfilter.suffix - package org.apache.nutch.urlfilter.suffix
-
URL filter plugin to either exclude or include only URLs which match
one of the given (path) suffixes.
- org.apache.nutch.urlfilter.validator - package org.apache.nutch.urlfilter.validator
-
URL filter plugin that validates given urls.
- org.apache.nutch.util - package org.apache.nutch.util
-
Miscellaneous utility classes.
- org.apache.nutch.util.domain - package org.apache.nutch.util.domain
-
Classes for domain name analysis.
- org.apache.nutch.webui - package org.apache.nutch.webui
-
Provides classes and interfaces for Web UI
- org.apache.nutch.webui.client - package org.apache.nutch.webui.client
-
Provides client classes and interfaces for Web UI
- org.apache.nutch.webui.client.impl - package org.apache.nutch.webui.client.impl
-
Contains implementation of client classes and interfaces for Web UI
- org.apache.nutch.webui.client.model - package org.apache.nutch.webui.client.model
-
Contains model classes of client for Web UI
- org.apache.nutch.webui.config - package org.apache.nutch.webui.config
-
Contains config classes for Web UI
- org.apache.nutch.webui.model - package org.apache.nutch.webui.model
-
Contains model classes for Web UI
- org.apache.nutch.webui.pages - package org.apache.nutch.webui.pages
-
Provides classes and interfaces of pages for Web UI
- org.apache.nutch.webui.pages.assets - package org.apache.nutch.webui.pages.assets
-
Contains asset classes for Web UI
- org.apache.nutch.webui.pages.auth - package org.apache.nutch.webui.pages.auth
-
Contains authorization classes for Web UI
- org.apache.nutch.webui.pages.components - package org.apache.nutch.webui.pages.components
-
Contains component classes for Web UI
- org.apache.nutch.webui.pages.crawls - package org.apache.nutch.webui.pages.crawls
-
Contains crawl page classes for Web UI
- org.apache.nutch.webui.pages.instances - package org.apache.nutch.webui.pages.instances
-
Contains instances pages classes for Web UI
- org.apache.nutch.webui.pages.menu - package org.apache.nutch.webui.pages.menu
-
Contains menu page classes for Web UI
- org.apache.nutch.webui.pages.seed - package org.apache.nutch.webui.pages.seed
-
Contains seed pages' classes for Web UI
- org.apache.nutch.webui.pages.settings - package org.apache.nutch.webui.pages.settings
-
Contains settings page classes for Web UI
- org.apache.nutch.webui.service - package org.apache.nutch.webui.service
-
Provides service classes and interfaces for Web UI
- org.apache.nutch.webui.service.impl - package org.apache.nutch.webui.service.impl
-
Contains service implementation classes for Web UI
- org.creativecommons.nutch - package org.creativecommons.nutch
-
Sample plugins that parse and index Creative Commons medadata.
- ORIGINAL_CHAR_ENCODING - Static variable in interface org.apache.nutch.metadata.Nutch
-
- Outlink - Class in org.apache.nutch.parse
-
- Outlink() - Constructor for class org.apache.nutch.parse.Outlink
-
- Outlink(String, String) - Constructor for class org.apache.nutch.parse.Outlink
-
- OutlinkExtractor - Class in org.apache.nutch.parse
-
Extractor to extract
Outlink
s / URLs from
plain text using Regular Expressions.
- OutlinkExtractor() - Constructor for class org.apache.nutch.parse.OutlinkExtractor
-
- save() - Method in class org.apache.nutch.collection.CollectionManager
-
Save collections into file
- save(SeedList) - Method in class org.apache.nutch.webui.service.impl.SeedListServiceImpl
-
- save(SeedList) - Method in interface org.apache.nutch.webui.service.SeedListService
-
- saveCrawl(Crawl) - Method in interface org.apache.nutch.webui.service.CrawlService
-
- saveCrawl(Crawl) - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- saveDom(OutputStream, Element) - Static method in class org.apache.nutch.util.DomUtil
-
save dom into ouputstream
- saveInstance(NutchInstance) - Method in class org.apache.nutch.webui.service.impl.NutchInstanceServiceImpl
-
- saveInstance(NutchInstance) - Method in interface org.apache.nutch.webui.service.NutchInstanceService
-
- SchedulingPage - Class in org.apache.nutch.webui.pages
-
- SchedulingPage() - Constructor for class org.apache.nutch.webui.pages.SchedulingPage
-
- SCHEMA$ - Static variable in class org.apache.nutch.storage.Host
-
- SCHEMA$ - Static variable in class org.apache.nutch.storage.ParseStatus
-
- SCHEMA$ - Static variable in class org.apache.nutch.storage.ProtocolStatus
-
- SCHEMA$ - Static variable in class org.apache.nutch.storage.WebPage
-
- SCOPE_CRAWLDB - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used when updating the CrawlDb with new URLs.
- SCOPE_DEFAULT - Static variable in class org.apache.nutch.net.URLNormalizers
-
Default scope.
- SCOPE_FETCHER - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used by
FetcherJob
when processing
redirect URLs.
- SCOPE_GENERATE_HOST_COUNT - Static variable in class org.apache.nutch.net.URLNormalizers
-
- SCOPE_INJECT - Static variable in class org.apache.nutch.net.URLNormalizers
-
- SCOPE_LINKDB - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used when updating the LinkDb with new URLs.
- SCOPE_OUTLINK - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used when constructing new
Outlink
instances.
- SCOPE_PARTITION - Static variable in class org.apache.nutch.net.URLNormalizers
-
- SCORE_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- ScoreDatum - Class in org.apache.nutch.scoring
-
- ScoreDatum() - Constructor for class org.apache.nutch.scoring.ScoreDatum
-
- ScoreDatum(float, String, String, int) - Constructor for class org.apache.nutch.scoring.ScoreDatum
-
- ScoringFilter - Interface in org.apache.nutch.scoring
-
A contract defining behavior of scoring plugins.
- ScoringFilterException - Exception in org.apache.nutch.scoring
-
Specialized exception for errors during scoring.
- ScoringFilterException() - Constructor for exception org.apache.nutch.scoring.ScoringFilterException
-
- ScoringFilterException(String) - Constructor for exception org.apache.nutch.scoring.ScoringFilterException
-
- ScoringFilterException(String, Throwable) - Constructor for exception org.apache.nutch.scoring.ScoringFilterException
-
- ScoringFilterException(Throwable) - Constructor for exception org.apache.nutch.scoring.ScoringFilterException
-
- ScoringFilters - Class in org.apache.nutch.scoring
-
- ScoringFilters(Configuration) - Constructor for class org.apache.nutch.scoring.ScoringFilters
-
- SearchPage - Class in org.apache.nutch.webui.pages
-
- SearchPage() - Constructor for class org.apache.nutch.webui.pages.SearchPage
-
- SECONDS_PER_DAY - Static variable in interface org.apache.nutch.crawl.FetchSchedule
-
- SecurityUtils - Class in org.apache.nutch.api.security
-
Utility class for security operations for NutchServer REST API.
- SeedList - Class in org.apache.nutch.api.model.request
-
- SeedList() - Constructor for class org.apache.nutch.api.model.request.SeedList
-
- SeedList - Class in org.apache.nutch.webui.model
-
- SeedList() - Constructor for class org.apache.nutch.webui.model.SeedList
-
- SeedListService - Interface in org.apache.nutch.webui.service
-
- SeedListServiceImpl - Class in org.apache.nutch.webui.service.impl
-
- SeedListServiceImpl() - Constructor for class org.apache.nutch.webui.service.impl.SeedListServiceImpl
-
- SeedListsPage - Class in org.apache.nutch.webui.pages.seed
-
This page is for seed lists management
- SeedListsPage() - Constructor for class org.apache.nutch.webui.pages.seed.SeedListsPage
-
- SeedPage - Class in org.apache.nutch.webui.pages.seed
-
This page is for seed urls management
- SeedPage() - Constructor for class org.apache.nutch.webui.pages.seed.SeedPage
-
- SeedPage(PageParameters) - Constructor for class org.apache.nutch.webui.pages.seed.SeedPage
-
- SeedResource - Class in org.apache.nutch.api.resources
-
- SeedResource() - Constructor for class org.apache.nutch.api.resources.SeedResource
-
- SeedUrl - Class in org.apache.nutch.api.model.request
-
- SeedUrl() - Constructor for class org.apache.nutch.api.model.request.SeedUrl
-
- SeedUrl - Class in org.apache.nutch.webui.model
-
- SeedUrl() - Constructor for class org.apache.nutch.webui.model.SeedUrl
-
- SelectorEntry() - Constructor for class org.apache.nutch.crawl.GeneratorJob.SelectorEntry
-
- SelectorEntry(String, float) - Constructor for class org.apache.nutch.crawl.GeneratorJob.SelectorEntry
-
- SelectorEntryComparator() - Constructor for class org.apache.nutch.crawl.GeneratorJob.SelectorEntryComparator
-
- SelectorEntryPartitioner() - Constructor for class org.apache.nutch.crawl.URLPartitioner.SelectorEntryPartitioner
-
- sendNoOp() - Method in class org.apache.nutch.protocol.ftp.Client
-
Sends a NOOP command to the FTP server.
- server - Variable in class org.apache.nutch.api.resources.AbstractResource
-
- SERVER_URL - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
-
- SERVER_URL - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- set(String, float) - Method in class org.apache.nutch.crawl.GeneratorJob.SelectorEntry
-
Sets url with score on this writable.
- set(String, String) - Method in class org.apache.nutch.metadata.Metadata
-
Set metadata name/value.
- set(String, String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
- setActiveConfId(String) - Method in class org.apache.nutch.api.model.response.NutchStatus
-
Sets active configuration id
- setActiveConfId(String) - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- setAll(Properties) - Method in class org.apache.nutch.metadata.Metadata
-
Copy All key-value pairs from properties.
- setApplicationContext(ApplicationContext) - Method in class org.apache.nutch.webui.NutchUiApplication
-
- setArgs(Map<String, Object>) - Method in class org.apache.nutch.api.model.request.JobConfig
-
- setArgs(Map<String, Object>) - Method in class org.apache.nutch.api.model.response.JobInfo
-
- setArgs(List<CharSequence>) - Method in class org.apache.nutch.storage.ParseStatus.Builder
-
Sets the value of the 'args' field
- setArgs(List<CharSequence>) - Method in class org.apache.nutch.storage.ParseStatus
-
Sets the value of the 'args' field.
- setArgs(List<CharSequence>) - Method in class org.apache.nutch.storage.ParseStatus.Tombstone
-
Sets the value of the 'args' field.
- setArgs(List<CharSequence>) - Method in class org.apache.nutch.storage.ProtocolStatus.Builder
-
Sets the value of the 'args' field
- setArgs(List<CharSequence>) - Method in class org.apache.nutch.storage.ProtocolStatus
-
Sets the value of the 'args' field.
- setArgs(List<CharSequence>) - Method in class org.apache.nutch.storage.ProtocolStatus.Tombstone
-
Sets the value of the 'args' field.
- setArgs(Map<String, Object>) - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- setArgs(Map<String, Object>) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setArgument(String, String) - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- setAttribute(String) - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument.DocumentField
-
- setBaseHref(URL) - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets the baseHref
.
- setBaseUrl(CharSequence) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'baseUrl' field
- setBaseUrl(CharSequence) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'baseUrl' field.
- setBaseUrl(CharSequence) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'baseUrl' field.
- setBatchId(String) - Method in class org.apache.nutch.api.model.request.DbFilter
-
- setBatchId(CharSequence) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'batchId' field
- setBatchId(CharSequence) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'batchId' field.
- setBatchId(CharSequence) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'batchId' field.
- setBlackList(String) - Method in class org.apache.nutch.collection.Subcollection
-
Set contents of blacklist from String
- setClazz(String) - Method in class org.apache.nutch.plugin.Extension
-
Sets the Class that implement the concret extension and is only used until
model creation at system start up.
- setCode(int) - Method in class org.apache.nutch.storage.ProtocolStatus.Builder
-
Sets the value of the 'code' field
- setCode(Integer) - Method in class org.apache.nutch.storage.ProtocolStatus
-
Sets the value of the 'code' field.
- setCode(Integer) - Method in class org.apache.nutch.storage.ProtocolStatus.Tombstone
-
Sets the value of the 'code' field.
- setCommand(String) - Method in class org.apache.nutch.util.CommandRunner
-
- setConf(Configuration) - Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser
-
- setConf(Configuration) - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
- setConf(Configuration) - Method in class org.apache.nutch.crawl.AdaptiveFetchSchedule
-
- setConf(Configuration) - Method in class org.apache.nutch.crawl.URLPartitioner.FetchEntryPartitioner
-
- setConf(Configuration) - Method in class org.apache.nutch.crawl.URLPartitioner.SelectorEntryPartitioner
-
- setConf(Configuration) - Method in class org.apache.nutch.crawl.URLPartitioner
-
- setConf(Configuration) - Method in class org.apache.nutch.host.HostDbUpdateJob
-
- setConf(Configuration) - Method in class org.apache.nutch.host.HostInjectorJob
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.anchor.AnchorIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.CleaningJob
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.html.HtmlIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.IndexingFiltersChecker
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.jsoup.extractor.JsoupIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.metadata.MetadataIndexer
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
-
- setConf(Configuration) - Method in class org.apache.nutch.indexer.tld.TLDIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexwriter.hbase.HBaseIndexWriter
-
- setConf(Configuration) - Method in class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- setConf(Configuration) - Method in class org.apache.nutch.microformats.reltag.RelTagIndexingFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.microformats.reltag.RelTagParser
-
- setConf(Configuration) - Method in class org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer
-
- setConf(Configuration) - Method in class org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.html.DOMContentUtils
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.html.HtmlParser
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.js.JSParseFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.jsoup.extractor.JsoupHtmlParser
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.metatags.MetaTagsParser
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.NutchSitemapParser
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.ParserChecker
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.ParserJob
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.ParseUtil
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.tika.DOMContentUtils
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.tika.TikaParser
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.file.File
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.http.api.HttpBase
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.http.Http
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.httpclient.Http
-
Reads the configuration from the Nutch configuration files and sets the
configuration.
- setConf(Configuration) - Method in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.RobotRulesParser
-
- setConf(Configuration) - Method in class org.apache.nutch.protocol.sftp.Sftp
-
- setConf(Configuration) - Method in class org.apache.nutch.scoring.link.LinkAnalysisScoringFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.scoring.opic.OPICScoringFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
- setConf(Configuration) - Method in class org.apache.nutch.urlfilter.domain.DomainURLFilter
-
Sets the configuration.
- setConf(Configuration) - Method in class org.apache.nutch.urlfilter.prefix.PrefixURLFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- setConf(Configuration) - Method in class org.apache.nutch.urlfilter.validator.UrlValidator
-
- setConf(Configuration) - Method in class org.apache.nutch.util.domain.DomainStatistics
-
- setConf(Configuration) - Method in class org.apache.nutch.util.GenericWritableConfigurable
-
- setConf(Configuration) - Method in class org.creativecommons.nutch.CCIndexingFilter
-
- setConf(Configuration) - Method in class org.creativecommons.nutch.CCParseFilter
-
- setConfId(String) - Method in class org.apache.nutch.api.model.request.JobConfig
-
- setConfId(String) - Method in class org.apache.nutch.api.model.response.JobInfo
-
- setConfId(String) - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- setConfId(String) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setConfigId(String) - Method in class org.apache.nutch.api.model.request.NutchConfig
-
- setConfiguration(Set<String>) - Method in class org.apache.nutch.api.model.response.NutchStatus
-
Sets configuration ids
- setConfiguration(Set<String>) - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- setConnectionStatus(ConnectionStatus) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setContent(byte[]) - Method in class org.apache.nutch.protocol.Content
-
- setContent(Content) - Method in class org.apache.nutch.protocol.ProtocolOutput
-
- setContent(ByteBuffer) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'content' field
- setContent(ByteBuffer) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'content' field.
- setContent(ByteBuffer) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'content' field.
- setContentType(String) - Method in class org.apache.nutch.protocol.Content
-
- setContentType(CharSequence) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'contentType' field
- setContentType(CharSequence) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'contentType' field.
- setContentType(CharSequence) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'contentType' field.
- setCrawlId(String) - Method in class org.apache.nutch.api.model.request.JobConfig
-
- setCrawlId(String) - Method in class org.apache.nutch.api.model.response.JobInfo
-
- setCrawlId(String) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setCrawlId(String) - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- setCrawlId(String) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setCrawlName(String) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setDataTimeout(int) - Method in class org.apache.nutch.protocol.ftp.Client
-
Sets the timeout in milliseconds to use for data connection.
- setDatum(WebPage) - Method in class org.apache.nutch.crawl.URLWebPage
-
- setDefaultValue(String) - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument.DocumentField
-
- setDescriptor(PluginDescriptor) - Method in class org.apache.nutch.plugin.Extension
-
Sets the plugin descriptor and is only used until model creation at system
start up.
- setDocumentFields(List<JsoupDocument.DocumentField>) - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument
-
- setDocumentLocator(Locator) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive an object for locating the origin of SAX document events.
- setEndKey(String) - Method in class org.apache.nutch.api.model.request.DbFilter
-
- setFetchInterval(int) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'fetchInterval' field
- setFetchInterval(Integer) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'fetchInterval' field.
- setFetchInterval(Integer) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'fetchInterval' field.
- setFetchSchedule(String, WebPage, long, long, long, long, int) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
Sets the fetchInterval
and fetchTime
on a
successfully fetched page.
- setFetchSchedule(String, WebPage, long, long, long, long, int) - Method in class org.apache.nutch.crawl.AdaptiveFetchSchedule
-
- setFetchSchedule(String, WebPage, long, long, long, long, int) - Method in class org.apache.nutch.crawl.DefaultFetchSchedule
-
- setFetchSchedule(String, WebPage, long, long, long, long, int) - Method in interface org.apache.nutch.crawl.FetchSchedule
-
Sets the fetchInterval
and fetchTime
on a
successfully fetched page.
- setFetchTime(long) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'fetchTime' field
- setFetchTime(Long) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'fetchTime' field.
- setFetchTime(Long) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'fetchTime' field.
- setFields(Set<String>) - Method in class org.apache.nutch.api.model.request.DbFilter
-
- setFileType(int) - Method in class org.apache.nutch.protocol.ftp.Client
-
Sets the file type to be transferred.
- setFilterFromPath(boolean) - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- setFollowTalk(boolean) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
Set followTalk
- setForce(boolean) - Method in class org.apache.nutch.api.model.request.NutchConfig
-
- setFParsePluginsFile(String) - Method in class org.apache.nutch.parse.ParsePluginsReader
-
- setHeaders(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'headers' field
- setHeaders(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'headers' field.
- setHeaders(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'headers' field.
- setHost(String) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setId(Long) - Method in class org.apache.nutch.api.model.request.SeedList
-
- setId(Long) - Method in class org.apache.nutch.api.model.request.SeedUrl
-
- setId(String) - Method in class org.apache.nutch.api.model.response.JobInfo
-
- setId(String) - Method in class org.apache.nutch.plugin.Extension
-
Sets the unique extension Id and is only used until model creation at
system start up.
- setId(Long) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setId(String) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setId(Long) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setId(Long) - Method in class org.apache.nutch.webui.model.SeedList
-
- setId(Long) - Method in class org.apache.nutch.webui.model.SeedUrl
-
- setIDAttribute(String, Element) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Set an ID string to node association in the ID table.
- setIgnoreCase(boolean) - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- setInfo(JobInfo) - Method in class org.apache.nutch.api.impl.JobWorker
-
- setInlinks(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.Host.Builder
-
Sets the value of the 'inlinks' field
- setInlinks(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.Host
-
Sets the value of the 'inlinks' field.
- setInlinks(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.Host.Tombstone
-
Sets the value of the 'inlinks' field.
- setInlinks(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'inlinks' field
- setInlinks(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'inlinks' field.
- setInlinks(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'inlinks' field.
- setInputStream(InputStream) - Method in class org.apache.nutch.util.CommandRunner
-
- setInstances(List<NutchInstance>) - Method in class org.apache.nutch.webui.config.NutchGuiConfiguration
-
- setJobClassName(String) - Method in class org.apache.nutch.api.model.request.JobConfig
-
- setJobClassName(String) - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- setJobConfig(JobConfig) - Method in class org.apache.nutch.webui.client.impl.RemoteCommand
-
- setJobInfo(JobInfo) - Method in class org.apache.nutch.webui.client.impl.RemoteCommand
-
- setJobs(Collection<JobInfo>) - Method in class org.apache.nutch.api.model.response.NutchStatus
-
Sets jobs
- setJobs(Collection<JobInfo>) - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- setKeepConnection(boolean) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
Set keepConnection
- setKeysReversed(boolean) - Method in class org.apache.nutch.api.model.request.DbFilter
-
- setLastModified(long) - Method in class org.apache.nutch.storage.ProtocolStatus.Builder
-
Sets the value of the 'lastModified' field
- setLastModified(Long) - Method in class org.apache.nutch.storage.ProtocolStatus
-
Sets the value of the 'lastModified' field.
- setLastModified(Long) - Method in class org.apache.nutch.storage.ProtocolStatus.Tombstone
-
Sets the value of the 'lastModified' field.
- setMajorCode(int) - Method in class org.apache.nutch.storage.ParseStatus.Builder
-
Sets the value of the 'majorCode' field
- setMajorCode(Integer) - Method in class org.apache.nutch.storage.ParseStatus
-
Sets the value of the 'majorCode' field.
- setMajorCode(Integer) - Method in class org.apache.nutch.storage.ParseStatus.Tombstone
-
Sets the value of the 'majorCode' field.
- setMarkers(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'markers' field
- setMarkers(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'markers' field.
- setMarkers(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'markers' field.
- setMaxContentLength(int) - Method in class org.apache.nutch.protocol.file.File
-
Set the point at which content is truncated.
- setMaxContentLength(int) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
Set the point at which content is truncated.
- setMeta(String, String) - Method in class org.apache.nutch.metadata.MetaWrapper
-
Set metadata.
- setMeta(String, byte[]) - Method in class org.apache.nutch.scoring.ScoreDatum
-
- setMetadata(Metadata) - Method in class org.apache.nutch.protocol.Content
-
Other protocol-specific data.
- setMetadata(Map<CharSequence, ByteBuffer>) - Method in class org.apache.nutch.storage.Host.Builder
-
Sets the value of the 'metadata' field
- setMetadata(Map<CharSequence, ByteBuffer>) - Method in class org.apache.nutch.storage.Host
-
Sets the value of the 'metadata' field.
- setMetadata(Map<CharSequence, ByteBuffer>) - Method in class org.apache.nutch.storage.Host.Tombstone
-
Sets the value of the 'metadata' field.
- setMetadata(Map<CharSequence, ByteBuffer>) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'metadata' field
- setMetadata(Map<CharSequence, ByteBuffer>) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'metadata' field.
- setMetadata(Map<CharSequence, ByteBuffer>) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'metadata' field.
- setMinorCode(int) - Method in class org.apache.nutch.storage.ParseStatus.Builder
-
Sets the value of the 'minorCode' field
- setMinorCode(Integer) - Method in class org.apache.nutch.storage.ParseStatus
-
Sets the value of the 'minorCode' field.
- setMinorCode(Integer) - Method in class org.apache.nutch.storage.ParseStatus.Tombstone
-
Sets the value of the 'minorCode' field.
- setModeAccept(boolean) - Method in class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- setModel(IModel<Crawl>) - Method in class org.apache.nutch.webui.pages.crawls.CrawlPanel
-
- setModel(IModel<NutchInstance>) - Method in class org.apache.nutch.webui.pages.instances.InstancePanel
-
- setModifiedTime(long) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'modifiedTime' field
- setModifiedTime(Long) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'modifiedTime' field.
- setModifiedTime(Long) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'modifiedTime' field.
- setMsg(String) - Method in class org.apache.nutch.api.model.response.JobInfo
-
- setMsg(String) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setName(String) - Method in class org.apache.nutch.api.model.request.SeedList
-
- setName(String) - Method in class org.apache.nutch.webui.model.NutchConfig
-
- setName(String) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setName(String) - Method in class org.apache.nutch.webui.model.SeedList
-
- setNoCache() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets noCache
to true
.
- setNoFollow() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets noFollow
to true
.
- setNoIndex() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets noIndex
to true
.
- setNormalizer(Normalizable) - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument.DocumentField
-
- setNumberOfRounds(Integer) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setObject(String, Object) - Method in class org.apache.nutch.util.ObjectCache
-
- setOutlinks(Map<Outlink, Metadata>) - Method in class org.apache.nutch.parse.NutchSitemapParse
-
- setOutlinks(Outlink[]) - Method in class org.apache.nutch.parse.Parse
-
- setOutlinks(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.Host.Builder
-
Sets the value of the 'outlinks' field
- setOutlinks(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.Host
-
Sets the value of the 'outlinks' field.
- setOutlinks(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.Host.Tombstone
-
Sets the value of the 'outlinks' field.
- setOutlinks(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'outlinks' field
- setOutlinks(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'outlinks' field.
- setOutlinks(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'outlinks' field.
- setPageGoneSchedule(String, WebPage, long, long, long) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
This method specifies how to schedule refetching of pages marked as GONE.
- setPageGoneSchedule(String, WebPage, long, long, long) - Method in interface org.apache.nutch.crawl.FetchSchedule
-
This method specifies how to schedule refetching of pages marked as GONE.
- setPageRetrySchedule(String, WebPage, long, long, long) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
This method adjusts the fetch schedule if fetching needs to be re-tried due
to transient errors.
- setPageRetrySchedule(String, WebPage, long, long, long) - Method in interface org.apache.nutch.crawl.FetchSchedule
-
This method adjusts the fetch schedule if fetching needs to be re-tried due
to transient errors.
- setParams(Map<String, String>) - Method in class org.apache.nutch.api.model.request.NutchConfig
-
- setParseStatus(ParseStatus) - Method in class org.apache.nutch.parse.NutchSitemapParse
-
- setParseStatus(ParseStatus) - Method in class org.apache.nutch.parse.Parse
-
- setParseStatus(ParseStatus) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'parseStatus' field
- setParseStatus(ParseStatus) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'parseStatus' field.
- setParseStatus(ParseStatus) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'parseStatus' field.
- setPassword(String) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setPassword(String) - Method in class org.apache.nutch.webui.pages.auth.User
-
- setPort(Integer) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setPrevFetchTime(long) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'prevFetchTime' field
- setPrevFetchTime(Long) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'prevFetchTime' field.
- setPrevFetchTime(Long) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'prevFetchTime' field.
- setPrevModifiedTime(long) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'prevModifiedTime' field
- setPrevModifiedTime(Long) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'prevModifiedTime' field.
- setPrevModifiedTime(Long) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'prevModifiedTime' field.
- setPrevSignature(ByteBuffer) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'prevSignature' field
- setPrevSignature(ByteBuffer) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'prevSignature' field.
- setPrevSignature(ByteBuffer) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'prevSignature' field.
- setProgress(int) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setProperty(String, String, String) - Method in interface org.apache.nutch.api.ConfManager
-
- setProperty(String, String, String) - Method in class org.apache.nutch.api.impl.RAMConfManager
-
Sets a property for the configuration which has given configuration id.
- setProtocolStatus(ProtocolStatus) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'protocolStatus' field
- setProtocolStatus(ProtocolStatus) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'protocolStatus' field.
- setProtocolStatus(ProtocolStatus) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'protocolStatus' field.
- setRefresh(boolean) - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets refresh
to the supplied value.
- setRefreshHref(URL) - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets the refreshHref
.
- setRefreshTime(int) - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets the refreshTime
.
- setRemoteVerificationEnabled(boolean) - Method in class org.apache.nutch.protocol.ftp.Client
-
Enable or disable verification that the remote host taking part of a data
connection is the same as the host to which the control connection is
attached.
- setReprUrl(CharSequence) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'reprUrl' field
- setReprUrl(CharSequence) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'reprUrl' field.
- setReprUrl(CharSequence) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'reprUrl' field.
- setRequestDelay(Duration) - Method in class org.apache.nutch.webui.client.impl.RemoteCommandExecutor
-
- setResult(Map<String, Object>) - Method in class org.apache.nutch.api.model.response.JobInfo
-
- setResult(Map<String, Object>) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setRetriesSinceFetch(int) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'retriesSinceFetch' field
- setRetriesSinceFetch(Integer) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'retriesSinceFetch' field.
- setRetriesSinceFetch(Integer) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'retriesSinceFetch' field.
- setRunningJobs(Collection<JobInfo>) - Method in class org.apache.nutch.api.model.response.NutchStatus
-
Sets running jobs
- setRunningJobs(Collection<JobInfo>) - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- setScore(FloatWritable) - Method in class org.apache.nutch.crawl.UrlWithScore
-
- setScore(float) - Method in class org.apache.nutch.crawl.UrlWithScore
-
- setScore(float) - Method in class org.apache.nutch.indexer.NutchDocument
-
- setScore(float) - Method in class org.apache.nutch.scoring.ScoreDatum
-
- setScore(float) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'score' field
- setScore(Float) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'score' field.
- setScore(Float) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'score' field.
- setSeedDirectory(String) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setSeedList(SeedList) - Method in class org.apache.nutch.api.model.request.SeedUrl
-
- setSeedList(SeedList) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setSeedList(SeedList) - Method in class org.apache.nutch.webui.model.SeedUrl
-
- setSeedUrls(Collection<SeedUrl>) - Method in class org.apache.nutch.api.model.request.SeedList
-
- setSeedUrls(Collection<SeedUrl>) - Method in class org.apache.nutch.webui.model.SeedList
-
- setSignature(ByteBuffer) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'signature' field
- setSignature(ByteBuffer) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'signature' field.
- setSignature(ByteBuffer) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'signature' field.
- setSitemaps(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'sitemaps' field
- setSitemaps(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'sitemaps' field.
- setSitemaps(Map<CharSequence, CharSequence>) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'sitemaps' field.
- setStartDate(Date) - Method in class org.apache.nutch.api.model.response.NutchStatus
-
- setStartDate(Date) - Method in class org.apache.nutch.webui.client.model.NutchStatus
-
- setStartKey(String) - Method in class org.apache.nutch.api.model.request.DbFilter
-
- setState(JobInfo.State) - Method in class org.apache.nutch.api.model.response.JobInfo
-
- setState(JobInfo.State) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setStatus(ProtocolStatus) - Method in class org.apache.nutch.protocol.ProtocolOutput
-
- setStatus(int) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'status' field
- setStatus(Integer) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'status' field.
- setStatus(Integer) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'status' field.
- setStatus(Crawl.CrawlStatus) - Method in class org.apache.nutch.webui.client.model.Crawl
-
- setStdErrorStream(OutputStream) - Method in class org.apache.nutch.util.CommandRunner
-
- setStdOutputStream(OutputStream) - Method in class org.apache.nutch.util.CommandRunner
-
- setStmPriority(float) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'stmPriority' field
- setStmPriority(Float) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'stmPriority' field.
- setStmPriority(Float) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'stmPriority' field.
- setText(String) - Method in class org.apache.nutch.parse.Parse
-
- setText(CharSequence) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'text' field
- setText(CharSequence) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'text' field.
- setText(CharSequence) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'text' field.
- setTimeout(int) - Method in class org.apache.nutch.protocol.ftp.Ftp
-
Set the timeout.
- setTimeout(int) - Method in class org.apache.nutch.util.CommandRunner
-
- setTimeout(Duration) - Method in class org.apache.nutch.webui.client.impl.RemoteCommand
-
- SettingsPage - Class in org.apache.nutch.webui.pages.settings
-
- SettingsPage() - Constructor for class org.apache.nutch.webui.pages.settings.SettingsPage
-
- setTitle(String) - Method in class org.apache.nutch.parse.Parse
-
- setTitle(CharSequence) - Method in class org.apache.nutch.storage.WebPage.Builder
-
Sets the value of the 'title' field
- setTitle(CharSequence) - Method in class org.apache.nutch.storage.WebPage
-
Sets the value of the 'title' field.
- setTitle(CharSequence) - Method in class org.apache.nutch.storage.WebPage.Tombstone
-
Sets the value of the 'title' field.
- setType(JobManager.JobType) - Method in class org.apache.nutch.api.model.request.JobConfig
-
- setType(JobManager.JobType) - Method in class org.apache.nutch.api.model.response.JobInfo
-
- setType(JobInfo.JobType) - Method in class org.apache.nutch.webui.client.model.JobConfig
-
- setType(String) - Method in class org.apache.nutch.webui.client.model.JobInfo
-
- setup(Mapper<String, WebPage, UrlWithScore, NutchWritable>.Context) - Method in class org.apache.nutch.crawl.DbUpdateMapper
-
- setup(Reducer<UrlWithScore, NutchWritable, String, WebPage>.Context) - Method in class org.apache.nutch.crawl.DbUpdateReducer
-
- setup(Mapper<String, WebPage, GeneratorJob.SelectorEntry, WebPage>.Context) - Method in class org.apache.nutch.crawl.GeneratorMapper
-
- setup(Reducer<GeneratorJob.SelectorEntry, WebPage, String, WebPage>.Context) - Method in class org.apache.nutch.crawl.GeneratorReducer
-
- setup(Mapper<LongWritable, Text, String, WebPage>.Context) - Method in class org.apache.nutch.crawl.InjectorJob.UrlMapper
-
- setup(Mapper<String, WebPage, Text, Text>.Context) - Method in class org.apache.nutch.crawl.WebTableReader.WebTableRegexMapper
-
- setup(Reducer<Text, LongWritable, Text, LongWritable>.Context) - Method in class org.apache.nutch.crawl.WebTableReader.WebTableStatCombiner
-
- setup(Mapper<String, WebPage, Text, LongWritable>.Context) - Method in class org.apache.nutch.crawl.WebTableReader.WebTableStatMapper
-
- setup(Mapper<String, WebPage, IntWritable, FetchEntry>.Context) - Method in class org.apache.nutch.fetcher.FetcherJob.FetcherMapper
-
- setup(Mapper<LongWritable, Text, String, Host>.Context) - Method in class org.apache.nutch.host.HostInjectorJob.UrlMapper
-
- setup(Mapper<String, WebPage, String, WebPage>.Context) - Method in class org.apache.nutch.indexer.CleaningJob.CleanMapper
-
- setup(Reducer<String, WebPage, NullWritable, NullWritable>.Context) - Method in class org.apache.nutch.indexer.CleaningJob.CleanReducer
-
- setup(Mapper<String, WebPage, String, NutchDocument>.Context) - Method in class org.apache.nutch.indexer.IndexingJob.IndexerMapper
-
- setup(Reducer<Text, SolrDeleteDuplicates.SolrRecord, Text, SolrDeleteDuplicates.SolrRecord>.Context) - Method in class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
-
- setup(Mapper<String, WebPage, String, WebPage>.Context) - Method in class org.apache.nutch.parse.ParserJob.ParserMapper
-
- setup(Mapper<String, WebPage, Text, LongWritable>.Context) - Method in class org.apache.nutch.util.domain.DomainStatistics.DomainStatisticsMapper
-
- setUrl(String) - Method in class org.apache.nutch.api.model.request.SeedUrl
-
- setUrl(String) - Method in class org.apache.nutch.crawl.URLWebPage
-
- setUrl(Text) - Method in class org.apache.nutch.crawl.UrlWithScore
-
- setUrl(String) - Method in class org.apache.nutch.crawl.UrlWithScore
-
- setUrl(String) - Method in class org.apache.nutch.scoring.ScoreDatum
-
- setUrl(String) - Method in class org.apache.nutch.webui.model.SeedUrl
-
- setUrlPattern(Pattern) - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument
-
- setUser(User) - Method in class org.apache.nutch.webui.pages.auth.SignInPage
-
- setUsername(String) - Method in class org.apache.nutch.webui.model.NutchInstance
-
- setUsername(String) - Method in class org.apache.nutch.webui.pages.auth.User
-
- setValue(String) - Method in class org.apache.nutch.webui.model.NutchConfig
-
- setWaitForExit(boolean) - Method in class org.apache.nutch.util.CommandRunner
-
- setWebPage(WebPage) - Method in class org.apache.nutch.util.WebPageWritable
-
- setWhiteList(ArrayList<String>) - Method in class org.apache.nutch.collection.Subcollection
-
- setWhiteList(String) - Method in class org.apache.nutch.collection.Subcollection
-
Set contents of whitelist from String
- Sftp - Class in org.apache.nutch.protocol.sftp
-
This class uses the Jsch package to fetch content using the Sftp protocol.
- Sftp() - Constructor for class org.apache.nutch.protocol.sftp.Sftp
-
- shortestMatch(String) - Method in class org.apache.nutch.util.PrefixStringMatcher
-
Returns the shortest prefix of input
that is matched,
or null
if no match exists.
- shortestMatch(String) - Method in class org.apache.nutch.util.SuffixStringMatcher
-
Returns the shortest suffix of input
that is matched,
or null
if no match exists.
- shortestMatch(String) - Method in class org.apache.nutch.util.TrieStringMatcher
-
Returns the shortest substring of input
that is
matched by a pattern in the trie, or null
if no match
exists.
- shouldFetch(String, WebPage, long) - Method in class org.apache.nutch.crawl.AbstractFetchSchedule
-
This method provides information whether the page is suitable for selection
in the current fetchlist.
- shouldFetch(String, WebPage, long) - Method in interface org.apache.nutch.crawl.FetchSchedule
-
This method provides information whether the page is suitable for selection
in the current fetchlist.
- shouldProcess(CharSequence, Utf8) - Static method in class org.apache.nutch.util.NutchJob
-
- shutDown() - Method in class org.apache.nutch.plugin.Plugin
-
Shutdown the plugin.
- Signature - Class in org.apache.nutch.crawl
-
- Signature() - Constructor for class org.apache.nutch.crawl.Signature
-
- SIGNATURE_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- SignatureComparator - Class in org.apache.nutch.crawl
-
- SignatureComparator() - Constructor for class org.apache.nutch.crawl.SignatureComparator
-
- SignatureFactory - Class in org.apache.nutch.crawl
-
Factory class, which instantiates a Signature implementation according to the
current Configuration configuration.
- SignInPage - Class in org.apache.nutch.webui.pages.auth
-
Sign in page implementation.
- SignInPage() - Constructor for class org.apache.nutch.webui.pages.auth.SignInPage
-
- SignInSession - Class in org.apache.nutch.webui.pages.auth
-
Checks for whether authenticate or not.
- SignInSession(Request) - Constructor for class org.apache.nutch.webui.pages.auth.SignInSession
-
- SimpleStringNormalizer - Class in org.apache.nutch.core.jsoup.extractor.normalizer
-
- SimpleStringNormalizer() - Constructor for class org.apache.nutch.core.jsoup.extractor.normalizer.SimpleStringNormalizer
-
- SITEMAP - Static variable in class org.apache.nutch.fetcher.FetcherJob
-
- SITEMAP_DETECT - Static variable in class org.apache.nutch.fetcher.FetcherJob
-
- size() - Method in class org.apache.nutch.metadata.Metadata
-
Returns the number of metadata names in this metadata.
- SIZEOF_BOOLEAN - Static variable in class org.apache.nutch.util.Bytes
-
Size of boolean in bytes
- SIZEOF_BYTE - Static variable in class org.apache.nutch.util.Bytes
-
Size of byte in bytes
- SIZEOF_CHAR - Static variable in class org.apache.nutch.util.Bytes
-
Size of char in bytes
- SIZEOF_DOUBLE - Static variable in class org.apache.nutch.util.Bytes
-
Size of double in bytes
- SIZEOF_FLOAT - Static variable in class org.apache.nutch.util.Bytes
-
Size of float in bytes
- SIZEOF_INT - Static variable in class org.apache.nutch.util.Bytes
-
Size of int in bytes
- SIZEOF_LONG - Static variable in class org.apache.nutch.util.Bytes
-
Size of long in bytes
- SIZEOF_SHORT - Static variable in class org.apache.nutch.util.Bytes
-
Size of short in bytes
- skip(DataInput) - Static method in class org.apache.nutch.parse.Outlink
-
Skips over one Outlink in the input.
- SKIP_TRUNCATED - Static variable in class org.apache.nutch.parse.ParserJob
-
- skipChildren() - Method in class org.apache.nutch.util.NodeWalker
-
Skips over and removes from the node stack the children of the last node.
- skippedEntity(String) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive notification of a skipped entity.
- SOLR_PREFIX - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
-
- SOLR_PREFIX - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- SolrConstants - Interface in org.apache.nutch.indexer.solr
-
- SolrConstants - Interface in org.apache.nutch.indexwriter.solr
-
- SolrDeleteDuplicates - Class in org.apache.nutch.indexer.solr
-
Utility class for deleting duplicate documents from a solr index.
- SolrDeleteDuplicates() - Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
-
- SolrDeleteDuplicates.SolrInputFormat - Class in org.apache.nutch.indexer.solr
-
- SolrDeleteDuplicates.SolrInputSplit - Class in org.apache.nutch.indexer.solr
-
- SolrDeleteDuplicates.SolrRecord - Class in org.apache.nutch.indexer.solr
-
- SolrDeleteDuplicates.SolrRecordReader - Class in org.apache.nutch.indexer.solr
-
- SolrIndexWriter - Class in org.apache.nutch.indexwriter.solr
-
- SolrIndexWriter() - Constructor for class org.apache.nutch.indexwriter.solr.SolrIndexWriter
-
- SolrInputFormat() - Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
-
- SolrInputSplit() - Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
-
- SolrInputSplit(int, int) - Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputSplit
-
- SolrMappingReader - Class in org.apache.nutch.indexwriter.solr
-
- SolrMappingReader(Configuration) - Constructor for class org.apache.nutch.indexwriter.solr.SolrMappingReader
-
- SolrRecord() - Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
-
- SolrRecord(String, float, long) - Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecord
-
- SolrRecordReader(SolrDocumentList, int) - Constructor for class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrRecordReader
-
- SolrUtils - Class in org.apache.nutch.indexer.solr
-
- SolrUtils() - Constructor for class org.apache.nutch.indexer.solr.SolrUtils
-
- SolrUtils - Class in org.apache.nutch.indexwriter.solr
-
- SolrUtils() - Constructor for class org.apache.nutch.indexwriter.solr.SolrUtils
-
- sortByValue() - Method in class org.apache.nutch.util.Histogram
-
- sortInverseByValue() - Method in class org.apache.nutch.util.Histogram
-
- SOURCE - Static variable in interface org.apache.nutch.metadata.DublinCore
-
A reference to a resource from which the present resource is derived.
- SpellCheckedMetadata - Class in org.apache.nutch.metadata
-
A decorator to Metadata that adds spellchecking capabilities to property
names.
- SpellCheckedMetadata() - Constructor for class org.apache.nutch.metadata.SpellCheckedMetadata
-
- split(byte[], byte[], int) - Static method in class org.apache.nutch.util.Bytes
-
Split passed range.
- splitEnd - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- splitLen - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- splitStart - Variable in class org.apache.nutch.tools.arc.ArcRecordReader
-
- SpringConfiguration - Class in org.apache.nutch.webui.config
-
- SpringConfiguration() - Constructor for class org.apache.nutch.webui.config.SpringConfiguration
-
- start() - Method in class org.apache.nutch.api.NutchServer
-
Starts the Nutch server printing some logging to the log file.
- startCDATA() - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Report the start of a CDATA section.
- startCrawl(Long, NutchInstance) - Method in interface org.apache.nutch.webui.service.CrawlService
-
- startCrawl(Long, NutchInstance) - Method in class org.apache.nutch.webui.service.impl.CrawlServiceImpl
-
- startDocument() - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive notification of the beginning of a document.
- startDTD(String, String, String) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Report the start of DTD declarations, if any.
- startElement(String, String, String, Attributes) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Receive notification of the beginning of an element.
- startEntity(String) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Report the beginning of an entity.
- startPrefixMapping(String, String) - Method in class org.apache.nutch.parse.html.DOMBuilder
-
Begin the scope of a prefix-URI Namespace mapping.
- startsWith(byte[], byte[]) - Static method in class org.apache.nutch.util.Bytes
-
Return true if the byte array on the right is a prefix of the byte array on
the left.
- startUp() - Method in class org.apache.nutch.plugin.Plugin
-
Will be invoked until plugin start up.
- STAT_COUNTERS - Static variable in interface org.apache.nutch.metadata.Nutch
-
Counters.
- STAT_JOBS - Static variable in interface org.apache.nutch.metadata.Nutch
-
Jobs.
- STAT_MESSAGE - Static variable in interface org.apache.nutch.metadata.Nutch
-
Status / result message.
- STAT_PHASE - Static variable in interface org.apache.nutch.metadata.Nutch
-
Phase of processing.
- STAT_PROGRESS - Static variable in interface org.apache.nutch.metadata.Nutch
-
Progress (float).
- StatisticsPage - Class in org.apache.nutch.webui.pages
-
- StatisticsPage() - Constructor for class org.apache.nutch.webui.pages.StatisticsPage
-
- status(String, WebPage) - Method in class org.apache.nutch.parse.ParseUtil
-
- status - Variable in class org.apache.nutch.util.NutchTool
-
- STATUS_BLOCKED - Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- STATUS_FAILED - Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- STATUS_FETCHED - Static variable in class org.apache.nutch.crawl.CrawlStatus
-
Page was successfully fetched.
- STATUS_GONE - Static variable in class org.apache.nutch.crawl.CrawlStatus
-
Page no longer exists.
- STATUS_GONE - Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- STATUS_MODIFIED - Static variable in interface org.apache.nutch.crawl.FetchSchedule
-
Page is known to have been modified since our last visit.
- STATUS_NOTFETCHING - Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- STATUS_NOTFOUND - Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- STATUS_NOTMODIFIED - Static variable in class org.apache.nutch.crawl.CrawlStatus
-
Fetching successful - page is not modified.
- STATUS_NOTMODIFIED - Static variable in interface org.apache.nutch.crawl.FetchSchedule
-
Page is known to remain unmodified since our last visit.
- STATUS_NOTMODIFIED - Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- STATUS_REDIR_EXCEEDED - Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- STATUS_REDIR_PERM - Static variable in class org.apache.nutch.crawl.CrawlStatus
-
Page permanently redirects to other page.
- STATUS_REDIR_TEMP - Static variable in class org.apache.nutch.crawl.CrawlStatus
-
Page temporarily redirects to other page.
- STATUS_RETRY - Static variable in class org.apache.nutch.crawl.CrawlStatus
-
Fetching unsuccessful, needs to be retried (transient errors).
- STATUS_RETRY - Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- STATUS_ROBOTS_DENIED - Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- STATUS_SUCCESS - Static variable in class org.apache.nutch.parse.ParseStatusUtils
-
- STATUS_SUCCESS - Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- STATUS_UNFETCHED - Static variable in class org.apache.nutch.crawl.CrawlStatus
-
Page was not fetched yet.
- STATUS_UNKNOWN - Static variable in interface org.apache.nutch.crawl.FetchSchedule
-
It is unknown whether page was changed since our last visit.
- STATUS_WOULDBLOCK - Static variable in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- stop(String, String) - Method in class org.apache.nutch.api.impl.RAMJobManager
-
- stop(String, String) - Method in interface org.apache.nutch.api.JobManager
-
- stop(boolean) - Method in class org.apache.nutch.api.NutchServer
-
Stop the Nutch server.
- stop(boolean) - Method in class org.apache.nutch.api.resources.AdminResource
-
- stop(String, String) - Method in class org.apache.nutch.api.resources.JobResource
-
- stopJob() - Method in class org.apache.nutch.api.impl.JobWorker
-
- stopJob() - Method in class org.apache.nutch.util.NutchTool
-
Stop the job with the possibility to resume.
- StorageUtils - Class in org.apache.nutch.storage
-
Entry point to Gora store/mapreduce functionality.
- StorageUtils() - Constructor for class org.apache.nutch.storage.StorageUtils
-
- store - Variable in class org.apache.nutch.indexer.IndexingJob.IndexerMapper
-
- StringUtil - Class in org.apache.nutch.util
-
A collection of String processing utility methods.
- StringUtil() - Constructor for class org.apache.nutch.util.StringUtil
-
- stripNonCharCodepoints(String) - Static method in class org.apache.nutch.indexer.solr.SolrUtils
-
- stripNonCharCodepoints(String) - Static method in class org.apache.nutch.indexwriter.solr.SolrUtils
-
- Subcollection - Class in org.apache.nutch.collection
-
SubCollection represents a subset of index, you can define url patterns that
will indicate that particular page (url) is part of SubCollection.
- Subcollection(String, String, Configuration) - Constructor for class org.apache.nutch.collection.Subcollection
-
public Constructor
- Subcollection(Configuration) - Constructor for class org.apache.nutch.collection.Subcollection
-
- SubcollectionIndexingFilter - Class in org.apache.nutch.indexer.subcollection
-
- SubcollectionIndexingFilter() - Constructor for class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
-
- SubcollectionIndexingFilter(Configuration) - Constructor for class org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
-
- SUBJECT - Static variable in interface org.apache.nutch.metadata.DublinCore
-
The topic of the content of the resource.
- SUCCESS - Static variable in interface org.apache.nutch.parse.ParseStatusCodes
-
Parsing succeeded.
- SUCCESS - Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
-
Content was retrieved without errors.
- SUCCESS_OK - Static variable in interface org.apache.nutch.parse.ParseStatusCodes
-
- SUCCESS_REDIRECT - Static variable in interface org.apache.nutch.parse.ParseStatusCodes
-
Parsed content contains a directive to redirect to another URL.
- SuffixStringMatcher - Class in org.apache.nutch.util
-
A class for efficiently matching String
s against a set of
suffixes.
- SuffixStringMatcher(String[]) - Constructor for class org.apache.nutch.util.SuffixStringMatcher
-
Creates a new PrefixStringMatcher
which will match
String
s with any suffix in the supplied array.
- SuffixStringMatcher(Collection<String>) - Constructor for class org.apache.nutch.util.SuffixStringMatcher
-
Creates a new PrefixStringMatcher
which will match
String
s with any suffix in the supplied
Collection
- SuffixURLFilter - Class in org.apache.nutch.urlfilter.suffix
-
Filters URLs based on a file of URL suffixes.
- SuffixURLFilter() - Constructor for class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- SuffixURLFilter(Reader) - Constructor for class org.apache.nutch.urlfilter.suffix.SuffixURLFilter
-
- TableUtil - Class in org.apache.nutch.util
-
- TableUtil() - Constructor for class org.apache.nutch.util.TableUtil
-
- TAG_ATTRIBUTE - Static variable in class org.apache.nutch.core.jsoup.extractor.JsoupExtractorConstants
-
- TAG_BLACKLIST - Static variable in class org.apache.nutch.collection.Subcollection
-
- TAG_COLLECTION - Static variable in class org.apache.nutch.collection.Subcollection
-
- TAG_COLLECTIONS - Static variable in class org.apache.nutch.collection.Subcollection
-
- TAG_CSS_SELECTOR - Static variable in class org.apache.nutch.core.jsoup.extractor.JsoupExtractorConstants
-
- TAG_DEFAULT_VALUE - Static variable in class org.apache.nutch.core.jsoup.extractor.JsoupExtractorConstants
-
- TAG_DOCUMENT - Static variable in class org.apache.nutch.core.jsoup.extractor.JsoupExtractorConstants
-
- TAG_FIELD - Static variable in class org.apache.nutch.core.jsoup.extractor.JsoupExtractorConstants
-
- TAG_FIELD_LIST - Static variable in class org.apache.nutch.core.jsoup.extractor.JsoupExtractorConstants
-
- TAG_ID - Static variable in class org.apache.nutch.collection.Subcollection
-
- TAG_NAME - Static variable in class org.apache.nutch.collection.Subcollection
-
- TAG_NORMALIZER - Static variable in class org.apache.nutch.core.jsoup.extractor.JsoupExtractorConstants
-
- TAG_NORMALIZER_LIST - Static variable in class org.apache.nutch.core.jsoup.extractor.JsoupExtractorConstants
-
- TAG_TYPE - Static variable in class org.apache.nutch.core.jsoup.extractor.JsoupExtractorConstants
-
- TAG_TYPE_LIST - Static variable in class org.apache.nutch.core.jsoup.extractor.JsoupExtractorConstants
-
- TAG_WHITELIST - Static variable in class org.apache.nutch.collection.Subcollection
-
- tail(byte[], int) - Static method in class org.apache.nutch.util.Bytes
-
- TEMP_MOVED - Static variable in interface org.apache.nutch.protocol.ProtocolStatusCodes
-
Resource has moved temporarily.
- terminal - Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
-
- TestbedProxy - Class in org.apache.nutch.tools.proxy
-
- TestbedProxy() - Constructor for class org.apache.nutch.tools.proxy.TestbedProxy
-
- TestJsoupHtmlParser - Class in org.apache.nutch.parse.jsoup.extractor
-
- TestJsoupHtmlParser() - Constructor for class org.apache.nutch.parse.jsoup.extractor.TestJsoupHtmlParser
-
- testJsoupHtmlParser() - Method in class org.apache.nutch.parse.jsoup.extractor.TestJsoupHtmlParser
-
- TextMD5Signature - Class in org.apache.nutch.crawl
-
Default implementation of a page signature.
- TextMD5Signature() - Constructor for class org.apache.nutch.crawl.TextMD5Signature
-
- TextProfileSignature - Class in org.apache.nutch.crawl
-
An implementation of a page signature.
- TextProfileSignature() - Constructor for class org.apache.nutch.crawl.TextProfileSignature
-
- THREADS_KEY - Static variable in class org.apache.nutch.fetcher.FetcherJob
-
- throwBadRequestException(String) - Method in class org.apache.nutch.api.resources.AbstractResource
-
Throws HTTP 400 Bad Request Exception with given message
- TikaParser - Class in org.apache.nutch.parse.tika
-
Wrapper for Tika parsers.
- TikaParser() - Constructor for class org.apache.nutch.parse.tika.TikaParser
-
- timeout - Variable in class org.apache.nutch.protocol.http.api.HttpBase
-
The network timeout in millisecond
- TIMESTAMP_FIELD - Static variable in interface org.apache.nutch.indexer.solr.SolrConstants
-
- TIMESTAMP_FIELD - Static variable in interface org.apache.nutch.indexwriter.solr.SolrConstants
-
- TimingUtil - Class in org.apache.nutch.util
-
- TimingUtil() - Constructor for class org.apache.nutch.util.TimingUtil
-
- TITLE - Static variable in interface org.apache.nutch.metadata.DublinCore
-
A name given to the resource.
- TLDIndexingFilter - Class in org.apache.nutch.indexer.tld
-
Adds the Top level domain extensions to the index
- TLDIndexingFilter() - Constructor for class org.apache.nutch.indexer.tld.TLDIndexingFilter
-
- TLDScoringFilter - Class in org.apache.nutch.scoring.tld
-
Scoring filter to boost tlds.
- TLDScoringFilter() - Constructor for class org.apache.nutch.scoring.tld.TLDScoringFilter
-
- tlsPreferredCipherSuites - Variable in class org.apache.nutch.protocol.http.api.HttpBase
-
Which TLS/SSL cipher suites to support
- tlsPreferredProtocols - Variable in class org.apache.nutch.protocol.http.api.HttpBase
-
Which TLS/SSL protocols to support
- toArgMap(Object...) - Static method in class org.apache.nutch.util.ToolUtil
-
- toASCII(String) - Static method in class org.apache.nutch.util.URLUtil
-
- toBinaryFromHex(byte) - Static method in class org.apache.nutch.util.Bytes
-
Takes a ASCII digit in the range A-F0-9 and returns the corresponding
integer/ordinal value.
- toBoolean(byte[]) - Static method in class org.apache.nutch.util.Bytes
-
- toByteArrays(String[]) - Static method in class org.apache.nutch.util.Bytes
-
- toByteArrays(String) - Static method in class org.apache.nutch.util.Bytes
-
- toByteArrays(byte[]) - Static method in class org.apache.nutch.util.Bytes
-
- toBytes(ByteBuffer) - Static method in class org.apache.nutch.util.Bytes
-
Returns a new byte array, copied from the passed ByteBuffer.
- toBytes(String) - Static method in class org.apache.nutch.util.Bytes
-
Converts a string to a UTF-8 byte array.
- toBytes(boolean) - Static method in class org.apache.nutch.util.Bytes
-
Convert a boolean to a byte array.
- toBytes(long) - Static method in class org.apache.nutch.util.Bytes
-
Convert a long value to a byte array using big-endian.
- toBytes(float) - Static method in class org.apache.nutch.util.Bytes
-
- toBytes(double) - Static method in class org.apache.nutch.util.Bytes
-
Serialize a double as the IEEE 754 double format output.
- toBytes(int) - Static method in class org.apache.nutch.util.Bytes
-
Convert an int value to a byte array
- toBytes(short) - Static method in class org.apache.nutch.util.Bytes
-
- toBytesBinary(String) - Static method in class org.apache.nutch.util.Bytes
-
- toContent() - Method in class org.apache.nutch.protocol.file.FileResponse
-
- toContent() - Method in class org.apache.nutch.protocol.ftp.FtpResponse
-
- toDate(String) - Static method in class org.apache.nutch.net.protocols.HttpDateFormat
-
- toDouble(byte[]) - Static method in class org.apache.nutch.util.Bytes
-
- toDouble(byte[], int) - Static method in class org.apache.nutch.util.Bytes
-
- toFloat(byte[]) - Static method in class org.apache.nutch.util.Bytes
-
Presumes float encoded as IEEE 754 floating-point "single format"
- toFloat(byte[], int) - Static method in class org.apache.nutch.util.Bytes
-
Presumes float encoded as IEEE 754 floating-point "single format"
- toHexString(ByteBuffer) - Static method in class org.apache.nutch.util.StringUtil
-
- toHexString(ByteBuffer, String, int) - Static method in class org.apache.nutch.util.StringUtil
-
Get a text representation of a ByteBuffer as hexadecimal String, where each
pair of hexadecimal digits corresponds to consecutive bytes in the array.
- toHexString(byte[]) - Static method in class org.apache.nutch.util.StringUtil
-
- toHexString(byte[], String, int) - Static method in class org.apache.nutch.util.StringUtil
-
Get a text representation of a byte[] as hexadecimal String, where each
pair of hexadecimal digits corresponds to consecutive bytes in the array.
- toHexString(byte[], int, int, String, int) - Static method in class org.apache.nutch.util.StringUtil
-
Get a text representation of a byte[] as hexadecimal String, where each
pair of hexadecimal digits corresponds to consecutive bytes in the array.
- toInt(byte[]) - Static method in class org.apache.nutch.util.Bytes
-
Converts a byte array to an int value
- toInt(byte[], int) - Static method in class org.apache.nutch.util.Bytes
-
Converts a byte array to an int value
- toInt(byte[], int, int) - Static method in class org.apache.nutch.util.Bytes
-
Converts a byte array to an int value
- toLong(String) - Static method in class org.apache.nutch.net.protocols.HttpDateFormat
-
- toLong(byte[]) - Static method in class org.apache.nutch.util.Bytes
-
Converts a byte array to a long value.
- toLong(byte[], int) - Static method in class org.apache.nutch.util.Bytes
-
Converts a byte array to a long value.
- toLong(byte[], int, int) - Static method in class org.apache.nutch.util.Bytes
-
Converts a byte array to a long value.
- ToolUtil - Class in org.apache.nutch.util
-
- ToolUtil() - Constructor for class org.apache.nutch.util.ToolUtil
-
- TopLevelDomain - Class in org.apache.nutch.util.domain
-
(From wikipedia) A top-level domain (TLD) is the last part of an Internet
domain name; that is, the letters which follow the final dot of any domain
name.
- TopLevelDomain(String, TopLevelDomain.Type, DomainSuffix.Status, float) - Constructor for class org.apache.nutch.util.domain.TopLevelDomain
-
- TopLevelDomain(String, DomainSuffix.Status, float, String) - Constructor for class org.apache.nutch.util.domain.TopLevelDomain
-
- TopLevelDomain.Type - Enum in org.apache.nutch.util.domain
-
- toShort(byte[]) - Static method in class org.apache.nutch.util.Bytes
-
Converts a byte array to a short value
- toShort(byte[], int) - Static method in class org.apache.nutch.util.Bytes
-
Converts a byte array to a short value
- toShort(byte[], int, int) - Static method in class org.apache.nutch.util.Bytes
-
Converts a byte array to a short value
- toString() - Method in enum org.apache.nutch.api.security.AuthorizationRoleEnum
-
- toString() - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument.DocumentField
-
- toString() - Method in class org.apache.nutch.core.jsoup.extractor.JsoupDocument
-
- toString() - Method in class org.apache.nutch.crawl.UrlWithScore
-
- toString() - Method in class org.apache.nutch.fetcher.FetchEntry
-
- toString() - Method in class org.apache.nutch.indexer.NutchDocument
-
A utility-like method which can easily be used to write any
NutchDocument
object to string for simple
debugging.
- toString() - Method in class org.apache.nutch.metadata.Metadata
-
- toString(Date) - Static method in class org.apache.nutch.net.protocols.HttpDateFormat
-
Get the HTTP format of the specified date.
- toString(Calendar) - Static method in class org.apache.nutch.net.protocols.HttpDateFormat
-
- toString(long) - Static method in class org.apache.nutch.net.protocols.HttpDateFormat
-
- toString() - Method in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
-
- toString() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
- toString() - Method in class org.apache.nutch.parse.Outlink
-
- toString(ParseStatus) - Static method in class org.apache.nutch.parse.ParseStatusUtils
-
- toString() - Method in class org.apache.nutch.protocol.Content
-
- toString(ProtocolStatus) - Static method in class org.apache.nutch.protocol.ProtocolStatusUtils
-
- toString() - Method in class org.apache.nutch.scoring.ScoreDatum
-
- toString() - Method in enum org.apache.nutch.storage.Host.Field
-
Gets field's attributes to string.
- toString() - Method in enum org.apache.nutch.storage.ParseStatus.Field
-
Gets field's attributes to string.
- toString() - Method in enum org.apache.nutch.storage.ProtocolStatus.Field
-
Gets field's attributes to string.
- toString() - Method in enum org.apache.nutch.storage.WebPage.Field
-
Gets field's attributes to string.
- toString() - Method in class org.apache.nutch.tools.Benchmark.BenchmarkResults
-
- toString(ByteBuffer) - Static method in class org.apache.nutch.util.Bytes
-
This method will convert utf8 encoded bytes into a string.
- toString(byte[]) - Static method in class org.apache.nutch.util.Bytes
-
- toString(byte[], String, byte[]) - Static method in class org.apache.nutch.util.Bytes
-
Joins two byte arrays together using a separator.
- toString(byte[], int, int) - Static method in class org.apache.nutch.util.Bytes
-
This method will convert utf8 encoded bytes into a string.
- toString() - Method in class org.apache.nutch.util.domain.DomainSuffix
-
- toString(List<E>) - Method in class org.apache.nutch.util.Histogram
-
- toString(CharSequence) - Static method in class org.apache.nutch.util.TableUtil
-
Convert given Utf8 instance to String and and cleans out any offending "�"
from the String.
- toString() - Method in class org.apache.nutch.webui.client.impl.RemoteCommand
-
- toStringArray(Collection<WebPage.Field>) - Static method in class org.apache.nutch.storage.StorageUtils
-
- toStringBinary(ByteBuffer) - Static method in class org.apache.nutch.util.Bytes
-
Write a printable representation of a ByteBuffer.
- toStringBinary(byte[]) - Static method in class org.apache.nutch.util.Bytes
-
Write a printable representation of a byte array.
- toStringBinary(byte[], int, int) - Static method in class org.apache.nutch.util.Bytes
-
Write a printable representation of a byte array.
- toUNICODE(String) - Static method in class org.apache.nutch.util.URLUtil
-
- TRANSFER_ENCODING - Static variable in interface org.apache.nutch.metadata.HttpHeaders
-
- TrieStringMatcher - Class in org.apache.nutch.util
-
TrieStringMatcher is a base class for simple tree-based string matching.
- TrieStringMatcher() - Constructor for class org.apache.nutch.util.TrieStringMatcher
-
- TrieStringMatcher.TrieNode - Class in org.apache.nutch.util
-
Node class for the character tree.
- TYPE - Static variable in interface org.apache.nutch.metadata.DublinCore
-
The nature or genre of the content of the resource.
- valueOf(String) - Static method in enum org.apache.nutch.api.JobManager.JobType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.api.model.response.JobInfo.State
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.api.security.AuthenticationTypeEnum
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.api.security.AuthorizationRoleEnum
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.crawl.InjectType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.parse.ParseUtil.ChangeFrequency
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.protocol.http.HttpResponse.Scheme
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.storage.Host.Field
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.storage.Mark
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.storage.ParseStatus.Field
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.storage.ProtocolStatus.Field
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.storage.WebPage.Field
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.tools.proxy.FakeHandler.Mode
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.util.domain.DomainStatistics.MyCounter
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.util.domain.DomainSuffix.Status
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.util.domain.TopLevelDomain.Type
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.webui.client.model.ConnectionStatus
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.webui.client.model.Crawl.CrawlStatus
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.webui.client.model.JobInfo.JobType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.nutch.webui.client.model.JobInfo.State
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum org.apache.nutch.api.JobManager.JobType
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.api.model.response.JobInfo.State
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.api.security.AuthenticationTypeEnum
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.api.security.AuthorizationRoleEnum
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.crawl.InjectType
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.parse.ParseUtil.ChangeFrequency
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.protocol.http.HttpResponse.Scheme
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.storage.Host.Field
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.storage.Mark
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.storage.ParseStatus.Field
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.storage.ProtocolStatus.Field
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.storage.WebPage.Field
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.tools.proxy.FakeHandler.Mode
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.util.domain.DomainStatistics.MyCounter
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.util.domain.DomainSuffix.Status
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.util.domain.TopLevelDomain.Type
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.webui.client.model.ConnectionStatus
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.webui.client.model.Crawl.CrawlStatus
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.webui.client.model.JobInfo.JobType
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.apache.nutch.webui.client.model.JobInfo.State
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- VERSION - Static variable in class org.apache.nutch.indexer.NutchDocument
-
- VerticalMenu - Class in org.apache.nutch.webui.pages.menu
-
- VerticalMenu(String) - Constructor for class org.apache.nutch.webui.pages.menu.VerticalMenu
-
- ViewCountNormalizer - Class in org.apache.nutch.parse.jsoup.extractor
-
- ViewCountNormalizer() - Constructor for class org.apache.nutch.parse.jsoup.extractor.ViewCountNormalizer
-
- vintToBytes(long) - Static method in class org.apache.nutch.util.Bytes
-