See: Description
Package | Description |
---|---|
org.apache.lucene |
Top-level package.
|
org.apache.lucene.analysis |
Text analysis.
|
org.apache.lucene.analysis.standard |
Fast, general-purpose grammar-based tokenizer
StandardTokenizer
implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29. |
org.apache.lucene.analysis.tokenattributes |
General-purpose attributes for text analysis.
|
org.apache.lucene.codecs |
Codecs API: API for customization of the encoding and structure of the index.
|
org.apache.lucene.codecs.blocktree |
BlockTree terms dictionary.
|
org.apache.lucene.codecs.compressing |
StoredFieldsFormat that allows cross-document and cross-field compression of stored fields.
|
org.apache.lucene.codecs.lucene50 |
Components from the Lucene 5.0 index format
See
org.apache.lucene.codecs.lucene80 for an overview
of the index format. |
org.apache.lucene.codecs.lucene60 |
Components from the Lucene 6.0 index format.
|
org.apache.lucene.codecs.lucene80 |
Components from the Lucene 8.0 index format
See
org.apache.lucene.codecs.lucene84 for an overview
of the index format. |
org.apache.lucene.codecs.lucene84 |
Components from the Lucene 8.4 index format.
|
org.apache.lucene.codecs.lucene86 |
Lucene 8.6 file format.
|
org.apache.lucene.codecs.lucene87 |
Lucene 8.7 file format.
|
org.apache.lucene.codecs.perfield |
Postings format that can delegate to different formats per-field.
|
org.apache.lucene.document |
The logical representation of a
Document for indexing and searching. |
org.apache.lucene.geo |
Geospatial Utility Implementations for Lucene Core
|
org.apache.lucene.index |
Code to maintain and access indices.
|
org.apache.lucene.search |
Code to search indices.
|
org.apache.lucene.search.comparators |
Comparators, used to compare hits so as to determine their
sort order when collecting the top results with
TopFieldCollector . |
org.apache.lucene.search.similarities |
This package contains the various ranking models that can be used in Lucene.
|
org.apache.lucene.search.spans |
The calculus of spans.
|
org.apache.lucene.store |
Binary i/o API, used for all index data.
|
org.apache.lucene.util |
Some utility classes.
|
org.apache.lucene.util.automaton |
Finite-state automaton for regular expressions.
|
org.apache.lucene.util.bkd |
Block KD-tree, implementing the generic spatial data structure described in
this paper.
|
org.apache.lucene.util.compress |
Compression utilities.
|
org.apache.lucene.util.fst |
Finite state transducers
|
org.apache.lucene.util.graph |
Utility classes for working with token streams as graphs.
|
org.apache.lucene.util.mutable |
Comparable object wrappers
|
org.apache.lucene.util.packed |
Packed integer arrays and streams.
|
Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using JUnit to check if the results are what we expect):
Analyzer analyzer = new StandardAnalyzer(); Path indexPath = Files.createTempDirectory("tempIndex"); Directory directory = FSDirectory.open(indexPath) IndexWriterConfig config = new IndexWriterConfig(analyzer); IndexWriter iwriter = new IndexWriter(directory, config); Document doc = new Document(); String text = "This is the text to be indexed."; doc.add(new Field("fieldname", text, TextField.TYPE_STORED)); iwriter.addDocument(doc); iwriter.close(); // Now search the index: DirectoryReader ireader = DirectoryReader.open(directory); IndexSearcher isearcher = new IndexSearcher(ireader); // Parse a simple query that searches for "text": QueryParser parser = new QueryParser("fieldname", analyzer); Query query = parser.parse("text"); ScoreDoc[] hits = isearcher.search(query, 10).scoreDocs; assertEquals(1, hits.length); // Iterate through the results: for (int i = 0; i < hits.length; i++) { Document hitDoc = isearcher.doc(hits[i].doc); assertEquals("This is the text to be indexed.", hitDoc.get("fieldname")); } ireader.close(); directory.close(); IOUtils.rm(indexPath);
The Lucene API is divided into several packages:
org.apache.lucene.analysis
defines an abstract Analyzer
API for converting text from a Reader
into a TokenStream
,
an enumeration of token Attribute
s.
A TokenStream can be composed by applying TokenFilter
s
to the output of a Tokenizer
.
Tokenizers and TokenFilters are strung together and applied with an Analyzer
.
analyzers-common provides a number of Analyzer implementations, including
StopAnalyzer
and the grammar-based StandardAnalyzer.org.apache.lucene.codecs
provides an abstraction over the encoding and decoding of the inverted index structure,
as well as different implementations that can be chosen depending upon application needs.
org.apache.lucene.document
provides a simple Document
class. A Document is simply a set of named Field
s,
whose values may be strings or instances of Reader
.org.apache.lucene.index
provides two primary classes: IndexWriter
,
which creates and adds documents to indices; and IndexReader
,
which accesses the data in the index.org.apache.lucene.search
provides data structures to represent queries (ie TermQuery
for individual words, PhraseQuery
for phrases, and BooleanQuery
for boolean combinations of queries) and the IndexSearcher
which turns queries into TopDocs
.
A number of QueryParsers are provided for producing
query structures from strings or xml.
org.apache.lucene.store
defines an abstract class for storing persistent data, the Directory
,
which is a collection of named files written by an IndexOutput
and read by an IndexInput
.
Multiple implementations are provided, but FSDirectory
is generally
recommended as it tries to use operating system disk buffer caches efficiently.org.apache.lucene.util
contains a few handy data structures and util classes, ie FixedBitSet
and PriorityQueue
.Document
s by
adding
Field
s;IndexWriter
and add documents to it with addDocument()
;IndexSearcher
and pass the query to its search()
method.> java -cp lucene-core.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.IndexFiles -index index -docs rec.food.recipes/soups
adding rec.food.recipes/soups/abalone-chowder
[ ... ]> java -cp lucene-core.jar:lucene-demo.jar:lucene-queryparser.jar:lucene-analyzers-common.jar org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
1. rec.food.recipes/soups/spam-chowder
[ ... thirty-four documents contain the word "chowder" ... ]Query: "clam chowder" AND Manhattan
Searching for: +"clam chowder" +manhattan
2 total matching documents
1. rec.food.recipes/soups/clam-chowder
[ ... two documents contain the phrase "clam chowder" and the word "manhattan" ... ]
[ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.