org.jsoup.parser.Parser

public class Parser
extends Object

Parses HTML into a Document. Generally best to use one of the more convenient parse methods in Jsoup.

Constructor Summary

Constructors

Constructor Description

Parser(org.jsoup.parser.TreeBuilder treeBuilder)
Create a new Parser, using the specified TreeBuilder

Method Summary

Modifier and Type	Method	Description
`ParseErrorList`	`getErrors()`	Retrieve the parse errors, if any, from the last parse.
`org.jsoup.parser.TreeBuilder`	`getTreeBuilder()`	Get the TreeBuilder currently in use.
`static Parser`	`htmlParser()`	Create a new HTML parser.
`boolean`	`isContentForTagData(String normalName)`	(An internal method, visible for Element.
`boolean`	`isTrackErrors()`	Check if parse error tracking is enabled.
`Parser`	`newInstance()`	Creates a new Parser as a deep copy of this; including initializing a new TreeBuilder.
`static Document`	`parse(String html, String baseUri)`	Parse HTML into a Document.
`static Document`	`parseBodyFragment(String bodyHtml, String baseUri)`	Parse a fragment of HTML into the `body` of a Document.
`static List<Node>`	`parseFragment(String fragmentHtml, Element context, String baseUri)`	Parse a fragment of HTML into a list of nodes.
`static List<Node>`	`parseFragment(String fragmentHtml, Element context, String baseUri, ParseErrorList errorList)`	Parse a fragment of HTML into a list of nodes.
`List<Node>`	`parseFragmentInput(String fragment, Element context, String baseUri)`
`Document`	`parseInput(Reader inputHtml, String baseUri)`
`Document`	`parseInput(String html, String baseUri)`
`static List<Node>`	`parseXmlFragment(String fragmentXml, String baseUri)`	Parse a fragment of XML into a list of nodes.
`ParseSettings`	`settings()`
`Parser`	`settings(ParseSettings settings)`
`Parser`	`setTrackErrors(int maxErrors)`	Enable or disable parse error tracking for the next parse.
`Parser`	`setTreeBuilder(org.jsoup.parser.TreeBuilder treeBuilder)`	Update the TreeBuilder used when parsing content.
`static String`	`unescapeEntities(String string, boolean inAttribute)`	Utility method to unescape HTML entities from a string
`static Parser`	`xmlParser()`	Create a new XML parser.

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- Parser
  
  public Parser(org.jsoup.parser.TreeBuilder treeBuilder)
  
  Create a new Parser, using the specified TreeBuilder
  
  Parameters:
  
  treeBuilder - TreeBuilder to use to parse input into Documents.
Method Details
- newInstance
  
  public Parser newInstance()
  
  Creates a new Parser as a deep copy of this; including initializing a new TreeBuilder. Allows independent (multi-threaded) use.
  
  Returns:
  
  a copied parser
- parseInput
  
  public Document parseInput(String html, String baseUri)
- parseInput
  
  public Document parseInput(Reader inputHtml, String baseUri)
- parseFragmentInput
  
  public List<Node> parseFragmentInput(String fragment, Element context, String baseUri)
- getTreeBuilder
  
  public org.jsoup.parser.TreeBuilder getTreeBuilder()
  
  Get the TreeBuilder currently in use.
  
  Returns:
  
  current TreeBuilder.
- setTreeBuilder
  
  public Parser setTreeBuilder(org.jsoup.parser.TreeBuilder treeBuilder)
  
  Update the TreeBuilder used when parsing content.
  
  Parameters:
  
  treeBuilder - current TreeBuilder
  
  Returns:
  
  this, for chaining
- isTrackErrors
  
  public boolean isTrackErrors()
  
  Check if parse error tracking is enabled.
  
  Returns:
  
  current track error state.
- setTrackErrors
  
  public Parser setTrackErrors(int maxErrors)
  
  Enable or disable parse error tracking for the next parse.
  
  Parameters:
  
  maxErrors - the maximum number of errors to track. Set to 0 to disable.
  
  Returns:
  
  this, for chaining
- getErrors
  
  public ParseErrorList getErrors()
  
  Retrieve the parse errors, if any, from the last parse.
  
  Returns:
  
  list of parse errors, up to the size of the maximum errors tracked.
- settings
  
  public Parser settings(ParseSettings settings)
- settings
  
  public ParseSettings settings()
- isContentForTagData
  
  public boolean isContentForTagData(String normalName)
  
  (An internal method, visible for Element. For HTML parse, signals that script and style text should be treated as Data Nodes).
- parse
  
  public static Document parse(String html, String baseUri)
  
  Parse HTML into a Document.
  
  Parameters:
  
  html - HTML to parse
  
  baseUri - base URI of document (i.e. original fetch location), for resolving relative URLs.
  
  Returns:
  
  parsed Document
- parseFragment
  
  public static List<Node> parseFragment(String fragmentHtml, Element context, String baseUri)
  
  Parse a fragment of HTML into a list of nodes. The context element, if supplied, supplies parsing context.
  
  Parameters:
  
  fragmentHtml - the fragment of HTML to parse
  
  context - (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation).
  
  baseUri - base URI of document (i.e. original fetch location), for resolving relative URLs.
  
  Returns:
  
  list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified.
- parseFragment
  
  public static List<Node> parseFragment(String fragmentHtml, Element context, String baseUri, ParseErrorList errorList)
  
  Parse a fragment of HTML into a list of nodes. The context element, if supplied, supplies parsing context.
  
  Parameters:
  
  fragmentHtml - the fragment of HTML to parse
  
  context - (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation).
  
  baseUri - base URI of document (i.e. original fetch location), for resolving relative URLs.
  
  errorList - list to add errors to
  
  Returns:
  
  list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified.
- parseXmlFragment
  
  public static List<Node> parseXmlFragment(String fragmentXml, String baseUri)
  
  Parse a fragment of XML into a list of nodes.
  
  Parameters:
  
  fragmentXml - the fragment of XML to parse
  
  baseUri - base URI of document (i.e. original fetch location), for resolving relative URLs.
  
  Returns:
  
  list of nodes parsed from the input XML.
- parseBodyFragment
  
  public static Document parseBodyFragment(String bodyHtml, String baseUri)
  
  Parse a fragment of HTML into the body of a Document.
  
  Parameters:
  
  bodyHtml - fragment of HTML
  
  baseUri - base URI of document (i.e. original fetch location), for resolving relative URLs.
  
  Returns:
  
  Document, with empty head, and HTML parsed into body
- unescapeEntities
  
  public static String unescapeEntities(String string, boolean inAttribute)
  
  Utility method to unescape HTML entities from a string
  
  Parameters:
  
  string - HTML escaped string
  
  inAttribute - if the string is to be escaped in strict mode (as attributes are)
  
  Returns:
  
  an unescaped string
- htmlParser
  
  public static Parser htmlParser()
  
  Create a new HTML parser. This parser treats input as HTML5, and enforces the creation of a normalised document, based on a knowledge of the semantics of the incoming tags.
  
  Returns:
  
  a new HTML parser.
- xmlParser
  
  public static Parser xmlParser()
  
  Create a new XML parser. This parser assumes no knowledge of the incoming tags and does not treat it as HTML, rather creates a simple tree directly from the input.
  
  Returns:
  
  a new simple XML parser.

Class Parser

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

Method Details