Package com.inet.html

Class InetHtmlParser

java.lang.Object
com.inet.html.InetHtmlParser

public class InetHtmlParser extends Object
The HTML parser of JWebEngine. This parser can be configured by setting the configuration of the document the parser operates on. It is not recommended to call the parser directly since the modification of the document requires a write lock.
  • Field Details

    • TABLE_STRUCT_TAGS

      public static final Set<HTML.Tag> TABLE_STRUCT_TAGS
      A Set with all HTML.Tags which are used in a table structure
  • Constructor Details

    • InetHtmlParser

      public InetHtmlParser()
  • Method Details

    • setMaximumStackSize

      public static void setMaximumStackSize(int size)
      Sets the maximum HTML element stack size. If the parser encounters an element which is deeper in the DOM tree, the element itself will be ignored, only it's textual content and monolithic elements will be parsed. This option can be used to adjust the tradeof between accuracy and stability.
      Default: 200
      Parameters:
      size - the stack size, is ensured to be at least 2 to cover HTML and BODY element
      Since:
      1.07
    • parse

      public void parse(Reader in, InetHtmlDocument doc, AbstractDocument.BranchElement root, int insertOffset, com.inet.html.InetHtmlDocument.EventList eventList, boolean allowSpaces, Object sourceID) throws IOException, BadLocationException
      Parses the input with this parser and add the content to the document.
      Parameters:
      in - The reader to get the content from
      doc - The document, to insert the content
      root - The root element, where to insert the content, must not be null
      insertOffset - where to insert the content
      eventList - optional event list to place the events for the parsing process
      allowSpaces - if true, whitespaces are allowed as singular content (may be import for inserts)
      sourceID - an optional object to identify the source of the content
      Throws:
      IOException - thrown by the reader
      BadLocationException - thrown, if the offset is not within the documents content
      Since:
      1.05
    • isMonolithic

      protected static boolean isMonolithic(Element elem)
      Test if this element is a compact object that has no end tag like IMG. This means this element has no children and the content length 1.
      Parameters:
      elem - the current element
      Returns:
      true if the element is monolithic
      Since:
      1.05
    • getContentOffsets

      public int[] getContentOffsets()
      Returns the start and end offset of the relevant content of the recent parse operation
      Returns:
      the [start,end] offset or null, if there were no fragment borders
      Since:
      1.07