Package com.inet.html
Class InetHtmlParser
java.lang.Object
com.inet.html.InetHtmlParser
The HTML parser of JWebEngine. This parser can be configured by setting the configuration of the document
the parser operates on. It is not recommended to call the parser directly since the modification of the
document requires a write lock.
-
Field Summary
FieldsModifier and TypeFieldDescription -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionint[]Returns the start and end offset of the relevant content of the recent parse operationprotected static booleanisMonolithic(Element elem) Test if this element is a compact object that has no end tag like IMG.voidparse(Reader in, InetHtmlDocument doc, AbstractDocument.BranchElement root, int insertOffset, com.inet.html.InetHtmlDocument.EventList eventList, boolean allowSpaces, Object sourceID) Parses the input with this parser and add the content to the document.static voidsetMaximumStackSize(int size) Sets the maximum HTML element stack size.
-
Field Details
-
TABLE_STRUCT_TAGS
-
-
Constructor Details
-
InetHtmlParser
public InetHtmlParser()
-
-
Method Details
-
setMaximumStackSize
public static void setMaximumStackSize(int size) Sets the maximum HTML element stack size. If the parser encounters an element which is deeper in the DOM tree, the element itself will be ignored, only it's textual content and monolithic elements will be parsed. This option can be used to adjust the tradeof between accuracy and stability.
Default: 200- Parameters:
size- the stack size, is ensured to be at least 2 to cover HTML and BODY element- Since:
- 1.07
-
parse
public void parse(Reader in, InetHtmlDocument doc, AbstractDocument.BranchElement root, int insertOffset, com.inet.html.InetHtmlDocument.EventList eventList, boolean allowSpaces, Object sourceID) throws IOException, BadLocationException Parses the input with this parser and add the content to the document.- Parameters:
in- The reader to get the content fromdoc- The document, to insert the contentroot- The root element, where to insert the content, must not be nullinsertOffset- where to insert the contenteventList- optional event list to place the events for the parsing processallowSpaces- if true, whitespaces are allowed as singular content (may be import for inserts)sourceID- an optional object to identify the source of the content- Throws:
IOException- thrown by the readerBadLocationException- thrown, if the offset is not within the documents content- Since:
- 1.05
-
isMonolithic
Test if this element is a compact object that has no end tag like IMG. This means this element has no children and the content length 1.- Parameters:
elem- the current element- Returns:
- true if the element is monolithic
- Since:
- 1.05
-
getContentOffsets
public int[] getContentOffsets()Returns the start and end offset of the relevant content of the recent parse operation- Returns:
- the [start,end] offset or
null, if there were no fragment borders - Since:
- 1.07
-