Class HtmlConverter

java.lang.Object
com.inet.editor.HtmlConverter

public class HtmlConverter extends Object
Utils class to convert text/plain to HTML and back
  • Constructor Details

    • HtmlConverter

      public HtmlConverter()
  • Method Details

    • text2html

      public static String text2html(String plainText, Font font)
      Converts a text/plain string into HTML. The HTML body content will always begin with an opening P tag. Therefore it is recommended to use this method to convert text blocks.
      Parameters:
      plainText - the text to be converted
      font - the base font for the HTML content, may be null for none
      Returns:
      the HTML formatted content
    • text2html

      public static String text2html(String plainText, Font font, boolean startWithP)
      Converts a text/plain string into HTML.
      Parameters:
      plainText - the text to be converted
      font - the base font for the HTML content, may be null for none
      startWithP - set true, to start the content in an P element, false to start the content as inline text
      Returns:
      the HTML formatted content
    • text2html

      public static String text2html(String plainText, Font font, boolean startWithP, String defaultClass)
      Converts a text/plain string into HTML.
      Parameters:
      plainText - the text to be converted
      font - the base font for the HTML content, may be null for none
      startWithP - set true, to start the content in an P element, false to start the content as inline text
      defaultClass - the default class attribute value for all generated elements. This class can be used to set a separate CSS style for all generated elements.
      Returns:
      the HTML formatted content
    • html2text

      public static String html2text(String htmlText)
      Converts HTML to plain text. The conversion will replace BRs as well as block level elements by line breaks. HR will replaced as well. Tables are block level elements and will be handled as such. There is no distinct table conversion.
      Parameters:
      htmlText - the text to convert, a null value will return an empty string
      Returns:
      the converted content, never null
    • html2text

      public static HtmlConverter.ConvertResult html2text(String htmlText, int maxLength)
      Converts HTML to plain text. The conversion will replace BRs as well as block level elements by line breaks. HR will replaced as well. Tables are block level elements and will be handled as such. There is no distinct table conversion.
      Parameters:
      htmlText - the text to convert, a null value will return an empty string
      maxLength - the maximum length of the output string, any further content will be discarded
      Returns:
      the conversion result containing the text content and a flag whether mayLength was reached
    • html2inlinedHtml

      @Nonnull public static @Nonnull String html2inlinedHtml(String htmlText)
      Converts HTML content with CSS references or global CSS definitions to a HTML with all styles defined inline(within the styles attributes of the elements). The content off all images will be converted to inline images as well.
      Parameters:
      htmlText - html coded content
      Returns:
      html coded content with all styles inline
    • html2inlinedHtml

      @Nonnull public static @Nonnull String html2inlinedHtml(String htmlText, boolean inlineImages)
      Converts HTML content with CSS references or global CSS definitions to a HTML with all styles defined inline(within the styles attributes of the elements)
      Parameters:
      htmlText - html coded content
      inlineImages - if true, the content of the images will be converted to inlined data as well, eliminating all external references from this document.
      Returns:
      html coded content with all styles inline
    • html2inlinedHtml

      @Nonnull public static @Nonnull String html2inlinedHtml(String htmlText, boolean inlineImages, URL baseURL)
      Converts HTML content with CSS references or global CSS definitions to a HTML with all styles defined inline(within the styles attributes of the elements)
      Parameters:
      htmlText - html coded content
      inlineImages - if true, the content of the images will be converted to inlined data as well, eliminating all external references from this document.
      baseURL - the base url of the html content. Required to resolve relative URIs within the document. This parameter should be set if inlineImages is set to true!
      Returns:
      html coded content with all styles inline
    • html2inlinedHtml

      @Nonnull public static @Nonnull String html2inlinedHtml(String htmlText, boolean inlineImages, boolean contentOnly, URL baseURL)
      Converts HTML content with CSS references or global CSS definitions to a HTML with all styles defined inline(within the styles attributes of the elements)
      Parameters:
      htmlText - html coded content
      inlineImages - if true, the content of the images will be converted to inlined data as well, eliminating all external references from this document.
      contentOnly - if true, only the content of the body will be returned. This is recommended if the original content will be inserted into another document. In case of false, a valid HTML document will be returned
      baseURL - the base url of the html content. Required to resolve relative URIs within the document. This parameter should be set if inlineImages is set to true!
      Returns:
      html coded content with all styles inline
    • html2inlinedHtml

      @Nonnull public static @Nonnull String html2inlinedHtml(String htmlText, boolean inlineImages, boolean contentOnly, boolean compact, URL baseURL)
      Converts HTML content with CSS references or global CSS definitions to a HTML with all styles defined inline(within the styles attributes of the elements)
      Parameters:
      htmlText - html coded content
      inlineImages - if true, the content of the images will be converted to inlined data as well, eliminating all external references from this document.
      contentOnly - if true, only the content of the body will be returned. This is recommended if the original content will be inserted into another document. In case of false, a valid HTML document will be returned
      compact - if true, the output will have not indents and fill spaces. Does not affect the rendering, only reduces the output size
      baseURL - the base url of the html content. Required to resolve relative URIs within the document. This parameter should be set if inlineImages is set to true!
      Returns:
      html coded content with all styles inline
    • getCompactHtmlText

      @Nonnull public static @Nonnull String getCompactHtmlText(String htmlText, Map<String,String> imageMap)
      Parses a HTML string and removes anything which does not influence the visual appearance of the content. All IMG tags with SRC-link will be replaced using the imageMap
      Parameters:
      htmlText - the html content to be converted
      imageMap - a map to replaces image source links
      Returns:
      the compacted content or the original one in case of an error
    • getCompactHtmlText

      @Nonnull public static @Nonnull String getCompactHtmlText(String htmlText, Map<String,String> imageMap, Map<HTML.Tag,Boolean> tagWritingOptions)
      Parses a HTML string and removes anything which does not influence the visual appearance of the content. All IMG tags with SRC-link will be replaced using the imageMap
      Parameters:
      htmlText - the html content to be converted
      imageMap - a map to replaces image source links
      tagWritingOptions - map with special writing settings
      Returns:
      the compacted content or the original one in case of an error
    • getCompactHtmlText

      @Nonnull public static @Nonnull String getCompactHtmlText(String htmlText, Map<String,String> imageMap, Map<HTML.Tag,Boolean> tagWritingOptions, boolean trustedImagePath)
      Parses a HTML string and removes anything which does not influence the visual appearance of the content. All IMG tags with SRC-link will be replaced using the imageMap
      Parameters:
      htmlText - the html content to be converted
      imageMap - a map to replaces image source links
      tagWritingOptions - map with special writing settings
      trustedImagePath - indicates whether the image paths in the image map are already encoded; So if true the paths will not be URL path encoded. This may lead to a corrupted file in case the paths are not properly encoded. When in doubt, set to false or don't use this method.
      Returns:
      the compacted content or the original one in case of an error
    • getCompactHtmlText

      @Nonnull public static @Nonnull String getCompactHtmlText(String htmlText, Map<String,String> imageMap, Map<String,String> hrefMap, Map<HTML.Tag,Boolean> tagWritingOptions, boolean trustedImagePath)
      Parses a HTML string and removes anything which does not influence the visual appearance of the content. All IMG tags with SRC-link will be replaced using the imageMap
      Parameters:
      htmlText - the html content to be converted
      imageMap - a map to replaces image source links
      hrefMap - a map to replace a-href links
      tagWritingOptions - map with special writing settings
      trustedImagePath - indicates whether the image paths in the image map are already encoded; So if true the paths will not be URL path encoded. This may lead to a corrupted file in case the paths are not properly encoded. When in doubt, set to false or don't use this method.
      Returns:
      the compacted content or the original one in case of an error
      Since:
      1.12
    • getCompactHtmlText

      public static String getCompactHtmlText(InetHtmlDocument doc, Map<String,String> imageMap) throws BadLocationException
      Writes the content of a document and removes anything which does not influence the visual appearance of the content. All IMG tags with SRC-link will be replaced using the imageMap
      Parameters:
      doc - the document content to be converted, must not be null
      imageMap - a map to replaces image source links
      Returns:
      the compacted content or the original one in case of an error
      Throws:
      BadLocationException - thrown in case of a corrupt model
    • getCompactHtmlText

      public static String getCompactHtmlText(InetHtmlDocument doc, Map<String,String> imageMap, Map<HTML.Tag,Boolean> tagWritingOptions) throws BadLocationException
      Writes the content of a document and removes anything which does not influence the visual appearance of the content. All IMG tags with SRC-link will be replaced using the imageMap
      Parameters:
      doc - the document content to be converted, must not be null
      imageMap - a map to replaces image source links
      tagWritingOptions - map with special writing settings
      Returns:
      the compacted content or the original one in case of an error
      Throws:
      BadLocationException - thrown in case of a corrupt model
    • getCompactHtmlText

      public static String getCompactHtmlText(InetHtmlDocument doc, Map<String,String> imageMap, Map<HTML.Tag,Boolean> tagWritingOptions, boolean trustedImagePath) throws BadLocationException
      Writes the content of a document and removes anything which does not influence the visual appearance of the content. All IMG tags with SRC-link will be replaced using the imageMap
      Parameters:
      doc - the document content to be converted, must not be null
      imageMap - a map to replaces image source links
      tagWritingOptions - map with special writing settings
      trustedImagePath - indicates whether the image paths in the image map are already encoded; So if true the paths will not be URL path encoded. This may lead to a corrupted file in case the paths are not properly encoded. When in doubt, set to false or don't use this method.
      Returns:
      the compacted content or the original one in case of an error
      Throws:
      BadLocationException - thrown in case of a corrupt model
    • getCompactHtmlText

      @Nonnull public static @Nonnull String getCompactHtmlText(InetHtmlDocument doc, Map<String,String> imageMap, Map<String,String> hrefMap, Map<HTML.Tag,Boolean> tagWritingOptions, boolean trustedImagePath) throws BadLocationException
      Writes the content of a document and removes anything which does not influence the visual appearance of the content. All IMG tags with SRC-link will be replaced using the imageMap
      Parameters:
      doc - the document content to be converted, must not be null
      imageMap - a map to replaces image source links
      hrefMap - map to replace a-href links
      tagWritingOptions - map with special writing settings
      trustedImagePath - indicates whether the image paths in the image map are already encoded; So if true the paths will not be URL path encoded. This may lead to a corrupted file in case the paths are not properly encoded. When in doubt, set to false or don't use this method.
      Returns:
      the compacted content or the original one in case of an error
      Throws:
      BadLocationException - thrown in case of a corrupt model
      Since:
      1.12
    • getInlinedHtml

      public static String getInlinedHtml(String htmlText)
      Returns the inlined html from the text.
      NOTE: It's an inline of the content without HTML or SPAN container elements so the style content is not encapsulated and may be affected by the target context.
      Parameters:
      htmlText - the html text
      Returns:
      the inlined html text