This filter option toggles how the layout detection of i-net PDFC will react to PDF files. By default, i-net PDFC will try to detect the layout of the document pages to some extent. With this option you can modify how i-net PDFC retrieves the layout information.
The layout is detected by the filters of i-net PDFC. All layout filters will be applied.
The content will be compared in the order it was printed to the PDF document. This approach assumes that the print order reflects the reading order. If can yield better results for very complex layouts.
This option advises i-PDFC to use the optional meta information about the structure of the document. Usually this includes information about for instance paragraphs, tables and figures. If the structure data is present and accurate it will be used to improve the result. In case it is not present, the original PDF text order will be used.
A more detailed explanation can be found on the page of this parser extension.
With this option, the PDF parser will drop the mapping from character numbers to readable text. This often solves issues with intentionally obfuscated PDF files, which don't have this mapping in the first place. On the downside, it may void the readability of the differences messages and won't work if the CMAP of both documents is different. So it's not a general solution, but often works for PDFs generated by the same application.
This option can also be combined with the "Text recovery by OCR" filter plugin. This filter uses optical recognition to restore readable text. By default, this recognition is only performed for fonts that do not have a character mapping table. With the "Disable CMAP" option, however, the recognition is performed for all fonts in the document.