PDFBox data organization idea

In a word: data and operation are separated.
Class is only responsible for operating the dictionary to reduce the members that mainly play the role of POJO contained in the class.
I thought about it. When outputting in this way, the rules of data organization can avoid coupling with the structure of the class. However, due to the data organization completely through the dictionary, the structure of the dictionary itself is another rule that is not described by the hierarchical relationship of the class
Another article Notes The relationships of these categories are mentioned. All at org apache. pdfbox. Pdmodel.

If you are interested, you can continue to look down.


By initializing its member COSDocument document,
And the member's internal member cosdictionary tracker.
Organize data in the form of a dictionary tree (value contains a dictionary).

    public PDDocument() {
    public PDDocument(MemoryUsageSetting memUsageSetting) {
    	// Scratch draft
        ScratchFile scratchFile = null;
        try {
            scratchFile = new ScratchFile(memUsageSetting);
        catch (IOException ioe) {
            LOG.warn("Error initializing scratch file: " + ioe.getMessage() +
                     ". Fall back to main memory usage only.");
                scratchFile = new ScratchFile(MemoryUsageSetting.setupMainMemoryOnly());
            catch (IOException ioe2) {}
        document = new COSDocument(scratchFile);
        pdfSource = null;

        // First we need a trailer
        COSDictionary trailer = new COSDictionary();

        // Next we need the root dictionary.
        COSDictionary rootDictionary = new COSDictionary();
        trailer.setItem(COSName.ROOT, rootDictionary);
        rootDictionary.setItem(COSName.TYPE, COSName.CATALOG);
        rootDictionary.setItem(COSName.VERSION, COSName.getPDFName("1.4"));

        // next we need the pages tree structure
        COSDictionary pages = new COSDictionary();
        rootDictionary.setItem(COSName.PAGES, pages);
        pages.setItem(COSName.TYPE, COSName.PAGES);
        COSArray kidsArray = new COSArray();
        pages.setItem(COSName.KIDS, kidsArray);
        pages.setItem(COSName.COUNT, COSInteger.ZERO);


The construction and calling time of this class is in pddocument In getpddocumentcatalog():
1). If it is built by a PDDocument object, it is stored in the member object PDDocument document as a parent object. Create a COSDictionary object and register it with document Tracker as root.
2). If PDDocument document and COSDictionary root are passed, the root will not be rebuilt.
The code is as follows:

    public PDDocumentCatalog(PDDocument doc) {
        document = doc;
        root = new COSDictionary();
        root.setItem(COSName.TYPE, COSName.CATALOG);
        document.getDocument().getTrailer().setItem(COSName.ROOT, root);
    public PDDocumentCatalog(PDDocument doc, COSDictionary rootDictionary) {
        document = doc;
        root = rootDictionary;

PDDocumentCatalog is actually the data operation class of the non member variable COSDictionary root in PDDocument.


By analogy, PDPageTree is actually the data operation class of the nonmember variable COSDictionary pages in PDDocument. The same construction logic can be found from its constructor.


PDPage has a member COSDictionary page and loads all the things needed for a single page.
Of course, we still adhere to one rule. Class operation dictionary.

Tags: Java

Posted by gabriellevierneza on Wed, 11 May 2022 17:46:34 +0300