Conversion of word documents to PDF and various document formats

Conversion of word documents to PDF and various document formats

Project address: https://gitee.com/Jakewabc/word-of-pdf.git

Relevant cases:

https://github.com/aspose-words/Aspose.Words-for-Java.git

https://github.com/aspose-pdf/Aspose.PDF-for-Java.git

It's very slow on GitHub. It is recommended to create a project on gitee. Click create project. There is an import existing project below. By giving him the address on GitHub, he can clone a project and pull it quickly. But these two projects have addresses on gitee.

1, Get the jar and install it into the maven local repository

1.1 required jar s

jar to use

1.2. Install jar

aspose-pdf-18.9.jar is best installed outside the local warehouse, because it is located in Aspose PDF for Java 18.9 perfect cracked version, no watermark, no use time limit, there are spaces in the directory, and can not be installed in the local warehouse.

Install jar to local warehouse, refer to: https://blog.csdn.net/u010393325/article/details/84314543

  • command
mvn install:install-file -DgroupId=groupId Referenced name -DartifactId=aspose-artifactId Referenced name -Dversion=Version number -Dpackaging=jar -Dfile=jar Address of

The version number is written at the end of the jar.

  • Commands I installed

You need to change the directory where the jar is located.

mvn install:install-file -DgroupId=com.aspose -DartifactId=aspose-words -Dversion=19.5 -Dpackaging=jar -Dfile=E:\Temporary documents\aspose-words-19.5jdk.jar

mvn install:install-file -DgroupId=com.aspose -DartifactId=aspose-pdf -Dversion=18.9 -Dpackaging=jar -Dfile=E:\Temporary documents\aspose-pdf-18.9.jar

1.3. Introducing jar into maven project

<dependency>
    <groupId>com.aspose</groupId>
    <artifactId>aspose-pdf</artifactId>
    <version>18.9</version>
</dependency>
<dependency>
    <groupId>com.aspose</groupId>
    <artifactId>aspose-words</artifactId>
    <version>19.5</version>
</dependency>

2, Convert word to PDF

word doc and docx format can be converted.

  • All codes

The purpose of pasting all codes is to display the package.

package com.word.pdf;

import com.aspose.words.*;

/**
 * @author: stars
 * @date: 2020 July 30, 2013 13:58
 **/
public class TestDemo {

    public static void main(String[] args) throws Exception {
        String inputPath = "C:\\Users\\stars\\Desktop\\powerdesigner use.doc" ;
        String outPath = "C:\\Users\\stars\\Desktop";
        String fileName = "powerdesigner use";
        setShowInBalloons(inputPath,outPath,fileName);
    }
    /**
     * Convert word document to PDF document
     *
     * @param inputPath word Address of the document
     * @param outPath Address to generate pdf output
     * @param fileName The name of the generated PDF < b > note: the name should not be suffixed. It is automatically encapsulated below</ b>
     * @throws Exception
     */
    private static void setShowInBalloons(String inputPath,String outPath,String fileName) throws Exception {
        Document doc = new Document(inputPath);
        // Gets the RevisionOptions object that controls the appearance of the revision
        RevisionOptions revisionOptions = doc.getLayoutOptions().getRevisionOptions();
        // Show deleted revisions in balloons
        revisionOptions.setShowInBalloons(ShowInBalloons.FORMAT_AND_DELETE);
        // output
        doc.save(outPath + "\\" + fileName+ ".pdf");
    }
}

The effect will not be displayed. Anyway, it can be converted.

3, word file append

import com.aspose.words.Document;
import com.aspose.words.ImportFormatMode;
import com.aspose.words.RevisionOptions;
import com.aspose.words.ShowInBalloons;
/**
 * word files were added
 */
public static void AppendDocuments(){
    // File path
    String dataDir = "C:\\Users\\stars\\Desktop\\" ;
    // source file
    Document dstDoc = new Document(dataDir + "powerdesigner use.doc");
    // Files to be appended (append the srcDoc file after dstDoc)
    Document srcDoc = new Document(dataDir + "ZS-1000 Quality manual.docx");
    // Append the source document to the destination document while keeping the original formatting of the source document.
    dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    // Appended file
    dstDoc.save(dataDir + "TestFile Out.docx");
    //ExEnd:

    System.out.println("Documents appended successfully.");
}

4, word document replacement

import com.aspose.words.*;

import java.util.regex.Pattern;
/**
 * Find and replace
 * @param replaceMark Replace indicator
 * @param replaceAfter Replaced data
 * @throws Exception
 */
public static void FindAndReplace(String replaceMark,String replaceAfter) throws Exception {
    // File path
    String dataDir = "C:\\Users\\stars\\Desktop\\" ;
    // source file
    Document doc = new Document(dataDir + "powerdesigner use.doc");
    // Check the text of the document
    System.out.println("Original document text: " + doc.getRange().getText());

    Pattern regex = Pattern.compile(replaceMark, Pattern.CASE_INSENSITIVE);
    // Replace the text in the document.
    doc.getRange().replace(regex, replaceAfter, new FindReplaceOptions());
    // Check the replacement was made.
    System.out.println("Document text after replace: " + doc.getRange().getText());
    // Replaced file
    doc.save(dataDir + "ReplaceSimpleOut.doc");
}

Replacement should be noted

If the substitutions are name and namea. If the name substitution character is executed first, the name will also be replaced in the place of namea, followed by A. But this a has no place to replace, so it is saved. This will cause errors in replacement, so it is recommended to use UUID to remove the horizontal line or generate distributed id as the replacement character.

5, PDF gets the specified page or total number of pages

Some pages in this way cannot be obtained. It is recommended to use another jar package, which will be explained in detail in Chapter 6.

import com.aspose.pdf.Document;
import com.aspose.pdf.Page;

import java.io.IOException;
/**
 * Get PDF specified page
 * <p>If you get all PDF s, you can use pdfDocument directly. If you get the specified number of pages, you use newDocument. If all pdfdocuments are returned save(response.getOutputStream());
 * If the specified page is returned, newdocument save(response.getOutputStream());
 * </p>
 * @param page Number of pages obtained
 * @return pdf PageCount 
 */
public static Integer GetPageCountWithoutSavingPDF(Integer page) throws IOException {
    // Open a document
    Document pdfDocument = new Document("C:\\Users\\stars\\Desktop\\a.pdf");
    // Get the specified page
    Page pdfPage = pdfDocument.getPages().getUnrestricted(page);
    // Create a Document object and write the specified page to
    Document newDocument = new Document();
    // Add the page to the Pages collection of new document object
    newDocument.getPages().add(pdfPage);
    // Save the new file
    newDocument.save("C:\\Users\\stars\\Desktop\\page_" + pdfPage.getNumber() + ".pdf");
    // Return total pages
    return pdfDocument.getPages().size();
}

6, Specify a page to get a PDF document

In this way mentioned in the fifth chapter, some pages will not get much, so change a jar package to get it.

6.1. Introducing dependency

<dependency>
    <groupId>net.sf.cssbox</groupId>
    <artifactId>pdf2dom</artifactId>
    <version>1.7</version>
</dependency>

If you only get the PDF document of the specified page, this jar package is enough, but there are two jar packages. These are all open source. You don't need to prepare specially. Just pull them directly. The following two specific uses can be referred to by yourself.

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.12</version>
</dependency>
<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox-tools</artifactId>
    <version>2.0.12</version>
</dependency>

6.2. Direct paste code

import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.springframework.web.bind.annotation.RestController;

import javax.servlet.http.HttpServletResponse;
import java.io.File;
import java.io.IOException;
import java.util.List;
/**
 *
 * @param fileName File name
 * @param pageNum Number of pages read
 * @param filePath File path
 * @return
 */
public void splitPdf(HttpServletResponse response, int pageNum, String filePath) {
    // This is the corresponding file name
    File indexFile = new File(filePath);
    PDDocument document = null;
    try {
        document = PDDocument.load(indexFile);
        // Get total pages
        int numberOfPages = document.getNumberOfPages();
        System.out.println("The total number of pages tells you:"+numberOfPages);

        Splitter splitter = new Splitter();
        // Start page
        splitter.setStartPage(pageNum);
        // trailer page
        splitter.setEndPage(pageNum);
        List<PDDocument> pages = splitter.split(document);
        for (PDDocument pdDocument : pages){
            pdDocument.save(response.getOutputStream());
            pdDocument.close();
        }
        document.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

7, Convert PDF to word

If it is a large document, we will not give it all when rendering the front end, but give it page by page at a time. So what if we are a large document template. Save this large document in PDF format and get it page by page at a time, because PDF cannot be changed and belongs to the final text. Then we can convert PDF to word and replace it with the final document, and then convert word to PDF for front-end display.

7.1 advantages and disadvantages of pdf and word

pdf

  • advantage

pdf supports reading the specified page, and the front page supports direct rendering of text in pdf format. Therefore, you don't need to read all large documents, just read the specified page.

  • shortcoming

pdf does not support changes

word

  • advantage

Support change and replacement

  • shortcoming

It is not supported to obtain the document of the specified page, and the front end does not support display.

7.2. pdf to doc

This conversion method only supports 4 pages

import com.aspose.pdf.*;

import java.io.IOException;
/**
 * pdf Convert to doc
 */
public static void savingToDoc(){
    // Read pdf document
    Document pdfDocument = new Document("C:\\Users\\stars\\Desktop\\abc.pdf");
    // Convert to doc document
    pdfDocument.save("C:\\Users\\stars\\Desktop\\TableHeightIssue.doc", SaveFormat.Doc);
}

7.3. pdf to docx

This method only supports converting 4 pages

This method can also be converted to doc. Just change the file suffix.

import com.aspose.pdf.*;

import java.io.IOException;
/**
 * pdf Convert to docx
 */
public static void savingToDOCX() {
    // Load source PDF file
    Document doc = new Document("C:\\Users\\stars\\Desktop\\abc.pdf");
    // Instantiate Doc SaveOptions instance
    DocSaveOptions saveOptions = new DocSaveOptions();
    // Set output file format as DOCX
    saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
    // Save resultant DOCX file
    doc.save("C:\\Users\\stars\\Desktop\\TableHeightIssue.docx", saveOptions);
}

8, PDF to PPT

import com.aspose.pdf.*;

import java.io.IOException;
/**
 * Convert PDF to ppt
 */
public static void ConvertPDFToPPTX(){
    // Load PDF document
    Document doc =new Document("C:\\Users\\stars\\Desktop\\t.pdf");
    // Instantiate PptxSaveOptions instance
    PptxSaveOptions pptx_save = new PptxSaveOptions();
    // Save the output in PPTX format
    doc.save("C:\\Users\\stars\\Desktop\\output.pptx", pptx_save);
}

9, Convert PDF to SVG image

import com.aspose.pdf.*;

import java.io.IOException;
/**
 * Convert pdf to SVG image
 */
public static void ConvertPDFToSVGFormat(){
    // load PDF document
    Document doc = new Document("C:\\Users\\stars\\Desktop\\t.pdf");
    // instantiate an object of SvgSaveOptions
    SvgSaveOptions saveOptions = new SvgSaveOptions();
    // do not compress SVG image to Zip archive
    saveOptions.CompressOutputToZipArchive = false;
    // resultant file name
    String outFileName ="C:\\Users\\stars\\Desktop\\Output.svg";
    // save the output in SVG files
    doc.save(outFileName, saveOptions);
}

10, Convert SVG to PDF

import com.aspose.pdf.*;

import java.io.IOException;
/**
 * Convert SVG to PDF
 */
public static void ConvertSVGFileToPDFFormat(){
    String file = "C:\\Users\\stars\\Desktop\\Output.svg";
    // Instantiate LoadOption object using SVG load option
    LoadOptions options = new SvgLoadOptions();
    // Create Document object
    Document document = new Document(file, options);
    // Save the resultant PDF document
    document.save("C:\\Users\\stars\\Desktop\\Result.pdf");
}

11, Convert PDF to HTML

Show large document template ideas

After large text display, large documents can be stored in PDF format. Get the document template of the specified page of PDF, and replace it with a real document when it is converted to HTML, so it can be displayed at the front end.

11.1 mode I

The effect is OK

import com.aspose.pdf.*;
import com.aspose.pdf.text.CustomFontSubstitutionBase;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
/**
 * pdf Convert to HTML format
 */
public static void EscapeHTMLTagsAndSpecialCharacters(){
    // Load existing PDf file
    Document pdfDoc = new Document("C:\\Users\\stars\\Desktop\\abc.pdf");
    final Map names = new HashMap();
    /*pdfDoc.FontSubstitution.add(new Document.FontSubstitutionHandler() {
        public void invoke(Font font, Font newFont) {
            // add substituted FontNames into map.
            names.put(font.getFontName(), newFont.getFontName());
            // or print the message into console
            System.out.println("Warning: Font " + font.getFontName() + " was substituted with another font -> " + newFont.getFontName());
        }
    });*/
    // instantiate HTMLSave option to save output in HTML
    HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
    // save resultant file
    pdfDoc.save("C:\\Users\\stars\\Desktop\\output.html", htmlSaveOps);
}

11.2 mode II

The effect of this conversion is not good

import com.aspose.pdf.*;
import com.aspose.pdf.text.CustomFontSubstitutionBase;

import java.io.IOException;
/**
 * pdf Convert to HTML
 * @throws Exception
 */
public static void DefaultFontWhenSpecificFontMissing() throws Exception {
    String myDir = "C:\\Users\\stars\\Desktop\\";
    Document pdf = new Document(myDir + "t.pdf");
    // configure font substitution
    CustomSubst1 subst1 = new CustomSubst1();
    FontRepository.getSubstitutions().add(subst1);
    // Configure notifier to console
    pdf.FontSubstitution.add(new Document.FontSubstitutionHandler() {
        public void invoke(Font font, Font newFont) {
            // print substituted FontNames into console
            System.out.println("Warning: Font " + font.getFontName() + " was substituted with another font -> " + newFont.getFontName());
        }
    });
    HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
    pdf.save(myDir + "Redis_1150_substitutedWithMSGothic_release.html", htmlSaveOps);
}
private static class CustomSubst1 extends CustomFontSubstitutionBase {
    public boolean trySubstitute(OriginalFontSpecification originalFontSpecification, /* out */com.aspose.pdf.Font[] substitutionFont) {
        substitutionFont[0] = FontRepository.findFont("MSGothic");
        return true;
    }
}

12, PDF to emf file

Only one page can be converted

import com.aspose.pdf.*;
import com.aspose.pdf.devices.EmfDevice;
import com.aspose.pdf.devices.Resolution;
import com.aspose.pdf.text.CustomFontSubstitutionBase;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
/**
 * pdf Only one page can be converted to emf file
 */
public static void PDFToEMF(){
    // instantiate EmfDevice object
    EmfDevice device = new EmfDevice(new Resolution(96));
    // load existing PDF file
    Document doc = new Document("C:\\Users\\stars\\Desktop\\abc.pdf");
    // save first page of PDF file as Emf image
    device.process(doc.getPages().get_Item(1), "C:\\Users\\stars\\Desktop\\output.emf");
}

13, Add tables to PDF documents

import com.aspose.pdf.*;
import com.aspose.pdf.devices.EmfDevice;
import com.aspose.pdf.devices.Resolution;
import com.aspose.pdf.text.CustomFontSubstitutionBase;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
/**
 * Add data to an existing PDF document
 * <p>Add table to specified page</p>
 */
public static void addTableInExistingPDFDocument(){
    // Load source PDF document
    Document doc =  new Document("C:\\Users\\stars\\Desktop\\abc.pdf");
    // Initializes a new instance of the table
    Table table = new Table();
    // Set the table border color to light gray
    table.setBorder(new BorderInfo(BorderSide.All, .5f, Color.getLightGray()));
    // Sets the border of a table cell
    table.setDefaultCellBorder(new BorderInfo(BorderSide.All, .5f, Color.getLightGray()));

    // Add three columns and ten rows of data
    for (int row_count = 1; row_count < 10; row_count++) {
        // Add row to table
        Row row = table.getRows().add();
        // Add table row data
        row.getCells().add("Column (" + row_count + ", 1)");
        row.getCells().add("Column (" + row_count + ", 2)");
        row.getCells().add("Column (" + row_count + ", 3)");
    }
    // Add the table on the second page of the PDF document
    doc.getPages().getUnrestricted(2).getParagraphs().add(table);
    doc.save( "C:\\Users\\stars\\Desktop\\Annotation_output.pdf");
}

14, Create PDF documents and add tables

import com.aspose.pdf.*;
import com.aspose.pdf.devices.EmfDevice;
import com.aspose.pdf.devices.Resolution;
import com.aspose.pdf.text.CustomFontSubstitutionBase;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
/**
 * Create PDF documents and add tables
 */
public static void setAutoFitToWindowPropertyInColumnAdjustmentTypeEnumeration(){
    //Instantiate the PDF object by calling its empty constructor
    Document doc = new Document();
    //Create the section in the PDF object
    Page page = doc.getPages().add();

    //Instantiate a table object
    Table tab = new Table();
    //Add the table in paragraphs collection of the desired section
    page.getParagraphs().add(tab);

    //Set with column widths of the table
    tab.setColumnWidths("50 50 50");
    tab.setColumnAdjustment(ColumnAdjustment.AutoFitToWindow);

    //Set default cell border using BorderInfo object
    tab.setDefaultCellBorder(new com.aspose.pdf.BorderInfo(com.aspose.pdf.BorderSide.All, 0.1F));

    //Set table border using another customized BorderInfo object
    tab.setBorder(new com.aspose.pdf.BorderInfo(com.aspose.pdf.BorderSide.All, 1F));
    //Create MarginInfo object and set its left, bottom, right and top margins
    com.aspose.pdf.MarginInfo margin = new com.aspose.pdf.MarginInfo();
    margin.setTop(5f);
    margin.setLeft(5f);
    margin.setRight(5f);
    margin.setBottom(5f);

    //Set the default cell padding to the MarginInfo object
    tab.setDefaultCellPadding(margin);

    //Create rows in the table and then cells in the rows
    com.aspose.pdf.Row row1 = tab.getRows().add();
    row1.getCells().add("col1");
    row1.getCells().add("col2");
    row1.getCells().add("col3");
    com.aspose.pdf.Row row2 = tab.getRows().add();
    row2.getCells().add("item1");
    row2.getCells().add("item2");
    row2.getCells().add("item3");

    //Save the PDF
    doc.save( "C:\\Users\\stars\\Desktop\\Annotation_output.pdf");
}

15, Add PDF and add data

import com.aspose.pdf.*;
import com.aspose.pdf.devices.EmfDevice;
import com.aspose.pdf.devices.Resolution;
import com.aspose.pdf.text.CustomFontSubstitutionBase;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
/**
 * New PDF add data
 */
public static void ForceTableRenderingOnNewPage(){
    // Added document
    Document doc = new Document();
    PageInfo pageInfo = doc.getPageInfo();
    MarginInfo marginInfo = pageInfo.getMargin();
    marginInfo.setLeft(37);
    marginInfo.setRight(37);
    marginInfo.setTop(37);
    marginInfo.setBottom(37);
    pageInfo.setLandscape(true);
    Table table = new Table();
    table.setColumnWidths("50 100");
    // Added page.
    Page curPage = doc.getPages().add();
    for (int i = 1; i <= 120; i++) {
        Row row = table.getRows().add();
        row.setFixedRowHeight(15);
        Cell cell1 = row.getCells().add();
        cell1.getParagraphs().add(new TextFragment("Content 1"));
        Cell cell2 = row.getCells().add();
        cell2.getParagraphs().add(new TextFragment("HHHHH"));
    }
    Paragraphs paragraphs = curPage.getParagraphs();
    paragraphs.add(table);
    /********************************************/
    Table table1 = new Table();
    table.setColumnWidths("100 100");
    for (int i = 1; i <= 10; i++) {
        Row row = table1.getRows().add();
        Cell cell1 = row.getCells().add();
        cell1.getParagraphs().add(new TextFragment("LAAAAAAA"));
        Cell cell2 = row.getCells().add();
        cell2.getParagraphs().add(new TextFragment("LAAGGGGGG"));
    }
    table1.setInNewPage(true);
    // I want to keep table 1 to next page please...
    paragraphs.add(table1);
    //Save the PDF
    doc.save( "C:\\Users\\stars\\Desktop\\Annotation_output.pdf");
}

16, PDF replace specified content

It's better to replace one character with one character. If you replace many characters with one character. Then some content will not be displayed, because other text will not make room. Suggestion: convert PDF to word document, and then convert word to PDF document after replacement.

pdf replacement only supports the first three pages

import com.aspose.pdf.*;
import com.aspose.pdf.devices.EmfDevice;
import com.aspose.pdf.devices.Resolution;
import com.aspose.pdf.text.CustomFontSubstitutionBase;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
/**
 * replace text
 * <b>If a text replaces a text, it's OK, because other formats are unchanged. Only the first three pages can be replaced</b>
 * @param BeforeReplacement Before replacement
 * @param AfterReplacement After replacement
 */
public static void replaceTextOnAllPages(String BeforeReplacement,String AfterReplacement){
    // Text to replace
    Document doc = new Document("C:\\Users\\stars\\Desktop\\abc.pdf");
    // Characters to be replaced
    TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(BeforeReplacement);
    // Accept the absorber for first page of document
    doc.getPages().accept(textFragmentAbsorber);
    // Get the extracted text fragments into collection
    TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
    // Loop through the fragments
    for (TextFragment textFragment : (Iterable<TextFragment>) textFragmentCollection) {
        // Update text and other properties
        textFragment.setText(AfterReplacement);
        // font size
        textFragment.getTextState().setFontSize(10);
        textFragment.getTextState().setForegroundColor(Color.getBlack());
        // The background color is now white
        textFragment.getTextState().setBackgroundColor(Color.getWhite());
    }
    // Replaced text
    doc.save( "C:\\Users\\stars\\Desktop\\Annotation_output.pdf");
}

Posted by NickTyson on Wed, 25 May 2022 16:14:20 +0300