Conversion of word documents to PDF and various document formats
Project address: https://gitee.com/Jakewabc/word-of-pdf.git
Relevant cases:
https://github.com/aspose-words/Aspose.Words-for-Java.git
https://github.com/aspose-pdf/Aspose.PDF-for-Java.git
It's very slow on GitHub. It is recommended to create a project on gitee. Click create project. There is an import existing project below. By giving him the address on GitHub, he can clone a project and pull it quickly. But these two projects have addresses on gitee.
1, Get the jar and install it into the maven local repository
1.1 required jar s
jar to use
1.2. Install jar
aspose-pdf-18.9.jar is best installed outside the local warehouse, because it is located in Aspose PDF for Java 18.9 perfect cracked version, no watermark, no use time limit, there are spaces in the directory, and can not be installed in the local warehouse.
Install jar to local warehouse, refer to: https://blog.csdn.net/u010393325/article/details/84314543
- command
mvn install:install-file -DgroupId=groupId Referenced name -DartifactId=aspose-artifactId Referenced name -Dversion=Version number -Dpackaging=jar -Dfile=jar Address of
The version number is written at the end of the jar.
- Commands I installed
You need to change the directory where the jar is located.
mvn install:install-file -DgroupId=com.aspose -DartifactId=aspose-words -Dversion=19.5 -Dpackaging=jar -Dfile=E:\Temporary documents\aspose-words-19.5jdk.jar mvn install:install-file -DgroupId=com.aspose -DartifactId=aspose-pdf -Dversion=18.9 -Dpackaging=jar -Dfile=E:\Temporary documents\aspose-pdf-18.9.jar
1.3. Introducing jar into maven project
<dependency> <groupId>com.aspose</groupId> <artifactId>aspose-pdf</artifactId> <version>18.9</version> </dependency> <dependency> <groupId>com.aspose</groupId> <artifactId>aspose-words</artifactId> <version>19.5</version> </dependency>
2, Convert word to PDF
word doc and docx format can be converted.
- All codes
The purpose of pasting all codes is to display the package.
package com.word.pdf; import com.aspose.words.*; /** * @author: stars * @date: 2020 July 30, 2013 13:58 **/ public class TestDemo { public static void main(String[] args) throws Exception { String inputPath = "C:\\Users\\stars\\Desktop\\powerdesigner use.doc" ; String outPath = "C:\\Users\\stars\\Desktop"; String fileName = "powerdesigner use"; setShowInBalloons(inputPath,outPath,fileName); } /** * Convert word document to PDF document * * @param inputPath word Address of the document * @param outPath Address to generate pdf output * @param fileName The name of the generated PDF < b > note: the name should not be suffixed. It is automatically encapsulated below</ b> * @throws Exception */ private static void setShowInBalloons(String inputPath,String outPath,String fileName) throws Exception { Document doc = new Document(inputPath); // Gets the RevisionOptions object that controls the appearance of the revision RevisionOptions revisionOptions = doc.getLayoutOptions().getRevisionOptions(); // Show deleted revisions in balloons revisionOptions.setShowInBalloons(ShowInBalloons.FORMAT_AND_DELETE); // output doc.save(outPath + "\\" + fileName+ ".pdf"); } }
The effect will not be displayed. Anyway, it can be converted.
3, word file append
import com.aspose.words.Document; import com.aspose.words.ImportFormatMode; import com.aspose.words.RevisionOptions; import com.aspose.words.ShowInBalloons;
/** * word files were added */ public static void AppendDocuments(){ // File path String dataDir = "C:\\Users\\stars\\Desktop\\" ; // source file Document dstDoc = new Document(dataDir + "powerdesigner use.doc"); // Files to be appended (append the srcDoc file after dstDoc) Document srcDoc = new Document(dataDir + "ZS-1000 Quality manual.docx"); // Append the source document to the destination document while keeping the original formatting of the source document. dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING); // Appended file dstDoc.save(dataDir + "TestFile Out.docx"); //ExEnd: System.out.println("Documents appended successfully."); }
4, word document replacement
import com.aspose.words.*; import java.util.regex.Pattern;
/** * Find and replace * @param replaceMark Replace indicator * @param replaceAfter Replaced data * @throws Exception */ public static void FindAndReplace(String replaceMark,String replaceAfter) throws Exception { // File path String dataDir = "C:\\Users\\stars\\Desktop\\" ; // source file Document doc = new Document(dataDir + "powerdesigner use.doc"); // Check the text of the document System.out.println("Original document text: " + doc.getRange().getText()); Pattern regex = Pattern.compile(replaceMark, Pattern.CASE_INSENSITIVE); // Replace the text in the document. doc.getRange().replace(regex, replaceAfter, new FindReplaceOptions()); // Check the replacement was made. System.out.println("Document text after replace: " + doc.getRange().getText()); // Replaced file doc.save(dataDir + "ReplaceSimpleOut.doc"); }
Replacement should be noted
If the substitutions are name and namea. If the name substitution character is executed first, the name will also be replaced in the place of namea, followed by A. But this a has no place to replace, so it is saved. This will cause errors in replacement, so it is recommended to use UUID to remove the horizontal line or generate distributed id as the replacement character.
5, PDF gets the specified page or total number of pages
Some pages in this way cannot be obtained. It is recommended to use another jar package, which will be explained in detail in Chapter 6.
import com.aspose.pdf.Document; import com.aspose.pdf.Page; import java.io.IOException;
/** * Get PDF specified page * <p>If you get all PDF s, you can use pdfDocument directly. If you get the specified number of pages, you use newDocument. If all pdfdocuments are returned save(response.getOutputStream()); * If the specified page is returned, newdocument save(response.getOutputStream()); * </p> * @param page Number of pages obtained * @return pdf PageCount */ public static Integer GetPageCountWithoutSavingPDF(Integer page) throws IOException { // Open a document Document pdfDocument = new Document("C:\\Users\\stars\\Desktop\\a.pdf"); // Get the specified page Page pdfPage = pdfDocument.getPages().getUnrestricted(page); // Create a Document object and write the specified page to Document newDocument = new Document(); // Add the page to the Pages collection of new document object newDocument.getPages().add(pdfPage); // Save the new file newDocument.save("C:\\Users\\stars\\Desktop\\page_" + pdfPage.getNumber() + ".pdf"); // Return total pages return pdfDocument.getPages().size(); }
6, Specify a page to get a PDF document
In this way mentioned in the fifth chapter, some pages will not get much, so change a jar package to get it.
6.1. Introducing dependency
<dependency> <groupId>net.sf.cssbox</groupId> <artifactId>pdf2dom</artifactId> <version>1.7</version> </dependency>
If you only get the PDF document of the specified page, this jar package is enough, but there are two jar packages. These are all open source. You don't need to prepare specially. Just pull them directly. The following two specific uses can be referred to by yourself.
<dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox</artifactId> <version>2.0.12</version> </dependency> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox-tools</artifactId> <version>2.0.12</version> </dependency>
6.2. Direct paste code
import org.apache.pdfbox.multipdf.Splitter; import org.apache.pdfbox.pdmodel.PDDocument; import org.springframework.web.bind.annotation.RestController; import javax.servlet.http.HttpServletResponse; import java.io.File; import java.io.IOException; import java.util.List;
/** * * @param fileName File name * @param pageNum Number of pages read * @param filePath File path * @return */ public void splitPdf(HttpServletResponse response, int pageNum, String filePath) { // This is the corresponding file name File indexFile = new File(filePath); PDDocument document = null; try { document = PDDocument.load(indexFile); // Get total pages int numberOfPages = document.getNumberOfPages(); System.out.println("The total number of pages tells you:"+numberOfPages); Splitter splitter = new Splitter(); // Start page splitter.setStartPage(pageNum); // trailer page splitter.setEndPage(pageNum); List<PDDocument> pages = splitter.split(document); for (PDDocument pdDocument : pages){ pdDocument.save(response.getOutputStream()); pdDocument.close(); } document.close(); } catch (IOException e) { e.printStackTrace(); } }
7, Convert PDF to word
If it is a large document, we will not give it all when rendering the front end, but give it page by page at a time. So what if we are a large document template. Save this large document in PDF format and get it page by page at a time, because PDF cannot be changed and belongs to the final text. Then we can convert PDF to word and replace it with the final document, and then convert word to PDF for front-end display.
7.1 advantages and disadvantages of pdf and word
- advantage
pdf supports reading the specified page, and the front page supports direct rendering of text in pdf format. Therefore, you don't need to read all large documents, just read the specified page.
- shortcoming
pdf does not support changes
word
- advantage
Support change and replacement
- shortcoming
It is not supported to obtain the document of the specified page, and the front end does not support display.
7.2. pdf to doc
This conversion method only supports 4 pages
import com.aspose.pdf.*; import java.io.IOException;
/** * pdf Convert to doc */ public static void savingToDoc(){ // Read pdf document Document pdfDocument = new Document("C:\\Users\\stars\\Desktop\\abc.pdf"); // Convert to doc document pdfDocument.save("C:\\Users\\stars\\Desktop\\TableHeightIssue.doc", SaveFormat.Doc); }
7.3. pdf to docx
This method only supports converting 4 pages
This method can also be converted to doc. Just change the file suffix.
import com.aspose.pdf.*; import java.io.IOException;
/** * pdf Convert to docx */ public static void savingToDOCX() { // Load source PDF file Document doc = new Document("C:\\Users\\stars\\Desktop\\abc.pdf"); // Instantiate Doc SaveOptions instance DocSaveOptions saveOptions = new DocSaveOptions(); // Set output file format as DOCX saveOptions.setFormat(DocSaveOptions.DocFormat.DocX); // Save resultant DOCX file doc.save("C:\\Users\\stars\\Desktop\\TableHeightIssue.docx", saveOptions); }
8, PDF to PPT
import com.aspose.pdf.*; import java.io.IOException;
/** * Convert PDF to ppt */ public static void ConvertPDFToPPTX(){ // Load PDF document Document doc =new Document("C:\\Users\\stars\\Desktop\\t.pdf"); // Instantiate PptxSaveOptions instance PptxSaveOptions pptx_save = new PptxSaveOptions(); // Save the output in PPTX format doc.save("C:\\Users\\stars\\Desktop\\output.pptx", pptx_save); }
9, Convert PDF to SVG image
import com.aspose.pdf.*; import java.io.IOException;
/** * Convert pdf to SVG image */ public static void ConvertPDFToSVGFormat(){ // load PDF document Document doc = new Document("C:\\Users\\stars\\Desktop\\t.pdf"); // instantiate an object of SvgSaveOptions SvgSaveOptions saveOptions = new SvgSaveOptions(); // do not compress SVG image to Zip archive saveOptions.CompressOutputToZipArchive = false; // resultant file name String outFileName ="C:\\Users\\stars\\Desktop\\Output.svg"; // save the output in SVG files doc.save(outFileName, saveOptions); }
10, Convert SVG to PDF
import com.aspose.pdf.*; import java.io.IOException;
/** * Convert SVG to PDF */ public static void ConvertSVGFileToPDFFormat(){ String file = "C:\\Users\\stars\\Desktop\\Output.svg"; // Instantiate LoadOption object using SVG load option LoadOptions options = new SvgLoadOptions(); // Create Document object Document document = new Document(file, options); // Save the resultant PDF document document.save("C:\\Users\\stars\\Desktop\\Result.pdf"); }
11, Convert PDF to HTML
Show large document template ideas
After large text display, large documents can be stored in PDF format. Get the document template of the specified page of PDF, and replace it with a real document when it is converted to HTML, so it can be displayed at the front end.
11.1 mode I
The effect is OK
import com.aspose.pdf.*; import com.aspose.pdf.text.CustomFontSubstitutionBase; import java.io.IOException; import java.util.HashMap; import java.util.Map;
/** * pdf Convert to HTML format */ public static void EscapeHTMLTagsAndSpecialCharacters(){ // Load existing PDf file Document pdfDoc = new Document("C:\\Users\\stars\\Desktop\\abc.pdf"); final Map names = new HashMap(); /*pdfDoc.FontSubstitution.add(new Document.FontSubstitutionHandler() { public void invoke(Font font, Font newFont) { // add substituted FontNames into map. names.put(font.getFontName(), newFont.getFontName()); // or print the message into console System.out.println("Warning: Font " + font.getFontName() + " was substituted with another font -> " + newFont.getFontName()); } });*/ // instantiate HTMLSave option to save output in HTML HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions(); // save resultant file pdfDoc.save("C:\\Users\\stars\\Desktop\\output.html", htmlSaveOps); }
11.2 mode II
The effect of this conversion is not good
import com.aspose.pdf.*; import com.aspose.pdf.text.CustomFontSubstitutionBase; import java.io.IOException;
/** * pdf Convert to HTML * @throws Exception */ public static void DefaultFontWhenSpecificFontMissing() throws Exception { String myDir = "C:\\Users\\stars\\Desktop\\"; Document pdf = new Document(myDir + "t.pdf"); // configure font substitution CustomSubst1 subst1 = new CustomSubst1(); FontRepository.getSubstitutions().add(subst1); // Configure notifier to console pdf.FontSubstitution.add(new Document.FontSubstitutionHandler() { public void invoke(Font font, Font newFont) { // print substituted FontNames into console System.out.println("Warning: Font " + font.getFontName() + " was substituted with another font -> " + newFont.getFontName()); } }); HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions(); pdf.save(myDir + "Redis_1150_substitutedWithMSGothic_release.html", htmlSaveOps); } private static class CustomSubst1 extends CustomFontSubstitutionBase { public boolean trySubstitute(OriginalFontSpecification originalFontSpecification, /* out */com.aspose.pdf.Font[] substitutionFont) { substitutionFont[0] = FontRepository.findFont("MSGothic"); return true; } }
12, PDF to emf file
Only one page can be converted
import com.aspose.pdf.*; import com.aspose.pdf.devices.EmfDevice; import com.aspose.pdf.devices.Resolution; import com.aspose.pdf.text.CustomFontSubstitutionBase; import java.io.IOException; import java.util.HashMap; import java.util.Map;
/** * pdf Only one page can be converted to emf file */ public static void PDFToEMF(){ // instantiate EmfDevice object EmfDevice device = new EmfDevice(new Resolution(96)); // load existing PDF file Document doc = new Document("C:\\Users\\stars\\Desktop\\abc.pdf"); // save first page of PDF file as Emf image device.process(doc.getPages().get_Item(1), "C:\\Users\\stars\\Desktop\\output.emf"); }
13, Add tables to PDF documents
import com.aspose.pdf.*; import com.aspose.pdf.devices.EmfDevice; import com.aspose.pdf.devices.Resolution; import com.aspose.pdf.text.CustomFontSubstitutionBase; import java.io.IOException; import java.util.HashMap; import java.util.Map;
/** * Add data to an existing PDF document * <p>Add table to specified page</p> */ public static void addTableInExistingPDFDocument(){ // Load source PDF document Document doc = new Document("C:\\Users\\stars\\Desktop\\abc.pdf"); // Initializes a new instance of the table Table table = new Table(); // Set the table border color to light gray table.setBorder(new BorderInfo(BorderSide.All, .5f, Color.getLightGray())); // Sets the border of a table cell table.setDefaultCellBorder(new BorderInfo(BorderSide.All, .5f, Color.getLightGray())); // Add three columns and ten rows of data for (int row_count = 1; row_count < 10; row_count++) { // Add row to table Row row = table.getRows().add(); // Add table row data row.getCells().add("Column (" + row_count + ", 1)"); row.getCells().add("Column (" + row_count + ", 2)"); row.getCells().add("Column (" + row_count + ", 3)"); } // Add the table on the second page of the PDF document doc.getPages().getUnrestricted(2).getParagraphs().add(table); doc.save( "C:\\Users\\stars\\Desktop\\Annotation_output.pdf"); }
14, Create PDF documents and add tables
import com.aspose.pdf.*; import com.aspose.pdf.devices.EmfDevice; import com.aspose.pdf.devices.Resolution; import com.aspose.pdf.text.CustomFontSubstitutionBase; import java.io.IOException; import java.util.HashMap; import java.util.Map;
/** * Create PDF documents and add tables */ public static void setAutoFitToWindowPropertyInColumnAdjustmentTypeEnumeration(){ //Instantiate the PDF object by calling its empty constructor Document doc = new Document(); //Create the section in the PDF object Page page = doc.getPages().add(); //Instantiate a table object Table tab = new Table(); //Add the table in paragraphs collection of the desired section page.getParagraphs().add(tab); //Set with column widths of the table tab.setColumnWidths("50 50 50"); tab.setColumnAdjustment(ColumnAdjustment.AutoFitToWindow); //Set default cell border using BorderInfo object tab.setDefaultCellBorder(new com.aspose.pdf.BorderInfo(com.aspose.pdf.BorderSide.All, 0.1F)); //Set table border using another customized BorderInfo object tab.setBorder(new com.aspose.pdf.BorderInfo(com.aspose.pdf.BorderSide.All, 1F)); //Create MarginInfo object and set its left, bottom, right and top margins com.aspose.pdf.MarginInfo margin = new com.aspose.pdf.MarginInfo(); margin.setTop(5f); margin.setLeft(5f); margin.setRight(5f); margin.setBottom(5f); //Set the default cell padding to the MarginInfo object tab.setDefaultCellPadding(margin); //Create rows in the table and then cells in the rows com.aspose.pdf.Row row1 = tab.getRows().add(); row1.getCells().add("col1"); row1.getCells().add("col2"); row1.getCells().add("col3"); com.aspose.pdf.Row row2 = tab.getRows().add(); row2.getCells().add("item1"); row2.getCells().add("item2"); row2.getCells().add("item3"); //Save the PDF doc.save( "C:\\Users\\stars\\Desktop\\Annotation_output.pdf"); }
15, Add PDF and add data
import com.aspose.pdf.*; import com.aspose.pdf.devices.EmfDevice; import com.aspose.pdf.devices.Resolution; import com.aspose.pdf.text.CustomFontSubstitutionBase; import java.io.IOException; import java.util.HashMap; import java.util.Map;
/** * New PDF add data */ public static void ForceTableRenderingOnNewPage(){ // Added document Document doc = new Document(); PageInfo pageInfo = doc.getPageInfo(); MarginInfo marginInfo = pageInfo.getMargin(); marginInfo.setLeft(37); marginInfo.setRight(37); marginInfo.setTop(37); marginInfo.setBottom(37); pageInfo.setLandscape(true); Table table = new Table(); table.setColumnWidths("50 100"); // Added page. Page curPage = doc.getPages().add(); for (int i = 1; i <= 120; i++) { Row row = table.getRows().add(); row.setFixedRowHeight(15); Cell cell1 = row.getCells().add(); cell1.getParagraphs().add(new TextFragment("Content 1")); Cell cell2 = row.getCells().add(); cell2.getParagraphs().add(new TextFragment("HHHHH")); } Paragraphs paragraphs = curPage.getParagraphs(); paragraphs.add(table); /********************************************/ Table table1 = new Table(); table.setColumnWidths("100 100"); for (int i = 1; i <= 10; i++) { Row row = table1.getRows().add(); Cell cell1 = row.getCells().add(); cell1.getParagraphs().add(new TextFragment("LAAAAAAA")); Cell cell2 = row.getCells().add(); cell2.getParagraphs().add(new TextFragment("LAAGGGGGG")); } table1.setInNewPage(true); // I want to keep table 1 to next page please... paragraphs.add(table1); //Save the PDF doc.save( "C:\\Users\\stars\\Desktop\\Annotation_output.pdf"); }
16, PDF replace specified content
It's better to replace one character with one character. If you replace many characters with one character. Then some content will not be displayed, because other text will not make room. Suggestion: convert PDF to word document, and then convert word to PDF document after replacement.
pdf replacement only supports the first three pages
import com.aspose.pdf.*; import com.aspose.pdf.devices.EmfDevice; import com.aspose.pdf.devices.Resolution; import com.aspose.pdf.text.CustomFontSubstitutionBase; import java.io.IOException; import java.util.HashMap; import java.util.Map;
/** * replace text * <b>If a text replaces a text, it's OK, because other formats are unchanged. Only the first three pages can be replaced</b> * @param BeforeReplacement Before replacement * @param AfterReplacement After replacement */ public static void replaceTextOnAllPages(String BeforeReplacement,String AfterReplacement){ // Text to replace Document doc = new Document("C:\\Users\\stars\\Desktop\\abc.pdf"); // Characters to be replaced TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(BeforeReplacement); // Accept the absorber for first page of document doc.getPages().accept(textFragmentAbsorber); // Get the extracted text fragments into collection TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments(); // Loop through the fragments for (TextFragment textFragment : (Iterable<TextFragment>) textFragmentCollection) { // Update text and other properties textFragment.setText(AfterReplacement); // font size textFragment.getTextState().setFontSize(10); textFragment.getTextState().setForegroundColor(Color.getBlack()); // The background color is now white textFragment.getTextState().setBackgroundColor(Color.getWhite()); } // Replaced text doc.save( "C:\\Users\\stars\\Desktop\\Annotation_output.pdf"); }