Pages

Sunday, November 24, 2013

From Word to PDF using LibreOffice

Problem

Here I will put together links and tips on how to handle the following scenario. I need to produce a PDF documentation from a set of Word files in .doc and .docx formats. These have some content that changes and, to handle that, I would create a "template" file where the content would be adjusted on the fly. Then the documents would be converted to PDF format for distribution. Details on implementation are below.

Working with Template Files

Template files are just formatted text files. These can be .docx, .doc, or .odt formats. Custom tags are inserted into the file originally, and then replaced when the custom output is required. The purpose is to have a visual designer for documents and also a system that is easily customizable but, at the same time, easy to read and manipulate by a machine.

Word

Word XML (.docx)

There is a library for working with .docx format, for .Net platform:

Word (.doc)

These files can be read using MS Office or Libre Office.

Open Document Format (.odt)

  • Writer Demo using CLI link.
  • AODL library for manipulating .odf documents.

Direct Manipulation

While reading the template files can be done using native libraries, the templates are simple enough for a quick find/replace functionality of the plain text. Both .docx and .odt are, in fact, Zip compressed collection of XML files.
  • Zip packages are supported directly by the .Net Framework, using ZipPackage class.
  • Zip files are accessible through ZipFile class.

Converting to PDF

Word files are converted to PDF using LibreOffice.
  • Earlier mentioned AODL library can export files as PDF.
  • CLI-UNO Language Bindings also can be used to export .odt files to PDF in .Net directly.
  • Converting Microsoft Word Document to PDF format using OpenOffice.org (Portable), on CodeProject link
  • HOW TO: Convert office documents to PDF using Open Office in C# (link
  • Programmatically convert Word (docx) to PDF, on StackOverflow (link)
CLI-UNO library is available with default LibreOffice installation and is (usually) found at
C:\Program Files (x86)\LibreOffice 4\URE\bin\cli_uno.dll

Additional References

Conclusion

There should be enough material here for the solution required at the beginning of the post.

No comments: