Convert Office files to PDF with LibreOffice and Python

Some time a developer will need transform some type of document, be that Word, Excel, CSV to PDF. There are a lot of libraries and tools available to perform that action. In some types of documents is necessary to do “juggling” to convert in another one, for example, convert a document X to HTML and then convert to PDF.

Today, I will explain how to use the command line of LibreOffice, that supply resources to convert in easy way many types of documents to PDF format, using Python, but you can adapt for your preferred programming language.

Tools and Technologies used:

Firstly, we are going to download and install LibreOffice, which can be obtained using the ‘apt’ command or from the official site, libreoffice.org. After that, you need to locate the directory containing the executable file named ‘soffice’, which will be used to perform the conversion. LibreOffice offers a CLI (command-line interface) resource, which can be accessed via the shell in the operating system to perform various tasks.

In my case, the ‘soffice’ executable was located in the folder ‘/opt/libreoffice7.3/program/’. You can also create a symbolic link to the directory as desired. In Ubuntu, after installation, the application became available with the command ‘libreoffice’. Here, we will use the full path.

In the shell, you can enter the command ‘-help’ to display the available resources in the command line or access the official documentation.

/opt/libreoffice7.3/program/soffice --help

or

libreoffice --help

LibreOffice offers various filters for document conversion. We will use the simplest form of the conversion command:

/opt/libreoffice7.3/program/soffice --headless --convert-to pdf --outdir /tmp FILE-TO-CONVERT.docx

or

libreoffice --headless --convert-to pdf --outdir /tmp FILE-TO-CONVERT.docx
--headless - Starts in "headless mode", allowing the application to be used without a user interface.
--convert-to - Converts files to a selected filter, in our case, "pdf".
--outdir - Indicates the destination folder of the converted file.
FILE-TO-CONVERT.docx - The path to the file to be converted.

After executing the command, a file with the same name as the original file but with the .pdf extension will be created, and a success or error message will be displayed, as follows:

convert /home/tarik/docs/arquivo.docx -> /tmp/arquivo.pdf using filter : writer_pdf_Export

In the conversion output, it shows the file to be converted, the destination path of the file after conversion, and the filter used. Unfortunately, it’s not possible to change the name of the destination file in this type of conversion. Now that we know how to use the LibreOffice CLI in the OS shell, let’s implement a Python feature to use this command and return the path of the PDF.

For this task, we’ll use the built-in Python module called “subprocess,” which allows the execution of external applications using the run() function:

Our code for conversion will be as follows:

import subprocess # Import subprocess module
import os # We will use the exists() function from this module to know if the file was created.
def convert_file_to_pdf(file_path, output_dir):
subprocess.run(
f'/opt/libreoffice7.3/program/soffice \
--headless \
--convert-to pdf \
--outdir ', shell=True)

pdf_file_path = f'.pdf'

if os.path.exists(pdf_file_path):
return pdf_file_path
else:
return None
file_path = '/home/tarik/docs/file.docx'
output_dir = '/tmp/'
file = convert_file_to_pdf(file_path, output_dir)
if file:
print(f'File converted to .')
else:
print('Unable to convert the file.')

The convert_file_to_pdf function takes two parameters: the location of the file to be converted and the output folder for the PDF. It then calls the subprocess.call() function with our command and the shell=True parameter to utilize the OS shell resources. A string is created with the destination path of the converted file to verify if the file was indeed converted. The expected result is as shown below:

convert /home/tarik/docs/file.docx -> /tmp/file.pdf using filter : writer_pdf_Export
Overwriting: /tmp/file.pdf
File converted to /tmp/file.pdf

This article was written in Portuguese in 2022. Thank you for your time. If you have any questions, feel free to send me a message.”