Linux doc to pdf converter

Как конвертировать Word (doc) в PDF в Linux?

У меня есть набор файлов в формате .doc , которые необходимо преобразовать в формат .pdf . Я использую Ubuntu Linux.

Затем перейдите к «Система»> «Администрирование»> «Печать» и создайте новый принтер, установите его в качестве принтера PDF-файла и назовите его «pdf».

Теперь вы найдете ваш файл .pdf в

Если пакет tetex-extra недоступен в вашем дистрибутиве, попробуйте texlive-base плюс texlive-latex-base:

/ PDF куда-нибудь еще?

Вы можете использовать:

Если вы используете X, то вы можете сделать это через Open Office. Поскольку вы возражаете против того, чтобы делать это вручную, помните, что в Open Office есть несколько хороших макрос-скриптов, которые вы можете автоматизировать . Вы можете сделать что-то подобное с AbiWord (AbiWord —to = pdf).

Если у вас нет X, тогда есть антислово, но оно просто извлекает текст — не выполняет форматирование и графику. Есть также wvWare, который я использовал для массового извлечения изображений из файлов документов, но я никогда не пытался использовать его для преобразования файлов документов в PDF-файлы.

Да, и для файлов .docx может потребоваться что-то другое, но, поскольку они представляют собой просто заархивированные XML-файлы, не составит труда сделать с ними что-то полезное. Для массового извлечения изображений вы просто распаковываете их и копируете каталог с изображениями, но мне никогда не нужно было конвертировать их в Linux.

Источник

How to convert Word (doc) to PDF in linux?

I have a set of files in .doc format, that need to be converted to .pdf format. I am using Ubuntu linux.

10 Answers 10

Then navigate to System > Administration > Printing and create a new printer, set it as a PDF file printer, and name it as «pdf».

Now you’ll find your .pdf file in

If the tetex-extra package is not available with your distribution, try texlive-base plus texlive-latex-base:

/PDF path to somewhere else ?

Printing to PDF loses a lot of the document metadata (title, authorship, the headings tree that is used for navigation, and so on).

Install unoconv, convert with: unoconv -fpdf file1.doc file2.doc…

If you’re running X then you can do it through Open Office. Since you’re about to object to doing it manually, remember there’s some nice macro scripts in Open Office so you can automate it. You can do something similar with AbiWord (AbiWord —to=pdf).

If you’ve not got X then there is antiword, but that just extracts the text — doesn’t do any formatting or graphics. There’s also wvWare which I’ve used to bulk extract images from doc files, but I’ve never tried using it to convert doc files to pdfs.

Oh and .docx files may well need something different, but since they’re just zipped xml files it shouldn’t be too difficult to do something useful with them. For bulk extracting images you just unzip them and copy the images directory, but I’ve never needed to convert them in Linux.

Источник

Convert docx to PDF

I am trying to convert docx files to pdf on my Ubuntu server using the command line but none of converters I tried so far seems to convert Word 2007/2010/2013 files correctly.

Appearently online converters can manage it without any problems but Web services are not an option because the files contain sensitive data. For tests I use this Word 2007 file because it contains some important elements (formulas, vector graphics, images, lists, etc.). I tested the following tools (partly from this post):

Читайте также:  Как обновить нокиа люмия 630 до windows 10 mobile

lowriter (LibreOffice Writer) — incorrect output (the circle is supposed to be on the last page, not the first one)

unoconv — the same as LibreOffice since it doesn’t use its own converter. Converting to odt first and then to pdf messes the file completely up.

abiword —to=pdf filename.doc — incorrect and incomplete (many elements are missing):

OpenOffice Writer — same result as for abiword

wvPDF — crash with the following error message:

$ wvPDF 2007_Office_DocEncryption.docx test.pdf

Current directory: /home/webmt/dev/test/

Some problem running latex.

Check for Errors in test.log

Conversion into dvi failed

Is there any way to convert docx files to PDF on Linux correctly? It would also help me if I knew it works for someone with any of the programs I already mentioned. I will start a bounty as soon as SE lets me.

p.s. I’m using Ubuntu server 12.04

Conclusion:

I had to conclude that as for me, as for now, there is no reliable tool which will work with new MS Word formats and all kind of its elements on Ubuntu and create a one-to-one copy of docx files. None of tools I tested could convert the sample file properly. Since I will be facing very different kind of document versions/contents and the output quality has one of the highest priority, I will end up performing the conversions by means of VB macros in Word on a Windows server connected to my Linux.

I will set the post getting the best results as the accepted answer. However, the bounty was intended for a solution with absolutely correct conversion. Thanks to everyone, again.

7 Answers 7

This answer passes all tests, but the flow chart one in your test document.

Why is this better than other methods suggest thus far?

I have tested the other methods suggested so far (especially oowriter and ebook-convert ), but they pass less tests than this method. The ebook-convert method strips the margins and a part of the texts out of the document.

This method even yields better results than a professional converter as rainbowpdf.

I also tried converting it to html, but the drawing with the square in the circle and the flow chart are incorrect.

Why does the flow chart test fail?

It seems that libreoffice and unoconv have some problems with correctly rendering the flow chart that is in the .docx file. This is probably because it was made using smart art in Microsoft Office. That is the problem. That is a bug also discussed on this thread. The textual and visual information is present in the pdf resulting from the above method as you can see (I had to select the text, though).

The font color, for instance, is not properly read and some lines are too long. I am not aware of any linux solution that is able to display smart art correctly. 🙁

This is also the reason why all the print solutions posted on this page will not satisfy you.

In short

In short, what you are doing is really hard and there are at present no solutions that will fully satisfy you. The achilles’ heel of docx2pdf conversions is the smart art. If you can live without that or if you can find a way to spot smart art and convert it somehow into an image, you can reach your goal.

Option 1. Force your users to deal with the problem

This is a very inelegant solution. Your content creators could save their smart art as jpg as described in the office help pages and hence the conversion would be possible on your server.

Option 2. Hack your way around the problem

If the flow charts are often very similar and depending on how good a developper you are, you could try and convert the smart art separately. You could, extract the drawing1.xml file from the .docx cluster of documents and then use natural language processing and some crazy hacks to rebuild a the smart art. For instance, you’d have to mess with this type of xml:

Читайте также:  Wifi apple mac os

Option 3. Use a third party service

I have done some more research the past few days and I have found a service that does the conversion perfectly: zamzar. Zamzar allows you to upload a docx file and then emails you a link. They also have a (paying?) service where you can send any file to pdf@zamzar.com and then get the converted file back in your inbox. You could easily build a system around this where you automatically send the file and parse it from the email. This is not so much work and it the end result is the best.

Notes

  • If anyone has other services that do the same, please feel free to edit them in.
  • I have mailed the zamzar support to ask whether they have an api. That would be even easier.
  • Maybe apose for .NET and Java could also help out? Or docx4java as in this very related SO post.
  • Another option is to look into the the odf-converter which seems dated and is dependent on openoffice rather than libreoffice.
  • I can now confirm that the java jodconverter also suffers fails the flow chart conversion.

I have actually taken the time to test the different methods proposed on this page. Please back any comments up with actual tests.

This is a command-line solution that works decently — but uses proprietary software.

I think that the basic problem is that Microsoft Word formats are fully understandable just for Microsoft Word (even there, there are differences between versions — there are Word files from the past that opens incorrectly formatted in newer versions). All the other solutions are approximations and hacks, so they will work or not depending on the file.

So to be sure you need to process your .docx files with a Microsoft Word installation (and yes, I think it’s their option and it’s fair. If you do not want to use Word, don’t use it — I go with LaTeX for my work, but it’s difficult to convince the rest of the world around. ).

I am using since ages Crossover for running Microsoft Office in my Linux Desktop (1), finding it quite useful. Maybe it works with wine too — never tried.

I do the conversion using this configuration:

1) I have Crossover installed

2) I have my version of Microsoft Office installed under Crossover

3) In Microsoft Word, disable «background printing»

4) I have cups-pdf printer installed and selected as default printer.

5) To do the conversion, run (hints here):

6) Your converted file will appear in

You document come out almost perfectly (there is some misalignement on answer #2, that are shown in my Office Word 2007 when running under Crossover — I do not know if it’s related to my Windows version).

Now, the problem is that the graphic word interface will pop-up — I do not know how to make it «headless». Command line options for Word didn’t help.

(1) I am in no way related to Codeveawers — just a happy user.

If you have Libreoffice installed, you can try to convert using that. Just press Ctrl + Alt + T on your keyboard to open Terminal. When it opens, run the command(s) below:

Another option is to install Cups PDF.

To do so just press Ctrl + Alt + T on your keyboard to open Terminal. When it opens, run the command(s) below:

Then create a new printer, set it as a PDF file printer, and name it whatever you want, as long as you know the name, then run:

And your PDF file will be in

I also had this problem in the past, haven’t had to use it lately, so I don’t know if it still is affecting me.

Читайте также:  Лучший дистрибутив линукс для ноутбука 2021

As for answering the question:

This question: How to batch convert .doc or .docx to .pdf gives a reason in the comments why your conversion with lowriter might be failing:

Beware of using «space» character from command line. When you get to the space character simply press «tab» 😉 – Pitto Nov 16 ’12 at 13:11

This question’s answer also might possibly help:

You would run libreoffice —headless —convert-to pdf *.odt . You can get more info on libreoffice with the command man libreoffice if you need help understanding or tweaking the command to work.

However, you can’t have LibreOffice open at the time, as per this bug: https://bugs.freedesktop.org/show_bug.cgi?id=37531

The first answer has two options, one using CUPS and creating a PDF printer, the other using LaTex, though you did say that LaTex was failing.

As for converting to PDF via CUPS PDF you would run sudo apt-get install cups-pdf followed by oowriter -pt pdf your_word_file.doc(x) . This might help with your oowriter issue.

This is probably a problem with the fact that you are trying to convert to PDF from DOC/DOCX, when most of the tools use ODT, as they are related to LibreOffice/OpenOffice/AbiWord. Thus, they either fail at trying to convert it from Microsofts DOCX format or in the conversion to ODT.

There are several bugs with a conversion from .docx w. Word Art (version is included):

This is from the LibreOffice forum regarding conversion from .doc and somewhat .docx: http://en.libreofficeforum.org/node/5096. It’s from January of 2013, so it should apply somewhat.

Beyond all this, I really don’t know. Hope you solve your problem!

Here is the bitter truth: Office solutions for Linux are total failures! I’ve been a full-time GNU/Linux user for many years and I’ve constantly searched and tried different office solutions, from the old Open-Office, to the later Libre-Office, Abi-Word, etc. They have all failed to help me do my office work. It even gets worse when it comes to non-Latin languages (right-to-left languages like Persian, Arabic, etc). The user has to fight with these software to get his/her work done! And Microsoft office compatibility is just not there. I can talk hours and hours of how much I’ve tried and they have all failed me, but this is not the point of this question.

I’ve also tried installing and running Microsoft Office using WINE, and some-how successful but it didn’t work-out nice and it mostly crashed when I tried to open my office files.

LaTeX is fine, but it’s not an office solution. LaTeX is for type-setting, and it’s more like a pro’s tool, and there’s no spread-sheets, nor presentations.

So what’s the solution?

This is not a command-line solution. The only solution that I have came up with in all these years, to keep me inside my GNU/Linux OS and also get my office works done, is to use a minimal Microsoft Windows installation in a virtual-machine (like VirtualBox) and install a Microsoft Office suit.

It might not sound pretty but it’s the only solution that works flawlessly and saves me from fighting with bad-office-solutions in my precious time. At first, I myself thought this was not a good solution, but after failing with all the others and doing this VM stuff for more than 2 years, I’m really happy with it 🙂

NOTE-1: I’m not advertising Microsoft products! Just trying to help solve the problem and move-on with life.

NOTE-2: As emphasized above, this is NOT a command-line solution. So why post the answer? Because it’s a TESTED and WELL-WORKING option! If no WORKING command-line solution is available (which I highly suspect is the case), then having an ALTERNATIVE option is better than NO options.

Источник

Оцените статью