Content Extraction &
Rebuilding from PDF

As part of our additional services, Foliage Solutions offers content extraction
and document rebuilding from PDF files.

Every language service provider receives requests to translate content provided only as pdf files without having the native file(s) available.

There can be various solutions for these requests depending on the end use of the translated documents and the expectations of the end client.

Our desktop publishing specialists have years of experience and apply various tools/solutions when handling requests to extract or rebuild in an editable format content provided only as pdf files.


Often the pdf files cannot be directly processed by CAT tools or the layout of the translated documents generated from the CAT doesn’t match the source.

This is usually the case with scanned pdf files where the CAT cannot “read/extract” the content correctly or the formatting of the translated files comes out of the CAT all messed up.

The same issue can also occur even with “live” text pdfs. The best solution is always to have the content needing translation prepared in an editable file format (Word, InDesign, etc.) before processing with the CAT.



Content extraction for translation in situations where only a pdf file and not the native file (e.g. InDesign) is available.

Content extraction

Whether the deliverables need to be translated files with a matching layout or just the content extracted into an editable format, desktop publishing specialists can help.

Using Optical Character Recognition (OCR) tools – such as ABBYYY Fine Reader, the Export PDF feature in Adobe Acrobat Pro and many others, the translatable content from the pdf files gets exported into an editable format such as Word or InDesign.

The exported content is then cleaned up of any garbage characters, unnecessary hard returns, tabs and any other elements that could for example interfere with proper segmentation or introduce unnecessary tags in the CAT.

Document rebuilding based on a pdf of a scanned document (like Faxes) in various formats.

Document rebuilding

Sometimes it is necessary to recreate the file structure from scratch and use the OCR tools to retrieve the text or, in very extreme cases, manually re-type the legible text.

This may be the case when the quality of the scanned PDF is very poor and/or it has a complicated layout.

Or else, when the native files are not available and you need editable files that match the formatting/layout of the pdf.

6 benefits you get when working
with Foliage solutions

Desktop publishing services tailored for your needs.

Modern and constantly updated software to help our desktop publishing experts deliver you a high-quality final output.

Experienced team of experts in the field of multilingual desktop publishing.

Firm in achieving accuracy, punctuality, and proficiency in every desktop publishing project.

Established and proven desktop publishing workflow to guarantee outstanding results.

What is Optical Character Recognition?

It is a technology that helps you convert different types of files, like scanned paper documents, images or pdf files into editable data which can then be translated.

Foliage’s desktop publishing specialists are skilled in using OCR tools to extract the content and create editable files from the pdf files our clients send us.

Contact us for a free quote!

Desktop Publishing Service Provider
Who Handles All Languages 

Tell us briefly about your project and we’ll get back to you
as soon as possible. Please complete the form below:

About Foliage Solutions SRL

We are a new company built on the collaboration between a former translation project manager with over a decade’s worth of experience at a multinational translation provider and a group of Desktop Publishing (DTP) professionals to provide specialized solutions associated with multilingual DTP to other Language Service Providers (LSP).

Foliage Solutions SRLS
Via Vincenzo Monti, 32
20123 Milano