This article is designed to help users whose PDF files did not reduce in size after being processed by file compression software like NXPowerLite Desktop. This often occurs when a PDF contains what are known as Content Streams.
If you aren't sure whether your file has content streams or not, you can check to be certain by using NXPowerLite's View Details feature.
1. What are Content Streams and Why Can't They Be Compressed?
Content streams are the raw data that describes the entire layout of a PDF page, including text and line drawings, expressed through page description commands. A single PDF page can contain one or more of these streams.
The key difference that prevents effective compression is how the content is represented:
- Standard Text and Images: When a PDF is created optimally (e.g., using the "Save As PDF" feature in Microsoft Office), text is stored as editable text objects, and images are stored as separate, compressible media objects. NXPowerLite can target and optimize these individual components.
- Content Streams: When a document is converted into large content streams, the text and images are no longer recognized as separate, editable elements. For these files, common compression tools like NXPowerLite Desktop and WeCompress are currently unable to directly manipulate or compress this specific type of data stream. Because this content is highly opaque and resists editing, an attempt to edit it in software like Adobe Acrobat may result in a non-editable error message.
2. The Primary Cause (and the Reliable Solution)
The issue of large, incompressible content streams is primarily a result of the PDF creation method, especially when it involves custom embedded fonts.
The Cause: How Content Streams Are Created
Content streams are created when an original document is converted to a PDF using methods that convert text and images into raw, static data. Once content is stored in these large streams, the text and images are no longer recognized as separate, editable elements. For these files, common compression tools like NXPowerLite Desktop and WeCompress are currently unable to directly manipulate or compress this specific type of data stream. Therefore, if a PDF contains heavy content streams, the only certain solution is to prevent their creation by going back to the original document and changing how the PDF is generated from the source.
This problem is frequently observed when the original document, often an Office file containing fully embedded fonts, is converted to a PDF using the Microsoft Print to PDF option on Windows. The "Print to PDF" method, especially with embedded fonts, converts the entire document into a heavy PDF where the text and images become non-editable content streams. For instance, a file created using the "Print to PDF" option resulted in a 17 MB file, while the exact same original Office document converted using the "Save as PDF" option produced a file that was only 500 KB and had fully editable text.
While this is the primary cause we have identified, content streams can also be generated by certain third-party libraries or business software used in the PDF production process. We will expand this content as we discover and confirm more specific examples.
The Solution: Controlling the PDF Creation Process (The Focus)
The only certain solution is to prevent the creation of content streams from the beginning by changing how they are generated from the source document. If you have access to the original document prior to it being converted to PDF, you must start there.
Document Creation Best Practices:
- Always use the "Save As PDF" option in Microsoft Office and similar applications. This creates a clean, optimized PDF file with editable content that NXPowerLite can successfully compress.
- Avoid fully embedding fonts in the original documents where possible. Instead, the more efficient method is to "subset" those fonts, which removes unused characters and saves space.
- Use Standard System Fonts like Arial, which are less likely to be converted into static images during the PDF creation process.
3. Other Workarounds for Content Stream PDFs
Once a PDF is created with heavy content streams, there is no single reliable way to reduce its size.
Less Reliable Workarounds
Although the solution above is the most certain fix, some users have reported occasional success with the following method, which involves creating a new PDF from the problematic one:
Reprint the PDF through a web browser's PDF printer. This process can create a new PDF where the content is arranged outside of the large streams, making it more accessible for compression software.
Process: Open the file in a modern web browser (like Google Chrome), select the print option, and choose "Save as PDF" from the destination menu. Once the new file is saved, attempt to compress it again using NXPowerLite or the WeCompress online tool. Note: Users report this method can be inconsistent and may not work reliably.
Ineffective Reduction Methods
Workarounds we have tested are either inconsistent or damaging to the file quality:
- Converting back to an Office file: Attempting to convert the PDF back to a DOCX file often results in inconsistent quality of images and unrecognizable text.
-
Optical Character Recognition (OCR): While running an OCR process on the document may reduce the size (by converting image-like text back into actual text), it is known to negatively impact the legibility of the text, potentially making the document unusable.
Related Content
Check out our blog post "Why is my PDF so big" for more tips on analysing and reducing PDF files based on their specific content.