iTextSharp给我错误:“未找到PDF标题签名”

人气:418 发布:2022-09-21 标签: pdf c# itextsharp .net-4.5

问题描述

我读到的关于此错误的所有内容都说该文件必须缺少%PDF-1.4或顶部类似的内容;但是,我的文件包含它。我不是PDF格式的专家,但我仔细检查过我没有多个%% EOF或预告片标签,所以现在我不知道是什么导致我的PDF标题签名不好。如果您想查看该文件,可以使用以下链接:格式错误的PDF

Everything I have read about this error says the file must be missing a "%PDF-1.4" or something similar at the top; however, my file includes it. I am not an expert in PDF formatting, but I did double check that I don't have multiple %%EOF or trailer tags, so now I'm at a loss as to what is causing my PDF header signature to be bad. Here is a link to the file if you would like to look at it: Poorly formatted PDF

这就是我正在做的事情。我以MemoryStream的形式获取PDF的每个页面,因此我必须将每个页面附加到前一页的末尾。为了做到这一点,我使用的是iTextSharp的PdfCopy类。以下是我正在使用的代码:

Here is what I'm doing. I am getting each page of the PDF in the form of a MemoryStream, so I have to append each page to the end of the previous pages. In order to do this, I am using iTextSharp's PdfCopy class. Here is the code I am using:

    /// <summary>
    /// Takes two PDF streams and appends the second onto the first.
    /// </summary>
    /// <param name="firstPdf">The PDF to which the other document will be appended.</param>
    /// <param name="secondPdf">The PDF to append.</param>
    /// <returns>A new stream with the second PDF appended to the first.</returns>
    public Stream ConcatenatePdfs(Stream firstPdf, Stream secondPdf)
    {
        // If either PDF is null, then return the other one
        if (firstPdf == null) return secondPdf;
        if (secondPdf == null) return firstPdf;
        var destStream = new MemoryStream();

        // Set the PDF copier up.
        using (var document = new Document())
        {
            using (var copy = new PdfCopy(document, destStream))
            {
                document.Open();
                copy.CloseStream = false;

                // Copy the first document
                using (var reader = new PdfReader(firstPdf))
                {
                    for (int i = 1; i <= reader.NumberOfPages; i++)
                    {
                        copy.AddPage(copy.GetImportedPage(reader, i));
                    }
                }

                // Copy the second document
                using (var reader = new PdfReader(secondPdf))
                {
                    for (int i = 1; i <= reader.NumberOfPages; i++)
                    {
                        copy.AddPage(copy.GetImportedPage(reader, i));
                    }
                }
            }
        }
        return destStream;
    }

每次收到新的PDF页面时,我都会传递先前连接的页面( firstPdf)以及该页面的新页面(secondPdf)。对于第一页,我没有任何先前连接的页面,因此firstPdf为null,从而导致返回secondPdf作为结果。我第二次经历,第一页作为firstPdf传入,新的第二页作为secondPdf传入。连接工作正常,结果实际上在上面链接的First.pdf文件中。

Every time I receive a new PDF page, I pass the previously concatenated pages (firstPdf) along with the new page (secondPdf) to this function. For the first page, I don't have any previously concatenated pages, so firstPdf is null, thereby resulting in secondPdf being returned as the result. The second time I go through, the first page is passed in as firstPdf and the new second page is passed in as secondPdf. The concatenation works just fine and the results are actually in the First.pdf file linked above.

问题是当我去添加第三页时。我使用前一个传递(前两页)的输出作为第三遍的输入,以及一个新的PDF流。当我尝试使用PDF以前连接的页面初始化PdfReader时发生异常。

The problem is when I go to add a third page. I am using the output of the previous pass (the first two pages) as the input for the third pass, along with a new PDF stream. The exception occus when I try to initialize the PdfReader with the PDF previously concatenated pages.

我觉得特别有趣的是它无法读取自己的输出。我觉得我一定做错了,但我既不知道如何避免这个问题,也不知道为什么标题有问题;它看起来很正常。如果有人能告诉我我的代码是错误的,或者至少对PDF文件有什么问题,我会非常感激。

What I find particularly interesting is that it fails to read its own output. I feel like I must be doing something wrong, but I can neither figure out how to avoid the problem, nor why there is a problem with the header; it looks perfectly normal to me. If someone could show me either what I'm doing wrong with the my code or at least what is wrong with the PDF file, I would really appreciate it.

推荐答案

(评论回答)

我强烈建议不要自己传递原始流,而是绕过字节数组通过在 MemoryStream 上调用 .ToArray()。 iTextSharp假设有一个专用的空流用于写入,因为它无法就地编辑现有文件。虽然流本质上映射到字节,但它们也具有固有属性,如打开已关闭位置可能搞砸了。

I strongly recommend not passing the raw streams themselves around and instead pass around a byte array by calling .ToArray() on your MemoryStream. iTextSharp assumes that is has a dedicated empty stream for writing to since it can't edit existing files "in-place". Although streams essentially map to bytes, they also have inherent properties like Open and Closed and Position that can mess things up.

265