开发者

C# GhostScript - Not able to successfully convert from PDF to TXT file

开发者 https://www.devze.com 2023-03-20 19:54 出处:网络
I have been successful with extracting text from a PDF file using GhostScript along with the following command line arguments:

I have been successful with extracting text from a PDF file using GhostScript along with the following command line arguments:

gswin32c.exe ^
  -q -dNODISPLAY -dSAFER -dDELAYBIND ^
  -dWRITESYSTEMDICT ^
  -dSIMPLE ^
  -c save ^
  -f ps2ascii.ps ^
   "test.pdf" ^
  -c quit ^
  >"test.txt"

Points to note: I copied the following three files from the installation directory into my C:\ directory.

1)gsdll32.dll
2)gsdll32.lib
3)gswin32

When manually running GhostScript through the command line I do the following: Run > CMD > cd C:\ (then proceed to input the above arguments).

(The above command works and a new file named "test.txt" appears in my C:\ drive with the appropriate pdf data).

When attempting to execute GhostScript through the command line, however, I am not successful.

In my C# Winform Application, I am using the following code to execute GhostScript:

Process p1 = new Process();
p1.StartInfo.FileName = @"C:\test.exe";
                    p1.StartInfo.UseShellExecute = fa开发者_运维技巧lse;

                    p1.StartInfo.WorkingDirectory = @"C:\";
                    p1.StartInfo.Arguments = " -q -dNODISPLAY -dSAFER -dDELAYBIND -dWRITESYSTEMDICT -dSIMPLE -c save -f ps2ascii.ps " + quote + @"C:\h.pdf" + quote + " -c quit >" + quote + @"C:\test.txt" + quote;
                    File.WriteAllText(@"C:\hhh.txt", p1.StartInfo.Arguments);
                    p1.Start();

Does anyone see any obvious errors with my code? I would appreciate any help I can get here.

Thanks,

Evan


  1. Make a batch file named batch.bat as follows.

    rem batch.bat
    rem %1 represents input file name without extension.
    echo off
    gswin32c -q -dNODISPLAY -dSAFER -dDELAYBIND -dWRITESYSTEMDICT -dSIMPLE -c save -f ps2ascii.ps %1.pdf -c quit >%1.txt
    
  2. Compile the following code to get a console application named myapp.exe

    using System.Diagnostics;
    
    class myapp
    {
        public static void Main()
        {
            Process p1 = new Process();
            p1.StartInfo.FileName = "batch.bat";
            p1.StartInfo.Arguments = "test";
            p1.StartInfo.UseShellExecute = false;
    
            p1.Start();
            p1.WaitForExit();
         }
     }
    
  3. Put all in the same directory as follows and double click the myapp.exe. Done!

C# GhostScript - Not able to successfully convert from PDF to TXT file

0

精彩评论

暂无评论...
验证码 换一张
取 消