I am using Popen function from the subprocess module to execute a command line tool:
subprocess.Popen(args, bufsize=0, executable=None, stdin=None, stdout=None, stderr=None, preexec_fn=None, close_fds=False, shell=False, cwd=None, env=None, universal_newlines=False, startupinfo=None, creationflags=0)
The tool I am using takes a list of files that it then processes. In some cases, this list of files can be very long. Is there a way to find the max length that the args parameter can be? With a large number of files being passed to the tool, I am getting the following error:
Traceback (most recent call last):
File "dump_output_sopuids.py", line 68, in <module>
uid_map = create_sopuid_to_path_dict_dcmdump(dicom_files)
File "dump_output_sopuids.py", line 41, in create_sopuid_to_path_dict_dcmdump
dcmdump_output = subprocess.Popen(cmd,stdout=subprocess.PIPE).communicate(0)[0]
File "c:\python26\lib\subprocess.py", line 621, in __init__
errread, errwrite)
File "c:\python26\lib\subprocess.py", line 830, in _execute_child
startupinfo)
WindowsError: [Error 206] The filename or extension is too long
Is there a general way to find this max length? I found the following article on msdn: Command prompt (Cmd. exe) command-line string limitation but I don't want to hard code in the value. I would rather get the value at run time to break up the command into multiple calls.
I am using Python 2.6 on Windows XP 64.
Edit: adding code example
paths = ['file1.dat','file2.dat',...,'fileX.dat']
cmd = ['process_file.exe','+p'] + paths
cmd_output = subprocess.Popen(cmd,stdout=subprocess.PIPE).communicate(0)[0]
The problem occurs because each actual entry in the paths
list is usually a very long file path AND there are several thousand of them.
I don't mind breaking up the command into multiple calls to process_file.exe
. I am looking for a general way to get the max length that args can be so I know h开发者_StackOverflow社区ow many paths to send in for each run.
If you're passing shell=False, then Cmd.exe does not come into play.
On windows, subprocess will use the CreateProcess function from Win32 API to create the new process. The documentation for this function states that the second argument (which is build by subprocess.list2cmdline) has a max length of 32,768 characters, including the Unicode terminating null character. If lpApplicationName is NULL, the module name portion of lpCommandLine is limited to MAX_PATH characters.
Given your example, I suggest providing a value for executable (args[0]) and using args for the first parameter. If my reading of the CreateProcess documentation and of the subprocess module source code is correct, this should solve your problem.
[edit: removed the args[1:] bit after getting my hands on a windows machine and testing]
For Unix-like platforms, the kernel constant ARG_MAX
is defined by POSIX. It is required to be at least 4096 bytes, though on modern systems, it's probably a megabyte or more.
On many systems, getconf ARG_MAX
will reveal its value at the shell prompt.
The shell utility xargs
conveniently allows you to break up a long command line. For example, if
python myscript.py *
fails in a large directory because the list of files expands to a value whose length in bytes exceeds ARG_MAX
, you can work around it with something like
printf '%s\0' * |
xargs -0 python myscript.py
(The option -0
is a GNU extension, but really the only completely safe way to unambiguously pass a list of file names which could contain newlines, quoting characters, etc.) Maybe also explore
find . -maxdepth 1 -type f -exec python myscript.py {} +
The way these work around the restriction is that they divide up the argument list if it's too long, and run myscript.py
multiple times on as many arguments as they can fit onto the command line at a time. Depending on what myscript.py
does, this can be exactly what you want, or catastrophically wrong. (For example, if it sums the numbers in the files you pass in, you will get multiple results for each set of arguments that it processed.)
Conversely, to pass a long list of arguments to subprocess.Popen()
and friends, something like
p = subprocess.Popen(['xargs', '-0', 'command'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = p.communicate('\0'.join(long_long_argument_list))
... where in most scenarios you should probably avoid raw Popen()
and let a wrapper function like run()
or check_call()
do most of the work:
r = subprocess.run(['xargs', '-0', 'command'],
input='\0'.join(long_long_argument_list),
universal_newlines=True)
out = r.stdout
subprocess.run()
supports text=True
in 3.7+ as the new name of universal_newlines=True
. Older Python versions than 3.5 didn't have run
, so you need to fall back to the older legacy functions check_output
, check_call
, or (rarely) call
.
If you wanted to reimplement xargs
in Python, something like this.
import os
def arg_max_args(args):
"""
Split up the list in `args` into a list of lists
where each list contains fewer than ARG_MAX bytes
(including room for a terminating null byte for each
entry)
"""
arg_max = os.sysconf("SC_ARG_MAX")
result = []
sublist = []
count = 0
for arg in args:
argl = len(arg) + 1
if count + argl > arg_max:
result.append(sublist)
sublist = [arg]
count = argl
else:
sublist.append(arg)
count += argl
if sublist:
result.append(sublist)
return result
Like the real xargs
, you'd run a separate subprocess on each sublist returned by this function.
A proper implementation should raise an error if any one argument is larger than ARG_MAX
but this is just a quick demo.
精彩评论