I have two pieces of code that I'm using to learn about multiprocessing in Python 3.1. My goal is to use 100% of all the available processors. However, the code snippets here only reach 30% - 50% on all processors.

Is there anyway to 'force' python to use all 100%? Is the OS (windows 7, 64bit) limiting Python's access to the processors? While the code snippets below are running, I open the task manager and watch the processor's spike, but never reach and maintain 100%. In addition to that, I can see multiple python.exe processes created and destroyed along the way. How do these processes relate to processors? For example, if I spawn 4 processes, each process isn't using it's own core. Instead, what are the processes using? Are they sharing all cores? And if so, is it the OS that is forcing the processes to share the cores?

code snippet 1

import multiprocessi开发者_StackOverflowng

def worker():
    #worker function
    print ('Worker')
    x = 0
    while x < 1000:
        print(x)
        x += 1
    return

if __name__ == '__main__':
    jobs = []
    for i in range(50):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

code snippet 2

from multiprocessing import Process, Lock

def f(l, i):
    l.acquire()
    print('worker ', i)
    x = 0
    while x < 1000:
        print(x)
        x += 1
    l.release()

if __name__ == '__main__': 
    lock = Lock()
    for num in range(50):
        Process(target=f, args=(lock, num)).start()

To use 100% of all cores, do not create and destroy new processes.

Create a few processes per core and link them with a pipeline.

At the OS-level, all pipelined processes run concurrently.

The less you write (and the more you delegate to the OS) the more likely you are to use as many resources as possible.

python p1.py | python p2.py | python p3.py | python p4.py ...

Will make maximal use of your CPU.

You can use psutil to pin each process spawned by multiprocessing to a specific CPU:

import multiprocessing as mp
import psutil


def spawn():
    procs = list()
    n_cpus = psutil.cpu_count()
    for cpu in range(n_cpus):
        affinity = [cpu]
        d = dict(affinity=affinity)
        p = mp.Process(target=run_child, kwargs=d)
        p.start()
        procs.append(p)
    for p in procs:
        p.join()
        print('joined')


def run_child(affinity):
    proc = psutil.Process()  # get self pid
    print(f'PID: {proc.pid}')
    aff = proc.cpu_affinity()
    print(f'Affinity before: {aff}')
    proc.cpu_affinity(affinity)
    aff = proc.cpu_affinity()
    print(f'Affinity after: {aff}')


if __name__ == '__main__':
    spawn()

Note: As commented, psutil.Process.cpu_affinity is not available on macOS.

Minimum example in pure Python:

def f(x):
    while 1:
        # ---bonus: gradually use up RAM---
        x += 10000  # linear growth; use exponential for faster ending: x *= 1.01
        y = list(range(int(x))) 
        # ---------------------------------
        pass  # infinite loop, use up CPU

if __name__ == '__main__':  # name guard to avoid recursive fork on Windows
    import multiprocessing as mp
    n = mp.cpu_count() * 32  # multiply guard against counting only active cores
    with mp.Pool(n) as p:
        p.map(f, range(n))

Usage: to warm up on a cold day (but feel free to change the loop to something less pointless.)

Warning: to exit, don't pull the plug or hold the power button, Ctrl-C instead.

Regarding code snippet 1: How many cores / processors do you have on your test machine? It isn't doing you any good to run 50 of these processes if you only have 2 CPU cores. In fact you're forcing the OS to spend more time context switching to move processes on and off the CPU than do actual work.

Try reducing the number of spawned processes to the number of cores. So "for i in range(50):" should become something like:

import os;
# assuming you're on windows:
for i in range(int(os.environ["NUMBER_OF_PROCESSORS"])):
    ...

Regarding code snippet 2: You're using a multiprocessing.Lock which can only be held by a single process at a time so you're completely limiting all the parallelism in this version of the program. You've serialized things so that process 1 through 50 start, a random process (say process 7) acquires the lock. Processes 1-6, and 8-50 all sit on the line:

l.acquire()

While they sit there they are just waiting for the lock to be released. Depending on the implementation of the Lock primitive they are probably not using any CPU, they're just sitting there using system resources like RAM but are doing no useful work with the CPU. Process 7 counts and prints to 1000 and then releases the lock. The OS then is free to schedule randomly one of the remaining 49 processes to run. Whichever one it wakes up first will acquire the lock next and run while the remaining 48 wait on the Lock. This'll continue for the whole program.

Basically, code snippet 2 is an example of what makes concurrency hard. You have to manage access by lots of processes or threads to some shared resource. In this particular case there really is no reason that these processes need to wait on each other though.

So of these two, Snippet 1 is closer to more efficiently utilitizing the CPU. I think properly tuning the number of processes to match the number of cores will yield a much improved result.

I'd recommend using the Joblib library, it's a good library for multiprocessing, used in many ML applications, in sklearn etc.

from joblib import Parallel, delayed

Parallel(n_jobs=-1, prefer="processes", verbose=6)(
    delayed(function_name)(parameter1, parameter2, ...)
    for parameter1, parameter2, ... in object
)

Where n_jobs is the number of concurrent jobs. Set n=-1 if you want to use all available cores on the machine that you're running your code.

More details on parameters here: https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html

In your case, a possible implementation would be:

def worker(i):
    print('worker ', i)
    x = 0
    while x < 1000:
        print(x)
        x += 1

Parallel(n_jobs=-1, prefer="processes", verbose=6)(
        delayed(worker)(num)
        for num in range(50)
    )

To answer your question(s):

Is there anyway to 'force' python to use all 100%?

Not that I've heard of

Is the OS (windows 7, 64bit) limiting Python's access to the processors?

Yes and No, Yes: if it python took 100%, windows will freeze. No, you can grant python Admin Priviledges which will result in a lockup.

How do these processes relate to processors?

They don't, technically on the OS level those python "processes" are threads which is processed by the OS Handler as it needs handling.

Instead, what are the processes using? Are they sharing all cores? And if so, is it the OS that is forcing the processes to share the cores?

They are sharing all cores, unless you start a single python instance that has affinity set to a certain core (in a multicore system) your processes will be split into which-ever-core-is-free processing. So yes, the OS is forcing the core sharing by default (or python is technically)

if you are interested in python core affinity, check out the affinity package for python.