开发者

Tutorials on optimizing non-trivial Python applications with C extensions or Cython

开发者 https://www.devze.com 2023-01-25 00:54 出处:网络
The Python community has published helpful reference material showing how to profile Python code, and the technical details of Python extensions in C or in Cython. I am still searching for tutorials w

The Python community has published helpful reference material showing how to profile Python code, and the technical details of Python extensions in C or in Cython. I am still searching for tutorials which show, however, for non-triv开发者_Python百科ial Python programs, the following:

  1. How to identify the hotspots which will benefit from optimization by conversion to a C extension
  2. Just as importantly, how to identify the hotspots which will not benefit from conversion to a C extension
  3. Finally, how to make the appropriate conversion from Python to C, either using the Python C-API or (perhaps even preferably) using Cython.

A good tutorial would provide the reader with a methodology on how to reason through the problem of optimization by working through a complete example. I have had no success finding such a resource.

Do you know of (or have you written) such a tutorial?

For clarification, I'm not interested in tutorials that cover only the following:

  • Using (c)Profile to profile Python code to measure running times
  • Using tools to examine profiles (I recommend RunSnakeRun)
  • Optimizing by selecting more appropriate algorithms or Python constructs (e.g., sets for membership tests instead of lists); the tutorial should assume the algorithm and Python code is already optimal, and we are at a point where a C extension is the next logical step
  • Recapitulating the Python documentation on writing C extensions, which is already excellent as a reference but not useful as a resource for showing when and how to move from Python to C.


Points 1 and 2 are just basic optimization rule of thumbs. I would be very astonished if there was anywhere the kind of tutorial you are looking for. Maybe that's why you haven't found one. My short list:

  • rule number one of optimization is don't.
  • rule number two measure
  • rule number three identify the limiting factor (if it's IO or database bound, no optimization may be reachable anyway).
  • rule number four is think, use better algorithms and data structure ...
  • considering a change of language is quite low on the list...

Just start by profiling your python code with usual python tools. Find where you code need to be optimized. Then try to optimize it sticking with python. If it is still too slow, try to understand why. If it's IO bound it is unlikely a C program would be better. If the problem come from the algorithm it is also unlikely C would perform better. Really the "good" cases where C could help are quite rare, runtime should not be too far away from what you want (like a 2 of 3 times speedup) data structure are simples and would benefit from a low level representation and you really, really need that speedup. In most other cases using C instead of python will be an unrewarding job.

Really it is quite rare calling C code from python is done with performance in mind as a primary goal. More often the goal is to interface python with some existing C code.

And as another other poster said, you would probably be better advised of using cython.

If you still want to write a C module for Python, all necessary is in the official documentation.


O'Reilly has a tutorial (freely available as far as I can tell, I was able to read the whole thing) that illustrates how to profile a real project (they use an EDI parsing project as a subject for profiling) and identify hotspots. There's not too much detail on writing the C extension that will fix the bottleneck in the O'Reilly article. It does, however, cover the first two things that you want with a non-trivial example.

The process of writing C extensions is fairly well documented here. The hard part is coming up with ways to replicate what Python code is doing in C, and that takes something that would be hard to teach in a tutorial: ingenuity, knowledge of algorithms, hardware, and efficiency, and considerable C skill.

Hope this helps.


For points 1 and 2, I would use a Python profiler, for example cProfile. See here for a quick tutorial.

If you've got an already existing python program, for point 3 you might want to consider using Cython. Of course, rather than re-writing in C, you may be able to think up an algorithmic improvement that will increase execution speed.


I will try to address your points 1 and 2, and your first 3 bullet points, but not in order.

The third bullet point says "assume the algorithm and python code is already optimal". When code is in that state, if one takes stack samples (as outlined here), the samples show exactly what the program is doing, from a time perspective, and there seems to be nothing that could be improved without language change. However, since you know how it is spending its time, you know which low-level algorithm (which could consist of more than one function, not just a hotspot) could benefit by being made to take less time, i.e. by being converted to C.

Regarding point 1, this method shows which parts of the code will benefit by conversion to C, and they may or may not be hotspots. (The first thing that comes to mind is any sort of recursive function or set of functions. Or, a small group of functions that together accomplish some purpose, such as a hill-climber.)

Regarding point 2, any code which does not appear on a healthy percent of stack samples, or which does but clearly will not benefit by being converted to C, such as I/O.

Regarding the first and second bullet points, I would agree that measuring is not the primary objective, but a by-product of the process of finding the code to optimize. Presenting such measurements also is beside the point.

I have been in similar situations, except not between python and C, but between C and hardware.**

Just to give an example, if the total run time is 10 seconds, and the algorithm is on the stack roughly 50% of the time, then it is responsible for roughly 5 of the 10 seconds. If converting the algorithm to C would give a 10x speedup, then that 5 seconds would shrink to 0.5 seconds, so the overall time would shrink to 5.5 seconds. (Roughly - it's more important to achieve the time reduction than to know in advance precisely how big it will be.) Notice, at this point, the whole process could be repeated, and it might make sense to convert something else to C also. You can stop this process when samples show that the python code is doing what it's good at, and the C code is doing what it's good at.

** e.g. Floating-point math, library vs. chip, or graphics, drawing text & polygons.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号