开发者

Python lists with STL like interface

开发者 https://www.devze.com 2022-12-15 01:19 出处:网络
I have to port a C++ STL application to Python. I am a Python newbie, but have been programming for over a decade. I have a great deal of experience with the STL, and find that it keeps me hooked to u

I have to port a C++ STL application to Python. I am a Python newbie, but have been programming for over a decade. I have a great deal of experience with the STL, and find that it keeps me hooked to using C++. I have been searching for the following items on Google these past several days:

  1. Python STL (in hope of leveraging my years of STL experience)
  2. Python linked lists
  3. Python advanced list usage
  4. Python list optimization
  5. Python开发者_如何学编程 ordered sets

And have found posts about the above topic, tutorials on Python lists that are decidedly NOT advanced, or dead ends. I am really surprised at my lack of success, I think I am just burned out from overworking and entering bad search terms!

(MY QUESTION) Can I get a Python STL wrapper, or an interface to Python lists that works like the STL? If not can someone point me to a truly advanced tutorial or paper on managing very large sorted collections of non trivial objects?

P.S. I can easily implement workarounds for one or two uses, but if management wants to port more code, I want to be ready to replace any STL code I find with equivalent Python code immediately. And YES I HAVE MEASURED AND DO NEED TO HAVE TOTALLY OPTIMAL CODE! I CANT JUST DO REDUNDANT SORTS AND SEARCHES!

(ADDENDUM) Thanks for the replies, I have checked out some of the references and am pleased. In response to some of the comments here:

1 - It is being ported to python because managements says so, I would just as soon leave it alone - if it aint broke, why fix it?

2 - Advanced list usage with non trivial objects, what I mean by that is: Many different ways to order and compare objects, not by one cmp method. I want to splice, sort, merge, search, insert, erase, and combine the lists extensively. I want lists of list iterators, I want to avoid copying.

3 - I now know that built in lists are actually arrays, and I should be looking for a different python class. I think this was the root of my confusion.

4 - Of course I am learning to do things in the Python way, but I also have deadlines. The STL code I am porting is working right, I would like to change it as little as possible, because that would introduce bugs.

Thanks to everyone for their input, I really appreciate it.


Python's "lists" are not linked lists -- they're like Java ArrayLists or C++'s std::vectors, i.e., in lower-level terms, a resizable compact array of pointers.

A good "advanced tutorial" on such subjects is Hettinger's Core Python containers: under the hood presentation (the video at the URL is of the presentation at an Italian conference, but it's in English; another, shorter presentation of essentially the same talk is here).

So the performance characteristics of Python lists are essentially those of C++'s std::vector: Python's .append, like C++'s push_back, is O(1), but insertion or removal "in the middle" is O(N). Consequently, keeping a list sorted (as can be easily done with the help of functions in Python's standard library module bisect) is costly (if items arrive and/or depart randomly, each insertion and removal is O(N), just like similarly maintaining order in an std::vector would be. For some purposes, such as priority queues, you may get away with a "heap queue", also easy to maintain with the help of functions in Python's standard library module heapq -- but of course that doesn't afford the same range of uses as a completely sorted list (or vector) would.

So for purposes for which in C++ you'd use a std::set (and rely on its being ordered, i.e., a hashset wouldn't do -- Python's sets are hash-based, not ordered) you may be better off avoiding Python builtin containers in favor of something like this module (if you need to keep things pure-Python), or this one (which offers AVL trees, not RB ones, but is coded as a C-implemented Python extension and so may offer better performance) if C-coded extensions are OK.

If you do end up using your own module (be it pure Python, or C-coded), you may, if you wish, give it an STL-like veneer/interface (with .begin, .end, iterator objects that are advanced by incrementing rather than, as per normal Python behavior, by calling their next methods, ...), although it will never perform as well as "going with the grain" of the language would (the for statement is optimized to use normal Python iterators, i.e., one with next methods, and it will be faster than wrapping a somewhat awkward while around non-Python-standard, STL-like iterators).

To give an STL-like veneer to any Python built-in container, you'll incur substantial wrapping overhead, so the performance hit may be considerable. If you, as you say, "DO NEED TO HAVE TOTALLY OPTIMAL CODE", using such a veneer just for "syntax convenience" purposes would therefore seem to be a very bad choice.

Boost Python, the Python extension package that wraps the powerful C++ Boost library, might perhaps serve your purposes best.


If I were you I would take the time to learn how to properly use the various data structures available in Python instead of looking for things that are similar to what you know from C++.

It's not like you're looking for something fancy, just working with some data structures. In that case I would refer you to Python's documentation on the subject.

Doing this the 'Python' way would help you and more importantly future maintainers who will wonder why you try to program C++ in Python.

Just to whet your appetite, there's also no reason to prefer STL's style to Python (and for the record, I'm also a C++ programmer who knows STL throughly), consider the most trivial example of constructing a list and traversing it:

The Pythonic way:

mylist = [1, 2, 3, 4]

for value in mylist:
    # playaround with value

The STL way (I made this up, to resemble STL) in Python:

mylist = [1, 2, 3, 4]
mylistiter = mylist.begin()

while mylistiter != mylist.end():
    value = mylistiter.item()
    mylistiter.next()


For linked-list-like operations people usually use collections.deque.

What operations do you need to perform fast? Bisection? Insertion?


I would say that your issues go beyond just STL porting. Since the list, dict, and set data structures, which are bolted on to C++ via the STL, are native to core Python, then their usage is incorporated into common Python code idioms. If you want to give Google another shot, try looking for references for "Python for C++ Programmers". One of your hits will be this presentation by Alex Martelli. It's a little dated, from way back in ought-three, but there is a side-by-side comparison of some basic Python code that reads through a text file, and how it would look using STL.

From there, I would recommend that you read up on these Python features:

  • iterators
  • generators
  • list and generator comprehensions

And these builtin functions:

  • zip
  • map

Once you are familiar with these, then you will be able to construct your own translation/mapping between STL usage and Python builtin data structures.

As others have said, if you are looking for a "plug-and-chug" formula to convert STL C++ code to Python, you will just end up with bad Python. Such a brute force approach will never result in the power, elegance, and brevity of a single-line list comprehension. (I had this very experience when introducing Python to one of our managers, who was familiar with Java and C++ iterators. When I showed him this code:

numParams = 1000
paramRequests = [ ("EqptEmulator/ProcChamberI/Sensors", 
                   "ChamberIData%d"%(i%250)) for i in range(numParams) ]
record.internalArray = [ParameterRequest(*pr) for pr in paramRequests]

and I explained that these replaced this code (or something like it, this might be a mishmash of C++ and Java APIs, sorry):

std::vector<ParameterRequest> prs = new std::vector<ParameterRequest>();
for (int i = 0; i<1000; ++i) {
    string idstr;
    strstream sstr(idstr);
    sstr << "ChamberIData" << (i%250);
    prs.add(new ParameterRequest("EqptEmulator/ProcChamberI/Sensors", idstr));
}
record.internalArray = new ParameterRequest[prs.size];
prs.toArray(record.internalArray);

One of your instincts from working with C++ will be a reluctance to create new lists from old, but rather to update or filter a list in place. We even see this on many forums from Python developers asking about how to modify a list while iterating over it. In Python, you are much better off building a new list from the old with a list comprehension.

allItems = [... some list of items, perhaps from a database query ...]
validItems = [it for it in allItems if it.isValid()]

As opposed to:

validItems = []
for it in allItems:
    if it.isValid():
        validItems.add(it)

or worse:

# get list of indexes of items to be removed
removeIndexes = []
for i in range(len(allItems)):
    if not allItems[i].isValid():
        removeIndexes.add(i)

# don't forget to remove items in descending order, or later indexes
# will be invalidated by earlier removals
sort(removeIndexes,reverse=True)

# copy list
validItems = allItems[:]

# now remove the items from allItems
for idx in removeIndexes:
    del validItems[i]


Python STL (in hope of leveraging my years of STL experience) - Start with the collections ABC's to learn what Python has. http://docs.python.org/library/collections.html

Python linked lists. Python lists have all the features you would want from a linked list.

Python advanced list usage. What does this mean?

Python list optimization. What does this mean?

Python ordered sets. You have several choices here; you could invent your own "ordered set" as a list that discards duplicates. You can subclass the heapq and add methods that discard duplicates: http://docs.python.org/library/heapq.html.

In many cases, however, the cost of maintaing an ordered set is actually excessive because it must only be ordered once at the end of the algorithm. In other cases, the "ordered set" really is a heapq -- you never needed the set-like features and only needed the ordering.

Non-Trivial. (I'm guessing at what you meant by "non-trivial"). All Python objects are equivalent. There's no "trivial" vs. "non-trivial" objects. They're all first-class objects and can all have "non-trivial" complexity without any real work. This is not C++ where there are primitive (non-object) values floating around. Everything's an object in Python.

Management Expectations. For the most part the C++ brain-cramping doesn't exist in Python. Use the obvious Python classes the obvious way and you'll have much less code. The reduction in code volume is the big win. Often, the management reason for converting C++ to Python is to get rid of the C++ complexity.

Python code will be much simpler, making it much more reliable and much easier to maintain.

While it's generally true that Python is slower than C++, it's also true that picking the right algorithm and data structure can have dramatic improvements on performance. In one benchmark, someone found that Python was actually faster than C because the C program had such a poorly chosen data structure.

It's possible that your C++ has a really poor algorithm and you will see comparable performance from Python.

It's also possible that your C++ program is I/O bound, or has other limitations that will leave the Python running at a comparable speed.


The design of Python is quite intentionally "you can use just a few data structures (arrays and hash tables) for whatever you want to do, and if that isn't fast enough there's always C".

Python's standard library doesn't have a sorted-list data structure like std::set. You can download a red/black tree implementation or roll your own. (For small data sets, just using a list and periodically sorting it is a totally normal thing to do in Python.)

Rolling your own linked list is very easy.

0

精彩评论

暂无评论...
验证码 换一张
取 消