Predefined text replacements in C++ and Python_问答_开发者

As a hobby project, I had like to implement a Morse code encoder and decoder in C++ and Python (both). I was wonderi开发者_Python百科ng the right data structure I should use for that. Not only is this question related to this specific project, but in general, when one has to make predefined text replacements, what is the best and the fastest way of doing it?

I would avoid re-inventing any data structure if possible (and I think it is). Please note that this is purely a learning exercise and I have always wondered what would be the best way of doing this. I can store the code and the corresponding character in a dictionary perhaps, then iterate over the text and make replacements. Is this the best way of doing this or can I do better?

from collections import defaultdict

morsecode = [('a','._'), ('b','_...'), ('c','_._.')]
codedict = defaultdict(lambda:' ')
for k,v in morsecode:
    codedict[k] = v

tomorse = lambda x: ' '.join([codedict[chr] for chr in x])

print tomorse('bab cab')

Gives:

_... ._ _...   _._. ._ _...

You could use str.translate:

m = {ord('S'): '---', ord('O'): '...'}
print('S O S'.translate(m))

will print:

--- ... ---

In Python, strings are immutable, so probably (depending on what you're doing with the output), you want to create a list of all the simple substitution results. Something like:

MORSE = {'A': '.-', ...}

def morsify(data):
    return [MORSE[c] for c in data if c in MORSE]

You'd need to get corresondingly fancier if you wanted to support different national versions of Morse code etc.

(Edited to deal with the fact that Morse code is apparently not a prefix code.)

On the Python side the string class' translate function is the way to go. On the C++ side I would go with a std::map to hold the character mapping. Then I would probably use std::for_each to do the look up and swap.

There's no simple optimal structure - for any given fixed mapping, there might be fiendish bit-twiddling optimisations for that precise mapping, that are better or worse on different architectures and different inputs. A map/dictionary should be pretty good all the time, and the code is pretty simple.

My official advice is to stick with that. Lookup performance is very unlikely to be an issue for code like this, since most likely you can easily encode/decode faster than you can input/output.

As it's a learning exercise, and you want to try out different possibilities: for the text -> morse you could use an array rather than a map/dictionary. Perhaps surprisingly, this is difficult to do in C++ and be completely portable. The following assumes that all uppercase letters have char values greater than A, which is not guaranteed by the standard:

std::string encode['Z'-'A'];
encode['A' - 'A'] = ".-";
encode['B' - 'A'] = "-...";
// etc.
encode['Z' - 'A'] = "--..";

If you're willing to assume that your code will only ever run on machines whose basic character set has the letters in a continuous run (true of ASCII, but not EBCDIC), you can tidy it up a bit:

std::string encode[26] = {".-", "-...", /* etc */ "--.."};

To look up the character stored in the variable c:

morse = encode[c - 'A'];

The Python version can assume ASCII (I think), and you'd have to throw in some use of ord.

To cope with anything other than upper case letters, you need either a bigger array (to contain an entry for every possible char value), or else multiple arrays with bounds checks, special case code for punctuation and so on.

The first hit on Google.

Depending on the quantity of text, 'foo'.replace('f', '..-.').replace('o','---') would work.

Unless you're translating thousands of lines of text, you probably won't notice a whole lot of difference in any method that you use - though you can easily use the timeit module to test each different method.

Python (expects a string):

def m(t):
 m=0xBFAFA7AEA1A0B0B8BCBE121A021D11120C41888A642082668040876584434267868D626021618163898B8C
 r=[]
 for c in t.upper():
  val=int((m/(256**(90-ord(c))))%256)
  r.append("".join([str((val>>y)&1) for y in range(val/32-1,-1,-1)]))
 return " ".join(r)