Is there any command in stl that converts ascii data to the integer form of its hex representation? such as: "abc" -> 0x616263.
i have the most basic way i can think of:
uint64_t tointeger(std::string){
std::string str = "abc";
uint64_t value = 0; // allow max of 8 chars
for(int x = 0; x < str.size(); x++)
value = (value << 8) + str[x];
return value;
}
as stated above: tointeger("abc");
returns the value 0x616263
but this is too slow. and because i have to use this function hundreds of thousands of times, it has slowed down my program significantly. there are 4 or 5 functions that rely on this one, and each of those are called thousands of times, in addition to this function being called thousands of times
what is a faster way to do this?
You want to pack ASCII characters from a string into a 64-bit integer.
Since std::string is not an intrinsic type, for safety, copy the data into a buffer:
uint_64 values[100]; // Allocate memory on a 64-bit boundary.
char * p = (char *) values; // Point to the memory as characters.
std::string example("beethoven");
std::copy(example.c_str(), p, example.length();
The copying is more safe as far as alignments go. To be faster, but more dangerous, just avoid the copy:
uint_64 danger;
danger = *((uint_64 *) example.c_str());
The std::string::c_str
method returns a pointer to a c-style string representation of the text, but the text is not guaranteed to last forever, thus the need to copy. Also, the pointer is only guaranteed to be on a character alignment. Thus if it happens to reside at address 0x1003, the processor may generate an alighnment fault (or slow down because it has to fetch at an un-aligned boundary).
Edit 1:
This method does not take into consideration Endianness. The method uses the Endianness of the platform. Changing Endianness will slow the performance.
Have you tried multi character constants? ie
int value = 'abc';
EDIT: rereading the question it looks like the intention is a BCD-esque conversion for up to an 8-character string, except using 8 bits instead of 4 for each character.
Your approach looks reasonable, or you could use memcpy (string as-is on big-endian, you'd have to reverse the string on little-endian).
However if this is a performance bottleneck for you I think you may wish to reconsider why you need to do this hundreds of thousands of times. Perhaps a fundamental change to the algorithm would yield a far greater performance increase than trying to micro-optimize a conversion. For example, store the values internally as uint64_t
and only convert to string form when needed for display/interface. Alternately just store it permanently as a string and eliminate the need to convert it into the pseudo-BCD format.
The fastest way to do something is not to do it at all.
Maybe you can store your data as integers, and only convert it to strings when you have to? Would you still need to convert the data hundreds of thousands of times?
If you really must, I'd probably use a simple fixed-size array (not a string) and unroll the loop. But this is a micro-optimisation, in most cases it's better just to find a different way to do what you're trying to do.
If you had constraints on how your string was stored you could cast the data directly to an int or long. If you knew your strings were padded at the end with NULL (0) bytes to at least an 8 byte alignment then the following would work.
uint64_t value = *(*unint64_t)str;
There is nothing inherently inefficient about your current code snippet. The operations are not slow. Since the max amount of characters you allow is 8 you can use a switch case and loop unrolling.
uint64_t value = 0;
switch(str.size()) {
case 0:
value = 0;
break;
case 1: // the 2nd char is a null anyways
case 2:
value = *(*uint16_t)str;
break;
case 3: // the 4th char would be null
case 4:
value = *(*uint32_t)str;
break;
case 5:
case 6:
value = *(*uint32_t)str + *((*uint16_t)(str+4));
break;
case 7:
case 8:
default: // 8 or more do the first 8
value = *(*uint64_t)str;
break;
}
Because we use the switch case statement the compiled code will be a jump table instead of a loop (where each iteration would require a comparison operation). Also because we cast the memory to a different type we don't need to loop through each string character/byte separately. MEMORY VALUE 0x8000 0x65,0x66,0x67,0x00 -> "abc",0 The size is 3 but the null terminator makes it 4 bytes long so we can cast the memory value directly to a uint32.
I don't code in c++ so hopefully the casting semantics are correct.
精彩评论