I am facing a rather peculiar problem. I am working on a compiler for an architecture that doesn't support bitwise operations. However, it handles signed 16-bit integer arithmetics and I was wondering if it would be possible to implement bitwise operations using only:
- Addition (c = a + b)
- Subtraction (c = a - b)
- Division (c = a / b)
- Multiplication (c = a * b)
- Modulus (c = a % b)
- Minimum (c = min(a, b))
- Maximum (c = max(a, b))
- Comparisons (c = (a < b), c = (a == b), c = (a <= b), et.c.)
- Jumps (goto, for, et.c.)
The bitwise operations I want to be able to support are:
- Or (c = a | b)
- And (c = a & b)
- Xor (c = a ^ b)
- Left Shift (c = a << b)
- Right Shift (c = a >> b)
- (All integers are signed so this is a problem)
- Signed Shift (c = a >>> b)
- One's Complement (a = ~b)
- (Already found a solution, see below)
Normally the problem is the other way around; how to achieve arit开发者_StackOverflowhmetic optimizations using bitwise hacks. However not in this case.
Writable memory is very scarce on this architecture, hence the need for bitwise operations. The bitwise functions themselves should not use a lot of temporary variables. However, constant read-only data & instruction memory is abundant. A side note here also is that jumps and branches are not expensive and all data is readily cached. Jumps cost half the cycles as arithmetic (including load/store) instructions do. On other words, all of the above supported functions cost twice the cycles of a single jump.
Some thoughts that might help:
I figured out that you can do one's complement (negate bits) with the following code:
// Bitwise one's complement
b = ~a;
// Arithmetic one's complement
b = -1 - a;
I also remember the old shift hack when dividing with a power of two so the bitwise shift can be expressed as:
// Bitwise left shift
b = a << 4;
// Arithmetic left shift
b = a * 16; // 2^4 = 16
// Signed right shift
b = a >>> 4;
// Arithmetic right shift
b = a / 16;
For the rest of the bitwise operations I am slightly clueless. I wish the architects of this architecture would have supplied bit-operations.
I would also like to know if there is a fast/easy way of computing the power of two (for shift operations) without using a memory data table. A naive solution would be to jump into a field of multiplications:
b = 1;
switch (a)
{
case 15: b = b * 2;
case 14: b = b * 2;
// ... exploting fallthrough (instruction memory is magnitudes larger)
case 2: b = b * 2;
case 1: b = b * 2;
}
Or a Set & Jump approach:
switch (a)
{
case 15: b = 32768; break;
case 14: b = 16384; break;
// ... exploiting the fact that a jump is faster than one additional mul
// at the cost of doubling the instruction memory footprint.
case 2: b = 4; break;
case 1: b = 2; break;
}
First solutions for shifting (shift is the shift distance, must not be negative, a is the operand to be shifted and contains also the result when done). The power table is used by all three shift operations.
// table used for shift operations
powtab = { 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, -32768 };
// logical shift left
if (shift > 15) {
a = 0; // if shifting more than 15 bits to the left, value is always zero
} else {
a *= powtab[shift];
}
// logical shift right (unsigned)
if (shift > 15) {
a = 0; // more than 15, becomes zero
} else if (shift > 0) {
if (a < 0) {
// deal with the sign bit (15)
a += -32768;
a /= powtab[shift];
a += powtab[15 - shift];
} else {
a /= powtab[shift];
}
}
// arithmetic shift right (signed)
if (shift >= 15) {
if (a < 0) {
a = -1;
} else {
a = 0;
}
} else if (shift > 0) {
if (a < 0) {
// deal with the sign bit
a += -32768;
a /= powtab[shift];
a -= powtab[15 - shift];
} else {
// same as unsigned shift
a /= powtab[shift];
}
}
For AND, OR and XOR i could not come up with a simple solution, so i'll do it with looping over each single bit. There might be a better trick to do this. Pseudocode assumes a and b are input operands, c is the result value, x is the loop counter (each loop must run exactly 16 times):
// XOR (^)
c = 0;
for (x = 0; x <= 15; ++x) {
c += c;
if (a < 0) {
if (b >= 0) {
c += 1;
}
} else if (b < 0) {
c += 1;
}
a += a;
b += b;
}
// AND (&)
c = 0;
for (x = 0; x <= 15; ++x) {
c += c;
if (a < 0) {
if (b < 0) {
c += 1;
}
}
a += a;
b += b;
}
// OR (|)
c = 0;
for (x = 0; x <= 15; ++x) {
c += c;
if (a < 0) {
c += 1;
} else if (b < 0) {
c += 1;
}
a += a;
b += b;
}
Thats assuming that all variables are 16 bits and all operations behave as signed (so a<0 actually is true when bit 15 is set).
EDIT: i actually tested all possible operand values (-32768 to 32767) for shifts ranging from 0 to 31 for correctness and it works correctly (assuming integer divides). For the AND/OR/XOR code an exhaustive test takes too long on my machine, but since the code for these is pretty simple there should be no edge cases anyway.
In this environment it might be best if you could set up to actually use arithmatic operators to peel out components of integers.
E.G.
if (a & 16) becomes if ((a % 32) > 15)
a &= 16 becomes if ((a % 32) < 15) a += 16
The transforms for these operators are obvious enough if you restrict RHS to a constant power of 2.
Peeling off two or four bits is also easy to do.
An incomplete answer on an old question, here concentrating on AND, OR, XOR. Once a solution is found for one of these bitwise operations, the other two can be derived. There are several ways, one is shown in the following test program (compiled on gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)).
In December 2018 I discovered an error in the solution. The XOR commented below only works because intermediate results in a+b-2*AND(a,b)
are promoted to int
, which is larger than 16 bits for all modern compilers.
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
//#define XOR(a,b) (a + b - 2*AND(a,b)) // Error. Intermediate overflow
#define XOR(a,b) (a - AND(a,b) + b - AND(a,b) )
#define IOR(a,b) XOR(XOR(a,b),AND(a,b)) // Credit to Jan Gray, Gray Research LLC, for IOR
static const uint16_t andlookup[256] = {
#define C4(a,b) ((a)&(b)), ((a)&(b+1)), ((a)&(b+2)), ((a)&(b+3))
#define L(a) C4(a,0), C4(a,4), C4(a,8), C4(a,12)
#define L4(a) L(a), L(a+1), L(a+2), L(a+3)
L4(0), L4(4), L4(8), L4(12)
#undef C4
#undef L
#undef L4
};
uint16_t AND(uint16_t a, uint16_t b) {
uint16_t r=0, i;
for ( i = 0; i < 16; i += 4 ) {
r = r/16 + andlookup[(a%16)*16+(b%16)]*4096;
a /= 16;
b /= 16;
}
return r;
}
int main( void ) {
uint16_t a = 0, b = 0;
do {
do {
if ( AND(a,b) != (a&b) ) return printf( "AND error\n" );
if ( IOR(a,b) != (a|b) ) return printf( "IOR error\n" );
if ( XOR(a,b) != (a^b) ) return printf( "XOR error\n" );
} while ( ++b != 0 );
if ( (a & 0xff) == 0 )
fprintf( stderr, "." );
} while ( ++a != 0 );
return 0;
}
You can operate bit-by-bit (as Mark Byers suggested), by extracting every bit which will be slow.
Or you could accelerate process and use 2d lookup tables that store results, say, for two 4-bit operands and operate on those. You'll need less extractions than if you were operating on bits.
You can also do everything using addition, subtraction and >= operation. Every bitwise operation can be unrolled into something like this using macros:
/*I didn't actually compile/test it, it is just illustration for the idea*/
uint16 and(uint16 a, uint16 b){
uint16 result = 0;
#define AND_MACRO(c) \
if (a >= c){ \
if (b >= c){\
result += c;\
b -= c;\
}\
a -= c;\
}\
else if (b >= c)\
b -= c;
AND_MACRO(0x8000)
AND_MACRO(0x4000)
AND_MACRO(0x2000)
AND_MACRO(0x1000)
AND_MACRO(0x0800)
AND_MACRO(0x0400)
AND_MACRO(0x0200)
AND_MACRO(0x0100)
AND_MACRO(0x0080)
AND_MACRO(0x0040)
AND_MACRO(0x0020)
AND_MACRO(0x0010)
AND_MACRO(0x0008)
AND_MACRO(0x0004)
AND_MACRO(0x0002)
AND_MACRO(0x0001)
#undef AND_MACRO
return result;
}
You'll need 3 variables to implement this.
Every bitwise operation will revolve around macros similar to AND_MACRO
- you compare remaining values of a and b to the "mask" (which is "c" parameter). then add mask to the result in the if branch that is suitable for your operation. And you subtract mask from values, if bit is set.
Depending on your platform, it may be faster than extracting every bit using % and / , and then putting it back using multiplication.
See for yourself whichever is better for you.
As long as you're willing for it to be very expensive, yes.
Basically, you'll explicitly put a number into a base-2 representation. You do this just as you would put a number into base-10 (e.g., to print it out), that is, by repeated division.
This turns your number into an array of bools (or ints in the range 0,1), then we add functions to operate on those arrays.
again, not that this is tremendously more expensive than bitwise operations, and that almost any architecture will supply bitwise operators.
In C (of course, in C you have bitwise operators, but...) an implementation might be:
include <limits.h>
const int BITWIDTH = CHAR_BIT;
typedef int[BITWIDTH] bitpattern;
// fill bitpattern with base-2 representation of n
// we used an lsb-first (little-endian) representation
void base2(char n, bitpattern array) {
for( int i = 0 ; i < BITWIDTH ; ++i ) {
array[i] = n % 2 ;
n /= 2 ;
}
}
void bitand( bitpattern op1, bitpattern op2, bitpattern result ) {
for( int i = 0 ; i < BITWIDTH ; ++i ) {
result[i] = op1[i] * op2[i];
}
}
void bitor( bitpattern op1, bitpattern op2, bitpattern result ) {
for( int i = 0 ; i < BITWIDTH ; ++i ) {
result[i] = (op1[i] + op2[i] != 0 );
}
}
// assumes compiler-supplied bool to int conversion
void bitxor( bitpattern op1, bitpattern op2, bitpattern result ) {
for( int i = 0 ; i < BITWIDTH ; ++i ) {
result[i] = op1[i] != op2[i] ;
}
}
Just some other approaches
For example a 16-bit AND:
int and(int a, int b) {
int d=0x8000;
int result=0;
while (d>0) {
if (a>=d && b>=d) result+=d;
if (a>=d) a-=d;
if (b>=d) b-=d;
d/=2;
}
return result;
}
double solution 2-bit AND without loops or table lookups:
int and(int a, int b) {
double x=a*b/12;
return (int) (4*(sign(ceil(tan(50*x)))/6+x));
}
32-bit integer solution 2-bit AND:
int and(int a, int b) {
return ((684720128*a*a -b) * a) % (b+1);
}
16-bit integer solution 2-bit AND:
int and(int a, int b) {
return ((121 * a) % 16) % (b+1);
}
16-bit integer solution 3-bit AND:
int and(int a, int b) {
return sign(a) * ((((-23-a) * (40+b)) % 2)+40+b) % ((10624 * ((((-23-a) * (40+b))%2)+40+b)) % (a%2 - 2 -a) - a%2 + 2 +a);
}
here's a method i came up with to process bitwise XOR 16-bits in parallel using Double-64 integer adds :
[gmn]awk '{ CONVFMT = OFMT = "%.20g"
c = (a=3e15+("1011000111110101"))+
(b=3e15+("1101010010101110"))
sub(/[7]/, "1",c)
gsub(/[268]/ ,"0",c)
sub(/^[^01]+/,"",c); print c }'
The bit-strings look like these (i took out the 3e15
guard digit here for clarity) :
a = 1011 0001 1111 0101
b = 1101 0100 1010 1110
c = 8112 0101 2121 1211 (intermediate)
-------------------------------------------
c = 0110 0101 0101 1011 (output)
one 52-bit unsigned integer add, and barely a handful of string substitution calls, and the output is already in a state that can be passed downstream.
The absolute highest value this add will climb to is 8222,2222,222,222, just shy of the 53-bit hard-limit.
For a bit-wise AND, convert all the 1's, leading 6 or 7, down to 0s : only 2's and the leading 8 are true bits that should then be converted to 1s.
For bit-wise OR, it's the reverse - anything not a 0 or 6 is a "1" in the output string.
For bit-wise complement, even easier - start with 1,111,111,111,111,111, and substract the concatenated bit strings of 2 bytes to obtain it.
精彩评论