This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Impr开发者_如何学Goove this questionRegisters are the fastest memories in a computer. So if we want to build a computer with just registers and not even caches is it possible? I think of even replacing the magnetic discs with registers although they are naturally volatile memories. Do we have some nonvolatile registers for that use? It would become so fast! I'm just wondering if that could be happen or not?
The very short answer is yes, you could in theory, but it doesn't really work in real life. Let me explain...
The reason the memory hierarchy exists is because those small and fast memory stores are very expensive per a bit (registers), while the big and slow memory stores are very cheap per a bit (hard drives).
Another reason why huge numbers of registers are highly impractical is because the instructions need to reference the memory location. When you only have a handful of registers, you can store the register (or registers) number and an opcode in a handful of bits, which means that low numbers of registers make for short and fast instructions. If you're going to have a multi-gigabyte collection of registers, you will need to be able to reference them in instructions, and these will be much longer (and therefore slower) instructions. Keep in mind that if everything was a register, some things would be much faster, but by having a smaller number of registers, certain things (i.e., most of what you do with a computer) are much faster.
Having vast numbers of registers would also add a great amount of complexity to the hardware which processes the reading and writing to registers, which would make everything slower.
Of course, while most of us think in terms of computers, there are surely simple devices which do only have registers, but they would also only have a very limited amount of memory, and aren't made for general purpose computation.
You may also be interested to my answer to Assembly: Why are we bothering with registers?
Registers are fast because most of the registers are connected directly to most of the functional units. While a program is loading one register, another register is feeding the ALU and yet another register is writing a result from some other functional unit.
Registers are made with logic elements such as flip-flops, so that most of the values from most of the registers are all available at the same time, all the time. This is different from a memory where only a selected address is available at any one time and only a very limited number of read ports is available. Typically, it's just one read circuit.
However this kind of implementation and interconnection is what uses up the die space on the microprocessor. When that is used up, you start adding memory for additional storage.
There have been architectures with extra banks of registers. (SPARC!)
Modern GPUs have about 5MB of registers and very little caches (comparing to CPUs). So yes it is possible to have a processor with lots of registers.
But you still need a memory hierarchy (registers -> scratchpad/caches -> device memory -> CPU memory). Note also that GPUs are completly different beasts in the sense that they are build with massive parallelism goals from day one and that GPUs are not general purpose but coprocessors.
Each GPU thread eats up some registers - the whole GPU program is register allocated - resulting in thousand of threads that can execute/pause/resume in parallel. Threads are used for hiding memory latency on GPUs whereas on CPUs huge caches are used for that purpose. Think of it like Hyper-Threading pushed to the extrem.
The problem with that is registers are present inside the cpu. Since its present in the cpu, its having minimum latency. Also because its lesser in size. When you increase the size, say you consider you build one big processor with lot of transistors (flip-flops) that holds the registers, then the heat dissipation, the energy consumption, the cost, etc will be enormous. Also when the space increase, the latency also increases. So basically there isn't much difference in doing so. Its worse actually.
Most of these answers address whether it would be practical. David Johnstone's also mentions the fact that a register name needs to be mentioned in each instruction that touches it. Further to this, in most modern instruction sets an instruction always has its operand registers coded in it. E.g. there's the mov %eax, %ebx
instruction, and there's the mov %eax, %ecx
instruction. It may so happen that their binary representation appears to look like:
| mov | source reg | dest reg |
| 2 | 3 | 3 |
and differs only in that dest reg
is equal to 3 rather than 2 -- but it also may not! (I haven't checked how these particular instructions are represented in 386, but I recall there are examples in that instruction set of instructions easily broken down into fields like this, and examples where they aren't.)
The problem is that most interesting programs are going to want to operate on locations of information, determined at runtime. E.g. in this iteration of the loop, we want to look at byte 37; the next iteration we will be interested in byte 38, etc.
I won't prove it but I suspect that in order to get anything approaching Turing completeness, your programs would need either:
- instructions that address registers based on the value in some other register, e.g. "Move from register X to register Y where X and Y are indicated by the values in registers 1 and 2.", or
- self modifying code.
At school we had a theoretical computer with 100 registers (plus accumulator), and 10 instructions, each of which was a three digit decimal number. The first digit indicated the operation (load, save, arithmetical, jump, conditional jump, halt), and the last two the register to operate on. Many sample programs could be written for this, like the factorial function. But it soon became apparent that a static program could only operate on a fixed set of data. If you wanted to write a loop to sum the values in a list, you would need a LOAD instruction that pointed to a different input register on each iteration. This meant you would arithmetically calculate the new code for the load instruction each time, and patch the code just prior to running that instruction.
for each register of 32 bit you need at least 9x32 gates of xor. that is a lot of gates.
the bigger problem comes when you want the register data to pass over the bus. which one will hold the bass? you want to add more bass?
lets say we have 10 register, do we do a 10 line bus? meaning we have 10 bus connectors which connects to most of the system? that is a lot of wireing, now you want the register to mean somthing right?
lets just hink how much bass we need for 1kb of data?
1024 bit = 1024*9*32 gates and 1024 bass lines in the cpu.
we know intel is working with 30 nm for one gate. thats 30 million gates , which the gate problem more redandent, but how do you intend to solve the bass problem?
You don't need even registers - it's possible to create something like Turing machine that takes stream of input code and data and produces output stream. This is something like what computers started with.
It is possible, but utterly impractical - even low-end computers today have 2 gigabytes of RAM. How would you handle two billion registers in code (and where would you stuff them, physically)?
Also, what would you do with it, that the speed of RAM (and even processor caches) is a concern? Either run the system off RAM (fast enough), or build a special-purpose processor.
Hot off the rouncer hardware theory plate->
If you manage to link every permutation of the address bits, to the individual words - then you could have a ram register system. imagine if you use nand to make the address groups. (in other words link the opposite of the address to the flop) One not, and youve done the addressing with wires alone + the little not switch, which could be a solenoid type coil which will not the value. then every register ors into the same output - the content pins. And only the address that was past, will get power to the output content pins.
simples.
The reason you get so little register memory is because it is incredibly expensive. This is why we have the memory hierarchy.
精彩评论