Optimizing for space instead of speed in C++_问答_开发者

When you say "optimization", people tend to think "speed". But what about embedded systems where speed isn't all that critical, but memory is a major constraint? What are some guidelines, techniques, and tricks that can be used for shaving off those extra kilobytes in ROM and RAM? How does one "profile" code to see where the memory bloat is?

P.S. One could argue that "prematurely" optimizing for space in embedded systems isn't all that evil, because you leave yourself more room for data storage and feature creep. It also allows you to cut hardware production costs because your code can run on smaller ROM/RAM.

P.P.S. References to articles and books开发者_如何转开发 are welcome too!

P.P.P.S. These questions are closely related: 404615, 1561629

My experience from an extremely constrained embedded memory environment:

Use fixed size buffers. Don't use pointers or dynamic allocation because they have too much overhead.
Use the smallest int data type that works.
Don't ever use recursion. Always use looping.
Don't pass lots of function parameters. Use globals instead. :)

There are many things you can do to reduce your memory footprints, I'm sure people have written books on the subject, but a few of the major ones are:

Compiler options to reduce code size (including -Os and packing/alignment options)
Linker options to strip dead code
If you're loading from flash (or ROM) to ram to execute (rather than executing from flash), then use a compressed flash image, and decompress it with your bootloader.
Use static allocation: a heap is an inefficient way to allocate limited memory, and if it might fail due to fragmentation if it is constrained.
Tools to find the stack high-watermark (typically they fill the stack with a pattern, execute the program, then see where the pattern remains), so you can set the stack size(s) optimally
And of course, optimising the algorithms you use for memory footprint (often at expense of speed)

A few obvious ones

If speed isn't critical, execute the code directly from flash.
Declare constant data tables using const. This will avoid the data being copied from flash to RAM
Pack large data tables tightly using the smallest data types, and in the correct order to avoid padding.
Use compression for large sets of data (as long as the compression code doesn't outweigh the data)
Turn off exception handling and RTTI.
Did anybody mention using -Os? ;-)

Folding knowledge into data

One of the rules of Unix philosophy can help make code more compact:

Rule of Representation: Fold knowledge into data so program logic can be stupid and robust.

I can't count how many times I've seen elaborate branching logic, spanning many pages, that could've been folded into a nice compact table of rules, constants, and function pointers. State machines can often be represented this way (State Pattern). The Command Pattern also applies. It's all about the declarative vs imperative styles of programming.

Log codes + binary data instead of text

Instead of logging plain text, log event codes and binary data. Then use a "phrasebook" to reconstitute the event messages. The messages in the phrasebook can even contain printf-style format specifiers, so that the event data values are displayed neatly within the text.

Minimize the number of threads

Each thread needs it own memory block for a stack and TSS. Where you don't need preemption, consider making your tasks execute co-operatively within the same thread (cooperative multi-tasking).

Use memory pools instead of hoarding

To avoid heap fragmentation, I've often seen separate modules hoard large static memory buffers for their own use, even when the memory is only occasionally required. A memory pool could be used instead so the the memory is only used "on demand". However, this approach may require careful analysis and instrumentation to make sure pools are not depleted at runtime.

Dynamic allocation only at initialization

In embedded systems where only one application runs indefinitely, you can use dynamic allocation in a sensible way that doesn't lead to fragmentation: Just dynamically allocate once in your various initialization routines, and never free the memory. reserve() your containers to the correct capacity and don't let them auto-grow. If you need to frequently allocate/free buffers of data (say, for communication packets), then use memory pools. I once even extended the C/C++ runtimes so that it would abort my program if anything tried to dynamically allocate memory after the initialization sequence.

As with all optimization, first optimize algorithms, second optimize the code and data, finally optimize the compiler.

I don't know what your program does, so I can't advice on algorithms. Many others have written about the compiler. So, here's some advice on code and data:

Eliminate redundancy in your code. Any repeated code that's three or more lines long, repeated three times in your code, should be changed to a function call.
Eliminate redundancy in your data. Find the most compact representation: merge read-only data, and consider using compression codes.
Run the code through a regular profiler; eliminate all code that isn't used.

Generate a map file from your linker. It will show how the memory is allocated. This is a good start when optimizing for memory usage. It also will show all the functions and how the code-space is laid out.

Here's a book on the subject Small Memory Software: Patterns for systems with limited memory.

Compile in VS with /Os. Often times this is even faster than optimizing for speed anyway, because smaller code size == less paging.

Comdat folding should be enabled in the linker (it is by default in release builds)

Be careful about data structure packing; often time this results in the compiler generated more code (== more memory) to generate the assembly to access unaligned memory. Using 1 bit for a boolean flag is a classic example.

Also, be careful when choosing a memory efficient algorithm over an algorithm with a better runtime. This is where premature optimizations come in.

Ok most were mentioned already, but here is my list anyway:

Learn what your compiler can do. Read compiler documentation, experiment with code examples. Check settings.
Check generated code at target optimization level. Sometimes results are surprising and often it turns out optimization actually slows things down (or just take too much space).
choose suitable memory model. If you target really small tight system, large or huge memory model might not be the best choice (but usually easisest to program for...)
Prefer static allocation. Use dynamic allocation only on startup or over statically allocated buffer (pool or maximum instance sized static buffer).
Use C99 style data types. Use smallest sufficient data type, for storage types. Local variables like loop variables are sometimes more efficient with "fast" data types.
Select inline candidates. Some parameter heavy function with relatively simple bodies are better off when inlined. Or consider passing structure of parameters. Globals are also option, but be careful - tests and maintenance can become difficult if anyone in them isn't disciplned enough.
Use const keyword well , be aware of array initialization implications.
Map file, ideally also with module sizes. Check also what is included from crt (is it really neccessary?).
Recursion just say no (limited stack space)
Floating point numbers - prefer fixed point math. Tends to include and call a lot of code (even for simple addition or multiplication).
C++ you should know C++ VERY WELL. If you don't, program constrainted embedded systems in C, please. Those who dare must be careful with all advanced C++ constructs (inheritance, templates, exceptions, overloading, etc.). Consider close to HW code to be rather Super-C and C++ is used where it counts: in high level logic, GUI, etc.
Disable whatever you don't need in compiler settings (be it parts of libraries, language constructs, etc.)

Last but not least - while hunting for smallest possible code size - don't overdo it. Watch out also for performance and maintainability. Over-optimized code tends to decay very quickly.

Firstly, tell your compiler to optimize for code size. GCC has the -Os flag for this.

Everything else is at the algorithmic level - use similar tools that you would for finding memory leaks, but instead look for allocs and frees that you could avoid.

Also take a look at commonly used data structure packing - if you can shave a byte or two off them, you can cut down memory use substantially.

If you're looking for a good way to profile your application's heap usage, check out valgrind's massif tool. It will let you take snapshots of your app's memory usage profile over time, and you can then use that information to better see where the "low hanging fruit" is, and aim your optimizations accordingly.

Profiling code or data bloat can be done via map files: for gcc see here, for VS see here.
I have yet to see a useful tool for size profiling though (and don't have time to fix my VS AddIn hack).

on top what others suggest:

Limit use of c++ features, write like in ANSI C with minor extensions. Standard (std::) templates use a large system of dynamic allocation. If you can, avoid templates altogether. While not inherently harmful, they make it way too easy to generate lots and lots of machine code from just a couple simple, clean, elegant high-level instructions. This encourages writing in a way that - despite all the "clean code" advantages - is very memory hungry.

If you must use templates, write your own or use ones designed for embedded use, pass fixed sizes as template parameters, and write a test program so you can test your template AND check your -S output to ensure the compiler is not generating horrible assembly code to instantiate it.

Align your structures by hand, or use #pragma pack

{char a; long b; char c; long d; char e; char f; } //is 18 bytes, 
{char a; char c; char d; char f; long b; long d; } //is 12 bytes.

For the same reason, use a centralized global data storage structure instead of scattered local static variables.

Intelligently balance usage of malloc()/new and static structures.

If you need a subset of functionality of given library, consider writing your own.

Unroll short loops.

for(i=0;i<3;i++){ transform_vector[i]; }

is longer than

transform_vector[0];
transform_vector[1];
transform_vector[2];

Don't do that for longer ones.

Pack multiple files together to let the compiler inline short functions and perform various optimizations Linker can't.

Don't be afraid to write 'little languages' inside your program. Sometimes a table of strings and an interpreter can get a LOT done. For instance, in a system I've worked on, we have a lot of internal tables, which have to be accessed in various ways (loop through, whatever). We've got an internal system of commands for referencing the tables that forms a sort of half-way language that's quite compact for what it gets donw.

But, BE CAREFUL! Know that you are writing such things (I wrote one accidentally, myself), and DOCUMENT what you are doing. The original developers do NOT seem to have been conscious of what they were doing, so it's much harder to manage than it should be.

Optimizing is a popular term but often technically incorrect. It literally means to make optimal. Such a condition is never actually achieved for either speed or size. We can simply take measures to move toward optimization.

Many (but not all) of the techniques used to move toward minimum time to a computing result sacrifices memory requirement, and many (but not all) of the techniques used to move toward minimum memory requirement lengthens the time to result.

Reduction of memory requirements amounts to a fixed number of general techniques. It is difficult to find a specific technique that does not neatly fit into one or more of these. If you did all of them, you'd have something very close to the minimal space requirement for the program if not the absolute minimum possible. For a real application, it could take a team of experienced programmers a thousand years to do it.

Remove all redundancy from stored data, including intermediates.
Remove all need for storing data that could be streamed instead.
Allocate only the number of bytes needed, never a single more.
Remove all unused data.
Remove all unused variables.
Free data as soon as it is no longer possibly needed.
Remove all unused algorithms and branches within algorithms.
Find the algorithm that is represented in the minimally sized execution unit.
Remove all unused space between items.

This is a computer science view of the topic, not a developer's one.

For instance, packing a data structure is an effort that combines (3) and (9) above. Compressing data is a way to at least partly achieve (1) above. Reducing overhead of higher level programming constructs is a way to achieve some progress in (7) and (8). Dynamic allocation is an attempt to exploit a multitasking environment to employ (3). Compilation warnings, if turned on, can help with (5). Destructors attempt to assist with (6). Sockets, streams, and pipes can be used to accomplish (2). Simplifying a polynomial is a technique to gain ground in (8).

Understanding of the meaning of nine and the various ways to achieve them is the result of years of learning and checking memory maps resulting from compilation. Embedded programmers often learn them more quickly because of limited memory available.

Using the -Os option on a gnu compiler makes a request to the compiler to attempt to find patterns that can be transformed to accomplish these, but the -Os is an aggregate flag that turns on a number of optimization features, each of which attempts to perform transformations to accomplish one of the 9 tasks above.

Compiler directives can produce results without programmer effort, but automated processes in the compiler rarely correct problems created by lack of awareness in the writers of the code.

Bear in mind the implementation cost of some C++ features, such as virtual function tables and overloaded operators that create temporary objects.

Along with that everyone else said, I'd just like to add don't use virtual functions because with virtual functions a VTable must be created which can take up who knows how much space.

Also watch out for exceptions. With gcc, I don't believe there is a growing size for each try-catch block(except for 2 function calls for each try-catch), but there is a fixed size function which must be linked in which could be wasting precious bytes