We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
This post was edited and submitted for review 4 months ago and failed to reopen the post:
Improve this questionOriginal close reason(s) were not resolved
I am developing a video playback application which will display video files that contain raw planar image data frame by frame. The data it contains is 8-bit rgb (no alpha at this moment). The hardware here only accept interleaved image data. As a result, I need to convert the planar image data into interleaved image data. What I did was to memmove or memcpy on the planar data. However, when processing HD content, this will take up a lot of time since the data is raw data. I have tried implementing with two threads, one for processing the other one for displaying. It turns out that displaying processed interleaved data is really quick but the processing thread is not able to keep up with it. Consequently, the frame rate is heavily affected b开发者_运维知识库ecause of the speed of processing.
I do have idea that to pre-process everything and save it in memory(number of frames of those video clips are relatively small). When needed, I will just display the processed data in the memory. Actually I tested this approach, it is fairly fast(60fps). However, this seems to suboptimal because I will either have a really slow first run or need to wait for some time before playback starts. Moreover, when file size gets big, it is impossible to do that due to memory limitation.
So I am looking for any image processing library or algorithm that does planar->interleaved quickly. I did try gil from boost but the performance is not quite good.
I have had to solve the same problem, but I have the added constraint that I needed perform the conversion "in place" (i.e., I had to leave the image data in the same buffer). In the image below I demonstrate how the pixels need to be moved from a planar to interleaved representation:
So we see we can alter the image "in place" with a sequence of swaps.
Here is my C++ implementation which runs in linear time. The template
parameter T
is the image channel type (e.g, uint8_t
for byte sized
channels).
#include <vector>
#include <cstdint>
#include <algorithm>
template <typename T>
void planarToInterleaved(int numPixels, int numChannels, T pixels[]) {
const int size = numPixels * numChannels;
std::vector<bool> visited(size);
std::fill(visited.begin(), visited.end(), false);
auto nextUnvisited = [&](int index) -> int {
int i;
for (i = index; i < size && visited[i]; i++)
;
return i;
};
auto interleavedIndex = [=](int planarIndex) -> int {
const int i = planarIndex % numPixels;
const int k = planarIndex / numPixels;
return numChannels*i + k;
};
int J = 0;
int Jnext = 0;
while ( (J = nextUnvisited(Jnext++)) < size ) {
visited[J] = true;
const int Jstart = J;
T tmp = pixels[J];
while ( true ) {
const int I = interleavedIndex(J);
if ( I == J ) break; // 1-node cycle
std::swap(pixels[I],tmp);
if ( I == Jstart ) break;
J = I;
visited[J] = true;
}
}
}
Here I convert a WxH RGB image stored in the buffer image
(which
holds W*H*3 values) from planar to interleaved:
planarToInterleaved(W*H, 3, image);
Anyway, this was fun to figure out.
(Adding code next to my comments above)
This, compiled with g++ 4.2.1 with -O2 on a 2.4GHz Intel Core 2 Duo runs at 2000 frames in under 10 seconds.
int const kWidth = 1920;
int const kHeight = 1080;
for (std::size_t i = 0; i != kWidth*kHeight; ++i) {
interleavedp[i*3+0] = planarp[i+0*kWidth*kHeight];
interleavedp[i*3+1] = planarp[i+1*kWidth*kHeight];
interleavedp[i*3+2] = planarp[i+2*kWidth*kHeight];
}
Note that writing it this way allows the compiler to optimize better. Breaking it up into lines (or 12-byte blocks) only makes things go slower.
libswscale (part of ffmpeg) can do that as far as I know, a good tutorial can be found here
It should be fairly straightforward to write this function using vector intrinsics. I don't know what processor, compiler, or packed pixel format you're using, so I'll give an example implementation using GCC and MMX intrinsics for x86. It should also be easy to translate this code into ARM NEON, PowerPC Altivec, or x86/x64 SSE code.
This should convert RGB planar to 32-bit RGBA packed, although ARGB is actually more common. If you need 24-bit RGB you're going to have to get a bit creative. The "Software Developer Manual" for your processor will be your best friend when writing this small piece of code, and you'll also need to read the documentation for your compiler.
SIMD handles this very well, you can tell by how short the code is below. Note that the code below is actually C99, not C++, as C99 gives access to the restrict
keyword which can reduce the number of loads and stores generated.
Also note that this code has strict alignment requirements.
#include <stddef.h>
#if defined(USE_MMX)
typedef char v8qi __attribute__ ((vector_size(8)));
void pack_planes3(void *dest, const void *src[3], size_t n)
{
v8qi *restrict dp = dest, x, y, zero = { 0, 0, 0, 0, 0, 0, 0, 0 };
const v8qi *restrict sp1 = src[0];
const v8qi *restrict sp2 = src[1];
const v8qi *restrict sp3 = src[2];
size_t i;
for (i = 0; i < n; i += 8) {
x = __builtin_ia32_punpckhbw(*sp1, *sp3);
y = __builtin_ia32_punpckhbw(*sp2, zero);
dp[0] = __builtin_ia32_punpckhbw(x, y);
dp[1] = __builtin_ia32_punpcklbw(x, y);
x = __builtin_ia32_punpcklbw(*sp1, *sp3);
y = __builtin_ia32_punpcklbw(*sp2, zero);
dp[2] = __builtin_ia32_punpckhbw(x, y);
dp[3] = __builtin_ia32_punpcklbw(x, y);
sp1++;
sp2++;
sp3++;
dp += 4;
}
}
#else
/* Scalar implementation goes here */
#endif
There is a Simd Library. It has many algorithms of image conversions. It supports conversion between following image formats : NV12, YUV420P, YUV422P, YUV444P, BGR-24, BGRA-32, HSL-24, HSV-24, Gray-8, Bayer and some other. The algorithms are optimized with using of different SIMD CPU extensions. In particular the library supports following CPU extensions: SSE, SSE2, SSSE3, SSE4.1, SSE4.2, AVX and AVX2 for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC.
精彩评论