Fastest way to read numerical values from text file in C++ (double in this case)_问答_开发者

Currently, my code is simply this:

void ReadFile(double Cst[][1000], char* FileName, int height)

FILE* ifp;
double value;
int nRead = 0;
int mRead = 0;

//open the file, check if successful
ifp = fopen( FileName, "r" );
if (ifp==NULL){
    ...
}


for (nRead开发者_C百科 = 0; nRead < height; nRead++){
    for (mRead = 0; mRead < 1000; mRead++){
        fscanf(ifp, "%le",&value);
        Cst[nRead][mRead]=value;
    }
}

fclose(ifp);

What can I change to make it the fastest possible?

Boost.Spirit.QI comes with a benchmark that compares the performance of std::atof, std::strtod, and boost::spirit::qi::double_. Here are the results on my system, using VC++ 2010 SP1 x64 and Boost 1.46.1:

atof_test: 4.1579 seconds
strtod_test: 4.2339 seconds
spirit_qi_double_test: 1.2822 seconds

This puts Spirit.QI at 230% faster than the next fastest verifiable^* option and 224% faster than the next fastest unverifiable option – pretty fast, I'd say!

_{* Unlike std::atof, std::strtod and Boost.Spirit will let you know whether or not the input was valid.}

Update: I've rerun the benchmark, additionally using Boost.Spirit.X3's boost::spirit::x3::double_; here are the results on my present system, using VC++ 2015 Update 3 x64 and Boost 1.61.0:

atof_test: 2.2874 seconds
strtod_test: 2.2923 seconds
spirit_qi_double_test: 0.4849 seconds
spirit_x3_double_test: 0.4308 seconds

This puts Spirit.QI at 373% faster than the next fastest verifiable option and 372% faster than the next fastest unverifiable option, and Spirit.X3 at 432% faster than the next fastest verifiable option and 431% faster than the next fastest unverifiable option – things have improved significantly for Spirit, and on top of that, the X3-based code compiles in about ⅕ of the time as the QI-based code, so wins all around there as well!

Additionally, I've benchmarked the code in @Potatoswatter's answer (modified with double-precision exponent table and support for negative numbers (code)), @6502's answer, and @Mehrdad's answer, with the same build and test environment. Here are the results (@6502's code excluded as half of my sample inputs use scientific notation, which his code does not support):

potatoswatter_test: 0.2358 seconds
mehrdad_test: 0.3415 seconds

If all inputs are converted to fixed notation, we can test @6502's code as well:

atof_test: 3.6249 seconds
strtod_test: 3.7023 seconds
spirit_qi_double_test: 1.0763 seconds
spirit_x3_double_test: 2.3657 seconds
potatoswatter_test: 0.8347 seconds
6502_test: 4.1463 seconds
mehrdad_test: 1.3471 seconds

One note of interest is that QI fails to parse some very long fixed-notation inputs; X3 parses these correctly, but runs significantly slower than with short scientific-notation inputs.

For an example, here is a very fast number parser from one of my projects. It only handles a very small subset of the actual features of the Standard Library numeric parsing.

uint64_t mystrtol( char *&pen, uint64_t val = 0 ) {
    for ( char c; ( c = *pen ^ '0' ) <= 9; ++ pen ) val = val * 10 + c;
    return val;
}

value_t mystrtof( char *&pen ) {
    static value_t const exp_table[]
     = { 1e5, 1e4, 1e3, 1e2, 10, 1, 0.1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7, 1e-8, 1e-9, 1e-10, 1e-11, 1e-12, 1e-13, 1e-14, 1e-15, 1e-16, 1e-17 },
     * exp_lookup = & exp_table[ 5 ];

    while ( iswspace( * ++ pen ) ) ;
    //if ( *pen == '-' ) ++ pen; // don't think we ever care about negative numbers
    uint64_t val = mystrtol( pen );
    int neg_exp = 0;
    if ( *pen == '.' ) { // mainly happens when val = 0
        char const *fracs = ++ pen;
        val = mystrtol( pen, val );
        neg_exp = pen - fracs;
    }
    if ( ( *pen | ('E'^'e') ) == 'e' ) {
        neg_exp += *++pen == '-'? mystrtol( ++ pen ) : - mystrtol( ++ pen );
    }
    return val * exp_lookup[ neg_exp ];
}

C/C++ parsing numbers from text is very slow. Streams are horribly slow but even C number parsing is slow because it's quite difficult to get it correct down to the last precision bit.

In a production application where reading speed was important and where data was known to have at most three decimal digits and no scientific notation I got a vast improvement by hand-coding a floating parsing function handling only sign, integer part and any number of decimals (by "vast" I mean 10x faster compared to strtod).

If you don't need exponent and the precision of this function is enough this is the code of a parser similar to the one I wrote back then. On my PC it's now 6.8 times faster than strtod and 22.6 times faster than sstream.

double parseFloat(const std::string& input)
{
    const char *p = input.c_str();
    if (!*p || *p == '?')
        return NAN_D;
    int s = 1;
    while (*p == ' ') p++;

    if (*p == '-') {
        s = -1; p++;
    }

    double acc = 0;
    while (*p >= '0' && *p <= '9')
        acc = acc * 10 + *p++ - '0';

    if (*p == '.') {
        double k = 0.1;
        p++;
        while (*p >= '0' && *p <= '9') {
            acc += (*p++ - '0') * k;
            k *= 0.1;
        }
    }
    if (*p) die("Invalid numeric format");
    return s * acc;
}

atof is probably much faster, it doesn't have to process the format string.

If you don't need to support all 1001 recognized input formats (with and without exponents, etc) then a custom function may be faster yet. If atof is still too slow for you, say so and I can clean up the code I use (it's not really suitable for public posting at the moment).

I just remembered the problem with atof -- it doesn't tell you where the number ended, so reading several numbers in sequence is difficult. strtod is better in that regard.

A fast way is to allocate a text buffer or string, read as much as you can into the string, then parse the string.

Your first bottleneck is file I/O. Second (in order) is converting text into numbers. You should profile your program as to whether sscanf or std::istringstream is faster. Modifications to the I/O portion will yield the biggest performance changes.

To make the process even faster, using multiple threads and double buffering. One thread reads data into one or more buffers, while another thread parses data out of those buffers.

Additional improvements can be made by changing the data to fixed size fields and records.

For C++, working with streams is both much easier and nearly always much slower than using the C interfaces. However, I suspect that the speed of various C interfaces will depend on their implementation. atof may be faster than strtod on one platform, and slower on another.

Personally, I would look at fast ways to read the file, not necessarily fast ways to parse the doubles. And your fastest ways to read files are almost always platform specific APIs (memory mapped files, scatter/gather I/O, etc.). So it's very hard to give you an answer that will be the fastest way possible, because that's very platform specific and will change in the future.

My version that handles exponents too:

template<class It>
double mystrtod(It &s, It const end)
{
    static double const pow10[] = { 1E-323, 1E-322, 1E-321, 1E-320, 1E-319, 1E-318, 1E-317, 1E-316, 1E-315, 1E-314, 1E-313, 1E-312, 1E-311, 1E-310, 1E-309, 1E-308, 1E-307, 1E-306, 1E-305, 1E-304, 1E-303, 1E-302, 1E-301, 1E-300, 1E-299, 1E-298, 1E-297, 1E-296, 1E-295, 1E-294, 1E-293, 1E-292, 1E-291, 1E-290, 1E-289, 1E-288, 1E-287, 1E-286, 1E-285, 1E-284, 1E-283, 1E-282, 1E-281, 1E-280, 1E-279, 1E-278, 1E-277, 1E-276, 1E-275, 1E-274, 1E-273, 1E-272, 1E-271, 1E-270, 1E-269, 1E-268, 1E-267, 1E-266, 1E-265, 1E-264, 1E-263, 1E-262, 1E-261, 1E-260, 1E-259, 1E-258, 1E-257, 1E-256, 1E-255, 1E-254, 1E-253, 1E-252, 1E-251, 1E-250, 1E-249, 1E-248, 1E-247, 1E-246, 1E-245, 1E-244, 1E-243, 1E-242, 1E-241, 1E-240, 1E-239, 1E-238, 1E-237, 1E-236, 1E-235, 1E-234, 1E-233, 1E-232, 1E-231, 1E-230, 1E-229, 1E-228, 1E-227, 1E-226, 1E-225, 1E-224, 1E-223, 1E-222, 1E-221, 1E-220, 1E-219, 1E-218, 1E-217, 1E-216, 1E-215, 1E-214, 1E-213, 1E-212, 1E-211, 1E-210, 1E-209, 1E-208, 1E-207, 1E-206, 1E-205, 1E-204, 1E-203, 1E-202, 1E-201, 1E-200, 1E-199, 1E-198, 1E-197, 1E-196, 1E-195, 1E-194, 1E-193, 1E-192, 1E-191, 1E-190, 1E-189, 1E-188, 1E-187, 1E-186, 1E-185, 1E-184, 1E-183, 1E-182, 1E-181, 1E-180, 1E-179, 1E-178, 1E-177, 1E-176, 1E-175, 1E-174, 1E-173, 1E-172, 1E-171, 1E-170, 1E-169, 1E-168, 1E-167, 1E-166, 1E-165, 1E-164, 1E-163, 1E-162, 1E-161, 1E-160, 1E-159, 1E-158, 1E-157, 1E-156, 1E-155, 1E-154, 1E-153, 1E-152, 1E-151, 1E-150, 1E-149, 1E-148, 1E-147, 1E-146, 1E-145, 1E-144, 1E-143, 1E-142, 1E-141, 1E-140, 1E-139, 1E-138, 1E-137, 1E-136, 1E-135, 1E-134, 1E-133, 1E-132, 1E-131, 1E-130, 1E-129, 1E-128, 1E-127, 1E-126, 1E-125, 1E-124, 1E-123, 1E-122, 1E-121, 1E-120, 1E-119, 1E-118, 1E-117, 1E-116, 1E-115, 1E-114, 1E-113, 1E-112, 1E-111, 1E-110, 1E-109, 1E-108, 1E-107, 1E-106, 1E-105, 1E-104, 1E-103, 1E-102, 1E-101, 1E-100, 1E-099, 1E-098, 1E-097, 1E-096, 1E-095, 1E-094, 1E-093, 1E-092, 1E-091, 1E-090, 1E-089, 1E-088, 1E-087, 1E-086, 1E-085, 1E-084, 1E-083, 1E-082, 1E-081, 1E-080, 1E-079, 1E-078, 1E-077, 1E-076, 1E-075, 1E-074, 1E-073, 1E-072, 1E-071, 1E-070, 1E-069, 1E-068, 1E-067, 1E-066, 1E-065, 1E-064, 1E-063, 1E-062, 1E-061, 1E-060, 1E-059, 1E-058, 1E-057, 1E-056, 1E-055, 1E-054, 1E-053, 1E-052, 1E-051, 1E-050, 1E-049, 1E-048, 1E-047, 1E-046, 1E-045, 1E-044, 1E-043, 1E-042, 1E-041, 1E-040, 1E-039, 1E-038, 1E-037, 1E-036, 1E-035, 1E-034, 1E-033, 1E-032, 1E-031, 1E-030, 1E-029, 1E-028, 1E-027, 1E-026, 1E-025, 1E-024, 1E-023, 1E-022, 1E-021, 1E-020, 1E-019, 1E-018, 1E-017, 1E-016, 1E-015, 1E-014, 1E-013, 1E-012, 1E-011, 1E-010, 1E-009, 1E-008, 1E-007, 1E-006, 1E-005, 1E-004, 1E-003, 1E-002, 1E-001, 1E+000, 1E+001, 1E+002, 1E+003, 1E+004, 1E+005, 1E+006, 1E+007, 1E+008, 1E+009, 1E+010, 1E+011, 1E+012, 1E+013, 1E+014, 1E+015, 1E+016, 1E+017, 1E+018, 1E+019, 1E+020, 1E+021, 1E+022, 1E+023, 1E+024, 1E+025, 1E+026, 1E+027, 1E+028, 1E+029, 1E+030, 1E+031, 1E+032, 1E+033, 1E+034, 1E+035, 1E+036, 1E+037, 1E+038, 1E+039, 1E+040, 1E+041, 1E+042, 1E+043, 1E+044, 1E+045, 1E+046, 1E+047, 1E+048, 1E+049, 1E+050, 1E+051, 1E+052, 1E+053, 1E+054, 1E+055, 1E+056, 1E+057, 1E+058, 1E+059, 1E+060, 1E+061, 1E+062, 1E+063, 1E+064, 1E+065, 1E+066, 1E+067, 1E+068, 1E+069, 1E+070, 1E+071, 1E+072, 1E+073, 1E+074, 1E+075, 1E+076, 1E+077, 1E+078, 1E+079, 1E+080, 1E+081, 1E+082, 1E+083, 1E+084, 1E+085, 1E+086, 1E+087, 1E+088, 1E+089, 1E+090, 1E+091, 1E+092, 1E+093, 1E+094, 1E+095, 1E+096, 1E+097, 1E+098, 1E+099, 1E+100, 1E+101, 1E+102, 1E+103, 1E+104, 1E+105, 1E+106, 1E+107, 1E+108, 1E+109, 1E+110, 1E+111, 1E+112, 1E+113, 1E+114, 1E+115, 1E+116, 1E+117, 1E+118, 1E+119, 1E+120, 1E+121, 1E+122, 1E+123, 1E+124, 1E+125, 1E+126, 1E+127, 1E+128, 1E+129, 1E+130, 1E+131, 1E+132, 1E+133, 1E+134, 1E+135, 1E+136, 1E+137, 1E+138, 1E+139, 1E+140, 1E+141, 1E+142, 1E+143, 1E+144, 1E+145, 1E+146, 1E+147, 1E+148, 1E+149, 1E+150, 1E+151, 1E+152, 1E+153, 1E+154, 1E+155, 1E+156, 1E+157, 1E+158, 1E+159, 1E+160, 1E+161, 1E+162, 1E+163, 1E+164, 1E+165, 1E+166, 1E+167, 1E+168, 1E+169, 1E+170, 1E+171, 1E+172, 1E+173, 1E+174, 1E+175, 1E+176, 1E+177, 1E+178, 1E+179, 1E+180, 1E+181, 1E+182, 1E+183, 1E+184, 1E+185, 1E+186, 1E+187, 1E+188, 1E+189, 1E+190, 1E+191, 1E+192, 1E+193, 1E+194, 1E+195, 1E+196, 1E+197, 1E+198, 1E+199, 1E+200, 1E+201, 1E+202, 1E+203, 1E+204, 1E+205, 1E+206, 1E+207, 1E+208, 1E+209, 1E+210, 1E+211, 1E+212, 1E+213, 1E+214, 1E+215, 1E+216, 1E+217, 1E+218, 1E+219, 1E+220, 1E+221, 1E+222, 1E+223, 1E+224, 1E+225, 1E+226, 1E+227, 1E+228, 1E+229, 1E+230, 1E+231, 1E+232, 1E+233, 1E+234, 1E+235, 1E+236, 1E+237, 1E+238, 1E+239, 1E+240, 1E+241, 1E+242, 1E+243, 1E+244, 1E+245, 1E+246, 1E+247, 1E+248, 1E+249, 1E+250, 1E+251, 1E+252, 1E+253, 1E+254, 1E+255, 1E+256, 1E+257, 1E+258, 1E+259, 1E+260, 1E+261, 1E+262, 1E+263, 1E+264, 1E+265, 1E+266, 1E+267, 1E+268, 1E+269, 1E+270, 1E+271, 1E+272, 1E+273, 1E+274, 1E+275, 1E+276, 1E+277, 1E+278, 1E+279, 1E+280, 1E+281, 1E+282, 1E+283, 1E+284, 1E+285, 1E+286, 1E+287, 1E+288, 1E+289, 1E+290, 1E+291, 1E+292, 1E+293, 1E+294, 1E+295, 1E+296, 1E+297, 1E+298, 1E+299, 1E+300, 1E+301, 1E+302, 1E+303, 1E+304, 1E+305, 1E+306, 1E+307, 1E+308 };
    long long b = 0, e1 = 0, e2 = 0;
    bool is_exp = false;
    do
    {
        bool negate = s != end && *s == '-';
        if (s != end && (*s == '-' || *s == '+')) { ++s; }
        bool decimal = false;
        long long &r = is_exp ? e2 : b;
        while (s != end && (*s == '.' || '0' <= *s && *s <= '9'))
        {
            if (*s != '.')
            {
                e1 -= decimal;
                char const digit = *s - '0';
                if (static_cast<unsigned long long>(r) < static_cast<unsigned long long>(r) * 10 + static_cast<unsigned char>(digit))
                {
                    r *= 10;
                    r += digit;
                }
            }
            else { decimal = true; }
            ++s;
        }
        r = negate ? -r : +r;
    } while ((is_exp = !is_exp, is_exp) && s != end && ((*s | ('e' ^ 'E')) == 'e') && (++s, is_exp));
    double const result = b * pow10[323 + (e1 + e2)];
    return result;
}

It seems to me you've written a lot of brittle code in the hope that it'll be super efficient. Before attempting all this code, have you even attempted the simple C++ idiomatic solution and determined that it's not fast enough?

std::ifstream input("/path/to/file");
if ( !input.is_open() ) {
    // handle error.
}
std::vector<double> numbers;
std::copy(std::istream_iterator<double>(input),
    std::istream_iterator(), std::back_inserter(numbers));

// Access item at position (i,j).
double x = numbers[1000*j+i];

Keep in mind that the developers behind your standard library vendor's implementation give their best at making this simple code as fast as possible. It's very likely you'll be able to reach your performance requirements with this trivial piece of code.

On top of that, you get a bunch of freebies: