开发者

Does sin_addr.s_addr = INADDR_ANY; need htonl at all?

开发者 https://www.devze.com 2023-03-07 23:11 出处:网络
I came across two threads: Socket with recv-timeout: What is wrong with this code? Reading / Writing to a socket using a FILE st开发者_Python百科ream in c

I came across two threads:

Socket with recv-timeout: What is wrong with this code?

Reading / Writing to a socket using a FILE st开发者_Python百科ream in c

one uses htonl and the other doesn't.

Which is right?


Since other constants like INADDR_LOOPBACK are in host byte order, I submit that all the constants in this family should have htonl applied to them, including INADDR_ANY.

(Note: I wrote this answer while @Mat was editing; his answer now also says it's better to be consistent and always use htonl.)

Rationale

It is a hazard to future maintainers of your code if you write it like this:

if (some_condition)
    sa.s_addr = htonl(INADDR_LOOPBACK);
else
    sa.s_addr = INADDR_ANY;

If I were reviewing this code, I would immediately question why one of the constants has htonl applied and the other does not. And I would report it as a bug, whether or not I happened to have the "inside knowledge" that INADDR_ANY is always 0 so converting it is a no-op.

The code you write is not only about having the correct runtime behavior, it should also be obvious where possible and easy to believe it is correct. For this reason you should not strip out the htonl around INADDR_ANY. The three reasons for not using htonl that I can see are:

  1. It may offend experienced socket programmers to use htonl because they will know it does nothing (since they know the value of the constant by heart).
  2. It requires less typing to omit it.
  3. A bogus "performance" optimization (clearly it won't matter).


INADDR_ANY is the "any address" in IPV4. That address is 0.0.0.0 in dotted notation, so 0x000000 in hex on any endianness. Passing it through htonl has no effect.

Now if you want to wonder about other macro constants, look at INADDR_LOOPBACK if it's defined on your platform. Chances are it will be a macro like this:

#define INADDR_LOOPBACK     0x7f000001  /* 127.0.0.1   */

(from linux/in.h, equivalent definition in winsock.h).

So for INADDR_LOOPBACK, an htonl is necessary.

For consistency, it could thus be better to use htonl in all cases.


Neither is right, in the sense that both INADDR_ANY and htonl are deprecated, and lead to complex, ugly code that only works with IPv4. Switch to using getaddrinfo for all of your socket address creation needs:

struct addrinfo *ai, hints = { .ai_flags = AI_PASSIVE|AI_ADDRCONFIG };
getaddrinfo(0, "1234", &hints, &ai);

Replace "1234" with your port number or service name.


Stevens uses htonl(INADDR_ANY) consistently in the book UNIX Network Programming (my copy is from 1990).

The current release version of FreeBSD defines 12 INADDR_ constants in netinet/in.h; 9 of the 12 require htonl() for proper functionality. (The 9 are INADDR_LOOPBACK and 8 other multicast group addresses such as INADDR_ALLHOSTS_GROUP and INADDR_ALLMDNS_GROUP.)

In practice, it makes no difference whether you use INADDR_ANY or htonl(INADDR_ANY), other than the possible performance hit from htonl(). And even that possible performance hit may not exist -- with my 64-bit gcc 4.2.1, turning on any level of optimization at all seems to activate compile-time htonl() conversion of constants.

In theory it would be possible for some implementer to redefine INADDR_ANY to a value where htonl() actually does something, but such a change would break tens of thousands of existing pieces of code out there and wouldn't survive in the "real world"... Too much code exists which depends explicitly or implicitly on INADDR_ANY being defined as some sort of zero-valued integer. Stevens likely didn't intend for anyone to assume that INADDR_ANY is always zero when he wrote:

cli_addr.sin_addr.s_addr = htonl(INADDR_ANY);
cli_addr.sin_port        = htons(0);

In assigning a local address for the client using bind, we set the Internet address to INADDR_ANY and the 16-bit Internet port to zero.


Was going to add this as a comment, but it got a little long-winded ...

I think it's clear from the answers and the commentary here that htonl() needs to be used on these constants (albeit that calling it on INADDR_ANY and INADDR_NONE are tantamount to no-ops). The problem that I see as to where the confusion arises is that it is not explicitly called out in documentation - someone please correct me if I simply missed it, but I have not seen in the man pages, nor in the include header where it explicitly states that the defines for INADDR_* are in host order. Again, not a big deal for INADDR_ANY, INADDR_NONE, and INADDR_BROADCAST, but it is significant for INADDR_LOOPBACK.

Now, I've done quite a bit of low-level socket work in C, but the loopback address rarely, if ever, gets used in my code. Although this topic is over a year old, this very problem just jumped up to bite me in the behind today, and it was because I went on the mistaken assumption that the addresses defined in the include header are in network order. Not sure why I had that idea - probably because the in_addr structure needs to have the address in network order, inet_aton and inet_addr return their values in network order, and so my logical assumption was that these constants would be usable as-is. Throwing together a quick 5-liner to test that theory showed me otherwise. If any of the powers-that-be happen to see this, I would make the suggestion to explicitly call out that the values are, in fact, in host order, not network order, and that htonl() should be applied to them. For consistency's sake, I would also suggest, as others have done so already here, that htonl() be used for all of the INADDR_* values, even if it does nothing to the value.


Let's summarize it a little bit, as none of the previous answers seems to be up to date and I may not be the last person who will see this question page. There have been opinions both for and against usage of htonl around INADDR_ANY constant or avoiding it entirely.

Nowadays (and it's been nowadays for quite some time now) system libraries are mostly IPv6 ready, so we use IPv4 as well as IPv6. The situation with IPv6 is much easier as the data structures and constants don't suffer from byte order. One would use 'in6addr_any' as well as 'in6addr_loopback' (both struct in6_addr type) and both of them are constant objects in the network byte order.

See why IPv6 doesn't suffer from the same problem (if IPv4 addresses were defined as four byte arrays they wouldn't suffer either):

struct in_addr {
    uint32_t       s_addr;     /* address in network byte order */
};

struct in6_addr {
    unsigned char   s6_addr[16];   /* IPv6 address */
};

For IPv4, it would be nice to also have 'inaddr_any' and 'inaddr_loopback' as 'struct in_addr' constants (so that they can also be compared with memcmp or copied with memcpy). Indeed it might be a good idea to create them in your program as they aren't provided by glibc and other libraries:

const struct in_addr inaddr_loopback = { htonl(INADDR_LOOPBACK) };

With glibc, this only works for me inside a function (and I can't make it static), as htonl is not a macro but an ordinary function.

The problem is that glibc (in contrast with what was claimed in other answers) doesn't provide htonl as a macro but rather as a function. Therefore you would have to:

static const struct in_addr inaddr_any = { 0 };
#if BYTE_ORDER == BIG_ENDIAN
static const struct in_addr inaddr_loopback = { 0x7f000001 };
#elif BYTE_ORDER == LITTLE_ENDIAN
static const struct in_addr inaddr_loopback = { 0x0100007f };
#else
    #error Neither big endian nor little endian
#endif

That would be a really nice addition to the headers and then you could work with IPv4 constants as easily as you can with IPv6.

But then to implement that, I had to use some constants to initialize that. When I know the respective bytes exactly, I don't need any constants. Just as some people claim that htonl() is redundant for a constant that evaluates to zero, anyone else could claim that the constant itself is redundant as well. And he would be right.

In the code I prefer to be explicit than implicit. Therefore if those constants (like INADDR_ANY, INADDR_ALL, INADDR_LOOPBACK) are all consistently in host byte order, then it's only correct if you treat them like that. See for example (when not using the above constant):

struct in_addr address4 = { htonl(use_loopback ? INADDR_LOOPBACK : INADDR_ANY };

Of course you could say that you don't need to call htonl for INADDR_ANY and therefore you could:

struct in_addr address4 = { use_loopback ? htonl(INADDR_LOOPBACK) : INADDR_ANY };

But then when ignoring the byte order of the constant because it's zero anyway, then I don't see much logic in using the constant at all. And the same applies to INADDR_ALL, as it's easy to type 0xffffffff as well;

Another way to get around it is to avoid setting those values directly altogether:

struct in_addr address4;

inet_pton(AF_INET, "127.0.0.1", &address4);

This adds a little bit of useless processing but it has no byte order problems and it is virtually the same for IPv4 and IPv6 (you just change the address string).

But the question is why are you doing that at all. If you want to connect() to IPv4 localhost (but sometimes to IPv6 localhost, or just any hostname), getaddrinfo() (mentioned in one of the answers) is much better for that, as:

  1. It is a function used for translating any hostname/service/family/socktype/protocol a to a list of matching struct addrinfo records.

  2. Each struct addrinfo includes a polymorphic pointer to struct sockaddr that you can directly use with connect(). Therefore you don't need to care about the construction of struct sockaddr_in, typecasting (via a pointer) to struct sockaddr, etc.

    struct addrinfo *ai, hints = { .ai_family = AF_INET }; getaddrinfo(0, "1234", &hints, &ai);

    record that in turn include pointers polymorphic struct sockaddr structures which you need for the connect() call.

So, the conclusion is:

1) The standard API fails to provide directly usable struct in_addr constants (instead it provides rather useless unsigned integer constants in host order).

struct addrinfo *ai, hints = { .ai_family = AF_INET, .ai_protocol = IPPROTO_TCP };
int error;

error = getaddrinfo(NULL, 80, &hints, &ai);
if (error)
    ...

for (item = result; item; item = item->ai_next) {
    sock = socket(item->ai_family, item->ai_socktype, item->ai_protocol);

    if (sock == -1)
        continue;

    if (connect(sock, item->ai_addr, item->ai_addrlen) != -1) {
        fprintf(stderr, "Connected successfully.");
        break;
    }

    close(sock);
}

When you are sure your query is selective enough that it only returns one result, you could do (omitting error handling for brevity) the following:

struct *result, hints = { .ai_family = AF_INET, .ai_protocol = IPPROTO_TCP };
getaddrinfo(NULL, 80, &hints, &ai);
sock = socket(result->ai_family, result->ai_socktype, result->ai_protocol);
connect(sock, result->ai_addr, result->ai_addrlen);

If you're afraid getaddrinfo() might be significantly slower than using the constants, the system library is the best place to fix that. A good implementation would just return the requested loopback address when service is null and hints.ai_family is set.


I don't usually like to answer when there is already a "decent" answer. In this case, I am going to make an exception because information I added to these answers is being misconstrued.

INADDR_ANY is defined as an all-zero-bits IPv4 address, 0.0.0.0 or 0x00000000. Calling htonl() on this value will result in the same value, zero. Therefore, calling htonl() on this constant value is not technically necessary.

INADDR_ALL is defined as an all-one-bits IPv4 address, 255.255.255.255 or 0xFFFFFFFF. Calling htonl() with INADDR_ALL will return INADDR_ALL. Again, calling htonl() is not technically necessary.

Another constant defined in the header files is INADDR_LOOPBACK, defined as 127.0.0.1, or 0x7F000001. This address is given in network-byte order, and cannot be passed to the sockets interface without htonl(). You must use htonl() with this constant.

Some would suggest that consistency and code readability demand that programmers use htonl() for any constant named INADDR_* -- because it is required for some of them. These posters are wrong.

An example given in this thread is:

if (some_condition)
    sa.s_addr = htonl(INADDR_LOOPBACK);
else
    sa.s_addr = INADDR_ANY;

Quoting from "John Zwinck":

"If I were reviewing this code, I would immediately question why one of the constants has htonl applied and the other does not. And I report it as a bug, whether or not I happened to have the "inside knowledge" that INADDR_ANY is always 0 so converting it is a no-op. And I think (and hope) many other maintainers would do the same."

If I were receiving such a bug report, I would immediately throw it away. This process would save me a lot of time, fielding bug reports from people who don't have the "basic minimum knowledge" that INADDR_ANY is always 0. (Suggesting that knowing the values of INADDR_ANY et al. somehow violates encapsulation or whatever is another non-starter -- the same numbers are used in the netcat output and inside the kernel. Programmers need to know the actual numerical values. People who don't know aren't lacking inside knowledge, they are lacking basic knowledge of the area.)

Really, if you have a programmer maintaining sockets code, and that programmer doesn't know the bit patterns of INADDR_ANY and INADDR_ALL, you are already in trouble. Wrapping 0 in a macro which returns 0 is the kind of mentality that is a slave to meaningless consistency and doesn't respect domain knowledge.

Maintaining sockets code is about more than understanding C. If you don't understand the difference between INADDR_LOOPBACK and INADDR_ANY at a level compatible with netstat output, then you are dangerous in that code and shouldn't be changing it.

Straw-man arguments proposed by Zwinck regarding the needless use of htonl():

  1. It may offend experienced socket programmers to use htonl because they will know it does nothing (since they know the value of the constant by heart).

This is a straw argument because we have a portrayal that experienced socket programmers know the value of INADDR_ANY by heart. This is like writing that only an experienced C programmer knows the value of NULL by heart. Writing "by heart" gives the impression that the number is slight difficult to memorize, perhaps a few digits, such as 127.0.0.1. But no, we are hyperbolically discussing the difficult of memorizing the patterns named "all zero bits" and "all one bits."

Considering that these numerical values appear in the output of, e.g., netstat and other system utilities, and also considering that some of these values appear in IP headers, there is no such thing as a competent sockets programmer who does not know these values, whether by heart or by brain. In fact, attempting sockets programming without knowing these basics can be dangerous to the network availability.

  1. It requires less typing to omit it.

This argument is intended to be absurd and dismissive, so it doesn't need much refuting.

  1. A bogus "performance" optimization (clearly it won't matter).

It's hard to know where this argument came from. It could be an attempt to supply stupid-seeming arguments to the opposition. In any case, not using the htonl() macro makes no difference to performance when you provide a constant and use a typical C compiler -- the constant expressions are reduced to a constant in either case.


A reason not to use htonl() with INADDR_ANY is that most experienced sockets programmer knows that it is not needed. What's more: those programmers who do not know need to learn. There is no extra "cost" with use of htonl(), the trouble is the cost of establishing a coding standard which fosters ignorance of such critically important values.

By definition, encapsulation fosters ignorance. That very ignorance is the usual benefit of using an encapsulated interface -- knowledge is expensive and finite, therefore encapsulation is usually good. The question becomes: which efforts of programming are best enhanced via encapsulation? Are there programming tasks which are disserved by encapsulation?

It is not technically incorrect to use htonl(), because it has no effect on this value. However, arguments that you should use it may be misleading.

There are those who would argue that a better situation would be one in which the developer did not need to know that INADDR_ANY is all zeroes and so on. This land of ignorance is worse, not better. Consider that these "magic values" are used throughout various interfaces with TCP/IP. For example, when configuring Apache, if you would like to listen only to IPv4 (and not IPv6), you must specify:

Listen 0.0.0.0:80

I have run into programmers who mistakenly supplied the local IP address instead of INADDR_ANY (0.0.0.0) above. These programmers don't know what INADDR_ANY is, and they probably wrap it in htonl() while they are at it. This is the land of abstaction-thinking and encapsulating.

The ideas of "encapsulation" and "abstraction" have been widely accepted and too-widely applied, but they do not always apply. In the domain of IPv4 addressing, it's not appropriate to treat these constant values as "abstract" -- they are converted directly into bits on the wire.


My point is this: there is no "correct" usage of INADDR_ANY with htonl() -- both are equivalent. I would not recommend adopting a requirement that the value be used any particular way, because the INADDR_X family of constants only have four members, and only one of them, INADDR_LOOPBACK has a value which is different depending on byte ordering. It is better to just know this fact than to establish a standard for using the values which turns a "blind eye" to the bit patterns of the values.

In many other APIs, it is valuable for programmers to proceed without knowing the numeric value or bit patterns of constants used by the APIs. In the case of the sockets API, these bit patterns and values are used as input and displayed pervasively. It is better to know these values numerically than to spend time thinking about using htonl() on them.

When programming in C, especially, most "use" of the sockets API involves grabbing some other person's source code, and adapting it. This is another reason it is so important to know what INADDR_ANY is before touching a line which uses it.

0

精彩评论

暂无评论...
验证码 换一张
取 消