开发者

Understanding C built-in library function implementations

开发者 https://www.devze.com 2023-03-19 17:37 出处:网络
So I was going through K&R second edition doing the exercises. Feeling pretty confident after doing few exercises I thought I\'d check the actual implementations of these functions. It was then my

So I was going through K&R second edition doing the exercises. Feeling pretty confident after doing few exercises I thought I'd check the actual implementations of these functions. It was then my confidence fled the scene. I could not understand any of it.

For example I check the getchar():

Here is the prototype in libio/stdio.h

extern int getchar (void);

So I follow it through it and gets this:

__STDIO_INLINE int
getchar (void)
{
  return _IO_getc (stdin);
}

Again I follow it to the libio/getc.c:

int
_IO_getc (fp)
     FILE *fp;
{
  int result;
  CHECK_FILE (fp, EOF);
  _IO_acquire_lock (fp);
  result = _IO_getc_unlocked (fp);
  _IO开发者_如何学运维_release_lock (fp);
  return result;
}

And I'm taken to another header file libio/libio.h, which is pretty cryptic:

#define _IO_getc_unlocked(_fp) \
       (_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) \
    ? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++)

Which is where I finally ended my journey.

My question is pretty broad. What does all this mean? I could not for the life of me figure out anything logical out of it by looking at the code. Looks like a bunch of codes abstracted away layers after layer.

More importantly when does it really get the character from stdin


_IO_getc_unlocked is an inlinable macro. The idea is that you can get a character from the stream without having to call a function, making it hopefully fast enough to use in tight loops, etc.

Let's take it apart one layer at a time. First, what is _IO_BE?

/usr/include/libio.h:# define _IO_BE(expr, res) __builtin_expect ((expr), res)

_IO_BE is a hint to the compiler, that expr will usually evaluate to res. It's used to structure code flow to be faster when the expectation is true, but has no other semantic effect. So we can get rid of that, leaving us with:

#define _IO_getc_unlocked(_fp) \
  ( ( (_fp)->_IO_read_ptr >= (_fp)->_IO_read_end ) \
    ? __uflow(_fp) : *(unsigned char *)(_fp)->_IO_read_ptr++) )

Let's turn this into an inline function for clarity:

inline int _IO_getc_unlocked(FILE *fp) {
  if (_fp->_IO_read_ptr >= _fp->_IO_read_end)
    return __uflow(_fp);
  else
    return *(unsigned char *)(_fp->_IO_read_ptr++);
}

In short, we have a pointer into a buffer, and a pointer to the end of the buffer. We check if the pointer is outside the buffer; if not, we increment it and return whatever character was at the old value. Otherwise we call __uflow to refill the buffer and return the newly read character.

As such, this allows us to avoid the overhead of a function call until we actually need to do IO to refill the input buffer.

Keep in mind that standard library functions can be complicated like this; they can also use extensions to the C language (such as __builtin_expect) that are NOT standard and may NOT work on all compilers. They do this because they need to be fast, and because they can make assumptions about what compiler they're using. Generally speaking your own code should not use such extensions unless absolutely necessary, as it'll make porting to other platforms more difficult.


Going from pseudo-code to real code we can break it down:

if (there is a character in the buffer)
  return (that character)
else
   call a function to refill the buffer and return the first character
end

Let's use the ?: operator:

#define getc(f) (is_there_buffered_stuff(f) ? *pointer++ : refill())

A bit closer:

#define getc(f) (is_there_buffered_stuff(f) ? *f->pointer++ : refill(f))

Now we are almost there. To determine if there is something buffered already, it uses the file structure pointer and a read pointer within the buffer

 _fp->_IO_read_ptr >= _fp->_IO_read_end ?

This actually tests the opposite condition to my pseudo-code, "is the buffer empty", and if so, it calls __uflow(_fp) // "underflow", otherwise, it just reaches directly into the buffer with a pointer, gets the character, and then increments the pointer:

? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++)


I can highly recommend The Standard C Library by P.J. Plauger. He provides background on the standard and provides an implementation of every function. The implementation is simpler than what you'll see in glibc or a modern C compiler, but does still make use of macros like the _IO_getc_unlocked() you posted.

The macro is going to pull a character from buffered data (which could be the ungetc buffer) or read it from the stream (which may read and buffer multiple bytes).


The reason there is a standard library is that you should not need to know the exact implantation details of these functions. The code that implements the library calls at some point has to use nonstandard system calls which have to deal with issues you may not be concerned with. If you are learning C make sure you can understand other C programs besides the stdlib once you get a little more advance look at the stdlib, but it still won't make alot of sense until you understand the system calls involved.


The definition of getchar() redefines the request as a specific request for a character from stdin.

The definition of _IO_getc() does a sanity check to make sure that the FILE* exists and is not an End-Of-File, then it locks the stream to prevent other threads from corrupting the call to _IO_getc_unlocked().

The macro definition of _IO_getc_unlocked() simply checks to see if the read pointer is at or past the end of file point, and either calls __uflow if it is, or returns the char at the read pointer if it is not.

This is standard stuff for all stdlib implementations. You are not supposed to ever look at it. In fact, many stdlib implementations will use assembly language for optimal processing, which is even more cryptic.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号