开发者

getting the end of an web address in c?

开发者 https://www.devze.com 2023-01-28 16:47 出处:网络
say I pass an argument www.bbc.co.uk/news/world-us-canada-11893886 I need to separate www.bbc.co.uk from /news/world-us-canada-11893886 for a HTTP GET

say I pass an argument www.bbc.co.uk/news/world-us-canada-11893886 I need to separate www.bbc.co.uk from /news/world-us-canada-11893886 for a HTTP GET

I have tried using strtok and strcat but I come across weird splits at开发者_如何转开发 runtime. I can get www.bbc.co.uk just fine using strtok( host, "/");

I have tried using a combination of strtok and strcat to try and get all the rest of the string from the first "/" but i get an output like this... request: da-11893886 tempString: news/world! host: www.bbc.co.uk Path: news/world!da-11893886

If you look at this output, the strangest part is that it always cuts out the middle section. In this case, the "-us-cana"

the section of the code is attached below

// testing purposes
 printf("argv[1]: %s\n", argv[1] );

 host = malloc(sizeof(argv[1]));
 strcpy(host, argv[1]);
 host = strtok(host, "/");

 // get the request
 request = malloc(sizeof(argv[1]) + sizeof(char)*6);

 char *tok, *tempString;
 tempString = malloc(sizeof(argv[1]));

 tok = strtok( NULL, "\0");

 while( tok ) {
  strcpy(tempString, tok);
  printf("request: %s\n", request);
  request = strcat(tempString, request);
  tok = strtok(NULL, "\0");
 }

 printf("host: %s\n", host);
 printf("Path: %s\n", request);

Thanks for looking over this. Any direction or even a link to a site where I can figure out how to do this would be much appreciated.


Here's some code that does more than you want. Note that this modifies the original string - you may want to make copies instead:

void split_request(char *request, char **protocol, char **addr, char **path)
{
  char *ptr = strstr(request, "://");

  if(NULL == ptr)
  {
    *protocol = NULL;
    *addr = request;
  }
  else
  {
    *protocol = request;
    *addr = ptr + 3;
    *ptr = '\0';
  }

  ptr = strchr(*addr, '/');
  if(NULL == ptr)
  {
    *path = NULL;
  }
  else
  {
    *path = ptr + 1;
    *ptr = '\0';
  }
}

Please excuse any typos/obvious errors. I'm typing this in a hurry as I have work to do :P It should get you started though.


 I have modified your code to work the way you are expecting
 main(int argc, char *argv[])
    {
    char *request,*host,*req;
     char *tok, *tempString;
    printf("argv[1]: %s\n", argv[1] );

    host = malloc(strlen(argv[1]));
     strcpy(host, argv[1]);
     host = strtok(host, "/");


     tempString = malloc(strlen(argv[1]));

     tok = strtok( NULL, "\0");
     printf("sizeof(tok) %d\n",strlen(tok));
      strncpy(tempString, tok,strlen(tok));

     while( tok ) {
      tok = strtok(NULL, "\0");
      if (tok != NULL) {
      strncat(tempString, tok,strlen(tok));
      }else {
       break;
      }
     }
     request = tempString;
     printf("host: %s\n", host);
     printf("Path: %s\n", request);
    }
    ~

Output

./tmp www.bbc.co.uk/news/world-us-canada-11893886/tmp.htmlargv[1]: www.bbc.co.uk/news/world-us-canada-11893886/tmp.html
sizeof(tok) 38
host: www.bbc.co.uk
Path: news/world-us-canada-11893886/tmp.html
bash-2.03$ 

~


Use strrchr() to find the last occurrence of '/' from the rear. You will then have a pointer to the start of 'the end of the web address' if you add one to that returned pointer.

Update

Assuming your URL does not start with http://, this aught to work

#include <stdio.h>
#include <string.h>

int main(void)
{
    char url[] = "www.bbc.co.uk/news/world-us-canada-11893886";
    int  cnt;
    char host[100];
    char path[100];
    char request[100];

    strcpy(request, strrchr(url, '/'));

    strcpy(host, url);
    host[cnt = strcspn(url, "/")] = '\0';

    strcpy(path, &url[cnt]);

    printf("host: %s\npath: %s\nrequest: %s\n", host, path, request);

    return 0;
}

Output

$ ./a.out
host: www.bbc.co.uk
path: /news/world-us-canada-11893886
request: /world-us-canada-11893886


strrchr() returns the LAST instance of the character. He wants the FIRST instance after any http:// string.

The answer is simple:

char *address_start = strchr(in_string+8, '/');

If it's non NULl then there you are at the first / of the path.

Why +8? Because "https://" is 8 characters long and even if there is no "http://" at the beginning, no IP or web address is less than 8 characters. Even "a.b.c.d" is 7 characters long and I don't believe an IPv4 dotted numerical notation has any legal public address with all single digits. I might be wrong though. Might be worth validating the string to check it's long enough first.

Anyway, you can always pre-validate the string to see if it begins with "http" or not to determine the offset to start searching at.

0

精彩评论

暂无评论...
验证码 换一张
取 消