Ignoring characters in a file while parsing_问答_开发者

Ignoring characters in a file while parsing

开发者 https://www.devze.com 2022-12-27 04:23 出处：网络

i need to parse through a text file and process the data. the valid data is usually denoted by either a timestamp with TS followed by 10 numbers (TS1040501134) or values with a alpabet followed by nine numbers (A098098098)...so it will be like TS1040501134A111111111B222222222...........TS1020304050A000000000........

However, there are cases when there will be filler 0s when there is no data. So, such a case might be

00000000000000000000TS1040501134A111111111B2222222220000000000TS1020304050A000000000........`

Now as we can see I need to ignore the开发者_如何学编程se zeros. how might i do this? I am using gnu C.

My first attempt at something 'C' like in about 20 years... So what follows is, at best, pseudo-code!

Read in a line of text, then...

char timestamp[11]; timestamp[10] = '\0';    
char number[10]; number[9] = '\0';    

for (i = 0 ; i < strlen(text); ) {
  if isAlpha(text[i]) {
     if text[i] == 'T' & text[i+1] == 'S' {
        memcpy(timestamp, text[i+2], 10)
        /* do whatever you do with a timestamp */
        i += 12 /* Skip over timestamp */
     } else {
        memcpy(number, text[i+1], 9)
        /* do whatever you do with a number */
        i += 10 /* Skip over number */
     }
   } else {
     if text[i] != '0' {
        /* handle the error - should not get here */
     }
     i++  /* move to next character */
   }

If lines do not have to contain complete strings (eg. one line ends with TS10405 and the next line begins with 01134), you will have to write extra code to manage refreshing the text buffer properly.

You should be able to read the file into a string, then use strnstr() to locate the "TS" substring in it. The string strnstr() returns will be the start of your time stamp.

To find the next timestamp, start strnstr on the same buffer at a pointer after the string you just found. If dealing with multiple strings, you'll have to deal with the situation where a single timestamp is split across multiple strings.