I'm trying to extract functions and function headers from some source code files. Here's an example of the type of code:
################################################################################
# test module
#
# Description : Test module
#
DATABASE test
###
# Global Vars
GLOBALS
DEFINE G_test_string STRING
END GLOBALS
###
# Modular Vars
DEFINE M_counter INTEGER
###
# Constants
CONSTANT MAX_ARR_SIZE = 100
##################################
# Alternative header
##################################
FUNCTION test_function_1()
DEFINE F_x INTEGER
LET F_x = 1
RETURN F_x
END FUNCTION
###################################
# Function:
# This is a test function
#
# Parameters:
# in - test
#
# Returns:
# out - result
#
FUNCTION test_function_2( P_in_var )
DEFINE P_in_var INTEGER
DEFINE F_out_var INTEGER
LET F_out_var = P_in_var
RETURN F_out_var
END FUNCTION
FUNCTION test_init_array()
DEFINE F_array ARRAY[ MAX_ARR_SIZE ] OF INTEGER
DEFINE F_element INTEGER
FOR F_element = 1 TO MAX_ARR_SIZE
LET F_array[ F_element ] = F_element * F_element
END FOR
END FUNCTION
Functions may or may not have a header above them. I'm trying to capture the function source, function header, function name and any parameters passed into the function in groups. Here's the expression i came up with (i'm doing this using .Net regex and have been testing using Regex Hero):
^([#]{0,1}.*?)(FUNCTION\s+(.*?)[(](.*?)[)].*?END FUNCTION)
This seems to work ok for all but the first function (test_function_1) in the file. The initial grouping for test_function_1 is capturing everything from the first line (the top of the source file) until the FUNCTION of test_function_1 begins. I realise this is because there are #s for other comments in the file, but i only want to capture the func开发者_运维百科tion header.
If I see it correctly, you have problems identifying lines starting with #.
To achieve this, you could turn on the RegexOptions.Multiline
flag and match the function header with
((?:^#.*\s)*)
Edit:
For this to work, you'd have to switch OFF RegexOptions.Singleline
and replace .*?
with [\s\S]*?
in your function body part.
精彩评论