Current work on the parser

At the moment I busy increasing the parsers independence from predefined patterns. The biggest improvement will be the ability to find variable definitions and store the with the corresponding value into a dictionary:

 1 if line.startswith("$"):
 2     # The variable
 3     var = line.partition("=")[0].strip()
 4     # The variable value
 5     val = line.partition("=")[2].strip()
 6     escaped_keys = []
 7     # Preparing the dictionary keys to serve as regexp pattern
 8     for key in self.var_dict.keys():
 9         escaped_keys.append(re.escape(key))
10     seperator = "|" 
11     # If there is any known variable in the dictionary
12     if len(escaped_keys) > 0:
13         # Forming the regexp pattern
14         key_string = seperator.join(escaped_keys)
15         # Composing the regexp pattern
16         var_pattern = re.compile(key_string)
17         # Look for the pattern
18         matched_keys = re.findall(var_pattern, val)
19         # If we found a known variable
20         if matched_keys:
21             for matched_key in matched_keys:
22                 # Replacing the variable with a known value
23                 val = re.sub(re.escape(matched_key), self.var_dict[matched_key], val)
24     # Replacing the function call with a predefined value
25     parsed_val = varparser.parsevar(val)
26     # Adding the new variable value pair to the dictionary
27     var_dict_entry = {var : parsed_val}
28     self.var_dict.update(var_dict_entry)

Having this, we are no longer dependent on patterns composed by variable names chosen by black hats.

Improved Parser

The number of attacks against the Webhoneypot depends strongly on his PHP parser. So keeping the pattern matching mechanism up to date was one of the major future works. One of my goals for the Google Summer of Code time was to improve the parser and to reduce upcoming changes in attack patterns. The old parser was very simple: collect all lines containing echo calls, look for known patterns and generate the appropriate response.

Here a simple example:
The injected file:

1 <?php
2 $un = @php_uname();
3 echo "uname -a: $un<br />";
4 ?>
5 

The parser:

1 for line in file:
2     pattern = "uname" 
3     if re.search(pattern, line):
4         uname = "Linux debian 2.6.8... " 
5         response = "uname -a: " + uname + "<br />" 
6         return response

Looks very simple? But what about Uname, Kernel, UNAME or even zname (Noticed the typo? Remember, they are just kids ;) )? So another goal is to leave pattern matching behind and invest some time and sweat into a cleverer parser.

Things are getting even more complicated when echo calls occurring in functions. A simple example:

1 <?php
2 function echothis($e, $c) { echo "$e: "; echo $c; echo "<br />"; }
3 echothis("uname -a: ", @php_uname());
4 ?>
5 

The new parser I'm using with Glastopf is able to recognize functions with echo calls and parses all calls of this function. Here is a pseudo example how this works:

1 if "function fu(bar)" in line:
2     store function into functionlist
3 for function in functionlist:
4     if "echo" in function:
5         echofunction = function
6 for line in injected_file:
7     if name_of_echofunction in line:
8         sent the line to the echo_parser

So as you see, I'm going to rewrite the PHP parser :) Just kidding, but I will use the next days to replace the variable replacement through pattern matching with a much more generic and powerful approach. I have committed a preliminary version of the new parser into SVN, currently you can choose between the old and clumsy and the new anbrainy parser.

Last but not least I invite you all to play around, comment and rewrite my Honeypot. Most parts are already finished and functional.

Also available in: HTML TXT