Perl bits: Parsing Apache’s Combined log format

A handy dandy regular expression to parse out fields from Apache’s combined log format:

	/^(\S+)\s		# requestor
	(\S+)\s			# ?
	(\S+)\s			# ?
	\[([^\]]*)\]\s		# time
	"([^"]*)"\s		# URL
	(\d*)\s			# result
	(\d*)\s			# bytes
	"([^"]*)"\s		# referrer
    "([^"]*)"$/x    		# user agent
This entry was posted in Uncategorized and tagged by Adam. Bookmark the permalink.

3 thoughts on “Perl bits: Parsing Apache’s Combined log format

  1. Thanks for this, very useful. I updated the code to fill in your question makrs:

    /^
    (\S+)\s # requestor
    (\S+)\s # RFC 1413 identity of the client determined by identd (highly unreliable – do not use)
    (\S+)\s # http userid
    \[([^\]]*)\]\s # time
    “([^"]*)”\s # URL
    (\d*)\s # result
    (\d*)\s # bytes
    “([^"]*)”\s # referrer
    “([^"]*)” # user agent
    $/x;

  2. user agent may (and sometimes does) contain escaped quotes, eg “\”Mozilla/4.0\”", so I’d suggest “(.*)” for that

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>