Tuesday, February 7, 2017

AND THEN MODIFY THE REQUEST AND RESPONSE

mitmproxy has a powerful scripting API that allows you to modify flows on-the-fly or rewrite previously saved flows locally.
The mitmproxy scripting API is event driven - a script is simply a Python module that exposes a set of event methods. Here's a complete mitmproxy script that adds a new header to every HTTP response before it is returned to the client:
def response(context, flow):
    flow.response.headers["newheader"] = ["foo"]
(examples/add_header.py)
The first argument to each event method is an instance of ScriptContext that lets the script interact with the global mitmproxy state. The response event also gets an instance of Flow, which we can use to manipulate the response itself.
We can now run this script using mitmdump or mitmproxy as follows:
> mitmdump -s add_header.py
The new header will be added to all responses passing through the proxy.

Example Scripts

mitmproxy comes with a variety of example inline scripts, which demonstrate many basic tasks. We encourage you to either browse them locally or in our GitHub repo.

Events

start(ScriptContext, argv)

Called once on startup, before any other events.

clientconnect(ScriptContext, ConnectionHandler)

Called when a client initiates a connection to the proxy. Note that a connection can correspond to multiple HTTP requests.

serverconnect(ScriptContext, ConnectionHandler)

Called when the proxy initiates a connection to the target server. Note that a connection can correspond to multiple HTTP requests.

request(ScriptContext, HTTPFlow)

Called when a client request has been received. The HTTPFlow object is guaranteed to have a non-None request attribute.

responseheaders(ScriptContext, HTTPFlow)

Called when the headers of a server response have been received. This will always be called before the response hook. The HTTPFlow object is guaranteed to have non-None request and response attributes. response.content will be None, as the response body has not been read yet.

response(ScriptContext, HTTPFlow)

Called when a server response has been received. The HTTPFlow object is guaranteed to have non-None request and response attributes. Note that if response streaming is enabled for this response, response.content will not contain the response body.

error(ScriptContext, HTTPFlow)

Called when a flow error has occurred, e.g. invalid server responses, or interrupted connections. This is distinct from a valid server HTTP error response, which is simply a response with an HTTP error code. The HTTPFlow object is guaranteed to have non-None request and error attributes.

clientdisconnect(ScriptContext, ConnectionHandler)

Called when a client disconnects from the proxy.

done(ScriptContext)

Called once on script shutdown, after any other events.

API

The main classes you will deal with in writing mitmproxy scripts are:
libmproxy.proxy.server.ConnectionHandlerDescribes a proxy client connection session. Always has a client_conn attribute, might have a server_conn attribute.
libmproxy.proxy.connection.ClientConnectionDescribes a client connection.
libmproxy.proxy.connection.ServerConnectionDescribes a server connection.
libmproxy.protocol.http.HTTPFlowA collection of objects representing a single HTTP transaction.
libmproxy.protocol.http.HTTPResponseAn HTTP response.
libmproxy.protocol.http.HTTPRequestAn HTTP request.
libmproxy.protocol.primitives.ErrorA communications error.
libmproxy.script.ScriptContextA handle for interacting with mitmproxy's from within scripts.
netlib.odict.ODictA dictionary-like object for managing sets of key/value data. There is also a variant called ODictCaseless that ignores key case for some calls (used mainly for headers).
netlib.certutils.SSLCertExposes information SSL certificates.
The canonical API documentation is the code, which you can browse locally or in our GitHub repo. You can view the API documentation using pydoc (which is installed with Python by default), like this:
> pydoc libmproxy.protocol.http.HTTPRequest

Running scripts in parallel

We have a single flow primitive, so when a script is handling something, other requests block. While that's a very desirable behaviour under some circumstances, scripts can be run threaded by using the libmproxy.script.concurrent decorator.
import time
from libmproxy.script import concurrent


@concurrent  # Remove this and see what happens
def request(context, flow):
    print "handle request: %s%s" % (flow.request.host, flow.request.path)
    time.sleep(5)
    print "start  request: %s%s" % (flow.request.host, flow.request.path)
(examples/nonblocking.py)

Make scripts configurable with arguments

Sometimes, you want to pass runtime arguments to the inline script. This can be simply done by surrounding the script call with quotes, e.g. mitmdump -s "script.py --foo 42". The arguments are then exposed in the start event:
# Usage: mitmdump -s "modify_response_body.py mitmproxy bananas"
# (this script works best with --anticache)
from libmproxy.protocol.http import decoded


def start(context, argv):
    if len(argv) != 3:
        raise ValueError('Usage: -s "modify-response-body.py old new"')
    # You may want to use Python's argparse for more sophisticated argument parsing.
    context.old, context.new = argv[1], argv[2]


def response(context, flow):
    with decoded(flow.response):  # automatically decode gzipped responses.
        flow.response.content = flow.response.content.replace(context.old, context.new)
(examples/modify_response_body.py)

Running scripts on saved flows

Sometimes, we want to run a script on Flow objects that are already complete. This happens when you start a script, and then load a saved set of flows from a file (see the "scripted data transformation" example on the mitmdump page). It also happens when you run a one-shot script on a single flow through the | (pipe) shortcut in mitmproxy.
In this case, there are no client connections, and the events are run in the following order: startrequestresponseheadersresponseerrordone. If the flow doesn't have a response or error associated with it, the matching events will be skipped.

Spaces in the script path

By default, spaces are interpreted as separator between the inline script and its arguments (e.g. -s "foo.py 42"). Consequently, the script path needs to be wrapped in a separate pair of quotes if it contains spaces: -s "'./foo bar/baz.py' 42".

No comments: