Analyzing Packet Captures with Python

For most situations involving analysis of packet captures, Wireshark is the tool of choice. And for good reason too - Wireshark provides an excellent GUI that not only displays the contents of individual packets, but also analysis and statistics tools that allow you to, for example, track individual TCP conversations within a pcap, and pull up related metrics.

There are situations, however, where the ability to process a pcap programmatically becomes extremely useful. Consider:

given a pcap that contains hundreds of thousands of packets, find the first connection to a particular server/service where the TCP SYN-ACK took more than 300ms to appear after the initial SYN
in a pcap that captures thousands of TCP connections between a client and several servers, find the connections that were prematurely terminated because of a RST sent by the client; at that point in time, determine how many other connections were in progress between that client and other servers
you are given two pcaps, one gathered on a SPAN port on an access switch, and another on an application server a few L3 hops away. At some point the application server sporadically becomes slow (retransmits on both sides, TCP windows shrinking etc.). Prove that it is (or is not) because of the network.
repeat the above exercises several times a week (or several times a day) with different sets of packet captures

In all these cases, it is immensely helpful to write a custom program to parse the pcaps and yield the data points you are looking for.

It is important to realize that we are not precluding the use of Wireshark; for example, after your program locates the proverbial needle(s) in the haystack, you can use that information (say a packet number or a timestamp) in Wireshark to look at a specific point inside the pcap and gain more insight.

So, this is the topic of this blog post: how to go about programmatically processing packet capture (pcap) files.

What programming language?

I will be using Python (3). Why Python? Apart from the well-known benefits of Python (open-source, relatively gentle learning curve, ubiquity, abundance of modules and so forth), it is also the case that Network Engineers are gaining expertise in this language and are using it in other areas of their work (device management and monitoring, workflow applications etc.).

What modules?

I will be using scapy, plus a few other modules that are not specific to packet processing or networking (argparse, pickle, pandas).

Note that there are other alternative Python modules that can be used to read and parse pcap files, like pyshark and pycapfile. Pyshark in particular is interesting because it simply leverages the underlying tshark installed on the system to do its work, so if you are in a situation where you need to leverage tshark’s powerful protocol decoding ability, pyshark is the way to go. In this blog however I am restricting myself to regular Ethernet/IPv4/TCP packets, and I can just use scapy.

The code

A few notes before we start

The code below was written and executed on Linux (Linux Mint 18.3 64-bit), but the code is OS-agnostic; it should work as well in other environments, with little or no modification.

In this post I use an example pcap file captured on my computer.

Step 1: Program skeleton

Build a skeleton for the program. This will also serve to check if your Python installation is OK.

Use the argparse module to get the pcap file name from the command line. If your argparse knowledge needs a little brushing up, you can look at my argparse recipe book, or at any other of the dozens of tutorials on the web.

Analyzing Packet Captures with Python Part 1 Figure 1

You will notice from the graph that the window size shows a sudden dip to some value between 400000 and 500000 shortly after timestamp 21.1. If you find this suspicious, you can again write more code to help you narrow down the exact packet number in the capture:

Analyzing Packet Captures with Python Part 1 Figure 2

Summary

With Python code, you can iterate over the packets in a pcap, extract relevant data, and process that data in ways that make sense to you. You can use code to go over the pcap and locate a specific sequence of packets (i.e. locate the needle in the haystack) for later analysis in a GUI tool like Wireshark. Or you can create customized graphical plots that can help you visualize the packet information. Further, since this is all code, you can do this repeatedly with multiple pcaps.