At Fox-IT and NCC Group, we are always looking to push our incident response capabilities to the next level. Because no adversary, no matter how high-end, should be beyond our reach. This led to the development of “dissect”, a proprietary enterprise investigation framework that we have now open sourced and shared with the world.
Dissect is the collective name of the many different projects that live in the
dissect.* namespace. Many of these
projects are parsers or implementations for various file formats, such as
dissect.ntfs for parsing NTFS
dissect.hypervisor for parsing many virtual disk formats. However, when we’re talking about
“dissect”, we usually refer to one project in particular:
dissect.target is a host investigation framework made for enterprise forensics. It works on
targets, which is basically any type of source data you may encounter in an
investigation. You don’t have to worry anymore about how you’re going to get something like a registry hive out of an
image, instead you’re able to immediately get usable artefacts and investigation information out of any source data.
This allows you to spend more time on doing the fun and interesting work of an investigation, and less time on the
boring stuff, like extracting files and running a bunch of different tools on them.
Instead of focussing on a single system,
dissect.target is designed to work on many targets at once.
A flexible architecture and plugin system allows our analysts to easily create new analysis capabilities that will
automatically work on all of the supported evidence formats.
Dissect is made up of many different libraries and multiple tools, which consume those libraries. An important decision in the architecture of Dissect was to put libraries front and center, and develop tools on top of those libraries. Each library has a single purpose and shouldn’t “know too much”. For example, an MBR/GPT/LVM2 implementation doesn’t need to know how a filesystem works, and an NTFS implementation doesn’t need to know about partitions or Windows registry files.
By essentially dog feeding ourselves with these libraries, it ensures that each library has a usable API and doesn’t get cluttered. It also allows for incredible flexibility with how you put your code to use. We don’t want to end up in a situation where some functionality is only available in a CLI script. Everything is reusable and flexible.
However, you can’t really do anything if you only have a bunch of libraries for parsing virtual disks and filesystems.
You’d end up in a situation where you’re writing custom Python scripts for each investigation, not scalable at all.
This is exactly the situation we ended up in. We had all these fancy and fast parsers for various formats, but every
time you wanted to use it, you’d have to write or edit a script. This is where
dissect.target comes into play.
dissect.target is the glue between all these different libraries. It leverages all the different parsers
to deliver an easy to use and extendable investigation framework. In practise, this means that
you to go straight from an
.E01 file to event logs in Splunk, without having to think about mounting, extracting or
dissect.target goes about this is a layered approach. Each layer is interchangeable for a compatible layer,
and you can start from any layer. Let’s go through each layer, taking an
Example.vmx file as a practical example to
explain each layer.
Targets (or target) is the terminology we use for the type of source data supported by
dissect.target. It includes
anything that can, one way or another, be used to describe a certain state of a system.
Some examples include:
Raw disk images, virtual disks (
.vhdx, etc) and evidence containers (
Virtual machine descriptor files (
Local live systems (
Forensic packages such as
.tararchives created by acquire or KAPE output directories
… and more
It’s important to be aware of the difference between virtual machine descriptor (like
.vmx) and virtual disk
.vmdk) files. For example, when using a
.vmdk with Dissect, only that VMDK will be loaded, but when
.vmx, all the
.vmdk files described in that VMX file will be loaded instead. This ensures that systems
with multiple virtual disks work as intended. A
.vmx file and
.vmdk file are thus both valid targets. Keep this
in mind when working with investigative data.
Dissect also supports the concept of targets within targets, referred to as child targets. For example, when a
target contains a
.vmdk file within itself, we can tell
dissect.target to load that target from within the
context of the first target. This can be useful when dealing with hypervisors.
Say, for example, we opened a Hyper-V target locally from
\\.\PhysicalDrive0, we can parse the metadata
ProgramData/Microsoft/Windows/Hyper-V/data.vmcx that tells us where all of the virtual machines are stored.
Then we can then use these paths and tell
dissect.target to load another target from there. Reading all of these
files will still happen from
\\.\PhysicalDrive0, passing through the various abstraction layers of
This allows Dissect to read the disks from running virtual machines, regardless of locks the operating has on these files.
On the technical side, everything eventually gets loaded into a
Target Python object. This is not required
knowledge for working with Dissect, but may help when reading the documentation.
For more information about how the
Target object works internally, or how targets within targets works, please
refer to Targets.
A loader in
dissect.target is responsible for mapping any kind of source data into something the Dissect framework
understands. Loaders can be incredibly complex or incredibly simple, depending on what you are trying to achieve.
In our example of the
Example.vmx file, we can parse the VMX file to retrieve the paths to all the VMDK files
associated with this VMX file. In this case, let’s say that is
D: drive respectively.
For more information about how loaders work internally, please refer to Loaders.
dissect.target can be seen as an abstraction layer for anything that looks like a raw disk.
It provides a unified API for all these different “containers”, which basically boils down to it acting like
any other file-like object.
Continuing where our loader left off, we now have two
.vmdk files we need to do something with. Since we have
a VMDK implementation in Dissect, we can tell the container layer to open each of these and add them as “disks”.
For more information about how containers works internally, please refer to Containers.
dissect.target, much like containers, are an abstraction layer for anything that looks like a raw volume.
Similar to containers, everything is a file-like object.
For each of our VMDKs,
dissect.target will try to detect a volume system and return all found partitions.
For this example, let’s say that each disk has just one partition, an NTFS volume.
For more information about how volumes and volume systems work internally, please refer to Volumes.
Filesystems is where it finally gets a little more exciting in
dissect.target. Again, they represent a filesystem
using a unified API. The API for this tries to mimic the Python standard library API as much as possible, to make working
with it as easy as possible for anyone with Python experience.
Anything that can be interpreted as a filesystem lives here. This, obviously, includes actual filesystems such as
ext4, but also anything else that you can interpret as a filesystem. For example, a directory containing extracted
files (comparable to a
chroot), a tar file or a remote EDR agent API can all be interpreted as a filesystem in some way.
Continuing with our example,
dissect.target will attempt to detect a filesystem on the two volumes we previously found.
It will succeed in detecting an
NTFS filesystem on both volumes.
One important thing to know is that
dissect.target relies heavily on virtual filesystems. These are fake filesystems
implementing the filesystem API, where we can map any file (real or virtual) to any path. This is used for e.g. creating
drive letters or mounting filesystems at specific paths.
For more information about how filesystems work internally, please refer to Filesystems.
Automatically loading containers, volumes and filesystems can be considered the core functionality of
In our example, we now have a target with two disks, two volumes and two filesystems, all done automatically. Nice, but still
not really that useful. We’d still have to write a ton of Python code ourselves to do anything useful with it. This is where
the plugin system of
dissect.target comes into play.
Plugins is where we further refine a loaded target and what we can do with it. It’s also the place where we can implement actual functionality for e.g. extracting forensic artefacts. If you were to stack all of the previous layers in order, with containers being at the bottom, plugins would be at the top. The idea is that you could swap any compatible component within a layer with another, and it will all continue to work just fine.
In practise, this means that a plugin that’s responsible for Windows event logs doesn’t care where those event logs actually
come from. They can be from an NTFS filesystem (that can in turn be parsed from a VMDK), or they can come from a
For more information about how plugins work internally, please refer to Plugins.
Plugins themselves are also layered. We start off with operating system plugins which are responsible for some OS specific plumbing, such as:
Further processing of operating systems that are normally loaded into memory (e.g. VyOS or ESXi)
Mounting volumes to the correct drive letter/path
Parsing OS specific information such as the version, IPs/network interfaces and users
Operating system plugins can themselves also be layered. For example, a Debian specific plugin can be layered on top of a Linux plugin.
In our VMX example, we had two disks for the
D: drives respectively. The Windows plugin will parse the
registry to find out which drive needs to be mounted where, and makes sure the respective filesystems are available on the
correct drive letters.
Next we have auxiliary and “artefact” plugins. An example of an auxiliary plugin is the Windows registry plugin, which provides an easy to use API for interacting with the Windows registry, which can then be used by all the other plugins. An example of an artefact plugin is the Windows event log plugin, or the Windows services plugin, which leverages the registry plugin!
Note that it’s a deliberate design decision in Dissect that artefacts are only parsed from their intended/configured locations. For example, when parsing Windows event logs we will only use the default and configured event log paths (from the registry). Event logs in any other location on the filesystem will not be parsed.
Having specified the target from a VMX file, however, gives us the benefit that we know about both the
drives. If event logs are configured to be stored at
D:\Logs, there are no additional steps we need to take to
parse these. The event log plugin will read the configured log location and parse the logs found on the filesystem
associated with the
Plugins can have four types of outputs:
none. The output type of a plugin
can be seen when listing them by running
default: Returns “basic types” such as strings, integers, lists
yield: Yields strings, useful for things like human readable timelines
none: Returns nothing, useful for miscellaneous plugins that print things on their own (such as the plugin lister plugin!)
Records are a simple typed key/value storage format created by Fox-IT. You can think of it like JSON, but strongly
typed. For example, we may define a record as having a timestamp field of type
datetime, or an IP address field
net.ipaddress. You can interact with specific attributes of these fields, too. For example, you can get the
filename attribute of an
uri field to just get the filename of that path.
Records and a range of utilities are implemented in flow.record.