Style guide#

This is a basic style guide for the Dissect projects. The goal of this guide is to increase the understandability and maintainability of both code and documentation.

Applicability#

This guide is applicable to both new and existing code and the Dissect build pipeline enforces the most important rules. Certain exceptions are made for older parts of the code as they were written before the creation of this guide.

New code#

When submitting new code for inclusion in one of the Dissect projects, your code should adhere to this guide unless there is a valid reason not to do so. This motivation should be added to your eventual Pull Request.

Older code#

When submitting changes to existing code that does not yet adhere to this style guide, a choice should be made whether to make the change conformant to the guidelines. You can use the rules below to help you decide what to do in these cases.

If the change in existing code is

  • large, it is best to refactor the function, method or class according to these guidelines.

  • small and rewriting the code for guideline conformance would not be proportional to the change itself, you may submit the code using the original styling.

Note

Regardless of conformance to this style guide, any change you make should be understandable and clear in its functioning.

Code style and formatting#

This section lists how to format code in a readable and consistent manner and which specifications and tools are used to enforce them.

PEP 8 and Black#

The code should adhere to the PEP 8 Python code style. The adherence to PEP 8 is checked using Flake8. Flake E203 errors can be ignored due to the ambiguous nature of these errors (see https://github.com/PyCQA/pycodestyle/issues/373).

The formatting of the code layout is further refined by using Black. Black provides functionality to automatically format code and enforces consistent coding style between files and projects regardless of the author. It also relieves authors of the burden of having to actively think about the formatting.

PEP 8 and Black styles are mandatory. This is configured in the project’s tox.ini files and tested for by our build pipeline.

Maximum line length#

Lines should be limited to 120 characters. For modern console sizes this gives a bit more room compared to the standard 80-character limit without sacrificing readability, probably even increasing it.

Type hinting#

New functions and classes should be fully type hinted. The combination of type hinting and docstrings helps in understanding what the function or class does and how it should be used.

Import order#

Import statements for files and modules are divided into three groups and should be ordered as indicated below:

  1. builtin modules

  2. modules from external projects including other Dissect projects, e.g. PyYAML

  3. modules from the project itself.

The imports within each group should be in alphabetical order, as in the example below:

import builtins_a
from builtins_a import foo
import builtins_b

import externals_a
from externals_a import bar
import externals_b
import other_dissect_project

import this_dissect_project
from this_dissect_project import bla

Formatting tuples#

Care should be taken when formatting tuples as Black attempts to reformat all elements into a single line. To prevent this, add a comma (,) after the last item of the tuple, like this:

function(
    param1,
    param2,
)

Coincidentally, this also gives cleaner code diffs when adding or removing items from the tuple later.

Naming variables#

Naming variables can be challenging. When deciding on a variable name, take the following rules into account:

  • Avoid single-character variable names.

  • Don’t name variables after their type (list, dict etc.).

Incorporating dissect.cstruct definitions#

Writing structure definitions is an essential part of writing a new parser. The following rules show how to format them properly.

Split definition and loading#

When using dissect.cstruct to define and load C structures, split the definition of the structure and the loading of the structure:

c_def = """
#define   SOME_C_DEF = 1
"""

c_obj = cstruct.load(c_def)

This increases readability and allows you to add a # noqa: E501 after the string defining the C structure. This is useful if the definition comes from an external source which has lines that are too long, but you want to keep the original layout.

Styling structure definitions#

The main rule for styling structure definitions is to keep the style similar to the original structures when this is possible.

Below follows more specific rules depending on the availability of the structures:

1. If open-source or openly documented structures are available, use them as much as possible. Changing field types or slightly altering structures for performance or compatibility reasons is encouraged. For example, char[n] is faster than int8[n], or changing a GUID field_name to char field_name[16].

2. If no original structures are available, make an educated guess on what they could look like in the original source. For example, during reverse engineering you see a debug log message that uses lowerCamelCase field names, use that style for your field names.

If no discernible style is visible, you can use the following general rules:

  • For a Microsoft file format, use UPPERCASE_NAME structure names and CamelCase field names.

    • One exception is that field prefixes like dw and cb should be removed, even when copy-pasting structures.

  • For other file formats, use lowercase_name structure and field names.

Documentation style and formatting#

New code needs to be documented properly using docstrings. To understand how documentation is organised and generated, check out the developing for Dissect page.

Use of docstrings#

Functions and classes should have docstrings detailing what that function or class does and/or how it should be used. They should be formatted as described in the Google docstring format.

The first line of a docstring should contain a short sentence describing the nature of the function/class, followed by an empty line and optionally a more verbose explanation detailing how the function/class goes about doing its thing and/or how it should be used. Finally, add an indented list of arguments, return value(s) and exceptions which can be raised according to the Google docstring format.

Typing of parameters should be done through type hinting.

Use the References: clause when referencing external resources such as URLs to websites.

Example docstrings#

An example of how to use the docstring to comment a function/method:

def comment_example(raw_comment: str) -> str:
    """This is an example comment.

    Args:
        raw_comment: A string containing the raw comment

    Returns:
        Data with a ``Comment:`` prefix
    """
    return f"Comment: {raw_comment}"


class CodestyleException(Exception):
    """An exception that gets raised to illustrate this example."""


def function_with_an_exception() -> None:
    """This function raises an exception.

    This function is only for illustratory purposes.

    References:
        - https://docs.dissect.tools/

    Raises:
        CodestyleException: Gets raised as an example
    """
    if True:
        raise CodestyleException("Hello")


#: This is a magic random string comment
some_random_variable: str = "hello world"

some_other_way_of_using_a_variable_comment: str = "Hello world 2"
"""Sphinx also recognizes this as a type of comment"""

private_member: str = "Private Member"
"""Using the ``meta private`` directive, you can tell sphinx not to include the variable

:meta private:"""

_public_member: str = "Public Member"
"""Using the ``meta public`` directive, you can tell sphinx to include the variable in the documentation.
When the variable name is prefixed with ``_``, it makes the variable private by default.

:meta public:"""

The examples above look like this:

exception codestyle.CodestyleException#

An exception that gets raised to illustrate this example.

codestyle.comment_example(raw_comment: str) str#

This is an example comment.

Parameters:

raw_comment – A string containing the raw comment

Returns:

Data with a Comment: prefix

codestyle.function_with_an_exception() None#

This function raises an exception.

This function is only for illustratory purposes.

References

Raises:

CodestyleException – Gets raised as an example

codestyle._public_member: str = 'Public Member'#

Using the meta public directive, you can tell sphinx to include the variable in the documentation. When the variable name is prefixed with _, it makes the variable private by default.

codestyle.some_other_way_of_using_a_variable_comment: str = 'Hello world 2'#

Sphinx also recognizes this as a type of comment

codestyle.some_random_variable: str = 'hello world'#

This is a magic random string comment

The most important takeaways are:

  • Use typehints so type information gets automatically added to the documentation

  • Args: To document parameters

  • Returns: To document what it specifically returns

  • Raises: To document if it raises a specific exception and why

Commit message style and formatting#

Commit messages should adhere to the following points:

  • Separate subject from body with a blank line

  • Limit the subject line to 50 characters as much as possible

  • Capitalize the subject line

  • Do not end the subject line with a period

  • Use the imperative mood in the subject line

  • The verb should represent what was accomplished (Create, Add, Fix etc)

  • Wrap the body at 72 characters

  • Use the body to explain the what and why vs. the how

Example commit message#

An example of a properly formatted commit message:

Fix parsing extra NULL bytes in the NTFS header

Sometimes extra null bytes can be present at the end of the NTFS allocator
table, this patch makes sure they are not included in the next header
structure.