:py:mod:`dissect.target.tools.dump` =================================== .. py:module:: dissect.target.tools.dump Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: dissect.target.tools.dump.RecordStreamElement dissect.target.tools.dump.Sink dissect.target.tools.dump.DumpState dissect.target.tools.dump.Compression dissect.target.tools.dump.Serialization dissect.target.tools.dump.JsonLinesWriter dissect.target.tools.dump.SortedKeysJsonRecordPacker Functions ~~~~~~~~~ .. autoapisummary:: :nosignatures: dissect.target.tools.dump.get_targets dissect.target.tools.dump.execute_function dissect.target.tools.dump.produce_target_func_pairs dissect.target.tools.dump.execute_functions dissect.target.tools.dump.log_progress dissect.target.tools.dump.sink_records dissect.target.tools.dump.persist_processing_state dissect.target.tools.dump.configure_state dissect.target.tools.dump.create_state dissect.target.tools.dump.persisted_state dissect.target.tools.dump.load_state dissect.target.tools.dump.serialize_obj dissect.target.tools.dump.get_nested_attr dissect.target.tools.dump.get_sink_dir_by_target dissect.target.tools.dump.get_sink_dir_by_func dissect.target.tools.dump.slugify_descriptor_name dissect.target.tools.dump.get_sink_filename dissect.target.tools.dump.get_relative_sink_path dissect.target.tools.dump.open_path dissect.target.tools.dump.get_sink_writer dissect.target.tools.dump.cached_sink_writers dissect.target.tools.dump.get_current_utc_time dissect.target.tools.dump.parse_datetime_iso dissect.target.tools.dump.execute_pipeline dissect.target.tools.dump.parse_arguments dissect.target.tools.dump.main Attributes ~~~~~~~~~~ .. autoapisummary:: dissect.target.tools.dump.HAS_LZ4 dissect.target.tools.dump.HAS_ZSTD dissect.target.tools.dump.log dissect.target.tools.dump.STATE_FILE_NAME dissect.target.tools.dump.PENDING_UPDATES_LIMIT dissect.target.tools.dump.COMPRESSION_TO_EXT dissect.target.tools.dump.DEST_DIR_CACHE_SIZE dissect.target.tools.dump.DEST_FILENAME_CACHE_SIZE dissect.target.tools.dump.OPEN_WRITERS_LIMIT dissect.target.tools.dump.SERIALIZERS .. py:data:: HAS_LZ4 :value: True .. py:data:: HAS_ZSTD :value: True .. py:data:: log .. py:class:: RecordStreamElement .. py:attribute:: target :type: dissect.target.target.Target .. py:attribute:: func :type: dissect.target.plugin.FunctionDescriptor .. py:attribute:: record :type: flow.record.Record .. py:attribute:: end_pos :type: int | None :value: None .. py:attribute:: sink_path :type: pathlib.Path | None :value: None .. py:function:: get_targets(targets: list[str]) -> collections.abc.Iterator[dissect.target.target.Target] Return a generator with :class:`Target` objects for provided paths. .. py:function:: execute_function(target: dissect.target.target.Target, function: dissect.target.plugin.FunctionDescriptor, dry_run: bool, arguments: list[str]) -> collections.abc.Iterator[dissect.target.helpers.record.TargetRecordDescriptor] Execute function ``function`` on provided target ``target`` and return a generator with the records produced. Only output type ``record`` is supported for plugin functions. .. py:function:: produce_target_func_pairs(targets: collections.abc.Iterable[dissect.target.target.Target], state: DumpState) -> collections.abc.Iterator[tuple[dissect.target.target.Target, dissect.target.plugin.FunctionDescriptor]] Return a generator with target and function pairs for execution. Target and function pairs that correspond to finished sinks in provided state ``state`` are skipped. .. py:function:: execute_functions(target_func_stream: collections.abc.Iterable[tuple[dissect.target.target.Target, dissect.target.plugin.FunctionDescriptor]], dry_run: bool, arguments: list[str]) -> collections.abc.Iterator[RecordStreamElement] Execute a function on a target for target / function pairs in the stream. Returns a generator of ``RecordStreamElement`` objects. .. py:function:: log_progress(stream: collections.abc.Iterable[Any], step_size: int = 1000) -> collections.abc.Iterator[Any] Log a number of items that went though the generator stream after every N element (N is configured in ``step_size``). .. py:function:: sink_records(record_stream: collections.abc.Iterable[RecordStreamElement], state: DumpState) -> collections.abc.Iterator[RecordStreamElement] Persist records from the stream into appropriate sinks, per serialization, compression and record type. .. py:function:: persist_processing_state(record_stream: collections.abc.Iterable[RecordStreamElement], state: DumpState) -> collections.abc.Iterator[RecordStreamElement] Keep track of the pipeline state in a persistent state object. .. py:function:: configure_state(args: argparse.Namespace) -> DumpState | None .. py:data:: STATE_FILE_NAME :value: 'target-dump.state.json' .. py:data:: PENDING_UPDATES_LIMIT :value: 10 .. py:class:: Sink .. py:attribute:: target_path :type: str .. py:attribute:: func :type: str .. py:attribute:: path :type: pathlib.Path .. py:attribute:: is_dirty :type: bool :value: True .. py:attribute:: record_count :type: int :value: 0 .. py:attribute:: size_bytes :type: int :value: 0 .. py:method:: __post_init__() .. py:class:: DumpState .. py:attribute:: target_paths :type: list[str] .. py:attribute:: functions :type: str .. py:attribute:: excluded_functions :type: list[str] .. py:attribute:: serialization :type: str .. py:attribute:: compression :type: str .. py:attribute:: start_time :type: datetime.datetime .. py:attribute:: last_update_time :type: datetime.datetime .. py:attribute:: sinks :type: list[Sink] :value: [] .. py:attribute:: output_dir :type: pathlib.Path | None :value: None .. py:attribute:: pending_updates_count :type: int | None :value: 0 .. py:property:: record_count :type: int .. py:property:: finished_sinks :type: list[Sink] .. py:property:: path :type: pathlib.Path .. py:method:: get_state_path(output_dir: pathlib.Path) -> pathlib.Path :classmethod: .. py:method:: get_full_sink_path(sink: Sink) -> pathlib.Path .. py:method:: get_sink(path: pathlib.Path) -> Sink | None .. py:method:: serialize() -> str Serialize state instance into a JSON formatted string. .. py:method:: persist(fh: TextIO) -> None Write serialized state instance into profided ``fh`` byte stream, overwriting it from the beginning. .. py:method:: mark_as_finished(target: dissect.target.target.Target, func: str) -> None Mark sinks that match provided ``target`` and ``func`` pair as not dirty. .. py:method:: create_sink(sink_path: pathlib.Path, stream_element: RecordStreamElement) -> Sink Create a sink instance for provided ``sink_path`` and ``stream_element`` (from which ``target`` and ``func`` properties are used). .. py:method:: update(stream_element: RecordStreamElement, fp_position: int) -> None Update a sink instance for provided ``stream_element``. .. py:method:: from_dict(state_dict: dict) -> Self :classmethod: Deserialize state instance from provided dictionary. .. py:method:: from_path(output_dir: pathlib.Path) -> Self | None :classmethod: Deserialize state instance from a file in the provided output directory path. .. py:method:: get_invalid_sinks() -> list[Sink] Return sinks that have a mismatch between recorded size and a real file size. .. py:method:: drop_invalid_sinks() -> None Remove sinks that have a mismatch between recorded size and a real file size from the list of sinks. .. py:method:: drop_dirty_sinks() -> None Drop sinks that are marked as "dirty" in the current state from the list of sinks. .. py:function:: create_state(*, output_dir: pathlib.Path, target_paths: list[str], functions: str, excluded_functions: list[str], serialization: Serialization, compression: Compression = None) -> DumpState Create a ``DumpState`` instance with provided properties. .. py:function:: persisted_state(state: DumpState) -> collections.abc.Iterator[collections.abc.Callable] Return a context manager for persisting ``DumpState`` instance. .. py:function:: load_state(output_dir: pathlib.Path) -> DumpState | None Load persisted ``DumpState`` instance from provided ``output_dir`` path and perform sink validation. .. py:function:: serialize_obj(obj: Any) -> str JSON serializer for object types not serializable by ``json`` library. .. py:class:: Compression Bases: :py:obj:`str`, :py:obj:`enum.Enum` Supported compression types. .. py:attribute:: BZIP2 :value: 'bzip2' .. py:attribute:: GZIP :value: 'gzip' .. py:attribute:: LZ4 :value: 'lz4' .. py:attribute:: ZSTD :value: 'zstandard' .. py:attribute:: NONE :value: None .. py:class:: Serialization Bases: :py:obj:`str`, :py:obj:`enum.Enum` Supported serialization methods. .. py:attribute:: JSONLINES :value: 'jsonlines' .. py:attribute:: MSGPACK :value: 'msgpack' .. py:data:: COMPRESSION_TO_EXT .. py:data:: DEST_DIR_CACHE_SIZE :value: 10 .. py:data:: DEST_FILENAME_CACHE_SIZE :value: 10 .. py:data:: OPEN_WRITERS_LIMIT :value: 10 .. py:function:: get_nested_attr(obj: Any, nested_attr: str) -> Any .. py:function:: get_sink_dir_by_target(target: dissect.target.target.Target, function: dissect.target.plugin.FunctionDescriptor) -> pathlib.Path .. py:function:: get_sink_dir_by_func(target: dissect.target.target.Target, function: dissect.target.plugin.FunctionDescriptor) -> pathlib.Path .. py:function:: slugify_descriptor_name(descriptor_name: str) -> str .. py:function:: get_sink_filename(record_descriptor: flow.record.RecordDescriptor, serialization: Serialization, compression: Compression | None = None) -> str Return a sink filename for provided record descriptor, serialization and compression. .. py:function:: get_relative_sink_path(element: RecordStreamElement, serialization: str, compression: Compression | None = None) -> pathlib.Path Return a sink path relative to an output directory. .. py:function:: open_path(path: pathlib.Path, mode: str, compression: Compression | None = None) -> BinaryIO Open ``path`` using ``mode``, with specified ``compression`` and return a file object. .. py:class:: JsonLinesWriter(fp: TextIO, **kwargs) Bases: :py:obj:`flow.record.adapter.jsonfile.JsonfileWriter` .. py:attribute:: fp .. py:attribute:: packer .. py:method:: flush() -> None Flush any buffered writes. .. py:method:: close() -> None Close the Writer, no more writes will be possible. .. py:class:: SortedKeysJsonRecordPacker(indent: int | None = None, pack_descriptors: bool = True) Bases: :py:obj:`flow.record.jsonpacker.JsonRecordPacker` .. py:method:: pack(obj: flow.record.Record | flow.record.RecordDescriptor) -> str .. py:data:: SERIALIZERS .. py:function:: get_sink_writer(full_sink_path: pathlib.Path, serialization: Serialization, compression: Compression | None = None, new_sink: bool = True) -> flow.record.adapter.jsonfile.JsonfileWriter | flow.record.RecordStreamWriter .. py:function:: cached_sink_writers(state: DumpState) -> collections.abc.Iterator[collections.abc.Callable] .. py:function:: get_current_utc_time() -> datetime.datetime .. py:function:: parse_datetime_iso(datetime_str: str) -> datetime.datetime .. py:function:: execute_pipeline(state: DumpState, targets: collections.abc.Iterator[dissect.target.target.Target], dry_run: bool, arguments: list[str], limit: int | None = None) -> None Run the record generation, processing and sinking pipeline. .. py:function:: parse_arguments() -> tuple[argparse.Namespace, list[str]] .. py:function:: main() -> None