corrupy.picklemagic — Pickle data extraction¶
The picklemagic module implements tools for extracting data stored in
python’s pickle object serialization format.
Technical Background¶
The Python pickle module is a nice tool for storing data structures, as long as the Python environment stays the same almost anything can be pickled. But if the original enviroment in which a pickle was made is unknown? Then trying to find out what data was serialized is a very cumbersome task and unpickling it is even a security risk.
The picklemagic module aims to solve this problem. Taking advantage of
Python’s dynamic nature, it extracts as much information as possible from
pickles and creates a data structure as close as possible to the original.
This is accomplished by generating any missing class and module definitions at runtime. These objects however only hold data which could be recovered from the unpickling process, they lack any implementation, and will therefore be referred to as fake classes and modules.
This module uses this behaviour in two ways. FakeUnpickler uses
it simply to extend the normal unpickler behaviour, creating fake modules
and classes when it encounters a definition it cannot find in the available
modules. This ensures that the resulting data structure is as close to the
original as possible. SafeUnpickler however uses it to replace
the default Unpickler behaviour, replacing any class definition requested
by the pickle by a fake class. This ensures safety during the unpickling
process since the pickle cannot instantiate dangerous objects or call
dangerous functions during the unpickling process.
Fake classes and modules¶
The mechanics of fake classes and modules are an important part of this module.
Fake classes get instantiated when the unpickling machinery encounters a request
to load a top-level object from a module. In a normal pickle this object should
either be a function or a class object. When the module cannot be found or the
object cannot be found in the module, a fake class has to be inserted. This fake
class is then created using the settings of the used FakeClassFactory,
during which the classes __module__ attribute will be set to the module
the class would have resided in, and the __name__ attribute will be set
to the name the class would have had.
A similar process happens for the generation of fake modules. These modules will
be generated when a FakeUnpickler encounters a reference to an object
in an unknown module. When this happens, a fake module will be generated to house
the fake classes which would be contained by that module. When such a module is
created, it will automatically generate all necessary parent modules and add
itself to sys.modules so it can be imported properly. It should be noted
though that SafeUnpickler does not generate fake modules while importing
since it is forbidden from importing modules.
A problem with this approach can be that it’s hard to write code to analyze the
created datastructures when the fake modules and classes are only created at
unpickling time. Therefore it is made possible for the user to create the necessary
fake modules beforehand, either by creating FakeModule instances directly
or by using fake_package(). This function allows the user to define that
any modules in the given package exist, which works recursively, for example:
import picklemagic
picklemagic.fake_package("foo")
import foo.bar.baz
print(foo.bar.baz)
>>> <module 'foo.bar.baz' (fake)>
These can then be used to code with due to the special comparison behaviour of
fake modules and classes. This behaviour works as follows: A fake class is equal
to a fake module if it’s qualified name matches the qualified name of the fake
module. This means that a fake class which says it has name bar in module foo
compares equal to a fake module which identifies as foo.bar (this behaviour extends
to hashing and isinstance/issubclass checking). This can then be
used as follows:
import picklemagic
picklemagic.fake_package("foo")
import foo
def is_foo_bar(obj):
if isinstance(obj, foo.bar):
print("yes")
result = picklemagic.safe_loads(b"cfoo\nbar\n(tR.")
# This pickle results in a foo.bar instance
is_foo_bar(result)
>>> yes
This means that you don’t have to worry about definitions not existing in certain
pickles. You can call fake_package() and then just code as if everything in
the module actually existed.
Security Risks¶
While SafeUnpickler secures the unpickling process by denying the process
access to globals and objects in modules by replacing the wanted definitions with
fake classes which cannot do any harm, there are other possible security risks in
the pickle protocol. These vulnerabilities are persistent ideas and the pickle
extension registry. Although SafeUnpickler allows subclassing of
SafeUnpickler.persistent_id(), care should be taken that the objects returned
by it cannot be used for anything harmful. The same goes for the pickle extension
registry if enabled (documented in the Python copyreg module).
Module Interface¶
To simply analyze a pickle string, you can simply call load() or
safe_load(). Similarly if you want to analyze a pickle data stream, you
can call the loads() and safe_loads() functions. However if you
want more control over the missing class faking process, you can control
FakeClass creation directly using FakeClassFactory and by subclassing
FakeClassType. For more control over the unpickling process itself the
classes FakeUnpickler and SafeUnpickler can be used directly.
The picklemagic module provides the following functions to make simple
use more convenient
- corrupy.picklemagic.load(file, class_factory=None, encoding='bytes', errors='errors')¶
Read a pickled object representation from the open binary file object file and return the reconstitutded object hierarchy specified therein, generating any missing class definitions at runtime. This is equivalent to
FakeUnpickler(file).load().The optional keyword arguments are class_factory, encoding and errors. class_factory can be used to control how the missing class definitions are created. If set to
None,FakeClassFactory({}, 'strict')will be used.In Python 3, the optional keyword arguments encoding and errors can be used to indicate how the unpickler should deal with pickle streams generated in python 2, specifically how to deal with 8-bit string instances. If set to “bytes” it will load them as bytes objects, otherwise it will attempt to decode them into unicode using the given encoding and errors arguments.
This function should only be used to unpickle trusted data.
- corrupy.picklemagic.safe_load(file, class_factory=None, safe_modules=(), use_copyreg=False, encoding='bytes', errors='errors')¶
Read a pickled object representation from the open binary file object file and return the reconstitutded object hierarchy specified therein, substituting any class definitions by fake classes, ensuring safety in the unpickling process. This is equivalent to
SafeUnpickler(file).load().The optional keyword arguments are class_factory, safe_modules, use_copyreg, encoding and errors. class_factory can be used to control how the missing class definitions are created. If set to
None,FakeClassFactory({}, 'strict')will be used. safe_modules can be set to a set of strings of module names, which will be regarded as safe by the unpickling process, meaning that it will import objects from that module instead of generating fake classes (this does not apply to objects in submodules). use_copyreg is a boolean value indicating if it’s allowed to use extensions from the pickle extension registry (documented in thecopyregmodule).In Python 3, the optional keyword arguments encoding and errors can be used to indicate how the unpickler should deal with pickle streams generated in python 2, specifically how to deal with 8-bit string instances. If set to “bytes” it will load them as bytes objects, otherwise it will attempt to decode them into unicode using the given encoding and errors arguments.
This function can be used to unpickle untrusted data safely with the default class_factory when safe_modules is empty and use_copyreg is False.
- corrupy.picklemagic.loads(string, class_factory=None, encoding='bytes', errors='errors')¶
Simjilar to
load(), but takes an 8-bit string (bytes in Python 3, str in Python 2) as its first argument instead of a binary file object.
- corrupy.picklemagic.safe_loads(string, class_factory=None, safe_modules=(), use_copyreg=False, encoding='bytes', errors='errors')¶
Similar to
safe_load(), but takes an 8-bit string (bytes in Python 3, str in Python 2) as its first argument instead of a binary file object.
To ease automatic analysis, the picklemagic module provides the
following functions.
- corrupy.picklemagic.fake_package(name)¶
Mounts a fake package tree with the name name. This causes any attempt to import module name, attributes of the module or submodules will return a
FakePackageinstance which implements the same behaviour. TheseFakePackageinstances compare properly withFakeClassTypeinstances allowing you to code using FakePackages as if the modules and their attributes actually existed.This is implemented by creating a
FakePackageLoaderinstance with root name and inserting it in the first spot insys.meta_path. This ensures that importing the module and submodules will work properly. Further theFakePackageinstances take care of generating submodules as attributes on request.If a fake package tree with the same name is already registered, no new fake package tree will be mounted.
This returns the
FakePackageinstance name.
- corrupy.picklemagic.remove_fake_package(name)¶
Removes the fake package tree mounted at name.
This works by first looking for any FakePackageLoaders in
sys.pathwith their root set to name and removing them from sys.path. Next it will find the top-levelFakePackageinstance name and from this point traverse the tree of created submodules, removing them fromsys.pathand removing their attributes. After this the modules are not registered anymore and if they are not referenced from user code anymore they will be garbage collected.If no fake package tree name exists a
ValueErrorwill be raised.
The picklemagic module defines this Exception:
- exception corrupy.picklemagic.FakeUnpicklingError¶
Error raised when there is not enough information to perform the fake unpickling process completely. It inherits from
pickle.UnpicklingError.
Fake Classes¶
The picklemagic module uses the following classes to provide the necessary
fake class definitions required by the fake unpickling process.
- class corrupy.picklemagic.FakeClassType(name, bases, attributes, module=None)¶
The metaclass used to create fake classes. To support comparisons between fake classes and
FakeModuleinstances custom behaviour is defined here which follows this logic:If the other object does not have
other.__name__set, they are not equal.Else if it does not have
other.__module__set, they are equal ifself.__module__ + "." + self.__name__ == other.__name__.Else, they are equal if
self.__module__ == other.__module__ and self.__name__ == other.__name__Using this behaviour,
==,!=,hash(),isinstance()andissubclass()are implemented allowing comparison betweenFakeClassTypeinstances andFakeModuleinstances to succeed if they are pretending to be in the same place in the python module hierarchy.To create a fake class using this metaclass, you can either use this metaclass directly or inherit from the fake class base instances given below. When doing this, the module that this fake class is pretending to be in should be specified using the module argument when the metaclass is called directly or a :attr:
__module__class attribute in a class statement.This is a subclass of
type.
- class corrupy.picklemagic.FakeClassFactory(special_cases=(), default=FakeStrict)¶
Factory of fake classses. It will create fake class definitions on demand based on the passed arguments.
special_cases should be an iterable containing fake classes which should be treated as special cases during the fake unpickling process. This way you can specify custom methods and attributes on these classes as they’re used during unpickling.
default_class should be a FakeClassType instance which will be subclassed to create the necessary non-special case fake classes during unpickling. This should usually be set to
FakeStrict,FakeWarningorFakeIgnore. These classes have__new__()and__setstate__()methods which extract data from the pickle stream and provide means of inspecting the stream when it is not clear how the data should be interpreted.As an example, we can define the fake class generated for definition bar in module foo, which has a
__str__()method which returns"baz":class bar(FakeStrict, object): def __str__(self): return "baz" special_cases = [bar]
- Alternatively they can also be instantiated using
FakeClassTypedirectly:: special_cases = [FakeClassType(c.__name__, c.__bases__, c.__dict__, c.__module__)]
- __call__(name, module)¶
Return the right class for the specified module and name.
This class will either be one of the special cases in case the name and module match, or a subclass of default_class will be created with the correct name and module.
Created class definitions are cached per factory instance.
- Alternatively they can also be instantiated using
- class corrupy.picklemagic.FakeClass¶
- class corrupy.picklemagic.FakeStrict(*args, **kwargs)¶
- class corrupy.picklemagic.FakeWarning(*args, **kwargs)¶
- class corrupy.picklemagic.FakeIgnore(*args, **kwargs)¶
These are FakeClassType instances which can easily be subclassed to get
the wanted behaviour. FakeClass is a featureless instance for the rest
to inherit from. FakeStrict, FakeWarning and FakeIgnore
all define __new__() and __setstate__() methods to support the fake
unpickling process. If FakeStrict is used, a FakeUnpicklingError
will be raised if special arguments were passed into the methods during unpickling.
If FakeWarning is used, a warning detailing the arguments will be printed
and the arguments will be stored inside an attribute of the object
(_setstate_args or _new_args). Finally if FakeIgnore is
used, any unknown arguments will be stored inside an attribute of the object but
no warning will be printed.
Fake Modules¶
The picklemagic module uses the following classees to implement the fake
modules generated by fake_package() and the fake unpickling process.
- class corrupy.picklemagic.FakeModule(name)¶
An object which pretends to be a module.
name is the name of the module and should be a
"."separated alphanumeric string.On initialization the module is added to sys.modules so it can be imported properly. Further if name is a submodule and if its parent does not exist, it will automatically create a parent
FakeModule. This operates recursively until the parent is a top-level module or when the parent is an existing module.If any fake submodules are removed from this module they will automatically be removed from
sys.modules.Just as
FakeClassType, it supports comparison withFakeClassTypeinstances, using the following logic:If the object does not have
other.__name__set, they are not equal.Else if the other object does not have
other.__module__set, they are equal if:self.__name__ == other.__name__Else, they are equal if:
self.__name__ == other.__module__ + "." + other.__name__Using this behaviour,
==,!=,hash(),isinstance()andissubclass()are implemented allowing comparison betweenFakeClassTypeinstances andFakeModuleinstances to succeed if they are pretending to bein the same place in the python module hierarchy.It inherits from
types.ModuleType.- _remove()¶
Removes this module from
sys.modulesand calls_remove()on any sub-FakeModules.
- class corrupy.picklemagic.FakePackage(name)¶
A
FakeModulesubclass which lazily createsFakePackageinstances on its attributes when they’re requested.This ensures that any attribute of this module is a valid FakeModule which can be used to compare against fake classes.
- class corrupy.picklemagic.FakePackageLoader(root)¶
A loader of
FakePackagemodules. When added tosys.meta_pathit will ensure that any attempt to import module root or its submodules results in a FakePackage.Together with the attribute creation from
FakePackagethis ensures that any attempt to get a submodule from module root results in a FakePackage, creating the illusion that root is an actual package tree.This class is both a finder and a loader
Fake Unpicklers¶
These two classes do the actual work behind the fake unpickling process.
- class corrupy.picklemagic.FakeUnpickler(file, class_factory=None, encoding='bytes', errors='strict')¶
A forgiving unpickler. On uncountering references to class definitions in the pickle stream which it cannot locate, it will create fake classes and if necessary fake modules to house them in. Since it still allows access to all modules and builtins, it should only be used to unpickle trusted data.
file is the binary file to unserialize.
The optional keyword arguments are class_factory, encoding and *errors. class_factory can be used to control how the missing class definitions are created. If set to
None,FakeClassFactory((), FakeStrict)will be used.In Python 3, the optional keyword arguments encoding and errors can be used to indicate how the unpickler should deal with pickle streams generated in python 2, specifically how to deal with 8-bit string instances. If set to “bytes” it will load them as bytes objects, otherwise it will attempt to decode them into unicode using the given encoding and errors arguments.
It inherits from
pickle.Unpickler. (In Python 3 this is actuallypickle._Unpickler)This takes a binary file for reading a pickle data stream.
The protocol version of the pickle is detected automatically, so no proto argument is needed.
The argument file must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments. Both methods should return bytes. Thus file can be a binary file object opened for reading, an io.BytesIO object, or any other custom object that meets this interface.
The file-like object must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments. Both methods should return bytes. Thus file-like object can be a binary file object opened for reading, a BytesIO object, or any other custom object that meets this interface.
If buffers is not None, it should be an iterable of buffer-enabled objects that is consumed each time the pickle stream references an out-of-band buffer view. Such buffers have been given in order to the buffer_callback of a Pickler object.
If buffers is None (the default), then the buffers are taken from the pickle stream, assuming they are serialized there. It is an error for buffers to be None if the pickle stream was produced with a non-None buffer_callback.
Other optional arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is True, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.
- class corrupy.picklemagic.SafeUnpickler(file, class_factory=None, safe_modules=(), use_copyreg=False, encoding='bytes', errors='strict')¶
A safe unpickler. It will create fake classes for any references to class definitions in the pickle stream. Further it can block access to the extension registry making this unpickler safe to use on untrusted data.
file is the binary file to unserialize.
The optional keyword arguments are class_factory, safe_modules, use_copyreg, encoding and errors. class_factory can be used to control how the missing class definitions are created. If set to
None,FakeClassFactory((), FakeStrict)will be used. safe_modules can be set to a set of strings of module names, which will be regarded as safe by the unpickling process, meaning that it will import objects from that module instead of generating fake classes (this does not apply to objects in submodules). use_copyreg is a boolean value indicating if it’s allowed to use extensions from the pickle extension registry (documented in thecopyregmodule).In Python 3, the optional keyword arguments encoding and errors can be used to indicate how the unpickler should deal with pickle streams generated in python 2, specifically how to deal with 8-bit string instances. If set to “bytes” it will load them as bytes objects, otherwise it will attempt to decode them into unicode using the given encoding and errors arguments.
This function can be used to unpickle untrusted data safely with the default class_factory when safe_modules is empty and use_copyreg is False. It inherits from
pickle.Unpickler. (In Python 3 this is actuallypickle._Unpickler)It should be noted though that when the unpickler tries to get a nonexistent attribute of a safe module, an
AttributeErrorwill be raised.This inherits from
FakeUnpicklerThis takes a binary file for reading a pickle data stream.
The protocol version of the pickle is detected automatically, so no proto argument is needed.
The argument file must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments. Both methods should return bytes. Thus file can be a binary file object opened for reading, an io.BytesIO object, or any other custom object that meets this interface.
The file-like object must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments. Both methods should return bytes. Thus file-like object can be a binary file object opened for reading, a BytesIO object, or any other custom object that meets this interface.
If buffers is not None, it should be an iterable of buffer-enabled objects that is consumed each time the pickle stream references an out-of-band buffer view. Such buffers have been given in order to the buffer_callback of a Pickler object.
If buffers is None (the default), then the buffers are taken from the pickle stream, assuming they are serialized there. It is an error for buffers to be None if the pickle stream was produced with a non-None buffer_callback.
Other optional arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is True, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.
Utility¶
Sometimes, it is necessary to be able to pickle the data structures created by the
fake unpicklers. While this can be performed using the normal pickle routines from
the python standard library for objects created by FakeUnpickler, this is
not true for objects created by SafeUnpickler. Therefore, the following
class is made available which allows objecs created by SafeUnpickler to be
pickled.
- class corrupy.picklemagic.SafePickler(file, protocol=None, *, fix_imports=True, buffer_callback=None)¶
A pickler which can repickle object hierarchies containing objects created by SafeUnpickler. Due to reasons unknown, pythons pickle implementation will normally check if a given class actually matches with the object specified at the __module__ and __name__ of the class. Since this check is performed with object identity instead of object equality we cannot fake this from the classes themselves, and we need to override the method used for normally saving classes.
This takes a binary file for writing a pickle data stream.
The optional protocol argument tells the pickler to use the given protocol; supported protocols are 0, 1, 2, 3, 4 and 5. The default protocol is 4. It was introduced in Python 3.4, and is incompatible with previous versions.
Specifying a negative protocol version selects the highest protocol version supported. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.
The file argument must have a write() method that accepts a single bytes argument. It can thus be a file object opened for binary writing, an io.BytesIO instance, or any other custom object that meets this interface.
If fix_imports is True and protocol is less than 3, pickle will try to map the new Python 3 names to the old module names used in Python 2, so that the pickle data stream is readable with Python 2.
If buffer_callback is None (the default), buffer views are serialized into file as part of the pickle stream.
If buffer_callback is not None, then it can be called any number of times with a buffer view. If the callback returns a false value (such as None), the given buffer is out-of-band; otherwise the buffer is serialized in-band, i.e. inside the pickle stream.
It is an error if buffer_callback is not None and protocol is None or smaller than 5.