corrupy.pickleast — Special pickle construction

The pickleast module provides tools for constructing special pickles capable of executing code and performing other operations during the deserialization of python’s pickle object serialization format.

Technical Background

The deserialization machinery of the pickle format is powerful enough to construct arbitrary graphs of dynamically created python objects. Due to the dynamic nature, this means it is also capable of doing a lot more than just that.

The pickle format can be conceptually seen as a programming language that targets a very simple stack machine. It has instructions for loading types, constructing them, and memoizing them. Crucially, it has an instruction intended for object creation that simply calls a value on the stack with other values as arguments. This means we can not just construct objects, but also call functions. And since we can load builtins like getitem(), this instruction can be used to call object methods, do slice indexing, perform math, etc. The main limitation to this is that there is no control flow in the execution. The pickle bytecode will be executed linearly, and as such it is impossible to encode looping constructs in the native pickle control flow.

This can be worked around by calling builtins like eval() and just storing the python code as a string, but one can imagine that these functions would be blacklisted during unpickling. Therefore, this module implements as much python functionality as possible in pure pickle bytecode.

Interface

To embed special behaviour in a pickle bytestream, this module provides a set of types based on the PickleBase type which can be placed anywhere in a normal python datastructure. This datastructure can then serialized using the special AstPickler implementation which will embed the special instructions into the bytestream.

corrupy.pickleast.dumps(obj, protocol=2)

Create a pickle from an object with special behaviour for PickleBase nodes, writing the result to a bytes object.

corrupy.pickleast.dump(obj, file=None, protocol=2)

Like dumps(), but writes the pickle to a file-like object.

class corrupy.pickleast.AstPickler(file, protocol=None, *, fix_imports=True, buffer_callback=None)

A pickle.Pickler subclass with special behaviour for PickleBase instances.

This takes a binary file for writing a pickle data stream.

The optional protocol argument tells the pickler to use the given protocol; supported protocols are 0, 1, 2, 3, 4 and 5. The default protocol is 4. It was introduced in Python 3.4, and is incompatible with previous versions.

Specifying a negative protocol version selects the highest protocol version supported. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

The file argument must have a write() method that accepts a single bytes argument. It can thus be a file object opened for binary writing, an io.BytesIO instance, or any other custom object that meets this interface.

If fix_imports is True and protocol is less than 3, pickle will try to map the new Python 3 names to the old module names used in Python 2, so that the pickle data stream is readable with Python 2.

If buffer_callback is None (the default), buffer views are serialized into file as part of the pickle stream.

If buffer_callback is not None, then it can be called any number of times with a buffer view. If the callback returns a false value (such as None), the given buffer is out-of-band; otherwise the buffer is serialized in-band, i.e. inside the pickle stream.

It is an error if buffer_callback is not None and protocol is None or smaller than 5.

AST types

class corrupy.pickleast.PickleBase

This is the abstract base class that all pickleast AST types derive from. When AstPickler encounters an instance of this class during serialization, it’s functionality will be serialized into the bytestream.

corrupy.pickleast.__call__(*args, **kwargs)

Shorthand method for creating a Call AST node. PickleBase()(*args) is identical to Call(PickleBase(), *args)

Basic operations

These AST nodes all correspond to individual pickle bytecodes:

class corrupy.pickleast.Wrap(obj)

A simple wrapper class which transforms obj into a PickleBase so the magic methods of picklebase can be used.

class corrupy.pickleast.Call(callable, *args)

This operation represents calling an object on the pickle VM stack

This will call object and a set of positional arguments (or no arguments), the object will be called with these arguments at unpickling time.

class corrupy.pickleast.SetAttributes(obj, **kwargs)

This operation represents calling __setattr__ or __dict__.update on the object.

It will call __dict__.update with the given keyword arguments and return the object.

class corrupy.pickleast.Imports(module, name, cache=True)

This class will return the object name in module module at unpickling time.

class corrupy.pickleast.Import(obj, cache=True)

This wrapper class will return obj at unpickling time.

Requirements: obj is a top level object in a module.

class corrupy.pickleast.Sequence(*objects, **kwargs)

This class represents a series of objects, where only the last return value of the sequence will be returned at unpickling time. If reversed is True then the first object will be returned instead of the last object.

class corrupy.pickleast.SetItem(object, key, value)

This class provides the equivalent of object[key] = value. This returns object

class corrupy.pickleast.Assign(varname, value)

This class stores value in varname. This is implemented as pushing the value on to the memo. This returns value.

class corrupy.pickleast.Load(varname)

This class loads the value from varname. This is implemented by getting the value from the memo. This returns value

Useful functions

These AST nodes are all Import instances of the relevant builtin function:

corrupy.pickleast.List
corrupy.pickleast.Dict
corrupy.pickleast.Set
corrupy.pickleast.Tuple
corrupy.pickleast.Frozenset
corrupy.pickleast.Str
corrupy.pickleast.Int
corrupy.pickleast.Bool
corrupy.pickleast.Any
corrupy.pickleast.All
corrupy.pickleast.Map
corrupy.pickleast.Zip
corrupy.pickleast.HasAttr
corrupy.pickleast.GetAttr
corrupy.pickleast.SetAttr
corrupy.pickleast.DelAttr
corrupy.pickleast.IsInstance
corrupy.pickleast.IsSubclass
corrupy.pickleast.Iter
corrupy.pickleast.Next
corrupy.pickleast.Range
corrupy.pickleast.Globals
corrupy.pickleast.Locals
corrupy.pickleast.Compile

Operation analogues

These AST nodes represent basic python operations that aren’t built into the pickle machinery and therefore constructed using builtin functions:

corrupy.pickleast.CallMethod(obj, attr, *args)

A convenience function for calling methods.

corrupy.pickleast.GetItem(obj, attr)

The equivalent of obj[attr].

corrupy.pickleast.DelItem(obj, attr)

The equivalent of del obj[attr].

corrupy.pickleast.Ternary(conditional, true_value, false_value)

A simple ternary statement. Due to the limitations of pickling both branches will be executed but it is possible to have a conditional final result.

corrupy.pickleast.AssignGlobal(varname, value, module=None)

Assigns value to varname in the global namespace (to interact with exec and eval blocks) This is implemented as globals()[varname] = value.

This returns the global namespace.

corrupy.pickleast.LoadGlobal(varname, module=None)

Loads varname from the global namespace

This is implemented as globals()[varname]

Code execution

These AST nodes allow arbitrary python code to be executed during the unpickling process:

corrupy.pickleast.Eval(code, globals=<corrupy.pickleast.Call object>, locals=None)

This node executes code in the global (pickle module) namespace and returns the result

corrupy.pickleast.Exec(string, globals=<corrupy.pickleast.Call object>, locals=None, filename='<pickle>')

This node executes string in the global namespace (this will usually be the pickle module namespace)

It returns None

corrupy.pickleast.ExecTranspile(string, foreign=())

This node takes as input a string of python code, and transpiles this to pickle code using TransPickler. See the documentation of TransPickler for details.

corrupy.pickleast.ExecAst(string, globals=<corrupy.pickleast.Call object>, locals=None, filename='<pickle>')

Takes a string of python code and compiles it into an object that, after being serialized with the ASTPickler, will execute the python code when unserialized.

The mechanism used for this is compiling the code to an AST, serializing this AST and then calling eval(compile()) on the ast.

Shell execution

The quickest proof for why you should not unpickle untrusted data.

corrupy.pickleast.System(string)

This will execute string as a shell command

Module manipulation

corrupy.pickleast.DeclareModule(name, retval=True)

Declares a module. This creates an empty module and inserts it in the sys.modules namespace, if retval is True then the module will be returned else sys.modules will be returned.

corrupy.pickleast.DefineModule(name, code, executor=<function Exec>)

This ‘defines’ a module by executing a block of code in the namespace Of said module. Returns None

corrupy.pickleast.GetModule(name)

This imports module name. Note that, if you ever need something contained in a module, it is more efficient to just use the native Import or Imports.

corrupy.pickleast.Module(name, code, retval=True, executor=<function Exec>)

This node creates a module at importing time. It simply takes the name of the module and the code in the module as a string. This is done by first declaring the module, and then defining it. If circular references between modules are problematic, the declaring and defining has to be ordered manually.

it returns the module if retval is set to True, else it returns sys.modules

Utilities

corrupy.pickleast.pprint(ast, file=None)

Pretty print a Pickle AST to a file or stdout.

This is shorthand for AstPrinter(file).dump(ast).

class corrupy.pickleast.AstPrinter(out_file=None, indentation='    ')

The internal implementation of pprint().

class corrupy.pickleast.TransPickler(foreign)

A somewhat experimental way of directly transpiling a python code ast to a pickle ast. This is a subclass of ast.NodeVisitor.

Not all python constructs are supported (no loops, conditionals, etc). Semantics of some operations may differ. External data can be passed in through the foreign argument, which can be accessed in the python code by referring to variable names _0, _1, etc, where the number represents the index of this value in the foreign list.

class corrupy.pickleast.PyAstCompiler

This is a more efficient way of embedding python ast’s in pickles.

This ast.NodeTransformer takes a python AST and returns an object hierarchy that, when pickled using the ASTPickler compresses in a more optimized format due to it calling the ast constructors directly.

Use it by calling PyAstCompiler.visit(ast_node).

corrupy.pickleast.optimize(origpickle, protocol=2)

optimizes a pickle by stripping extraenous memoizing instructions and embedding a zlib compressed pickle inside the pickle.

corrupy.pickleast.optimize_puts(p)

Optimizes a pickle bytecode given in p by assigning the low 256 BINPUT opcodes to the most used GET opcodes.

Should only be used for pickle protocol 1 - 3, as it does not handle the MEMOIZE opcode.

Returns the modified pickle bytecode.