teudng

a Python XML documentation generator

Forest Bond


Table of Contents

Introduction
XML Document Type
Introduction
Element Reference
Command-Line Usage
Implementation
Overview
Module Reference

Introduction

teudng is a package intended for generating documentation for Python modules. Actually, teudng does not directly generate finished documentation. Instead, it generates an XML description of objects defined by a Python module. This XML can easily be transformed into any of a number of standard documentation formats using XSLT processing.

The teudng package contains the teudng command-line utility, the program that generates the XML description, as well as an XML DTD (document type definition), and some XSLT stylesheets that can serve as a base for your own XSLT customizations. (The suggested work-flow, however, is to simply generate docbook XML from the teudng output, and then transform the docbook XML into other formats, such as HTML and PDF).

XML Document Type

Introduction

The teudng document type should seem intuitive to Python developers familiar with typical Python package and module structure. Here is a simple example document describing a fictional package called "mypackage" with a single sub-module, "mymodule":


<?xml version="1.0"?>
<module file="/home/fab/lib/python/mypackage/__init__.pyc" module="mypackage" name="mypackage" public="true">
  <description>an example Python package</description>
  <modules>
    <module file="/home/fab/lib/python/mypackage/mymodule.pyc" imported="false" module="mypackage" name="mymodule" public="true">
      <description>an example Python module</description>
      <functions>
        <function imported="false" module="mypackage.mymodule" name="myfun" public="true">
          <parameters>
            <parameter name="x"/>
          </parameters>
        </function>
      </functions>
    </module>
  </modules>
</module>

      

The XML document type used by teudng is intended to encapsulate as much descriptive information about the module being documented as possible. The result is that there is a high degree of redundancy in an XML document generated by teudng. The intention is that, by including as much information as possible in an accessible document structure, it should be easy to customize the resulting documentation by implementing trivial variations on otherwise standard XSLT stylesheets.

In describing a module, teudng makes every effort to accurately portray the true structure of the Python objects being represented. It is not uncommon for a Python package to define objects in sub-modules, and then import them into the package __init__.py in order to improve accessibility. Thus, the object, while only defined in one place, is visible and accessible in multiple locations. teudng's approach to handling this is to fully describe the object in both locations, marking all but one of those locations with the attribute imported set to true. In other words, the imported attribute indicates whether an object appears at the location being described because it was imported, or because it was actually defined there.

Element Reference

module
a Python module

Attributes

file
the absolute path of the Python source file that defines this module
imported
boolean indicating whether the object being described was defined at this particular location (false), or imported here (true)
public
boolean indicating whether the object is considered public (part of a public API)
name
name of the attribute in the context of the parent location
module
name of the module where this object is defined (if known)
exception
a Python exception

Attributes

imported
boolean indicating whether the object being described was defined at this particular location (false), or imported here (true)
public
boolean indicating whether the object is considered public (part of a public API)
name
name of the attribute in the context of the parent location
module
name of the module where this object is defined (if known)
class
a Python class

Attributes

imported
boolean indicating whether the object being described was defined at this particular location (false), or imported here (true)
public
boolean indicating whether the object is considered public (part of a public API)
name
name of the attribute in the context of the parent location
module
name of the module where this object is defined (if known)
function
a Python function

Attributes

imported
boolean indicating whether the object being described was defined at this particular location (false), or imported here (true)
public
boolean indicating whether the object is considered public (part of a public API)
name
name of the attribute in the context of the parent location
module
name of the module where this object is defined (if known)
data
a Python object that is not a module, exception, class, or function

Attributes

imported
boolean indicating whether the object being described was defined at this particular location (false), or imported here (true)
public
boolean indicating whether the object is considered public (part of a public API)
name
name of the attribute in the context of the parent location
module
name of the module where this object is defined (if known)
modules
the set of modules that are children of the parent node
exceptions
the set of exceptions that are children of the parent node
classes
the set of classes that are children of the parent node
functions
the set of functions that are children of the parent node
datum
the set of datum (objects meeting the criteria for a data element, as described above) that are children of the parent node
base
a base class for a class being described
bases
the set of base classes for the class being described
default
the default value for a function parameter
description
a textual description of an object, usually taken from the object's docstring
inherit
specifies the object that the object attribute being described was inherited from
override
specifies the object whose attribute is being overridden by the object attribute being described
parameter
a function parameter
parameters
the set of parameters for the function being described
value
the value of a data object

Command-Line Usage

Using the teudng utility is fairly straight-forward. It accepts a single argument: the name of a Python module to describe. This name should be the name that is used when importing the module in Python code. The XML module description is printed on standard output; use shell redirection to create an XML file.

Implementation

Overview

This section is a work in progress.

The approach I've taken in implementing teudng is, to a small degree, a novel one. In general, the XML document corresponding with a given Python module has close to a 1:1 correspondence between actualy Python objects and XML elements in the document. I've exploited this to some extent in order to maximize flexibility and generality of the implementaiton.

To be more specific, here is a simplified version of what teudng does when it processes a module and generates an XML tree from it:

  • Gather some basic information about the module itself, and create a 'module' element to represent the module.
  • Take all of the children objects in the module and create an "element set" from them. An element set is an iterable whose members each contain a 3-tuple that represents most of the relevant information regarding a specific child object, as well as contains a DOM Element object that will be the XML representation of that Python object.
  • The element set is processed using a number of "transform functions" each of which iterates over the element set and returns a new element set. Transform functions are permitted to perform any desired action on the element set that is passed to them; the only requirement of a transform function is that it must return a valid element set. (There is one exception to this rule — terminal transform functions — which will be explained in further detail below). Transform functions are applied serially, and in a specific order; the element set returned from a given transform function is passed as input to the following transform function. Note that a transform function may process child objects as well.
  • After all transform functions have been run, the each of the elements in the resulting element set are added as child elements to the parent 'module' element.

The transform functions that are applied to each element set are determined by the type of the objects in the element set. The dict TRANSFORMS is keyed by type strings, and contains a list of transform functions for each type, such that TRANSFORMS['module'] is the list of transform functions that will be applied to objects for which typeOf(obj) == 'module'. In other words, transforms are specified declaratively by type.

It is sometimes desirable for the last transform function to return an element set that does not represent actual Python objects, but, rather, contains elements whose children represent Python objects. For instance, teudng's current behavior is to group function objects under a 'functions' element. This element is returned as part of a terminal element set. Terminal element sets cannot be passed into (most) transform functions, since transform functions generally expect that incoming element sets contain legitimate Python objects that are being described.

Module Reference

Note: this section is automatically generated from source code.

Module teudng.genxml

Functions
applyTransforms(document, parent, element_set, child_type)

Applies the appropriate transform functions (as defined in TRANSFORMS), to element_set. document is the DOM document object; parent is the parent object of the objects represented by element_set. child_type is the type (as returned by typeOf) of the objects.

attributeBase(obj, name)

Returns the base class from which obj inherits the attribute referred t by name, or None if the attribute is not defined in any of the base classes of obj.

bases(document, parent, element_set)

Adds a 'bases' sub-element to each element in element_set. This element has children that accurately reflect the base classes for the object being described, if it has any. If the object doesn't have any base classes, no 'bases' element is added to the element.

compoundTransform(*transforms)

Returns an element_set transform function that is a serial combination of each of the arguments to compoundTransform, which should be any number of element set transform functions. (Each transform function will be called in the order specified, with its output given to the next transform as input).

createElementSet(document, parent, names, objs)

Creates a so-called "element set".

An element set is an iterable of 3-tuples containing the element that will represent that object, the name of the object in the context that it was found, and the object being described.

Here, document is the DOM document object, parent is parent object of the objects that the element set represents, names is an iterable specifying the names of the objects, and objs is iterable containing the actual objects.

descendInto(*types)

A function generator that produces an element_set filter that descends into child objects for which typeOf returns a string given as an argument. For instance, descendInto('module') returns an element set filter that causes the XML generator to process all children for which typeOf(child_obj) is 'module'.

descendsFrom(cls, possible_ancestor)

Determines if class cls is a sub-class of possible_ancestor. If so, returns True; otherwise, returns False.

description(document, parent, element_set)

Adds a 'description' sub-element to the Elements in element_set for all objects represented by element_set that have a docstring. The content of the 'description' element is the object's docstring.

file(document, parent, element_set)

Adds a 'file' attribute to the Elements in element_set for all objects represented by element_set that have a __file__ attribute.

filter(document, parent, element_set)

A compound element set transform function that consists of the functions filterSame, filterBuiltin, filterForbidden, filterForeign. This should be considered the standard object filter.

filterBuiltin(document, parent, element_set)

Filters builtin objects from element_set.

filterForbidden(document, parent, element_set)

Filters objects in element_set whose name is in FORBIDDEN_NAMES.

filterForeign(document, parent, element_set)

Filters objects in element_set not defined in the module being described. Note that, for some types of objects, it is impossible to determine the module in which the object is defined. These objects will never be filtered.

filterSame(document, parent, element_set)

Filters objects in element_set that are the same as the parent.

getDocString(obj)

Get the documentation string for an object.

All tabs are expanded to spaces. To clean up docstrings that are indented to line up with blocks of code, any whitespace than can be uniformly removed from the second line onwards is removed.

getMembers(obj)

Returns the child objects of obj.

getModule(obj)

Return the module an object was defined in, or None if not found.

groupAs(group)

A functon generator that produces an element_set filter that groups all of the elements in the element set under a single elment whose type is specified by group. The return value of the returned function is a new element set containing only the newly created grouping element (a 3-tuple with the new element as its first member, and None as the other two). This is a terminal element set transform function.

importName(name)

Imports a Python module referred to by name, and returns that module. Raises an ImportError if the import fails.

imported(document, parent, element_set)

Adds an 'imported' attribute to each element in element_set. This attribute specifies whether the object being represented exists by the name being documented because it was imported here ('true'), or because it was actually defined here ('false'). If this cannot be discerned, no 'imported' attribute is set.

inherit(document, parent, element_set)

Adds an 'inherit' sub-element to each element in element_set. The 'inherit' sub-element specifies that the object being described is actually an attribute inherited from some base class of parent. The 'inherit' element has two attributes: 'module', which names the module in which the base class is defined, and 'name', which is the name of the attribute that is being inherited. If the object being described is not inherited from a base class of parent (it is defined in parent itself) no 'inherit' sub-element is added.

isBuiltin(obj)

Return true if the object is a built-in function or method.

Built-in functions and methods provide these attributes: __doc__ documentation string __name__ original name of this function or method __self__ instance to which a method is bound, or None

isCallable(obj)

Returns True if obj can be called like a function; returns False otherwise.

isClass(obj)

Returns True if obj is a Python class; returns False otherwise.

isClassExclusively(obj)

Returns True if obj is a class, and isn't anything more than that; returns False otherwise.

isData(obj)

Returns True if obj is a Python data object; returns False otherwise.

isDataExclusively(obj)

Returns True if obj is a data object, and isn't anything more than that; returns False otherwise.

isException(obj)

Returns True if obj is a Python exception; returns False otherwise.

isExceptionExclusively(obj)

Returns True if obj is an exception, and isn't anything more than that; returns False otherwise.

isFunction(obj)

Returns True if obj is a Python function; returns False otherwise.

isFunctionExclusively(obj)

Returns True if obj is a function, and isn't anything more than that; returns False otherwise.

isModule(obj)

Returns True if obj is a Python module; returns False otherwise.

isModuleExclusively(obj)

Returns True if obj is a module, and isn't anything more than that; returns False otherwise.

isPackage(obj)

Returns True if obj is a Python package; returns False otherwise.

isPackageExclusively(obj)

Returns True if obj is a package, and isn't anything more than that; returns False otherwise.

makeBaseElement(document, base)

Returns a 'base' DOM Element corresponding with class base.

makeBasesElement(document, obj)

Returns a new 'bases' DOM Element whose children accurately reflect the base classes of obj, which should be an object that has bases (a Python class, for example).

makeDocument()

Creates an empty DOM document and returns it.

makeParameterElement(
document,
name,
has_default = False,
default = None,
vararg = False,
varkw = False
)

Returns a 'parameter' DOM Element with name. If the parameter has a default value, has_default should be True, and the default value should be specified with argument default.

makeParametersElement(document, obj)

Returns a new 'parameters' DOM Element which has as its children 'parameter' Elements that accurately reflect the parameters accepted by obj, which should be a callable.

module(document, parent, element_set)

Adds a 'module' attribute to each element in element_set. This attribute's value is the name of the module in which the object being represented was defined, if it is known; otherwise, no 'module' attribute is set.

moduleToDomDocument(module)

Returns a DOM document that represents XML that describes module.

moduleToElement(document, module)

Creates the 'module' element that represents module. This is primarily useful as the function that does the real work for function moduleToDomDocument().

moduleToXml(module)

Returns a complete XML document (that describes module) as a string.

name(document, parent, element_set)

Adds a 'name' attribute to the Elements in element_set for all objects represented by element_set (names are based on the second member of each tuple in element_set).

override(document, parent, element_set)

Adds an 'override' sub-element to each element in element_set. The 'override' sub-element specifies that the object being described overrides an attribute of some base class of parent. The 'override' element has two attributes: 'module', which names the module in which the base class is defined, and 'name', which is the name of the attribute that is being overridden. If the object being described does not override any attributes of the parent's base classes, no 'override' sub-element is added.

parameters(document, parent, element_set)

Adds a 'parameters' sub-element to each element in the element_set. The 'parameters' element's children accurately reflect the parameters accepted by each object when it is called like a function.

printReject(name, reason)

Calls sclapp's printDebug to print a prioritized error message to stderr indicating that name was rejected for description because of reason.

public(document, parent, element_set)

Adds a 'public' attribute to the Elements in element_set for all objects represented by element_set. The 'public' attribute is set to 'true' if the object is considered public by teudng, or 'false' if the object is not considered a public object. Currently, an object is considered public if its parent has an __all__ attribute and that object's name is listed there, or, if the parent has no __all__ attribute, it is considered public if its name does not begin with an underscore.

represent(obj)

Returns a string representation of obj, suitable for insertion into documentation

typeOf(obj)

Returns a string indicating the type of obj, as teudng sees things. In other words, teudng recognizes subtle differences between objects that are instances of the same Python type, like exceptions and classes (which are both Python classes), and packages and modules (which are both Python modules). To determine the type of an object, typeOf tests each of the PREDICATES in order until one returns True.

value(document, parent, element_set)

Adds a 'value' sub-element to Elements in element_set. The content of the 'value' element is the string representation of the object's value.

Data

FORBIDDEN_NAMES = ('__builtins__', '__doc__', '__all__', '__bases__', '__base__', '__class__', '__dict__', '__hash__', '__weakref__', '__name__', '__file__')

PREDICATES = {'function': <function isFunction at 0x402b541c>, 'exception': <function isException at 0x402b533c>, 'data': <function isData at 0x402b548c>, 'class': <function isClass at 0x402b53ac>, 'module': <function isModule at 0x4028748c>}

TRANSFORMS = {'function': [<function compoundTransformFn at 0x402b5b1c>, <function name at 0x402b58b4>, <function module at 0x402b5c6c>, <function public at 0x402b58ec>, <function imported at 0x402b5c34>, <function description at 0x402b5924>, <function parameters at 0x402b5b54>, <function override at 0x402b5bc4>, <function inherit at 0x402b5bfc>, <function groupAsFn at 0x402b5e9c>], 'exception': [<function compoundTransformFn at 0x402b5b1c>, <function name at 0x402b58b4>, <function module at 0x402b5c6c>, <function public at 0x402b58ec>, <function imported at 0x402b5c34>, <function description at 0x402b5924>, <function bases at 0x402b5b8c>, <function groupAsFn at 0x402b5df4>], 'data': [<function compoundTransformFn at 0x402b5b1c>, <function name at 0x402b58b4>, <function module at 0x402b5c6c>, <function public at 0x402b58ec>, <function imported at 0x402b5c34>, <function value at 0x402b595c>, <function groupAsFn at 0x402b5ed4>], 'class': [<function compoundTransformFn at 0x402b5b1c>, <function name at 0x402b58b4>, <function module at 0x402b5c6c>, <function public at 0x402b58ec>, <function imported at 0x402b5c34>, <function description at 0x402b5924>, <function bases at 0x402b5b8c>, <function descendFn at 0x402b5e2c>, <function groupAsFn at 0x402b5e64>], 'module': [<function compoundTransformFn at 0x402b5b1c>, <function name at 0x402b58b4>, <function module at 0x402b5c6c>, <function public at 0x402b58ec>, <function file at 0x402b587c>, <function imported at 0x402b5c34>, <function description at 0x402b5924>, <function descendFn at 0x402b5d84>, <function groupAsFn at 0x402b5dbc>]}

TYPES = ('module', 'exception', 'class', 'function', 'data')