Monday, October 15, 2012

Functional Decomposition: Dynamic Task Execution Ordering, Based on Dependencies

Task Execution Synchronization

Even processes that spawn thousands of threads start from a single entry point that handles the preparations for parallel execution. If ordering is sometimes mandatory for sequential execution, it has even higher level of importance when parallelism is to be used.

To achieve synchronization when dividing a task in multiple sub-tasks using functional decomposition we must know exactly how these sub-tasks are related to each other - in other words to build the dependency chain. Basically independent tasks should always be completed first, and then the other tasks that rely on them to complete some initial work.

In advanced situations hard-coding this dependency chain is not a maintainable solution, and in some complex cases having a static description of the relations between the tasks is impossible.

For maximum maintainability the only thing we would want to set for each sub-task is the list other sub-tasks it depends on, and implement a high-level framework to arrange the execution order.

Defining the sub-tasks (Example)

Consider a scenario with four operations A, B, C and D:
A is an independent routine
B relies on the completion of C
C needs the result from A
D cannot start before B and C have finished.

We`ll define a worker class for each of them, and set the dependencies using attributes:

class WorkerA : Worker { }

class WorkerB : Worker { }

class WorkerC : Worker { }

[DependsOn(typeof(WorkerC), typeof(WorkerB))]
class WorkerD : Worker { }

The attribute in this case is defined as:

class DependsOnAttribute : Attribute
    public Type[] Dependencies { get; set; }

    public DependsOnAttribute(params Type[] t)
        Dependencies = t;

In a real scenario those would inherit from a base worker class a virtual method for loading the data each derived class needs to work with and a method to start the task.

Implementing the routines

Having an easy way to make changes to the dependency chain, in which every worker class knows its dependencies, we need some basic logic to dynamically build the order of these operations at runtime. To implement the priority manager class, the first thing we need is a method to return the dependencies by a given worker class:

private static IEnumerable<Type> GetDependenciesOf(Type t, IEnumerable<Type> remaining)
    var dependsOn = t.GetCustomAttributes(typeof(DependsOnAttribute), false);

    var types = from attr in dependsOn.Cast<DependsOnAttribute>()
                from d in attr.Dependencies
                where remaining.Contains(d) & (d != t)
                select d;
    return types;

To decide where to start from we need a method to return a random (the first) worker that is not dependent on any other workers from a given list (the pending tasks):

private static Type GetFirstIndependent(IEnumerable<Type> remaining)
    var first = (from t in remaining
                 let d = GetDependenciesOf(t, remaining)
                 where d.FirstOrDefault() == null
                 select t).FirstOrDefault();
    return first;

At this point we have everything we need to build the list of tasks arranged dynamically by priority:

public static IEnumerable<Type> BuildPriorityList(IEnumerable<Type> types)
    List<Type> remaining = new List<Type>(types);
    while (remaining.Count > 0)
        Type t = GetFirstIndependent(remaining);
        yield return t;

Putting it all together, we can get the ordered list of tasks by passing all worker types in a random order to the BuildPriorityList method.

The next steps

In general there are two main directions for improvements from here:
- Parallel execution built on the top of the priority list
- Soft coded worker definitions and common compiled logic
We`ll cover both topics in the near future, and build a command framework that processes operational soft-coded sequences, that could be defined in XML.