Skip to content

Architecture

This page explains how DASA works internally.

System Overview

┌────────────────────────────────────────────────────────────────────┐
│                         DASA Pipeline                              │
├────────────────────────────────────────────────────────────────────┤
│                                                                    │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────────────┐   │
│  │   Java      │     │   GraalVM   │     │   seafoam           │   │
│  │   Source    │────▶│   native-   │────▶│   bgv2json          │   │
│  │   (.java)   │     │   image     │     │                     │   │
│  └─────────────┘     └─────────────┘     └─────────────────────┘   │
│                             │                      │               │
│                             ▼                      ▼               │
│                      graal_dumps/           Method.main.json       │
│                      (BGV files)            (Graph JSON)           │
│                                                    │               │
│  ┌─────────────────────────────────────────────────┼───────────┐   │
│  │                  Python Analysis                │           │   │
│  │                                                 ▼           │   │
│  │  ┌─────────────┐     ┌─────────────┐     ┌───────────────┐  │   │
│  │  │ GraphBuilder│────▶│  PyTorch    │────▶│ run_          │  │   │
│  │  │ (load JSON) │     │  Graph      │     │ optimization  │  │   │
│  │  └─────────────┘     └─────────────┘     └───────────────┘  │   │
│  │                                                 │           │   │
│  └─────────────────────────────────────────────────┼───────────┘   │
│                                                    ▼               │
│                                          Found Input Values        │
│                                                    │               │
│  ┌─────────────────────────────────────────────────┼───────────┐   │
│  │                  Validation                     ▼           │   │
│  │            ┌─────────────────────────────────────────┐      │   │
│  │            │  java -ea Main < generated_inputs       │      │   │
│  │            │  → AssertionError (if successful)       │      │   │
│  │            └─────────────────────────────────────────┘      │   │
│  └─────────────────────────────────────────────────────────────┘   │
└────────────────────────────────────────────────────────────────────┘

Stage 1: Graph Extraction

GraalVM Compilation

GraalVM's native-image compiles Java to native code while dumping compiler intermediate representation:

native-image -H:Dump=:1 -H:MaximumInliningSize=0 Main

This creates BGV (Binary Graph Visualization) files in graal_dumps/.

JSON Conversion

The seafoam tool converts BGV to readable JSON:

bgv2json "graal_dumps/.../[Main.main].bgv" > Main.main.json

JSON Structure

{
  "nodes": [
    {
      "id": 33,
      "props": {
        "node_class": {"node_class": "jdk.graal.compiler.nodes.InvokeNode"},
        "targetMethod": "Verifier.nondetInt",
        ...
      }
    },
    ...
  ],
  "edges": [
    {"from": 33, "to": 45, "props": {"name": "next", "type": "Control"}},
    ...
  ]
}

Stage 2: Graph Building

GraphBuilder Class

GraalWrapper/GraphBuilder.py converts JSON to Python objects:

graph_builder = GraphBuilder('Main.main.json')
graph = graph_builder.get_graph(start_node=0, end_node=target)

Node Type Mapping

GraalVM node classes are mapped to Python classes:

GraalVM Node Python Class Purpose
ConstantNode nodes.ConstantNode Literal values
InvokeNode nodes.InvokeNode Method calls
IfNode nodes.IfNode Conditional branches
AddNode nodes.calc.AddNode Integer addition
IntegerLessThanNode nodes.calc.IntegerLessThanNode Comparison
ThrowBytecodeExceptionNode Target node Assertion/exception

Backward Slicing

Only nodes reachable from the target are included:

def do_backward_slicing(self, graph, end_node):
    # BFS from end_node to find all contributing nodes
    visited = set()
    queue = [end_node]
    while queue:
        node = queue.pop(0)
        visited.add(node)
        for edge in graph['edges']:
            if edge['to'] == node and edge['from'] not in visited:
                queue.append(edge['from'])
    return visited

Stage 3: Optimization

Control Flow Modeling

Each node has a controlFlowMultiplicative score (0.0 to 1.0) indicating path reachability:

class BaseNode:
    def __init__(self, node):
        self.controlFlowMultiplicative = torch.tensor(1.0)

At merge points, the minimum is taken:

self.controlFlowMultiplicative = torch.min(
    incoming_flow,
    self.controlFlowMultiplicative
)

Differentiable Conditionals

IfNode uses a sigmoid to make branching differentiable:

class IfNode(BaseNode):
    def exec(self):
        condition = self.inputs['condition']
        # Sigmoid makes this differentiable
        prob_true = torch.sigmoid(condition * annealing_factor)
        prob_false = 1 - prob_true

        self.children['trueSuccessor'].controlFlowMultiplicative *= prob_true
        self.children['falseSuccessor'].controlFlowMultiplicative *= prob_false

Loss Function

The loss maximizes reachability to the target:

loss = -graph[target_node].controlFlowMultiplicative

# Add penalties for complex operations
for node in graph.values():
    if node.node_penalty:
        loss += node.node_penalty

Adam Optimizer

optimizer = optim.Adam(input_parameters, lr=0.1)

for i in range(num_iterations):
    optimizer.zero_grad()

    # Forward pass: execute graph
    execute_graph(graph, inputs)

    # Compute loss
    loss = -graph[target].controlFlowMultiplicative

    # Backward pass: compute gradients
    loss.backward()

    # Update inputs
    optimizer.step()

Stage 4: Validation

Generated inputs are fed to the actual Java program:

inputs = "INPUT_000 42\nINPUT_001 17"
result = subprocess.run(
    ["java", "-ea", "Main"],
    input=inputs.encode(),
    capture_output=True
)

if "AssertionError" in result.stderr.decode():
    return STATE_CORRECT

Key Design Decisions

Why GraalVM?

  • Produces detailed compiler IR with semantic information
  • BGV format captures control flow, data flow, and type information
  • native-image enables ahead-of-time analysis

Why Gradient Descent?

  • Scales to complex constraint combinations
  • Handles non-linear relationships naturally
  • Annealing allows exploration then exploitation

Why Differentiable Execution?

  • Gradients guide search toward targets
  • Soft control flow enables gradient flow through branches
  • Temperature annealing sharpens decisions over time

File Structure

dasa-program-analysis/
├── test.py                 # Main optimization loop
├── run_sv-comp.py          # SV-COMP competition entry
├── GraalWrapper/
│   ├── GraphBuilder.py     # JSON → Python graph
│   ├── MethodRegister.py   # Inlined method tracking
│   └── InputNodeTypes.py   # Type conversion utilities
├── nodes/
│   ├── BaseNode.py         # Base class
│   ├── IfNode.py           # Conditional branching
│   ├── calc/               # Arithmetic operations
│   ├── java/               # Java operations
│   ├── types/              # Complex types
│   └── custom/             # Custom operations
├── scripts/
│   └── entrypoint.sh       # Docker entry point
└── Dockerfile              # GraalVM container