Predictive Analysis for Detecting Serializability Violations through Trace Segmentation

Arnab Sinha, Sharad Malik
Princeton University

Chao Wang, Aarti Gupta
NEC Laboratories America

Abstract—We address the problem of detecting serializability violations in a concurrent program using predictive analysis, where a violation is detected either in an observed trace or in an alternate interleaving of events in that trace. Under the widely used notion of conflict-serializability, checking whether a given execution is serializable can be done in polynomial time. However, when all possible interleavings are considered, the problem becomes intractable. We address this in practice through a graph-based method, which for a given atomic block and trace, derives a smaller segment of the trace, referred to as the Trace Atomicity Segment (TAS), for further systematic exploration. We use the observed write-read pairs of events in the given trace to consider a set of events that guarantee feasibility, i.e., each interleaving of these events corresponds to some real execution of the program. We call this set of interleavings the almost view-preserving (AVP) interleavings. We show that the TAS is sufficient for finding serializability violations among all AVP interleavings. Further, the TAS enables a simple static check that can prove the absence of a violation. This check often succeeds in practice. If it fails, we perform a systematic exploration over events in the TAS, where we use dynamic partial order reduction with additional pruning to reduce the number of interleavings considered. Unlike previous efforts that are less precise, when our method reports a serializability violation, the reported interleaving is guaranteed to correspond to an actual execution of the program. We report experimental results that demonstrate the effectiveness of our method in detecting serializability violations for Java and C/C++ benchmark programs.

I. INTRODUCTION

The atomicity of a set of operations is a desired correctness condition for concurrent programs. Informally, atomicity refers to the non-interference between shared accesses residing outside and inside an atomic region. A recent study shows that 69% of concurrency bugs are atomicity violations [1] and there is significant interest in the research community to address this problem [2–5].

There are different correctness notions associated with atomicity such as conflict-serializability and linearizability. Informally, an execution is conflict-serializable if it is equivalent, in some sense, to a serial execution where the individual atomic regions are executed sequentially [6, 7]. Consider the program execution shown in Figure 1. It has two concurrent threads \( T_1 \) and \( T_2 \), a globally visible pointer variable \( p \) and two thread-local variables \( a \) and \( b \). The relevant events of interest are the read/writes to the shared variable \( p \). Let \( p=0 \) initially. The three statements in thread \( T_1 \) should be executed atomically as they belong to a user transaction indicated by atomic\{\ldots\}. Note that this is a serial execution, since the shared pointer \( p \) is accessed atomically by \( T_1 \) between events \( e_1 \) and \( e_4 \). However, Figure 2(a) shows another execution order where event \( e_4 \) is interfering with the atomic block and hence breaks the conflict-serializability. Henceforth, we interchangeably refer to this kind of violation as a conflict-serializability violation or an atomicity violation.

Checking conflict-serializability of a given trace can be done in polynomial-time [5, 7, 8]. This problem is referred to as the monitoring problem [4, 8]. For concurrent programs, one may also be interested in checking conflict-serializability for all possible interleavings of a given trace. This is referred to as the prediction problem. In predictive analysis, a violation is detected either in the observed trace or in an
alternate interleaving of events in that trace.

Broadly speaking, existing predictive analysis methods can be classified into two categories based on their precision. Methods in the first category detect must-violations, i.e. the reported violation must be a real violation. Methods in the second category detect may-violations, i.e. the reported violation may be a real violation. They also differ in the coverage they target when exploring alternate interleavings, as shown in Figure 3.

Methods in the first category often use under-approximated analysis, e.g. based on Lamport’s happens-before causality relation and its extensions [9–11], the maximal causal model [12], SideTrack [13], and our proposed method TAS. A notable exception, which uses precise rather than under-approximated analysis, is the concurrent trace program (CTP) model [14]. However, achieving the best precision and high coverage is costly. Furthermore, in practical situations, it is not always possible to heavily instrument the program to extract a precise predictive model.

Methods in the second category often use over-approximated analysis, e.g. the Eraser-style lockset analysis [15], the meta-analysis based on lock acquisition histories [16], and the UCG analysis based on universal causality graph [17]. Note that in both the meta-analysis and the UCG-analysis, a reported erroneous schedule is not guaranteed to appear in some program execution, i.e. the reported violation may be bogus. The reason is that these methods track only the control flow and synchronizations while ignoring the data flow. This is in sharp contrast to our proposed method, which tracks data flow.

![Figure 3. Classification of predictive analysis techniques based on precision and coverage.](image)

**A. Motivation**

The motivation for our current work is to explore a different trade-off point in this space of coverage and precision. We want to design a predictive model that admits interleavings likely to exhibit errors, while sticking with feasible interleavings and using scalable analysis. In some sense, we want to explore the limits of coverage one can achieve with feasible traces extracted from the observed trace. In particular, we want to avoid the post-analysis runtime check for feasibility. While other methods [12, 13] also have similar goals, the key ingredients of our method are different – the predictive model, the target set of interleavings, and the reductions we use.

We target general conflict-serializability violations, since they are very challenging and involve reasoning over all possible shared variables over all threads. Most previous efforts do not address such violations for more than two threads. Indeed, our predictive model and techniques can be easily adapted for detection of simpler concurrency violations (e.g. serializability violations for a single variables in two threads, or dataarces, etc.).

Consider the example shown in Figure 2 which motivates our approach. Although both executions (a) and (b) are unserializable interleavings of events in the trace shown in Figure 1, the violation in (a) is real, while (b) is not guaranteed to be real. In execution (b), event \( e_2 \) precedes \( e_1 \), assigning a different value to the shared variable \( p \) than in the original trace. This leads to the local variable \( b \) in \( T_1 \) being assigned a different value, thereby affecting its control flow. Specifically, when \( b \) equals zero, the program cannot execute event \( e_3 \), and thus there is no real atomicity violation.

**B. Overview of our Approach**

We use a graph-based predictive model with clock vectors to track events in the observed trace, where events correspond to visible operations in the program, i.e. read/write accesses on shared variables and synchronization operations. The clock vectors capture the causality (happen-before) constraints in the usual way [9]. In particular, we couple each read event on a shared variable with the last write event on the same variable that happened before the trace. Note that this coupling may be inter-thread or intra-thread. Assume that during our exploration of alternate interleavings, all synchronizing constraints are respected (details provided later). Now, consider an interleaving of thread events, where the read events respect their coupling. Clearly this interleaving is guaranteed to correspond to a real execution. This is related to the well-known notions of view equivalence and view serializability [7].

However, if we explore complete view-preserving interleavings only, then we cannot hope to find view-serializability violations, and hence no conflict-serializability violations either. The idea is to consider interleavings where any thread is allowed to break its read-coupling, i.e. read from a different write event, but then to skip all subsequent events in this thread and its dependent events in other threads since they can no longer be guaranteed. We call this set of interleavings the *almost view-preserving (AVP)* set, since within each thread we preserve the view upto the broken read-coupling i.e. maintain the *view preserving prefix*.

In most practical cases, the program traces are very large with an even larger number of interleavings. We address this by designing a graph-based algorithm that focuses on a given
atomic block (at a time), to derive a smaller segment of the trace for exploration. We show that the segment we derive is sufficient to detect serializability violations for the given atomic block under all AVP interleaveings. We refer to this algorithm as Trace Segmentation and the segment as a Trace Atomicity Segment (TAS). We can further refine the TAS by shrinking it iteratively, depending on the access patterns in the given atomic block. The TAS also allows us to perform a simple static check before performing a search over all AVP interleaveings. If the static check passes, then no violation is possible among the AVP set. If the static check fails, i.e. a violation is possible, we perform systematic exploration on events in the TAS, where we use known techniques such as dynamic partial order reduction (DPOR) [18, 19], with additional pruning and heuristics optimized for our setting, to reduce the number of interleaveings to be checked. For each interleaving, the final step involves checking whether it violates conflict-serializability. We adopt the approach of [5], which builds a conflict (DSR) graph [7] for the interleaving and checks for a cycle in the graph.

We have implemented our method to work on traces generated from C/C++ and Java benchmark programs. The results show that our TAS-based method is effective at finding real violations in practice, while the TAS reduction and static check greatly help to improve performance (in comparison to a systematic search over all feasible traces).

This work makes the following contributions.

- We focus on finding real violations. Therefore, we explore a set of almost view-preserving (AVP) interleaveings, where violations consist only of those thread-local prefixes that are guaranteed to correspond to some real program execution.
- We use a graph-based model for predictive analysis, where we derive a smaller segment of the trace called a trace atomicity segment (TAS). We prove that the TAS we derive is sufficient for finding conflict-serializability violations among all AVP interleaveings.
- The TAS also enables a quick static check, to prove easy cases where no violations are possible among the target set of interleaveings. This check often succeeds in practice.
- We propose a systematic exploration of the TAS when the static check fails. This utilizes dynamic partial order reduction and additional search pruning to further reduce the number of interleaveings to be checked.
- We have implemented our TAS-based method and show promising results on C/C++ and Java benchmark programs.

II. PRELIMINARIES AND PROBLEM FORMULATION

We consider a concurrent program as consisting of a set of threads $T_1, \ldots, T_k$ and a set of shared variables. Let $tid = \{1, \ldots, k\}$ be the set of thread indices. The remaining aspects of the program, including the control flow and the expression syntax, are intentionally left unspecified for generality.

A. Program Trace Model

An execution trace $\rho = e_1, e_2, \ldots, e_n$ is a sequence of events, each of which is an instance of a visible operation during the execution of the concurrent program. For Java programs as well as C/C++ programs using POSIX Threads, the read/write accesses to shared variables and the synchronization operations are regarded as visible operations. All other operations are regarded as thread-local and are omitted in the execution trace. An event is represented as a 5-tuple $(tid, eid, type, var, child)$, where $tid$ is the thread index, $eid$ is the event index (that starts from 1 and increases sequentially within a thread), $type$ is the event type, $var$ is either a shared variable (in read/write) or a synchronization object, and child is the child thread index (in thread creation/join). The event type is one of $\{read, write, fork, join, acquire, release, wait, notify, notifyall, atomic_begin, atomic_end\}$. They can be classified into four categories:

1) read and write denote the read and write accesses to a shared variable var;
2) fork and join denote the creation and termination of a child thread, where event $(tid, eid, fork, -, child)$ creates a child thread whose index is child, and event $(tid, eid, join, -, child)$ joins the child thread back;
3) atomic_begin and atomic_end denote the beginning and end of an atomic block, respectively. The marker events (atomic_begin and atomic_end) may come from manual annotation of the program source code, or may be mined automatically [2, 20].
4) The rest correspond to synchronization operations over locks and condition variables. The synchronized keyword in Java is translated into a pair of acquire and release events over the lock implicitly associated with an object.

An execution trace $\rho$ provides a total order on the events appearing in $\rho$. We derive a partial order by retaining only the set of must-happen-before constraints, which collectively are sufficient to guarantee feasibility of the serializations of this partial order.

B. The Partial Order Graph

Let $G(V, E)$ be the partial order graph which is constructed from an execution trace as follows. $V(G)$ is the set of vertices, each of which represents an event in the trace (we use vertices and events interchangeably when the context is clear). The directed edge $(u, v) \in E(G)$, the set of edges, if and only if at least one of the following conditions is true:

1) Program Order edge: when $u.tid = v.tid$ and $u.eid = v.eid - 1$, i.e. event $u$ immediately precedes event $v$ in program order.
2) Read-After-Write edge: when $v$ reads the value of a shared variable written by $u$ in the given trace $\rho$. This may be an intra- or an inter-thread edge.
3) Sync. edge: We consider the following sync. events.
• \( u \) is a \textit{fork} event and \( v \) is the first event in the child thread.
• \( u \) is a terminating event for a thread and \( v \) is the corresponding \textit{join} event.
• \( u \) is a \textit{wait} event and \( v \) is the corresponding \textit{notify} event.

The PO (program order) edges enforce the program order within each thread. (For ease of exposition, we assume a sequential consistency memory model; other weaker memory models can be handled by relaxing these constraints.) The RAW (read-after-write) edges are needed to explore the AVP interleavings, which guarantee that the interleavings are feasible. Note that there are no WAW (write-after-write) and WAR (write-after-read) edges in \( G \) itself, but these edges are tracked while detecting atomicity violations in the interleavings. Sync. edges are also used for generating feasible interleavings. Note there are no edges for lock events - lock semantics are guaranteed during the exploration of interleavings.

\textbf{Definition of a read-couple}: If there is a RAW edge from \( a \) to \( b \) in \( G \), then the pair \((a, b)\) is a \textit{read-couple}. If in a different interleaving, \( b \) reads from a different event \( c \), we say that the read-couple for \( b \) in \( G \) is broken.

\textbf{Definition of vector}: Each vertex \( v \) in \( G(V, E) \) has a vector \( \text{vec}_v \) of \( k \) integers, where the \( i^{th} \) integer denotes the \( \text{eid} \) of the latest event in thread \( i \) that must occur before \( v \) \[9\]. (We claim no novelty in defining the vector but provide our definition for clarity.) The vectors are computed in a topological order on the vertices. The vector of a vertex is defined as follows:

\[
\text{vec}_v[i] = \begin{cases} 
 v.eid - 1 & \text{if } (i = v.tid) \\
 \max(\text{set}_1 \cup \text{set}_2 \cup \{0\}) & \text{otherwise}
\end{cases} 
\]  

(1)

where,

\[
\text{set}_1 = \{ \text{vec}_u[i] | (u.tid \neq i) \land (u, v) \in E(G) \} 
\]

and,

\[
\text{set}_2 = \{ u.eid | (u.tid = i) \land (u, v) \in E(G) \}
\]

\textbf{Example}: An example of a partial order graph with 3 threads is shown in Figure 4. The rectangular block in this figure represents an atomic block. The number inside each vertex is the \( \text{eid} \). The vectors are shown in square brackets next to the vertices. For convenience, we shall refer to vertex 1 in the \( 2^{nd} \) thread as vertex 2.1.

Whether two vertices \( u \) and \( v \) in graph \( G(V, E) \) may happen in parallel (MHP) can be conveniently checked by comparing their vectors. Formally, we define the following \textit{event precedence relations} based on the vectors. For two vertices \( u \) and \( v \),

- \textbf{MHB}: \( u \) must happen before \( v \), denoted \( u < v \), iff \( \text{vec}_v[u.tid] \geq u.eid \).
- \textbf{MHP}: \( u \) may happen in parallel with \( v \), denoted \( u \parallel v \), iff \( \text{vec}_v[u.tid] < u.eid \), and \( \text{vec}_u[v.tid] < v.eid \).

In our example (Figure 4), note that vertex 2.4 may happen in parallel with vertex 1.4 because, based on the two vectors \( \text{vec}_{1.4} = [3,3,0] \) and \( \text{vec}_{2.4} = [0,3,4] \), neither

\[\begin{align*}
\text{(vec}_{1.4} & = [3,3,0] \\
\text{vec}_{2.4} & = [0,3,4]
\end{align*}\]

Figure 4. The partial order graph with vectors (computed according to Eq. 1). Each vertex is an event from the execution trace. The dashed edges are PO edges, and the solid edges are RAW/Sync. edges. Note that RAW edges can both be inter- and intra-thread edges.

Figure 5. The TAS in a partial order graph \( G \), with respect to the atomic block (shaded rectangular region). The upper and lower frontiers of the TAS are given by \{8,5,6\} and \{17,11,11\}, respectively.
vertex must happen before the other. In contrast, vertex 2.2 (with vector [0,1,0]) must happen before vertex 1.4 (with vector [3,3,0]).

Definition of atomic block: An atomic block $A$ is a subgraph of $G$ such that $V(A)$ is the set of events between the special marker events atomic\_begin and atomic\_end, all within the same thread that includes the marker events. The edge set is defined as $E(A) = \{(u, v) \mid u, v \in V(A) \& u.eid = v.eid - 1\}$. Let, $T_A$ be thread containing the atomic block and $T_A.tid$ denote the thread index of vertices in $V(A)$. In a concurrent execution trace, there may be multiple atomic blocks in different threads. Our method performs violation detection on a single atomic block at a time.

C. Almost View Preserving (AVP) Interleavings

Let $\rho$ be the given trace. We are targeting $AVP(\rho)$ as the maximal set of interleavings that are guaranteed feasible without interpreting the data values, and without a post-analysis/runtime check. This makes it an interesting trade-off point between search complexity and coverage. It is derived as follows. Let $G'$ be a partial order graph derived from $\rho$ similar to $G$ except it contains only the program order and sync. edges. Let $t \in T$, the set of all interleavings consistent with $G'$. Let $v$ be a read event in $t$. For each read event $v$ in $t$ if the read couple for $v$ in $\rho$ is broken in $t$ then all vertices $w$ for which $v$ must-happens-before $w$ in $G$ are deleted from $t$ resulting in $t'$. $AVP(\rho)$ is the set of all $t'$ s.t. $t \in T$.

In Figure 1, $\rho = e_1, e_2, e_3, e_4$. Therefore, $AVP(\rho) = \{(e_1, e_2, e_3, e_4), (e_1, e_2, e_4, e_3), (e_1, e_4, e_2), (e_2, e_1, e_4)\}$. As shown in our experimental results, the AVP set constitutes a large number of interleavings in practice. Serbănuță et al. [12] also proposed a maximal model based on actual observed values, but they allow only one value mismatch. Thus their set of interleavings and the AVP set are incomparable.

D. A Violation Path

It is well-known that there is a conflict-serializability violation if there exists a cycle in the D-serializable (DSR) graph [5, 7]. In our setting, for any alternate interleaving we conceptually construct a conflict graph $G_C$, (similar to DSR graph) where vertices are the read/write events and edges represent conflicting accesses and program order. There is an atomicity violation if we can find a path that starts and terminates within the atomic block, and visits at least one vertex outside the atomic block. For example, for the trace in Figure 1, $e_2 \rightarrow e_4 \rightarrow e_3$ is such a path in the alternate interleaving $\{e_1, e_2, e_4, e_3\}$. Next we formalize this notion.

Let, $V_{RW}(G) \subseteq V(G)$ be the set of vertices which read/write global variables. Similarly, $V_{RW}(A) = V(A) \cap V_{RW}(G)$. Let, $\rho' \in AVP(\rho)$. In the conflict-graph $G_C(V(G_C), E(G_C))$, $V(G_C)$ is the set of read/write events in $\rho'$ and $E(G_C)$ consists of program order edges between the vertices in $V(G_C)$ and conflicting accesses: read-after-write (RAW), write-after-read (WAR) and write-after-write (WAW).

A violation path $P(\rho')$ is a sequence of alternating vertices and edges in $G_C$ such that it originates from $A$, visits at least one vertex outside $A$ and terminates within $A$, with no repetition of vertices. The vertices and edges in $P(\rho')$ are denoted by $V(P(\rho')) \subseteq V(G_C)$ and $E(P(\rho')) \subseteq E(G_C)$, respectively. There exists an atomicity violation for the atomic block $A$ in $\rho$ iff there exist a violation path $P(\rho')$. The vertex at which the path leaves (enters) the atomic block is referred to as $s(P(\rho')) (t(P(\rho')))$. A violation path $P(\rho')$, if it exists, is characterized as follows:

- $s(P(\rho')).eid < t(P(\rho')).eid$ where, $s(P(\rho'))$, $t(P(\rho')) \in V_{RW}(A)$
- $V(P(\rho')) \cap V_{RW}(A) = \{\}$, where $V_{RW}(A)$ is the set of read/write vertices outside $A$, i.e., $V_{RW}(A) = V(G_C)\setminus V_{RW}(A)$.

The predictive analysis problem considered here is to detect the existence of a violation path $P(\rho')$ for all $\rho' \in AVP(\rho)$.

III. THE TRACE ATOMICITY SEGMENT (TAS)

For an atomic block $A$, we identify the TAS, i.e. a subgraph $Z_A \subseteq G$ that is sufficient for the purpose of detecting the serializability violations among $AVP(\rho)$. Intuitively, the TAS captures the events that may happen in parallel with events in $A$ till any broken reads are encountered. Next, we define the boundaries of this TAS referred to as the frontiers.

A. The Frontiers

Let vertices $u_0$ and $u_\infty$ be the ‘first’ and ‘last’ read/write vertices in $V_{RW}(A)$ respectively. Therefore, $u_0.eid = \min\{v.eid \mid v \in V_{RW}(A)\}$ and $u_\infty.eid = \max\{v.eid \mid v \in V_{RW}(A)\}$. A frontier is a $k$-tuple, i.e. a vector, where the $i^{th}$ integer represents the $eid$ of some event in the $i^{th}$ thread. A TAS is bounded by two frontiers (upper and lower with respect to $G$) defined as follows (see Figure 5):

- Upper Frontier: $UF_A$ is a frontier such that $UF_A = vec_{u_0}$. Any coupled read that has its writer ($wr$) below the upper frontier (i.e. $wr.eid > UF_A[wr.tid]$) is referred to as read coupled below $UF_A$.
- Lower Frontier: $LF_A$ is a frontier such that

$$LF_A[i] = \begin{cases} u_\infty.eid + 1 & \text{if } i = T_A.tid \\ \min\{v.eid \mid v \in F\} & \text{if } i \neq T_A.tid \text{ and } F \neq \{\} \\ v.tid + i, vec_v[T_A.tid] & \text{if } i \neq T_A.tid \text{ and } F = \{\} \end{cases}$$

where $\phi$ denotes a special terminating event, added to every thread at the end.

The subgraph of $G$ between the upper frontier and the lower frontier is called the TAS $Z_A$. Formally,

- $V(Z_A) = \{u \mid u \in V(G) \land (u.eid > UF_A[u.tid]) \land (u.eid < LF_A[u.tid])\}$

- $E(Z_A) = \{(u,v) \mid u,v \in V(Z_A) \land (u,v) \in E(G)\}$

A vertex $v$ is said to be ‘on the $LF_A’ (‘below the $LF_A’
iff $LF_A[v.tid] = v.eid$ ($LF_A[v.tid] < v.eid$). Similarly, a vertex $v$ is said to be ‘on the UF’ (‘above the UF’) iff $UF_A[v.tid] = v.eid$ ($UF_A[v.tid] < v.eid$). The intuition behind the frontiers is that no vertex above the upper frontier (below lower frontier) may appear after (before) any vertex $v \in V_{RW}(A) \subseteq V(Z_A)$ in any interleaving in AVP($\rho$).

Example: Figure 5 shows an example of a TAS for the atomic region in a partial order graph with three threads. The upper frontier is $UF_A$={8,5,6}. The lower frontier is $LF_A$={17,11,11}. Note that the vertices 1.8, 2.5 and 3.6 are on the $UF_A$. Similarly, vertices 1.17, 2.11 and 3.11 are on the $LF_A$. The subgraph in between the frontiers is the TAS $Z_A$. Although, the frontiers are simple vectors, they are represented as cuts in Figure 5 as the frontiers demarcate the boundaries of the TAS. Next, we show the sufficiency of TAS for detecting the existence of a violation path for a given atomic block.

B. Sufficiency of TAS

Theorem 1: [TAS THEOREM] Let, $v,v' \in V_{RW}(G)$ such that $v.eid \leq UF_A[v.tid], v'.eid \geq LF_A[v'.tid]$, $\forall \rho' \in AVP(\rho)$, there does not exist a violation path $P(\rho')$ s.t. $P(\rho')$ goes through $v$ or $v'$.

Figure 6. Case 1 of Theorem 1: $v$ lies above $UF_A$. Let, $(x,y)$ be the broken read-couple on path connecting $v$ and $u_0$ in $G$, such that $s(P(\rho'))$ precedes $v$ in $P(\rho')$.

Proof Sketch: If a violation path goes through $v$ or $v'$ then (a) $s(P(\rho'))$ precedes $v$ in $P(\rho')$, or (b) $t(P(\rho'))$ follows $v'$ in $P(\rho')$. We consider both cases separately.

Case 1: There exist a violation path $P(\rho')$ such that $s(P(\rho'))$ precedes $v$ in $P(\rho')$. Since $v$ lies above $UF_A$, there must exist at least one path in $G$ connecting $u_0$ and $v_0$ in $G$ such that $s(P(\rho'))$ precedes $v$ in $P(\rho')$.

Case 2: Let there be a violation path $P(\rho')$ such that $t(P(\rho'))$ follows $v'$ in $P(\rho')$. Since $v'$ lies below $LF_A$, there must exist at least one path in $G$ connecting $u_\infty$ and $v'$ and in all those paths $u_\infty$ precedes $v'$ (see Figure 7). Moreover, $u_\infty$ follows $t(P(\rho'))$ in $G$. Therefore, at least one RAW edge in all paths in $G$ connecting $u_\infty$ and $v'$ must be broken in $\rho'$, otherwise we would have a cycle connecting $t(P(\rho'))$, $u_\infty$ and $v'$ in $\rho'$. Let, $(x',y')$ be the broken read-couple in $G$ which lies on a path (say $\tau'$) connecting $u_\infty$ and $v'$ in $G$. Considering the cases on $y'$.

Case 2.1: $y' = v'$. From the premise, $v'.eid \geq LF_A[v'.tid]$. There are two possibilities.

Case 2.1.1: $v'.eid = LF_A[v'.tid]$. The RAW edge $(x',v')$ in $E(G)$ implies $x' < v'$. If $v'$ is on the $LF_A$, then $x'$ must be within $Z_A$. But, then $v'$ is a read-coupled below $UF_A$. Therefore, by Eq. 2, $v'$ cannot be on the $LF_A$.

Case 2.1.2: $v'.eid > LF_A[v'.tid]$. There must exist $w$ such that $w.eid = LF_A[v'.tid]$ (i.e. $w$ is on $LF_A$). Therefore, there must exist at least one path connecting $u_\infty$ and $w$ in $G$ and in all those paths $u_\infty$ precedes $w$. Moreover, $w$ precedes $v'$ in all interleavings in $AVP(\rho)$ since both events belong to the same thread. Therefore, in all interleavings in $AVP(\rho)$, $u_\infty$ must precede $v'$ and this precludes a path from $v'$ to $t(P(\rho'))$.

Case 2.2: $y' \neq v'$ (shown in Figure 7). Let, $(y',w')$ be the program order edge which follows $(x',y')$ in $\tau'$. Therefore, if $(x',y')$ is the broken read-couple in $\rho'$, then $w'$ must be skipped. Thus, $\tau'$ discontinues after $y'$ in $\rho'$.
and \( v' \) is not present in \( P(\rho') \). Hence the contradiction.

**Corollary 1:** [LOCALITY COROLLARY] The serializability of atomic block \( A \) over all AVP interleavings cannot be violated due to any vertex \( v \notin V(Z_A) \).

From Theorem 1, we infer that, for any vertices \( v \) and \( v' \) outside \( Z_A \) (defined in the context of Theorem 1), no violation path is possible where \( v \) follows \( \text{source}(P(\rho')) \) or \( v' \) precedes \( \text{target}(P(\rho')) \), where \( \rho' \in AVP(\rho) \). Therefore, a violation path \( P \) originating from and terminating into \( A \) can neither cross \( UF_A \) nor cross \( LF_A \). Hence, there exists no violation path that includes vertices outside \( Z_A \).

### C. Static Checking of TAS

Once the TAS \( Z_A \) is identified, it is often possible to statically determine the absence of any AVP interleaving that violates atomicity. For a violation path, there must exist at least two events within the atomic block that conflict with other access(es) outside the atomic block. Formally, the necessary condition for existence of a violation path \( P \) is: there exist \( e_1, e_2 \in V_{\text{RW}}(A) \) such that \( 3e_1', e_2' \in V_{\text{RW}}(A) \), and \( e_1 (e_2) \) conflicts (RAW, WAR, WAW) with \( e_1' (e_2') \).

If such events \( e_1 \) and \( e_2 \) do not exist, then no violation is possible. Note that the conflicting accesses \( \{e_1', e_2'\} \) outside the atomic block may or may not correspond to the same event. It is important to note that the TAS frequently makes this check succeed, whereas it would typically fail for the full trace since it is likely that a shared variable (that is accessed within \( A \)) is accessed outside the TAS. While the static check is straight-forward, it is this beneficial result of the TAS that we would like to highlight.

### D. Exploration of TAS

If the static check fails for a TAS, i.e. an atomicity violation is possible, we systematically explore events in the TAS to generate the AVP interleavings. These interleavings are then checked for existence of a violation path. The exploration of TAS can be performed by any systematic search method (explicit or symbolic). In our current implementation, we employ explicit search inspired by dynamic partial order reduction [18, 19]. Interleavings with the same relative order between the conflicting events are defined as conflict-equivalent interleavings. The DPOR algorithm avoids generation of conflict-equivalent interleavings. However, off-the-shelf application of DPOR cannot generate AVP interleavings efficiently and several heuristics need to be used in our context. These have been omitted here for brevity.

### E. Complexity

Let, \( |V(G)| = N, |E(G)| = M \). The vertices in \( G \) need to be topologically sorted for the task of vector assignment \( O(N + M) \). Once the vectors are assigned, derivation of TAS requires identification of upper \( O(1) \) and lower \( O(N) \) frontiers. Next, the static check requires a scanning of the vertices in the TAS \( O(N) \). Therefore, the total time complexity of deriving the TAS is \( O(N + M) \).

The complexity of search-space exploration depends on the method chosen, in our case this is the complexity of DPOR based search.

### IV. Results

We have implemented our AVP predictive analysis technique in a prototype tool. This tool is capable of logging/analyzing execution traces generated by both Java programs and multithreaded C/C++ programs using Pthreads. The C++ benchmark used is available online [21]. All the Java benchmarks are also publicly available [22–26].

Next, we describe how the tool logs execution traces and analyzes those traces. The tool logs execution traces at runtime from C++ source code instrumented using the commercial front end from Edison Design Group (EDG). For Java programs, we use execution traces logged at runtime by a modified Java Virtual Machine (JVM). For each test case, we first execute the program using the default OS thread scheduling and log the execution trace. Next we apply our algorithm to detect the serializability violations. For Java traces, we assume that all synchronized blocks are intended to be atomic, unless the synchronized block has a wait. For the C++ application, we assume that all blocks using scoped locks (monitors implemented using Pthreads locks and condition variables) are intended to be atomic.

All our experiments were conducted on an Intel Xeon machine with a 2.8 GHz Intel processor and 1GB memory running Linux. We want to answer the following questions through experimentation.

- How large is the TAS relative to the trace?
- How often does the static check on the TAS succeed?
- What is the computation time needed to detect the violations?

We considered the following scenarios.

- “no TAS”: This shows the results of experiments where plain DPOR (without our heuristics) is applied to the entire trace for checking each atomic block.

![Image](image-url)
“TAS, no SP”: TAS and search heuristics but no static check.

“TAS, SP, $CS_{max} = 5$”: Along with TAS, static check and our optimized DPOR search technique we have considered context switch bounding [27] to 5 context switches ($CS_{max} = 5$) to reduce the number of interleavings considered.

“TAS, SP, $CS_{max} = \infty$”: TAS, static check, search heuristics with no bound on context switches.

Note that since the previous research on these Java benchmarks, including Fusion [14], are limited by their ability to check serializability violations involving at most 1 variable and 2 threads, we cannot provide an experimental comparison against them.

For ease of representation, we present the experimental results for a sample of traces in Figures 8, 9 and 10 for ‘no TAS’ and ‘TAS, SP, $CS_{max} = \infty$’ experiments. Henceforth, for brevity we refer to the ‘TAS, SP, $CS_{max} = \infty$’ case as the ‘TAS’ case. Additional details for these traces are presented in the Table I. The remaining traces in the benchmark set show similar characteristics. These figures support the claims we make next.

In Figure 8, the average TAS size (X-axis) is shown as a percentage of the total number of events. The Y-axis depicts the traces with the number of events shown within parenthesis (in increasing order of the number of events). Note that, the average TAS size of Tsp is 0.2% of the total events and hence the corresponding bar is virtually absent in the figure. Moreover, observe that the relative TAS size is significantly smaller than the trace size, and tends to be more so for the larger traces for this set of benchmarks.

For the same traces, Figure 9 shows that the static checks using the TAS often succeed which results in fewer timeouts. A timeout of 10 minutes per atomic block is chosen for the search phase. There are two horizontal bars per trace. The upper and the lower bar show the results of ‘no TAS’ and ‘TAS’ cases respectively. In the ‘no TAS’ case, there are three possible outcomes for each atomic block - ‘no violation possible’ (NV), ‘violation found’ (V) or ‘timeout’ (TO). We show the distribution of NV-V-TO for the ‘no TAS’ case in the upper bar. Similarly, in the ‘TAS’ case, the possible outcomes are: ‘static check passed’ (SP), NV, V, TO. The distribution of SP-NV-V-TO is shown for the ‘TAS’ case in the lower bar. Observe that in traces philo, SynchBench, Elevator2, ThriftTrace11 the timeouts in the ‘TAS’ case disappear. Moreover, in traces account1 and Barrier1 the number of timeouts decreased significantly. This is due to the large number of static checks that succeed in the ‘TAS’ case, which effectively bring down the number of timeouts.

Figure 10 compares the execution time and the number of interleavings explored in the ‘TAS’ case normalized with respect to the ‘no TAS’ case. In all the traces, ‘TAS’ case outperformed ‘no TAS’ case in terms of number of interleavings explored and time taken. For instance in traces SynchBench and Tsp, all the TAS-es passed the quick static check in negligible time, demonstrating the efficiency of our static check. Moreover, we also found that for all these experiments, the time needed to compute the TAS was insignificant (less than 1 sec). Hence, we find that our method catches violations in significantly less time by exploring much fewer interleavings.

Discussion:

- In our ‘TAS Search w/ Static Check’ experiments, we detect 23 atomicity violations in 26 traces. This underscores our intuition that the predictive search-space of AVP interleavings has value in detecting atomicity violations.

- We find that the static check on TAS is very effective. For example, in the ThriftTrace benchmarks, more than 90% of the atomic blocks are statically determined not to have a violation. This drastically reduces the total time, which is dominated by the time spent in search. On average, 81.75% of the atomic blocks pass this test.
• Usually the ‘no TAS’ experiments run longer compared to experiments involving TAS. This shows that the trace reduction enabled by TAS is effective. On average only 23.88% (best case 0.2%) of the entire trace is included in a TAS. Moreover, the TAS enables us to handle non-terminating/streaming applications, which cannot be analyzed by any technique that needs a complete trace.

• The results indicate less than 1% timeouts over all checks in ‘TAS, SP, CS\textsubscript{max} = \infty’. This is due to an effective combination of TAS with DPOR. Specifically, we observe that TAS size of traces with mostly decoupled threads (i.e., insignificant inter-thread interaction) are bigger. However, fortunately, in such cases, the DPOR-based search explores only a very few interleavings e.g., in almost all the Thrift traces, although the absolute TAS sizes are fairly large, less than 20 interleavings were explored. Similar observation can be made regarding Elevator traces. Thus, combining TAS with DPOR works well for both tightly as well as loosely coupled threads on this set of benchmarks.

V. RELATED WORK

We have already discussed the broadly related efforts on predictive analysis. More specifically, the maximal causal model proposed by Serbanuta et al. [12] captures a related but incomparable set of interleavings. They record the actual values read and written in the trace, and allow only a single mismatch among the values read. In contrast, our predictive model does not interpret the values (and could handle nondeterministic values, e.g., in the initial state), and we allow multiple mismatches (in different threads) among the AVP interleavings explored. Farzan and Madhusudan use the notion of causal atomicity [6], which also relies on conflict-equivalence and hence is closely related to conflict-serializability. (A causal atomicity violation is a violation of conflict-serializability; but the reverse is not always true.) Besides this difference, as mentioned earlier, their predictive model does not guarantee feasible traces, i.e., may have false positives [16]. Their predictive analysis has been recently used to guide runtime testing [28].

Among other related techniques, Lu et al. [2] used access interleaving invariants to capture patterns of test runs and then monitor production runs for detecting three-access atomicity violations. Xu et al. [20] used a variant of the two-phase locking algorithm to monitor and detect serializability violations. Both methods were aimed at detecting, not predicting, violations in the given trace. Wang and Stoller [29] also studied the prediction of serializability violations under the assumptions of deadlock-freedom and nested locking; their algorithms are precise for checking violations involving one or two transactions but incomplete for checking arbitrary runs.

VI. CONCLUSIONS

We have proposed a graph-based predictive analysis method for detecting serializability violations in concurrent programs. Our method is precise in that when it reports a serializability violation, the reported interleaving is guaranteed to be feasible in the actual program execution (hence no false positives). We directly address the performance issues inherent in exploring a large number of interleavings by first deriving a smaller segment (TAS) of the given trace for checking a given atomic block. We prove that it is sufficient for exploring all interleavings in our target set (AVP interleavings). We also use the TAS to provide a quick static check for proving the absence of violations, which very often succeeds in practice. For the systematic search over all interleavings, we employ dynamic partial order reduction and other pruning techniques to reduce the number of interleavings to be checked. Our experimental results demonstrate the effectiveness of our TAS-based approach on several C and Java benchmark programs.

REFERENCES


<table>
<thead>
<tr>
<th>1.</th>
<th>2.</th>
<th>3.</th>
<th>4.</th>
<th>5.</th>
<th>6.</th>
<th>7.</th>
<th>8.</th>
<th>9.</th>
<th>10.</th>
</tr>
</thead>
<tbody>
<tr>
<td>Traces</td>
<td>All</td>
<td>Avg. TAS size (% of all events)</td>
<td>Scenarios</td>
<td>NT</td>
<td>SP (%)</td>
<td>NV</td>
<td>V</td>
<td>TO</td>
<td>Int.</td>
</tr>
<tr>
<td><strong>compoz</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>thds: 4, evs: 97</td>
<td>4</td>
<td>34.5</td>
<td>100% TAS</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>l-evs: 16, l-var: 1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 53, rw-var: 5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 3</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Barre</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>thds: 10, evs: 653</td>
<td>11</td>
<td>262.6</td>
<td>100% TAS</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>l-evs: 108, l-var: 2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 262, rw-var: 12</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 7</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>account</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>thds: 11, evs: 902</td>
<td>61</td>
<td>307.6</td>
<td>100% TAS</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>l-evs: 146, l-var: 21</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 430, rw-var: 42</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 10</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>phase</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>thds: 6, evs: 1141</td>
<td>28</td>
<td>273.5</td>
<td>100% TAS</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>l-evs: 126, l-var: 6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 857, rw-var: 23</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 22</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Synchro</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>thds: 16, evs: 1510</td>
<td>124</td>
<td>64.68</td>
<td>100% TAS</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>l-evs: 306, l-var: 2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 533, rw-var: 15</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Daisy</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>thds: 3, evs: 2998</td>
<td>142</td>
<td>88.5</td>
<td>100% TAS</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>l-evs: 422, l-var: 10</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 2003, rw-var: 45</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 15</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Elevated</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>thds: 4, evs: 5001</td>
<td>231</td>
<td>1873.4</td>
<td>100% TAS</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>l-evs: 810, l-var: 11</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 3698, rw-var: 117</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Thriff</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>thds: 4, evs: 11357</td>
<td>363</td>
<td>1191.36</td>
<td>100% TAS</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>l-evs: 1384, l-var: 48</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 3184, rw-var: 171</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 324</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Thriffre1</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>thds: 4, evs: 25237</td>
<td>703</td>
<td>3272.5</td>
<td>100% TAS</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>l-evs: 2522, l-var: 15</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 9218, rw-var: 319</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 549</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Top</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>thds: 4, evs: 45653</td>
<td>5</td>
<td>97</td>
<td>100% TAS</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>l-evs: 20, l-var: 5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 25366, rw-var: 42</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>rw-evs: 3</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Table I**

Experimental data of the serializability violation detection. (AB=atomic blocks, SP=static proofs, NV=no violation possible, V=violations found, TO=Timeouts, Int.=No. of interleavings generated)

---


