You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/paper.tex
+27-13Lines changed: 27 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -73,14 +73,19 @@
73
73
Single core processors were widely replaced by multi-core architectures optimized to run processes and threads concurrently.
74
74
To utilize the full performance of the given hardware, programmers need to write their software with a high degree of parallelism.
75
75
This introduces the potential for bugs that would not occur in sequential programs due to non-determinism in the order of execution.
76
-
For this purpose tools and languages have been developed to ease the creation of concurrency-aware programs and help programmers find concurrency bugs effectively.
77
-
This paper will evaluate the different tools and techniques that are available to find, reproduce and fix concurrency bugs from the viewpoint of how usable they are in a production environment.
76
+
Such bugs can be very challenging to fix because they are not easily reproducible.
77
+
For this purpose tools and languages have been developed to ease the creation of concurrency-aware programs and to help programmers find concurrency bugs effectively.
78
+
This paper will evaluate three different techniques and concrete tool implementations of them, that are available to find, reproduce and fix concurrency bugs from the viewpoint of how usable they are in a production environment.
78
79
The requirements for these tools are that they need to be easy to deploy, they should bring only a minimal computational and storage overhead, they should have a high coverage with minimal false-positive reports and they need to enable the developer to quickly find the cause of a bug.
80
+
The techniques evaluated are: Dynamic code analysis with the record and replay tool rr and the data race detector ThreadSanitizer.
81
+
Concurrency-aware testing with a combined approach of evaluating thread schedules with delta-debugging to automatically pinpoint concurrency bugs.
82
+
Static code analysis with the methods of sequentialization and model-checking.
79
83
All tools are evaluated for the Go programming language but the concepts are mainly applicable to every other language as well.
80
84
\end{abstract}
81
85
82
86
\begin{IEEEkeywords}
83
-
[Software Engineering]: Testing and Debugging
87
+
[Software Engineering]: Software testing and Debugging
88
+
[Computing methodologies]: Concurrent programming languages
84
89
\end{IEEEkeywords}
85
90
86
91
@@ -139,11 +144,11 @@ \subsection{The Go Programming Language}
139
144
140
145
\subsection{Table of Contents}
141
146
142
-
\Cref{sct:taxonomy} gives a brief introduction on the different types of concurrency bugs and their main causes.
143
-
\Cref{sct:dynamic} covers some techniques to dynamically detect concurrency bugs and reliably reproduce them.
144
-
\Cref{sct:testing} shows different methods of concurrency-aware testing to detect concurrency bugs automatically by manipulating the thread scheduler.
145
-
And \Cref{sct:static} finally covers how to detect concurrency bugs with static code analysis.
146
-
In the end there is a conclusion with a short outlook on the possible future of multi-threaded debugging.
147
+
\Cref{sct:taxonomy} gives a brief introduction on the different types of concurrency bugs and their main causes: Deadlocks, Data races and Atomicity and Order violations.
148
+
\Cref{sct:dynamic} covers some techniques to dynamically detect concurrency bugs, reliably reproduce them by utilizing record and replay with the tool \emph{rr} and detect data races dynamically with the tool \emph{ThreadSanitizer}.
149
+
\Cref{sct:testing} shows different methods of concurrency-aware testing to detect concurrency bugs automatically by manipulating the thread scheduler and delta debugging.
150
+
And \Cref{sct:static} finally covers how to detect concurrency bugs with static code analysis by sequentialization and model-checking.
151
+
In the end there is a conclusion with a short comparison of all techniques and a quick outlook on the possible future of multi-threaded debugging.
147
152
148
153
149
154
% ------------------------------------ %
@@ -166,9 +171,9 @@ \section{Taxonomy of Concurrency Bugs}
166
171
Non-blocking bugs are often harder to find because they can occur even when the termination of a program was successful but the result is wrong.
167
172
This can also lead to a cascade of bugs where the root cause is a non-blocking concurrency bug which might not be obvious.
168
173
169
-
For example: A method concurrently creates a list of numbers that is expected to be ordered but due to a non-blocking concurrency bug the list contains unordered elements.
170
-
This failure might only occur once in a thousand executions due to the exponentially growing number of possible thread interleavings.
171
-
But if other methods depend on the correct order of elements in this list, the program might crash or generate wrong results and the reason can be very hard to find.
174
+
For example: A method concurrently creates a list of numbers that is expected to be ordered but due to a non-blocking concurrency bug the list suddenly contains unordered elements.
175
+
This failure might only occur once in a thousand executions or even less, due to the exponentially growing number of possible thread interleavings.
176
+
But if other methods depend on the correct order of the elements in this list, the program might crash or generate wrong results and the reason can be very hard to find.
172
177
173
178
\subsection{Deadlocks}
174
179
\begin{lstlisting}[float=h, language=Go, label=lst:deadlockWG, caption=Deadlock caused by waiting for the \emph{WaitGroup} at a wrong location -- based on \cite{tu2019go}]
@@ -188,6 +193,7 @@ \subsection{Deadlocks}
188
193
The most commonly manifestation of blocking bugs are \emph{deadlocks}, where circular dependencies between resources block the flow of a program.
189
194
\Cref{lst:deadlockWG} shows one example of such a deadlock in a Go program.
190
195
The problem is a \emph{blocking synchronization} where the \lstinline{group.Wait()} inside the for-loop is causing the block.
196
+
This statement has to be moved outside the for loop to resolve the unintentional blocking and fix the concurrency bug.
191
197
Although the error seems obvious in this case, those small mistakes can quickly happen and can get unrecognized into the production environment if not tested well enough.
192
198
193
199
\begin{lstlisting}[float=h, language=Go, label=lst:deadlockCh, caption={Deadlock caused by misuse of an \emph{unbuffered Channel}}]
@@ -206,6 +212,8 @@ \subsection{Deadlocks}
206
212
A second example of a deadlock that might not be obvious is \Cref{lst:deadlockCh} which uses two unbuffered channels to transfer information between threads.
207
213
The problem here is that without an active listener on an unbuffered channel, any send action will be blocked.
208
214
To fix this, one could replace the unbuffered channel with a buffered one so that the execution flow of the program can continue without blocking.
215
+
This shows how important it is to know the concrete implementation of a concurrency abstraction.
216
+
Even though intended to ease the synchronization between threads and make inter-thread communication safer, by not knowing the implementation of an abstraction the developer can unknowingly create hard to find concurrency bugs.
209
217
210
218
Another common problem in event-driven concurrent programs are \emph{blocking operations} like filesystem operations that are executed inside an event-handler.
211
219
These ``can penalize and even paralyze the entire program execution.''~\cite{tchamgoue2012testing}
@@ -229,7 +237,8 @@ \subsection{Data Races}
229
237
230
238
\Cref{lst:race} shows an example of a data race that is frequently found.~\cite{serebry2009threadsanitizer}
231
239
The data race happens when ``two threads access a non-thread-safe complex object [e.g. a map] without synchronization.''~\cite{serebry2009threadsanitizer}
232
-
Even though the two threads in this example write to different keys, this might cause a corruption of data or even crash the program because the default Go map is not concurrency-aware.
240
+
Even though the two threads in this example write to different keys of the map \lstinline{m}, this might cause a corruption of data or even crash the program because the default Go map implementation is not concurrency-aware.
241
+
To fix this, the access to the map needs to be synchronized by a lock for example.
233
242
234
243
% TODO: Is it a data race or an atomicity violation?
235
244
A special case of data races are multi-variable data races.
@@ -258,7 +267,8 @@ \subsection{Atomicity and Order Violations}
258
267
259
268
\Cref{lst:order} shows a common order violation bug pattern called ``Test-and-Use''.
260
269
The programmer's intention is to check if a variable is not \lstinline{nil} and then use this variable.
261
-
However, due to the thread that was launched before, it could happen that after the check in line 7, the thread of the goroutine gets scheduled and the data variable is set to \lstinline{nil}.
270
+
However, due to the thread that was launched before, it could happen that after the \lstinline{if} check in line 7, the thread of the goroutine gets scheduled and the data variable is set to \lstinline{nil}.
271
+
To fix this bug, the check and the usage of the variable need to become an atomic operation to enforce the order of execution.
@@ -279,11 +289,15 @@ \subsection{Atomicity and Order Violations}
279
289
The programmer assumes that ++ is an atomic operation because it is one literal in Go.
280
290
However, after compilation this is expanded to 3 instructions: LOAD, INCREMENT and finally STORE.
281
291
The thread scheduler could switch the context after any of these instructions what leads to undefined behavior, when multiple threads try to increment the same variable.
292
+
To fix this bug pattern, the \lstinline{++} operation also needs to be replaced by an atomic operation that does not allow other threads to access the \lstinline{sum} variable while incrementing.
282
293
283
294
Lu, Park, Seo and Zhou conducted a study in 2008 where they analyzed the characteristics of real-world concurrency bugs.~\cite{lu2008mistakes}
284
295
One key finding was that:
285
296
``Most of the examined non-deadlock concurrency bugs are covered by two simple patterns: atomicity-violation and order-violation''~\cite{lu2008mistakes}
286
297
298
+
A promising solution to atomicity and order violation bugs is \emph{software transactional memory} (STM) as proposed by Peyton Jones.~\cite{peytonjones2007beautiful}
299
+
It is an alternative to traditional lock-based synchronization where atomic regions get declared explicitly.
300
+
Languages like Haskell provide STM by their design of language but Go can also utilize this mechanism by using external libraries.
0 commit comments