TY - JOUR
T1 - Logging inter-thread data dependencies in linux kernel
AU - Kubota, Takafumi
AU - Aota, Naohiro
AU - Kono, Kenji
N1 - Funding Information:
This work was supported in part by the Japan Science and Technology Agency (JST CREST JPMJCR19F3) and the Japan Society for the Promotion of Science (JSPS KAKENHI JP16J03272 and JP16K00104). We thank the members of the HITACHI Yokohama Research Laboratory for their insightful comments. They gave us invaluable feedback on our proposal that greatly improved this paper.
PY - 2020/7/1
Y1 - 2020/7/1
N2 - SUMMARY Logging is a practical and useful way of diagnosing failures in software systems. The logged events are crucially important to learning what happened during a failure. If key events are not logged, it is almost impossible to track error propagations in the diagnosis. Tracking an error propagation becomes utterly complicated if inter-thread data dependency is involved. An inter-thread data dependency arises when one thread accesses to share data corrupted by another thread. Since the erroneous state propagates from a buggy thread to a failing thread through the corrupt shared data, the root cause cannot be tracked back solely by investigating the failing thread. This paper presents the design and implementation of K9, a tool that inserts logging code automatically to trace inter-thread data dependencies. K9 is designed to be "practical"; it scales to one million lines of code in C, causes negligible runtime overheads, and provides clues to tracking inter-thread dependencies in real-world bugs. To scale to one million lines of code, K9 ditches rigorous static analysis of pointers to detect code locations where inter-thread data dependency can occur. Instead, K9 takes the best-effort approach and finds out "most"of those code locations by making use of coding conventions. This paper demonstrates that K9 is applicable to Linux and captures relevant code locations, in spite of the best-effort approach, enough to provide useful clues to root causes in real-world bugs, including a previously unknown bug in Linux. The paper also shows K9 runtime overhead is negligible. K9 incurs 1.25% throughput degradation and 0.18% CPU usage increase, on average, in our evaluation.
AB - SUMMARY Logging is a practical and useful way of diagnosing failures in software systems. The logged events are crucially important to learning what happened during a failure. If key events are not logged, it is almost impossible to track error propagations in the diagnosis. Tracking an error propagation becomes utterly complicated if inter-thread data dependency is involved. An inter-thread data dependency arises when one thread accesses to share data corrupted by another thread. Since the erroneous state propagates from a buggy thread to a failing thread through the corrupt shared data, the root cause cannot be tracked back solely by investigating the failing thread. This paper presents the design and implementation of K9, a tool that inserts logging code automatically to trace inter-thread data dependencies. K9 is designed to be "practical"; it scales to one million lines of code in C, causes negligible runtime overheads, and provides clues to tracking inter-thread dependencies in real-world bugs. To scale to one million lines of code, K9 ditches rigorous static analysis of pointers to detect code locations where inter-thread data dependency can occur. Instead, K9 takes the best-effort approach and finds out "most"of those code locations by making use of coding conventions. This paper demonstrates that K9 is applicable to Linux and captures relevant code locations, in spite of the best-effort approach, enough to provide useful clues to root causes in real-world bugs, including a previously unknown bug in Linux. The paper also shows K9 runtime overhead is negligible. K9 incurs 1.25% throughput degradation and 0.18% CPU usage increase, on average, in our evaluation.
KW - Debugging
KW - Inter-thread dependency
KW - Logging automation
KW - Operating systems
UR - http://www.scopus.com/inward/record.url?scp=85089369940&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089369940&partnerID=8YFLogxK
U2 - 10.1587/transinf.2019EDP7255
DO - 10.1587/transinf.2019EDP7255
M3 - Article
AN - SCOPUS:85089369940
SN - 0916-8532
VL - E103D
SP - 1633
EP - 1646
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
IS - 7
ER -