What is Working

I do believe tracking exception/error flow is a good idea. However, automated taint analysis may not be the best approach. Instead, tracing frameworks like OpenTelemetry offer a more practical solution for this purpose.

What is not Working

Taint Analysis?

There is not a practical taint analysis framework that can be used in production. In fact, the majority time I spent on this project is not implementing ExChain, but fixing bugs in the underlying taint analysis framework.

Static taint analysis is extremely slow and resource savy for large code base. We didn’t optimize anything for that and just use whatever Soot provides.

Only for Java

ExChain only works with Java’s exception model. Languages using return codes for error handling (like C) lack the explicit exception mechanisms ExChain requires, making extension difficult.

The Unknown

Dataset

Finding bug set is more challenging than I expected. In this project, we are looking for failures that involves at least two exceptions. The first exception puts the system in a faulty state without immediate failure, while the second exception triggers the actual crash that developers observe.

However, when reading bug reports, most issues either describe the root cause exception and provide a potential fix, or document only the final failure exception. It is rare to see an issue that describes the entire chain of exceptions.

After reviewing nearly 1000 issues, I found only 10 with clear descriptions that could be reproduced. Frankly, this low yield raises concerns about pursuing further failure diagnosis research.

Niche Problem?

I don’t know if it is a good idea to introduce a new analysis framework for only one specific type of failures. It might be an overkill. I wish there is a more general framework that can be used for all types of failures.