Apport and the retracer bot in the Canonical data center have provided server-side automatic closing of duplicate crash report bugs for quite a long time. As we have only kept Apport crash detection enabled in the development release, we got away with this as bugs usually did not get so many duplicates that they became unmanageable. Also, the number of duplicates provided a nice hint to how urgent and widespread a crash actually was.
However, it’s time to end that era and provide something better now:
- This probably caused a lot of frustration when a reporter of the crash spent time, bandwidth, and creativity to upload the crash data and create a description for it, only to find that it got closed as a duplicate 20 minutes later.
- Some highly visible crashes sometimes generated up to a hundred duplicates in Launchpad, which was prone to timeouts, and needless catch-up by the retracers.
- We plan to have a real crash database soon, and eventually want to keep Apport enabled in stable releases. This will raise the number of duplicates that we get by several magnitudes.
For common crashes we had to write manual bug patterns to avoid getting even more duplicates.
For the technically interested, this is how we detect duplicates for the “signal” crashes like SIGSEGV (as opposed to e. g. Python crashes, where we always have a fully symbolic stack trace):
As we cannot rely on symbolic stack traces, and do not want to force every user to download tons of debug symbols, Apport now falls back to generating a “crash address signature” which combines the absolute addresses of the (non-symbolic) stack trace and the
/maps mapping to a stack of libraries and the relative offsets within those, which is stable under ASLR for a given set of dependency versions. As the offsets are specific to the architecture, we form the signature as combination of the executable name, the signal number, the architecture, and the offset list. For example, the i386 signature of bug 4 looks like this:
As library dependencies can change, we have more than one architecture, and the faulty function can be called from different entry points, there can be many address signatures for a bug, so the database maintains an N:1 mapping. In its current form the signatures are taken as-is, which is much more strict than it needs to be. Once this works in principle, we can refine the matching to also detect duplicates from different entry points by reducing the part that needs to match to the common prefix of several signatures which were proven to be a duplicate by the retracer (which gets a fully symbolic stack trace).
The retracer bots now exports the current duplicate/address signature database to http://people.canonical.com/~ubuntu-archive/apport-duplicates in an indexed text format from where Apport clients can quickly check whether a bug is known.
For the Launchpad crash database implementation we actually check if the bug is readable by the reporter, i. e. it is private and the reporter is in a subscribed team, or the bug is public; if not, we let him report the bug anyway and duplicate it later through the existing server-side retracer, so that the reporter has a chance of getting subscribed to the bug. We also let the bug be filed if the currently existing symbolic stack trace is bad (tagged as apport-failed-retrace) or if a developer wants a new symbolic stack trace with the current libraries (tagged as apport-request-retrace).
As this is a major new feature, I decided that it’s time to call this Apport 2.0. This is the first public beta towards it, thus called 1.90. With Apport’s test driven and agile development the version numbers do not mean much anyway (the retracer bots in the data center always just run trunk, for example), so this is as good time as any to reset the rather large “.26” minor version that we are at right now.