ToothPicker: Apple Picking in the iOS Bluetooth Stack


Download ToothPicker: Apple Picking in the iOS Bluetooth Stack


Preview text

ToothPicker: Apple Picking in the iOS Bluetooth Stack

Dennis Heinze Secure Mobile Networking Lab
TU Darmstadt / ERNW GmbH

Jiska Classen Secure Mobile Networking Lab
TU Darmstadt

Matthias Hollick Secure Mobile Networking Lab
TU Darmstadt

Abstract
Bluetooth enables basic communication prior to pairing as well as low-energy information exchange with multiple devices. The Apple ecosystem is extensively using Bluetooth for coordination tasks that run in the background and enable seamless device handover. To this end, Apple established proprietary protocols. Since their implementation is closedsource and over-the-air fuzzers are very limited, these protocols are largely unexplored and not publicly tested for security. In this paper, we summarize the current state of Apple’s Bluetooth protocols. Based on this, we build the iOS in-process fuzzer ToothPicker and evaluate the implementation security of these protocols. We find a zero-click Remote Code Execution (RCE) that was fixed in iOS 13.5 and simple crashes.
1 Introduction
In the past, Bluetooth was mainly used as an audio-only technology. Nowadays, it is used by integral operating system services that are permanently running in the background. Within the Apple ecosystem, most of these are part of the Continuity framework [5]. Examples are searching for devices before sharing files via AirDrop [31], the initial detection of an Apple Watch by macOS to start the Auto Unlock protocol and unlocking a Mac, or beginning to write an email on a mobile device and later continuing this on a desktop via Handoff [14, 23]. Further features outside of the Continuity framework include pairing AirPods once and seamlessly using them on all devices that are logged into the same iCloud account [15]. With Exposure Notifications introduced due to SARS-CoV-2, Apple expanded their rather closed ecosystem to work with Google devices [12]. Even users with just an iPhone who do not depend on the Continuity framework might enable Bluetooth for exposure notifications or audio streaming.
Testing these services over-the-air is cumbersome. Apple’s Bluetooth stacks almost immediately disconnect upon receiving invalid packets. A disconnect is hard to distinguish from a crash because the Bluetooth service restarts within seconds,

thus, rendering feedback-based fuzzing difficult. Moreover, crafting arbitrary packets and transmitting them for Bluetooth Low Energy (BLE) as well as Classic Bluetooth has limited tool support. As of now, there is no full-stack open-source Software-Defined Radio (SDR) implementation. Thus, security researchers need to use commercial tools by Ellisys starting at US$ 10 k or extend InternalBlue [22], which is based on reverse-engineered firmware on off-the-shelf smartphones. During our research, InternalBlue proofed as powerful tool for bug verification but the overall nature of Apple’s Bluetooth stack requires a custom fuzzing solution.
We design and implement ToothPicker, an in-process Bluetooth daemon fuzzer running on iOS 13. To this end, we solve various harnessing challenges, such as creating virtual connections, partial chip abstraction, selecting protocol handlers, obtaining suitable corpora for undocumented protocols as well as getting crash and coverage feedback of this closed-source target. ToothPicker is based on F IDA and radamsa [16, 28] and harnesses the Bluetooth daemon while leaving the remaining interaction with iOS components intact. The main contributions of this paper are as follows:
• We implement ToothPicker, an iOS in-process Bluetooth fuzzer running on a physical iPhone.
• We provide an up-to-date Bluetooth protocol overview for iOS, macOS, and RTKit.
• We uncover various issues in Apple’s implementations that we can reproduce over-the-air. Based on our findings, Apple fixed the Bluetooth RCE CVE-2020-9838 in iOS 13.5 and the Denial of Service (DoS) CVE-20209931 in iOS 13.6.
Our results indicate that Apple never systematically fuzzed their Bluetooth stack, with issues going back to iOS 5 that still exist in iOS 13. However, fuzzing BLE/GATT, which is used by most Internet of Things (IoT) devices and supported by various Nordic Semiconductor based testing frameworks [9, 25], did not reveal any findings. Thus, ToothPicker indeed closes the gap between what is possible with over-the-air testing and fast in-process fuzzing for proprietary protocols.

R

R R R
R R
R
R

This paper is structured as follows. We provide background information in Section 2. In Section 3, we start with a ToothPicker design overview, identify and prioritize various target protocols, and then harness the iOS Bluetooth daemon for fuzzing. We evaluate ToothPicker in Section 4. The fuzzing results and malicious payloads are provided in Section 5. Finally, we conclude our work in Section 6. Even though a detailed understanding of Apple’s proprietary protocols is not required for fuzzing per se, they have not been documented before and we provide descriptions in the Appendix.
2 Background on Fuzzing Apple Bluetooth
In the following, we explain why we chose the iOS Bluetooth stack and provide a background on the fuzzing options.
2.1 Selecting a Bluetooth Stack
Apple implements three different Bluetooth stacks. One is for iOS and its derivates. macOS uses another stack with duplicate protocol implementations that behave slightly different. Embedded devices, such as the AirPods, use the RTKit stack.
The embedded RTKit stack is the least accessible. In contrast, the macOS stack has already been tested recently with severe security issues uncovered [11]. We find that the iOS stack experienced only little research but implements the majority of Apple’s proprietary protocols. With the checkra1n jailbreak [18], it becomes fairly accessible on research iPhones.
2.2 iOS Fuzzing Options
Apple internally tests their software including fuzzing but it is unknown how exactly they do this. With neither the source code nor the fuzzing methods and targets being public, security of Apple’s software should be tested by independent researchers. Apple keeps bluetoothd closed-source, meaning that only they are able to rebuild it with fuzzing instrumentation such as edge coverage and memory sanitization. As this binary is rather complex, statically patching it without source code is out of scope, as there is currently no such tool for iOS ARM64e binaries without symbols in Mach-O format.
In general, there are multiple tools for dynamic iOS analysis. The iOS 12 kernel can be booted into an interactive shell using Quick Emulator (QEMU) but without any daemons running [1]. Moreover, KTRW enables iOS 13 kernel debugging at runtime [7]. However, bluetoothd runs in user space. In contrast to the previous options, F IDA is a dynamic instrumentation framework that enables function hooking and coverage collection by rewriting instructions of user-space programs during runtime [28]. Thus, ToothPicker is based on F IDA and the fuzzing-specific frizzer extension [21].
The F IDA stalker follows program flow during runtime [27, 29]. To this end, F IDA copies code prior to execution, modifies that copy, and runs it. A significant speedup is

achieved by adding a trust-threshold on code that is not modified during runtime, and, thus, has reusable blocks. The stalker observes basic block coverage, as required for feedback-based fuzzing. If exceptions occur while running the code copy, F IDA catches these and the original program does not crash.
3 ToothPicker
In the following, we design and implement ToothPicker. An overview of the ToothPicker architecture and its specifics is depicted in Figure 1. First, we provide an intuition about its design in Section 3.1. Then, we analyze various Bluetoothbased protocols and prioritize them as fuzzing targets in Section 3.2. In Section 3.3, we select corpora for these protocols and describe adaptions required for harnessing these. Finally, we describe the fuzzer operation in Section 3.4, including its virtual connection management in Section 3.5.
3.1 General Design and Concepts
The main challenge in fuzzing the iOS Bluetooth daemon is harnessing it while maintaining sufficient state to find realistic bugs. ToothPicker achieves this by attaching itself to the running daemon with F IDA [28]. However, fuzzing requires further extensions, as running on a physical iPhone has various side effects due to the remaining interactions with the Bluetooth chip, other daemons, and apps. To this end, ToothPicker creates virtual connections, partially abstracts between the physical Bluetooth chip and the changed behavior within bluetoothd, and gets coverage and crash feedback during fuzzing. Moreover, as fuzzing speed is limited, we have to understand Apple’s proprietary protocols sufficiently to obtain meaningful corpora and select interesting protocol handlers.
Chip interaction can be harmful for the overall fuzzing process. For example, if a protocol that requires an active connection is fuzzed but the chip is not aware of such a connection, it reports the connection to be terminated. Yet, it is complicated to fully abstract from the chip, as bluetoothd requires it during startup and other complex operations. Thus, ToothPicker uses virtual connections and filters communication with the chip concerning these connections manually.
Another important factor are the daemons and apps that remain interacting with bluetoothd while it is being fuzzed. First of all, they might crash as well due to the fuzzing, with exceptions that cannot be caught by F IDA but are still captured within the iOS crash reports. Second, anything happening on the iPhone that also affects Bluetooth changes the fuzzing behavior. Simple instructions such as opening the Bluetooth device dialog already crash bluetoothd immediately. Moreover, the chip might still receive BLE advertisements or connection requests from other devices and forward them to bluetoothd. All this parallelism and statefulness often make the same fuzzing input result in different coverage.

R R
R
R

bluetoothaudiod sharingd ...
iPhone
User Space Kernel Space

Apps bluetoothd

Create virtual connections and fuzz inputs.

Hook into each basic block by runtime code rewriting with F IDA stalker.

Fuzzing Input & Feedback

Laptop

General Fuzzing Harness Specialized Fuzzing Harness
Filter specific chip interactions.

Manager Input Mutation

Drivers
Figure 1: ToothPicker architectural overview and fuzzing setup.

TOOTHP CKER
Coverage Corpus Crashes OTA Fuzzer InternalBlue

The total ToothPicker fuzzing speed is limited by running on a physical iPhone 7. However, it is still significantly faster than over-the-air fuzzing. While over-the-air fuzzing based on InternalBlue achieves only 1–2 packet/s [22], ToothPicker speeds this up to 25 packet/s. The actual speedup compared to over-the-air fuzzing is even higher, as ToothPicker overwrites functions within bluetoothd that disconnect on invalid packets and optimizes its packets based on the coverage feedback. Nonetheless, running on an iPhone is a limitation and parallelization requires multiple iPhones.
While the overall F IDA-based setup is quite complex, the F IDA workflow also supports reverse-engineering proprietary protocols. Stalking and hooking allow observing protocols and intercepting valid payloads to build a corpus.
Fuzzed payloads are recorded during the fuzzing process. ToothPicker provides an over-the-air replay script for InternalBlue. This step is required to confirm that fuzzing results do not originate from any F IDA hooking side effects. Additionally, it provides Proofs of Concept (PoCs) that can be tested against arbitrary devices, including non-jailbroken iPhones, other Apple devices, and even non-Apple devices.
3.2 Target Protocol Selection for Fuzzing
Prior to fuzzing, we need to identify protocols that we can fuzz and prioritize them.
3.2.1 Attack Surface Considerations
The zero-click RCE surface within iOS Bluetooth only affects protocol parts that are available prior to pairing. Such attacks do not require any user interaction, an attacker within wireless range could take control over a device. Protocols that become available after pairing, such as tethering, hold significantly more state but also require user interaction.

Protocols without response channel, such as BLE advertisements including exposure notifications [12], miss the feedback required to bypass Address Space Layout Randomization (ASLR). However, if further Bluetooth services on the target stop sending packets and these can be assigned to a specific target device, this indicates bluetoothd crashes, which might be a sufficient ASLR bypass primitive as this daemon restarts automatically [13]. We explicitly skip BLE advertisements, because there has been exhaustive study of Continuity and Handoff already [10, 14, 23], and getting a meaningful response channel is rather complex if possible at all.
3.2.2 Initial Protocol Analysis
Protocols are either documented within the Bluetooth specification [8] or proprietary and Apple-specific. The latter are usually undocumented and often not publicly mentioned at all, yet they are available on hundreds of millions of Apple devices. The Bluetooth PacketLogger, which is included in the Additional Tools for Xcode [3], decodes specification-compliant protocols as well as many proprietary Apple protocols on macOS and iOS. Some parsers are only available within one specific version, such as chip memory pool statistics, and were likely forgotten to remove from non-internal builds. We use PacketLogger to initially observe and select protocols.
As these protocols are undocumented, we provide an overview in the Appendix. Note that fuzzing focuses on implementation bugs and RCEs. Understanding these protocols in detail is not required to this end. However, such details can reveal further issues. For example, custom Logical Link Control and Adaptation Protocol (L2CAP) echo replies leak the operating system type (see Section A.1.1) and LE Audio (LEA) leaks the iOS version (see Section A.1.3).

Table 1: List of Apple’s Bluetooth protocols and potential targets.

Category Protocol

iOS macOS RTKit Exposure Specification Knowledge

BLE Security Manager

?



[8, p. 1666ff]



BLE Signal Channel

?



[8, p. 1046ff]



Classic Security Manager

?



[8, p. 1666ff]



Classic Signal Channel



[8, p. 1046ff]



Connectionless Channel

?



[8, p. 1035]



Fixed

DoAP

?





L2CAP Channels

FastConnect Discovery GATT







()



[8, p. 1531ff]



LEAP







LEAS







MagicPairing







Magnet







AAP







Dynamic
L2CAP Channels

Apple Pencil GATT External Accessory (iAP2) FastConnect Magnet Channels

























SDP



[8, p. 1206ff]



ACL (Classic+BLE)



[8, p. 477ff]



Other

BLE Exposure Notification BRO/UTP



[12]



?





USB OOB Pairing

()







Target

Exposed describes how accessible the protocol is to an attacker. This includes protocol setup or authentication requirements of the protocol. There are three possible options high ↑, medium •, and low ↓. The Specification column indicates whether the protocol is a proprietary protocol by Apple or specified otherwise. Knowledge determines how much information of this protocol is either openly available or can be extracted by existing debug tools. Targets indicates that the protocol is targeted for further analysis. The brackets around the checkmarks on RTKit indicate that this protocol is not available on all RTKit devices.

3.2.3 Target Prioritization
ToothPicker can hook arbitrary functions within bluetoothd for fuzzing. On iOS 13.5, bluetoothd has 24 625 functions consisting of 153 620 basic blocks. However, only a few of those directly handle incoming data prior to pairing. This allows us to focus on selected protocols once identified.
In the following, we prioritize the protocols to derive which ones are interesting for further analysis. This prioritization shown in Table 1 is based on operating system, exposure prior to pairing, publicly available specifications, and the knowledge we were able to obtain about each protocol. The target column determines whether a protocol was chosen for further analysis. While there are other protocols from the Bluetooth specification that have a high accessibility and distribution within the operating systems, such as the BLE Signal Channel, or the Security Manager protocols, we mainly focus on Asynchronous Connection-Less (ACL), Generic Attribute (GATT), the Classic Signal Channel, and Service Discovery Protocol (SDP). Due to their low exposure we skip most dynamic L2CAP channel protocols.
3.3 Harnessing and Corpus Collection
For each target in Table 1, corpora need to be collected or generated. Moreover, to properly harness some of the protocols, they need additional optimizations. An overview of the corpora and their optimizations is listed in Table 2.

R

3.3.1 Initial Protocol Corpus
The second column in Table 2 describes how the corpus for the particular protocol was generated. The corpora are mostly generated by intercepting (i.e., recording) both the ACL reception and transmission functions while the specific data is sent over a real physical connection. In practice, this is implemented using a F IDA hook in the ACL reception handler and the Universal Asynchronous Receiver Transmitter (UART) write function that is used to send data to the Bluetooth chip. The data received by the ACL reception handler as first argument can be stored as a corpus file, as it is already of the format that ToothPicker expects. The data arriving at the UART write function still needs to be filtered, as other data, such as Host Controller Interface (HCI) commands, is also received by this function. By filtering for data starting with 0x02 [8, p. 1727], ACL data sent by bluetoothd can be captured.
In cases where this is not possible, the corpus is generated manually, for example for the LE Audio Protocol (LEAP) protocol. As we do not have access to any device using LEAP, the corpus was created by reverse-engineering the protocol and manually generating valid messages. In case of the MagicPairing protocol, we augment the recorded corpus with a manually created message type, that would otherwise not be included as it does not occur in regular connections. Figure 2 shows the corpus for MagicPairing.

Protocol ACL (BLE) ACL (Classic) Classic Signal Channel + FastConnect Discovery GATT
LEAP
MagicPairing
Magnet
SDP

Table 2: ToothPicker corpora and optimizations for fuzzing.

Corpus • Manually created BLE Signal Channel messages • Record of L2CAP Echo Requests and Responses • Record traffic by connecting and disconnecting AirPods • Record FastConnect Discovery messages • Record of interaction with GATT exploration app
• Manually created by reverse engineering • Record of AirPods pairing • Manually created Ping message
• Connect and disconnect Apple Watch
• Record traffic by connecting to macOS and query device with Bluetooth Explorer

Optimizations — —
• Correct ACL length
• Correct ACL and L2CAP length • No fragmentation • Correct ACL and L2CAP length • No fragmentation • Correct ACL and L2CAP length • No fragmentation • Correct ACL, L2CAP, and Magnet length • Keep track of protocol version • No fragmentation • Correct ACL, L2CAP, and SDP length • No fragmentation

Technology BLE Classic Classic BLE BLE Classic
BLE
Classic

3.3.2 Specialized Protocol Harness
The third column in Table 2 describes the optimization that the specialized harness applies to the mutated fuzzing input. In most cases, there are three optimizations: correcting the L2CAP length field, the ACL length field, and the flags in the handle that determine the L2CAP fragmentation. While correcting the length field removes the fuzzing input’s randomness, it ensures that the generated input packets do not unnecessarily fail at length checks and rather reach deeper into the actual parsing code.
We wrote an additional script that intercepts various logging functions and prints the messages to monitor the fuzzing process. While iOS’s standard logging (e.g., via the Console .app on macOS) could be used to monitor the logs, our approach has the advantage that even internal debug logging becomes visible. Before some of bluetoothd’s logging calls, there are checks that determine whether it is an internal build. We patched bluetoothd in such a way that it assumes it is an internal build to increase the verbosity even more. Monitoring the logs turns out to be useful to observe certain behavior.
During the initial fuzzing rounds of the Magnet protocol, we observed that the majority of these internal logging messages were concerning an invalid length field. This leads us to two additional optimizations for the Magnet protocol. First, Magnet’s own length field is correctly calculated, and second, the fuzzer keeps track of which version is currently negotiated. This is important for correcting the length field, as the offset and the size of the Magnet length field changes. Keeping track of the current version is as simple as monitoring the generated packets. As soon as a Version packet is sent, the specialized harness adapts the following version fields accordingly.
The last column in Table 2 indicates the Bluetooth technology (Classic Bluetooth or BLE). Depending on the technology the harness has to adapt the creation of the virtual connection.

Ping Message 0b20 0600 0200 3000 f001

Hint Message
0b00 3900 3500 3000 0101 0310 0010 00f3 2871 7bb7 6ed6 f2b6 07d3 0d1c 5c47 fc20 0010 00f3 9028 c260 502e 08d4 2e62 69c6 aa2a 1900 0104 0001 0000 00e8 d4

Ratchet Message

0b20 32d8 ed91 50bf

4900 ddab bd66 5f00

4500 70e9 5a12 0104

3000 ff9c 1ea0 0001

0201 b63e 356f 0000

0280 37f6 05f3 00

0036 7c4f 8d70

00a4 e47d dca4

fac9 956d 23d3

831d 394e 7136

c027 4f40 a767

Ratchet AES SIV Message

0b00 96e6 271a 4390 e043

5b00 6877 275c ec69 9bee

5700 9e9b a669 baf4 205d

3000 7b1d e849 115a 52

0301 142d 2a36 3a45

0180 f683 c1fb 6586

0050 a623 9ee3 dcca

0002 d287 e0d6 b94e

aad4 4175 1373 c0f8

1dca b0bb 0f9c 8742

d96e deba 41b5 8c10

Status Message 0b20 0700 0300 3000 ff01 00

Status Message (Error) 0b00 0700 0300 3000 ff01 08

Figure 2: MagicPairing corpus.

3.4 Fuzzer Internals
In the following, we provide technical details about the fuzzer. Figure 1 also contains the fuzzer’s components.
3.4.1 Underlying Technology
We implement ToothPicker based on frizzer [21]. frizzer provides the basic fuzzing architecture, like coverage collection, corpus handling, input mutation, and thus a large part of the manager component. ToothPicker, like frizzer, is built on F IDA, which is a dynamic instrumentation toolkit [28].

R

R
R
R R

With F IDA, custom code can be injected into a target process in the form of JavaScript code. Thus, the fuzzing harness is implemented in JavaScript and injected into bluetoothd. The manager is implemented in Python by using F IDA’s Python bindings. The test case generator radamsa acts as the input generation component [16].
3.4.2 Fuzzer Components
The fuzzer consists of two components, the manager, running on a computer, and the fuzzing harness running on an iOS device. These components work as follows:
Manager The manager is responsible for starting and maintaining the fuzzing process. It injects the fuzzing harness into the target process and handles the communication with it. In addition to that, it maintains a set of crashes that occurred during fuzzing, a corpus to derive inputs from, and the coverage that was achieved during fuzzing. New inputs are created by the input mutation component. Based on a seed it randomly mutates data from the corpus to derive new input data. As of now, the only supported input mutation is provided by radamsa [16], and we correct length, fragmentation and version fields as listed in Table 2.
Harness The fuzzing harness consists of two parts: a general fuzzing harness and a specialized fuzzing harness. The general fuzzing harness is responsible for all general operations required for fuzzing bluetoothd. It can create virtual connections and applies the necessary patches that are required for a stable fuzzing process. It also provides the means to collect code coverage and receives the fuzzing input from the manager. The specialized harness is specific to the target function and protocol that is to be fuzzed. It is responsible for preparing the received input and calling a protocol’s function handler, as well as any other handler-specific preparations.
3.4.3 Fuzzer Operation
ToothPicker is initialized with a corpus of valid protocol messages. More specifically, this corpus consists of function arguments for the fuzzed protocol handlers. The fuzzer then starts to collect the initial coverage by sending the initial corpus to the fuzzing harness, which, by using the specialized harness, executes the payloads. The collected coverage is then returned to the manager, which stores it for later use.
Once the initial coverage is collected, the actual fuzzing starts. Each iteration works as follows. The manager picks one of the entries in the corpus and sends it along with a seed value to the input mutator. The mutator then mutates the input and sends it back to the manager. Afterward, the manager proceeds by sending the input to the specialized fuzzing harness. If desired, it can mutate the input further. This is useful in cases with input fields that require deterministic values or length

fields that, for certain cases, should be correct. The specialized fuzzing harness can modify the input and send it back to the manager. Sending back the modified input before actually calling the function under test is required for cases where the target crashes as a result of the input. As the injected harness crashes together with the target, the modified input would be lost. Once the manager receives the modified input, the target function can be called with this input.
Usually, the protocol reception handlers within blue toothd run in a separate reception thread—in our case, this thread is called RxLoop. Since bluetoothd keeps operating normally except from hooked functions, the RxLoop continues calling functions within bluetoothd in case it receives data such as BLE advertisements. Any function call within the RxLoop could interfere with our fuzzing. On iOS, many actions could trigger sending or receiving Bluetooth packets. Thus, during fuzzing, the iPhone should be in Do not disturb mode, locked, and with Wi-Fi switched off to ensure stable fuzzing results. Also isolating the iPhone from other sources of interference by wrapping it in tinfoil can help.
The fuzzing harness runs in its own thread, which is a F IDA-specific behavior. This thread then calls the target function. This has the advantage that a custom exception handler can be implemented for this thread. If an exception occurs while fuzzing a function that would normally result in a crash, this handler can catch the exception and terminate gracefully without crashing bluetoothd. While the function is called, the harness is collecting basic block coverage. After calling the function, there are three possible outcomes:
1. Ordinary Return The function was executed successfully and returns. The collected coverage information is sent to the manager.
2. Exception The function execution results in an exception, which is caught and returned to the manager. The manager stores both the input and the exception type as a crash.
3. Uncontrolled Crash In case the target crashes in a thread or external component not controlled by the fuzzing harness via the F IDA exception handler, it will crash and generate a crash report. In this case, the exception cannot be sent to the manager. However, the manager detects a crash and can store the generated input as a crash. The corresponding iOS crash report can be manually gathered from the iPhone.
Even in the case of an exception, this might be a false positive. Therefore, it is important to verify the identified crashes. This is done by using an over-the-air fuzzer based on InternalBlue, which opens a connection and then replays crashes stored by the manager. This validation must be done while the inprocess fuzzer is not running. As with the general operation

R
R R R

of the over-the-air fuzzer, the payload should be sent while monitoring the device with PacketLogger.
In case of an uncontrolled crash, the iOS crash logs can be examined to determine the cause. These crashes are sometimes within F IDA-related threads and most of the time caused by non-fuzzed input to bluetoothd, for example in its RxLoop or StackLoop, with the latter being responsible for communication with the Bluetooth chip. When opening the scan dialog within the iOS settings, the external SpringBoard component tends to crash and sometimes even the sharingd daemon. These crashes typically do not reproduce over-theair, as they originate from inconsistent states introduced by ToothPicker itself.
3.5 Connection Management
A protocol handler only accepts payloads if it is convinced that an active connection for this protocol exists. In Bluetooth, most data is transferred based on ACL. Thus, one of the tasks the specialized harness has to handle is creating a forged ACL connection. In the first ToothPicker version, we tried copying physical connections, however, creating virtual connections turned out to be the better solution.
3.5.1 Copying Physical Connections
The entry point for most application data arriving from a Bluetooth connection is the ACL reception handler. All data received by a connected peer is handled by this function. Therefore, the ACL reception handler can be used to potentially fuzz any application-layer Bluetooth protocol. The ACL reception handler accepts three arguments. A Bluetooth connection handle, the length of the ACL data, and a pointer to the received data.
void acl_reception_handler(uint16_t handle , size_t length , void* data)
In the first ToothPicker version, we tried the following fuzzing strategy. First, we hooked the reception handler function. Then, we created a physical over-the-air Bluetooth connection. Afterward, we sent over-the-air ACL data to the target device. We copied the connection handle value of the physical connection structure and called the reception handler with this copy. The fuzzing harness would start calling the ACL reception handler with the stored connection and arbitrary ACL data. While this simple approach ensures that the proper data structures for a connection are in place, it has similar drawbacks as fuzzing over-the-air. First, a physical connection is required, and second, when one of the peers decides to terminate the connection, the allocated connection structure is destroyed and a new connection has to be created. This also implies hooking the function again and storing a new connection structure.

3.5.2 Creating Virtual Connections
Instead of copying physical connections, we virtualize connections. This can be done by calling the function that allocates the connection structure from our fuzzing harness. While reverse-engineering bluetoothd on iOS, we identified a function that is used to allocate exactly this connection. Due to the lack of symbols we call this function allocateACLConnection. This function can now be called using the specialized harness to create a forged ACL connection and corresponding handle. It accepts two arguments, the Bluetooth address of the peer and another value that is stored directly in a field of the ACL connection structure. While reverse-engineering and dynamically analyzing this structure, we found that this seems to be a field indicating the status of the handle. This status is mostly set to the value 0 when the ACL reception handler is called. Listing 1 shows how to create such a virtual ACL connection in F IDA.
The ACL reception handler requires the handle value as a first parameter. Therefore, we need to associate a known handle to our newly created connection structure. We chose 0x11 as it is within the range of handles that would usually be created on a physical connection. Based on this, F IDA scripts can call protocol-specific ACL reception handlers. For example, the fixed channel L2CAP protocols in bluetoothd can be fuzzed. Similar to creating an ACL handle, BLE handles and dynamic L2CAP channels can be created.
3.5.3 Stabilizing and Using Virtual Connections
A virtual connection can still be disconnected. We identified two functions that disconnect or destroy an ACL or BLE connection. We overwrite these functions to prevent our forged connections from being disconnected, such as the function OI_HCI_ReleaseConnection.
While hooking and replacing these functions prevents the connection structures from being destroyed, this disconnection prevention technique cannot be used for the over-the-air fuzzer. During an over-the-air fuzzing session, there exist four distinct representations of the connection, two for each of the peers. The first representation is used by the data structures
// Create a buffer for the Bluetooth address var bd_addr = Memory.alloc(6); // Resolve function address var base = Module.getBaseAddress("bluetoothd"); var fn_addr = base.add(symbols.allocateACLConnection); // Create function reference to call it from JavaScript var allocateACLConnection = new NativeFunction(fn_addr , "
pointer", ["pointer","char"]); // Write the Bluetooth address to memory bd_addr.writeByteArray([0xca ,0xfe ,0xba ,0xbe ,0x13 ,0x37]); // Call the function and create a forged ACL connection // If handle is != 0 then the call was successful var handle = allocateACLConnection(bd_addr , 0); // Set the connection’s handle value to 0x11 Memory.writeShort(handle , 0x11);
Listing 1: Creating a virtual ACL handle using F IDA.

R
R

that the Bluetooth stack creates, such as an ACL connection. These usually store information about the controller’s HCI handle and other application-layer information, such as open L2CAP channels. The second representation of the connection resides within the chip. The chip allocates an HCI handle the stack can use to reference the connection. The chip also holds additional state to keep the connection alive. Even if we manage to prevent bluetoothd from destroying the connection, we cannot easily control the other involved components. Thus, the in-process fuzzing variant with the virtual connection can be controlled the best.
4 Evaluation
The overall performance of ToothPicker is strongly limited by the computation power of the iPhone it is running on as well as potential concurrency issues within bluetoothd. Moreover, the dynamic F IDA-based instrumentation has significant performance drawbacks. In the following, we provide a coverage and performance analysis, options for performance optimization as well as details on compatible iPhones and iOS versions.
4.1 Coverage
Figure 3 shows how the basic block coverage increases over time for a combination of all corpora, which are 29 different inputs based on Table 2. Initially, the coverage increases very fast as the non-mutated corpora cover 1295 basic blocks. We use the same corpora within each run and reuse the same seed a couple of times. Even for the same seed the coverage differs due to concurrent operation within bluetoothd. However, the coverage still shows different tendencies depending on the initial seed. In case of an uncontrolled crash not caught

4,000

Crashes not caught by F IDA (aborts fuzzing, ×)

Coverage in basic blocks

3,000

2,000

Internal crashes (bluetoothd hangs)

1,000 0 0

seed=1925069456 seed=7134 seed=1925069456, 5 crashes blacklisted seed=0, no mutations

5,000

10,000 Time (seconds)

15,000

20,000

Figure 3: Coverage for a combination of all corpora, iOS 13.3.1 for an iPhone 7, manager on Intel Core i7-6600U.

R

R R
R

by F IDA, we abort the run. Uncontrolled crashes can be blacklisted by analyzing the iOS crash logs replacing the according functions with a return. While this significantly reduces uncontrolled crashes and might reach different parts of the bluetoothd binary, it also reduces the code coverage for the given inputs. As indicated by no longer getting any new basic block coverage feedback, bluetoothd can also hang without crashing as a result of fuzzing.
While we stopped execution upon a crash in Figure 3, ToothPicker can be restarted within the same configuration and continue based on the meanwhile increased corpus. Thus, running ToothPicker in a loop and killing bluetoothd from time to time automates the fuzzing process.
The overall coverage reached is small compared to the total number of basic blocks. The better runs in Figure 3 cover around 4 k basic blocks, which corresponds to 3 % of the blue toothd binary. Though, ToothPicker only fuzzes protocols prior to pairing, as pairing would require user interaction. Complex protocols like tethering and music streaming with a lot of data transfer and state handling are not considered. When checking the coverage information with Lighthouse, the protocol handlers under test are well-covered. Moreover, ToothPicker also discovers new handlers—CVE-2020-9838, discussed later in Section 5, is in a protocol not contained in any corpus of Table 2.
4.2 Speed and Bottlenecks
The major bottleneck originates from the F IDA instrumentation. Moreover, the radamsa-based input mutation is slow.
When running ToothPicker with radamsa mutations on the corpus, it reaches 25 inputs/s on average on an iPhone 7. However, without the mutations, it reaches 65 inputs/s on average. F IDA applies a trust-threshold on blocks it is dynamically instrumenting, which significantly increases performance [27]. Not mutating the input means that F IDA is always executing the same functions with the same inputs and, thus, executing the same basic blocks.
One radamsa input mutation, measured within the manager component of ToothPicker, takes about 8 ms on an Intel Core i7-4980HQ with 2.8 GHz running on macOS and 14 ms running on an Intel Core i7-6600U with 2.6 GHz on Linux. We compared two variants of this. The first opens radamsa as a subprocess to mutate the input (frizzer default implementation) and the second uses libradamsa with ctypes. Surprisingly, we found that the variant calling radamsa as subprocess is slightly faster. libradamsa’s lack of speed has been documented before [17]. When fuzzing with 25 inputs/s on average, this means that 8 ms radamsa input generation make up 20 % of the ToothPicker runtime. For comparison, directly reading inputs from a file within the ToothPicker manager takes 0.1 ms on the same machine. Using a different fuzzing engine, such as American Fuzzy Lop (AFL), will be considered for the next version of ToothPicker. An AFL ex-

Table 3: List of vulnerabilities identified with ToothPicker and their status as of July 2020.

ID MP1 MP2 MP7 MP8 L2CAP2 LEAP1 SMP1 SIG1

Description Ratchet AES SIV Hint Ratchet AES SIV Ratchet AES SIV Group Message Version Leak SMP OOB Missing Checks

Effect Crash Crash Crash Crash Crash Information Disclosure Partial PC Control Crash

Detection ToothPicker ToothPicker ToothPicker ToothPicker ToothPicker Manual ToothPicker ToothPicker

OS iOS 13.3–13.6, 14 Beta 2 iOS 13.3–13.6, 14 Beta 2 iOS 13.3–13.6, 14 Beta 2 iOS 13.3–13.6, 14 Beta 2 iOS 5–13.6, 14 Beta 2 iOS 13.3–13.5 iOS 13.3 iOS 13.3–13.5

Disclosure Oct 30 2019 Dec 4 2019 Mar 13 2020 Mar 13 2020 Mar 13 2020 Mar 31 2020 Mar 31 2020 Mar 31 2020

Status Not fixed Not fixed Not fixed Not fixed Not fixed Not fixed iOS 13.5, CVE-2020-9838 iOS 13.6, CVE-2020-9931

R R

tension for F IDA was not available when initially building ToothPicker but has been released in July 2020 [30].
Intuitively, fuzzing on newer iPhone models should be faster. We ported ToothPicker to an iPhone SE2, released in April 2020, as well as the iPhone 11, released in September 2019, which are much newer than the iPhone 7, released in September 2016. However, the iPhone SE2 and iPhone 11 feature an A13 CPU with Pointer Authentication (PAC). Thus, each F IDA NativeFunction call needs to be signed. These extra operations reduce the speed from 20 inputs/s to 14 inputs/s on average on both A13 devices when the manager runs on an Intel Core i7-6600U.
4.3 Increasing Jetsam Limits
A general issue that arises due to the in-process fuzzing is that the resource utilization of bluetoothd is much higher than usual. Jetsam, Apple’s out-of-memory killer, terminates processes taking too many resources [20]. Thus, we change the Jetsam configuration file to increase bluetoothd’s memory limit and set its priority to the maximum of 19, reducing terminations due to resource consumption.
4.4 Supporting Multiple iOS Versions
ToothPicker supports iOS 13.3, 13.3.1, 13.5 Beta 4, and 13.5 on an iPhone 7, iOS 13.5 on an iPhone SE2, and iOS 13.3 on an iPhone 11. Symbol locations within bluetoothd change with each iOS version. For the current ToothPicker version, 19 symbols have to be defined. Since the changes between those versions are minimal and most of them even contain the same print statements, they can be easily identified using BinDiff, Diaphora, or the Ghidra versioning tool [19, 24, 32].
5 Identified Issues
An overview of identified vulnerabilities is shown in Table 3. Note that ToothPicker has been initially developed to analyze MagicPairing, and thus, bugs discovered within MagicPairing and L2CAP have been described previously [15]. The first issues discovered by ToothPicker are still unpatched months after reporting. As Apple does not communicate details about

their patching timeline, we do not know when they will be fixed. However, the most severe bug discovered by ToothPicker, which allows RCE, was reported on March 31 and fixed in the iOS 13.5 release on May 20 as CVE-2020-9838. Moreover, Apple started fixing the DoS issues and addressed one as CVE-2020-9931 in the iOS 13.6 release on July 16. In the following, more details about the findings are provided.
SMP1: Security Manager Protocol Out-of-Bounds Jump
This vulnerability occurs in the reception handler of the Security Manager Protocol (SMP), which, as of iOS 13, is accessible via both Classic Bluetooth and BLE. The cause for this flaw is an incorrect check of the received protocol opcode value, which leads to an out-of-bounds read. The value that is read is treated as a function pointer. Listing 2 shows a pseudocode representation of the flawed opcode check. If the opcode 0x0f is sent, the bounds check is still valid. However, the global function table where the handlers for the specific opcodes reside only has 15 entries. As the table is indexed by zero, the 15th entry is out of bounds. The table is immediately followed by other data. As the tables for Classic Bluetooth and BLE are at different locations, the result is different, depending on which technology is chosen. We found that on iOS 13.3, the data following the Classic Bluetooth SMP function table is partially controllable by an attacker. More specifically,

opcode = data[0]

// Cet opcode from the input data

if ( opcode <= 0xf): // Flawed opcode check

// The handler is resolved from the function table

handler = GLOBAL_FUNCTION_TABLE[opcode]

// The code checks if the resolved pointer is not null

if handler != None:

handler (...)

Listing 2: Flawed SMP opcode check includes 0xf.

02 0 b00 0 e00 0 a00 0700 0f 414243444546474849

H4 Type: ACL

L2CAP Channel ID (CID)

ACL Handle+Flags

SMP Opcode

ACL Length

Payload

L2CAP Length

Figure 4: Malicious payload for SMP1 aka CVE-2020-9838.

it is followed by two 4 B values, an unknown value that we observed to be 0x00000001 and a global counter that is incremented for each Bluetooth connection that is created. Due to its layout, the connection counter determines the MSB of the address. Therefore, an attacker could crash bluetoothd and then create a specific amount of connections to form the address. However, we did not find a way to influence the other value. This leads to the lower four bytes of the address to be equal to 0x00000001, which is an unaligned address and thus crashes bluetoothd. In case it is possible to influence this value, or the layout of the memory following the function table changes, this flaw could potentially lead to a control flow hijack. The amount of preparation required for abusing this flaw is very high. This includes preparing or determining a useful address to jump to and also being able to precisely control the value.
This issue was fixed by Apple in iOS 13.5 and was assigned CVE-2020-9838. Figure 4 shows an ACL payload that triggers the vulnerability. Note that the handle needs to be adapted to the actual connection’s handle value. SMP1 affects both SMP over Classic Bluetooth and BLE. Classic Bluetooth is using the Channel ID (CID) 0x0007, while BLE is using the CID 0x0006 for SMP.
As shown in Table 2, we did not explicitly target the SMP protocol. The SMP1 vulnerability was found by the ACL fuzzing pass, which had a corpus of only L2CAP Signal Channel frames. This also shows that the input generation and mutation mechanism is capable of identifying new paths and protocol messages.
SIG1: Signal Channel Missing Checks
The Classic Signal Channel fuzzing identified three crashes that all originate from the same flaw. These crashes are a result of accessing memory at the location 0x00, 0x08, and 0x10. The crash occurs within the Configuration Request, and Disconnection Response frame handlers in the Signal Channel. These frames are all related to dynamic L2CAP channels. Thus, they contain a CID. To obtain the necessary data structure for an L2CAP channel with a given CID, a lookup is done via the ChanMan_GetChannel function. This returns the channel data structure for both dynamic and static L2CAP
// L2CAP channel structure obtained by CID status = ChanMan_GetChannel(channel_id , &channel); // If CID exists , obtain dynamic structure via channel if (status == 0) {
dyn_structure = get_dyn_structure(channel); // The dyn_structure pointer is not checked before
derefencing. The following line will try to dereference the value 0x10, leading to a crash. if (*( char *) (dyn_structure + 0x10) == 0 x06 && [...]) { [...] } }
Listing 3: Signal channel structure pointer dereference.

R R

channels. If the L2CAP channel is a dynamic channel, the data structure contains a field that points to another structure with additional information about the channel. In case of a fixed L2CAP channel this additional data structure does not seem to exist and the field value is 0. The Signal Channel parsing code in bluetoothd does not differentiate between static and dynamic L2CAP channels. Thus, the code falsely treats this null pointer as pointer to a structure and dereferences the fields at offset 0x00, 0x08, and 0x10, leading to crashes due to invalid memory accesses. A pseudocode representation of this flaw is shown in Listing 3. The example is taken from the reception handler of the Disconnection Response frame handler. Apple fixed this issue in iOS 13.6, however, only the payloads we provided for offsets 0x08 and 0x10 stopped crashing, our payload for 0x00 still crashes bluetoothd and might be a different issue.
6 Conclusion
The results of ToothPicker show that it is a powerful fuzzing framework that can discover new vulnerabilities within the iOS Bluetooth stack. Using virtual connections, it harnesses bluetoothd in a way that Apple likely did not apply when testing it internally. Thus, we were able to discover new vulnerabilities despite the speed limitations. Since ToothPicker runs as F IDA in-process fuzzer on an iPhone and dynamically rewrites code during runtime, it is comparably slow to what Apple could reach with internal testing tools and recompiling the stack. Our findings are quite concerning with regards to the omnipresence of Bluetooth within their ecosystem. We hope that the publication of tools like ToothPicker will improve the state of Bluetooth security in the long term.
Acknowledgments
We thank Apple for handling our responsible disclosure requests. Moreover, we thank Kristoffer Schneider for continuing the work on ToothPicker, Mathias Payer for shepherding this paper, Ole André V. Ravnås for the F IDA support, and Anna Stichling for the ToothPicker logo.
This work has been funded by the German Federal Ministry of Education and Research and the Hessen State Ministry for Higher Education, Research and the Arts within their joint support of the National Research Center for Applied Cybersecurity ATHENE.
Availability
ToothPicker is publicly available on GitHub: https:// github.com/seemoo-lab/toothpicker

Preparing to load PDF file. please wait...

0 of 0
100%
ToothPicker: Apple Picking in the iOS Bluetooth Stack