Code obfuscation

Code obfuscation is the process of transforming code into a more complex form that is harder to understand and reverse-engineer.

Any app can be hacked. It’s not a question of if, but when.

The fact that any app can be reverse-engineered may discourage developers from attempting to protect it. However, by following the best practices and incorporating basic protection mechanisms, will make reverse engineering the app incomparably more challenging, requiring substantially greater effort and more radical hacking measures.

This guide explores source code and machine code obfuscation techniques, as well as related code transformations that occur during compiling and building.

Compiling and building

The Xcode build system incorporates name mangling and symbol stripping techniques within its code compilation and optimization strategies. While both of these methods lead to what can be defined as code obfuscation, and hold an important role in the context of this guide, they serve a very different and unique purpose.

Name mangling

Name mangling is a compilation process that encodes additional information into the symbol names of functions, variables, and other identifiers. This information can include aspects like the module, type, and parameters involved with the symbol. The mangled names allow the Swift compiler to support features like overloading and namespaces while avoiding symbol clashes.

For example, a function func add(a: Int, b: Int) -> Int in a module named Math might be mangled to something like _T04Math3addySiSi_SitF, where the encoded characters contain information about the module, function name, and parameters.

Symbol stripping

Symbol stripping is a process during the building an app where unnecessary symbol information is removed from the binary to reduce its size and make reverse engineering more difficult. Symbols like function names, variable names, and other debug information that are useful during development but unnecessary in the final release version are removed or simplified.

For example, in a debug build, a symbol like _T07Example3add2a2bSi_SitF might be present, but in a release build with symbol stripping applied, this information may be completely removed or reduced to a minimal form, such as an anonymous symbol reference.

Digest:

It's important to note that not all identifiers can be stripped from the compiled binary, and some reverse engineering tools can easily decode mangled names.

Neither name mangling nor symbol stripping offer a robust defense against reverse engineering, and they can be bypassed with the appropriate tools.

Source code obfuscation

There are two distinct techniques used in source code obfuscation: identifier obfuscation and logic obfuscation.

Identifier obfuscation

Identifier obfuscation is the process of altering identifier names to conceal their purpose. If you have worked with JavaScript, you may already be familiar with identifier obfuscation through the use of minifiers in the app packaging process.

For example, renaming checkLicense function into a1b2c3, removes the semantic clue about what it does, making it impossible to tell from the function name alone what its purpose is.

While identifier obfuscation, name mangling, and symbol stripping all center on altering symbol names, identifier obfuscation is a unique practice carried out specifically to hide their purpose.

Data obfuscation

Data obfuscation involves transforming data structures, such as classes, structs, and enums, into different forms to erase any significant information about their original design or function, including their logical structure and relationships within the code. This may include encrypting literal values, scrambling the arrangement of data structures, and fully flattening them.

This approach can be particularly helpful when the build system fails or is unable to remove relevant symbol names, resulting in identifiers and related information being exposed in the compiled binary.

For example, flattening the enum will ensure that no data structure information leaks into the compiled binary.

// Original enum
enum State {
  case unpurchased
  case purchasing
  case purchased
}
// Flattened enum
let unpurchasedState = 0
let purchasingState = 1
let purchasedState = 2

Logic obfuscation

Logic obfuscation involves complicating the control flow and the operational logic of the code by transforming it into an equivalent, but more complex and less comprehensible version. Common techniques used in logic obfuscation include adding dead code, false dependencies, opaque predicates, and conditional jumps.

  1. Dead code: This involves inserting code sequences that have no effect on the program's operation and are never called or executed. These unused portions of code can increase the overall code size and complicate the control flow graph, without changing the program's behavior.

    func calculate(x: Int, y: Int) -> Int {
      let result = x + y // Real logic
      if x == Int.max { // Dead code block
        return x - y
      }
      return result
    }
  2. False dependencies: Introducing dependencies on variables or conditions that don't actually influence the outcome can make the code's execution flow more complex. This includes using variables in computations whose results are never used, creating an illusion of interdependence and complexity.

    var a = 5
    var b = a * 2 // Real dependency
    var c = a * 3 // False dependency, `c` is never used in the program
  3. Opaque predicates: These are conditions inserted into the code that are constructed in a way that is not immediately apparent and are either always true or always false. These predicates can lead to forks in the code where one path is never taken, allowing for the inclusion of additional obfuscating elements such as dead code or false dependencies.

    if 5 * 5 == 25 { // Always true, but not immediately obvious
      print("This is an opaque predicate example")
    }
  4. Conditional jumps: This technique introduces jumps in the execution flow based on conditions, real or artificial, leading to a nonlinear and confusing flow of control. Conditional jumps can be based on actual runtime values or can be created using opaque predicates, leading to a maze-like structure of the code that still performs the intended operation.

    var x = 10
    if (x > 5) {
      x *= 2
    } else {
      x += 1 // This block will never be executed, creates confusion in flow
    }

When using any logic obfuscation techniques, it is important to understand how compiler optimizations can affect the resulting machine code. The compiler may attempt to simplify control flow and remove unnecessary calculations during optimization, which can impact the effectiveness of obfuscation or even render it ineffective. When exercising logic obfuscation, compiler optimizations should be used with caution to maintain the effectiveness of obfuscation.

Digest:

Identifier obfuscation focuses on the context and semantics (the what), while logic obfuscation focuses on the process and control flow (the how) of the code.

Identifier obfuscation makes code more difficult to understand by removing meaningful context without altering the control flow.

Logic obfuscation, on the other hand, transforms the control flow into a maze-like structure and stands out as a superior countermeasure against reverse engineering.

While source code obfuscation can be a very effective defense against reverse engineering, it also increases complexity the code, reducing readability and maintainability.

Machine code obfuscation

Machine code obfuscation modifies binary code directly, without affecting the source code. This approach avoids the possibility of source code obfuscation being simplified or removed during the compiler optimization process.

  1. Control flow obfuscation: This technique manipulates the execution paths within the binary code through the insertion of spurious conditional jumps, loops, and opaque predicates. By altering the natural flow of the program, it creates a more intricate control flow graph without affecting the program's functionality.

  2. Instruction substitution: This method involves replacing certain instructions with equivalent but less recognizable or more convoluted ones. By doing so, the overall structure of the machine code is preserved, but the specific implementation is transformed, increasing the complexity of analysis.

  3. Register renaming: This technique restructures machine code by changing the naming or ordering of registers used in the computation, or by utilizing less commonly used registers for standard operations. This obfuscation alters the surface appearance of the code while maintaining its underlying logic, complicating reverse engineering tasks.

  4. Code transposition and code integration: Code transposition rearranges independent instructions, confusing the sequential logic, while code integration consolidates multiple instructions into more complex ones. These methods alter the spatial organization of the code, making static analysis more intricate without changing the underlying operations of the program.

Unfortunately, obfuscating machine code for Swift and Objective-C apps isn't straightforward due to the lack of publicly available tools, and often requires a deep understanding of machine code and the specific processor architecture being targeted.

This is a more specialized field of software security and often goes beyond the scope of general app development. Nonetheless, understanding these techniques can provide valuable insight into the lengths that can be taken to secure software at the machine code level.

Digest:

Machine code obfuscation can significantly impact app performance and should generally only be applied to sensitive parts of the program.

For platforms like macOS and iOS, where code signing is required, developers must ensure that obfuscation techniques do not violate the platform's code signing policies.