When reverse engineering Linux-based firmware images the following methodology usually works pretty well:
- use Binwalk to identify different parts of a firmware image by their magic signatures
- use dd to split the firmware image apart
- unpack parts / mount/extract the filesystem(s)
- find interesting config files/binaries
- load ELF binaries into your favorite disassembler
- start looking at beautiful MIPS/ARM/PPC ASM
This approach unfortunately didn’t work when I looked at firmware images for a broadband router called ‘EasyBox 803’ distributed by Vodafone Germany (formerly Arcor). Apart from two LZMA-packed segments containing information irrelevant for my research didn’t find anything useful in the firmware image at first.
As I had to confirm a major vulnerability (default WPA keys based on ESSID/BSSID [1][2]) I didn’t give up at this point. But let’s start right at the beginning …
I obtained a firmware update file for the EasyBox 803 from Vodafone’s support page. A Google search reveals the following:
- the device is manufactured by Astoria Networks, which is the German subsidiary of the Taiwanese company Arcadyan
- there are tools available for unpacking Arcadyan firmware (SP700EX, arcadyan_dec)
- Arcadyan uses obfuscation (xor, swapping bits/bytes/blocks) to thwart analysis of their firmware files
- Arcadyan devices don’t run Linux, instead they have their own proprietary OS
- MIPS big endian is their preferred architecture
I tried to unpack the firmware file with the tools I found, but although they can deobfuscate the firmware of other Arcadyan devices, they could not do the same for mine. Nevertheless the tools helped me in understanding the layout of my firmware image. It basically consists of several sections which are concatenated. From a high-level view a section looks like this:
(relevant words marked with '||') beginning of section: 00000000h:|32 54 76 98|11 AF 99 D3 AC FF EA 6C 43 62 39 C8 ; 2Tv˜.¯™Ó¬ÿêlCb9È ... end of data: 001e83a0h: 0C D8 A3 4A|CD AB 89 67|EE 50 66 2C 53 00 15 93 ; .Ø£JÍ«‰gîPf,S..“ ... end of section: 001e83e0h: 0A 01 EF 8A 73 58 DE 85 00 00 00 00|FF FF FF FF|; ..ïŠsXÞ…....ÿÿÿÿ 001e83f0h:|FF FF FF FF|A4 83 1E 00|78 56 34 12|5E E2 53 5F|; ÿÿÿÿ¤ƒ..xV4.^âS_ beginning of next section: 001e8400h:|32 54 76 98|82 FF 4D 9D CF 6A 95 5E B0 5C 96 7F ; 2Tv˜‚ÿMÏj•^°\– ...
After I miserably failed at recognizing the obfuscation method just by looking at the hexdump I had to move on. I suspected that the deobfuscation is handled by the bootloader itself, so that was the next thing I wanted to look at. Luckily Vodafone had to update the bootloader for Easybox 802 (predecessor to EasyBox 803) to enable some random functionality and kindly provided a copy, otherwise dumping the flash would have been necessary.
unzip_fw looks like this:
As the deobfuscation and LZMA unpacking is indeed handled by the bootloader, I reversed and reimplemented their fancy deobfuscation routine (deobfuscate_ZIP3):
#include<stdio.h> #include<stdlib.h> #include<string.h> //xor chars in str with xorchar void xor(unsigned char* bytes, int len, char xorchar) { int i; for (i = 0; i < len; i++) { bytes[i] = bytes[i] ^ xorchar; } } //swap high and low bits in bytes in str //0x12345678 -> 0x21436578 void hilobswap(unsigned char* bytes, int len) { int i; for (i = 0; i < len; i++) { bytes[i] = (bytes[i] << 4) + (bytes[i] >> 4); } } //swap byte[i] with byte[i+1] //0x12345678 -> 0x34127856 void wswap(unsigned char* bytes, int len) { int i; unsigned char tmp; for (i = 0; i < len; i += 2) { tmp = bytes[i]; bytes[i] = bytes[i + 1]; bytes[i + 1] = tmp; } } int main(int argc, char *argv[]) { unsigned char* buffer; unsigned char* tmpbuffer[0x400]; size_t insize; FILE *infile, *outfile; if (argc != 3) { printf("usage: easybox_deobfuscate infile outfile.bin.lzma\n"); return -1; } //read obfuscated file infile = fopen(argv[1], "rb"); if (infile == NULL) { fputs("cant open infile", stderr); return -1; } fseek(infile, 0, SEEK_END); insize = ftell(infile); rewind(infile); buffer = (unsigned char*) malloc(insize); if (buffer == NULL) { fputs("memory error", stderr); exit(2); } printf("read \t%i bytes\n", fread(buffer, 1, insize, infile)); fclose(infile); printf("descrambling file ...\n"); //xor HITECH xor(buffer + 0x404, 0x400, 0x48); xor(buffer + 0x804, 0x400, 0x49); xor(buffer + 0x4, 0x400, 0x54); xor(buffer + 0x404, 0x400, 0x45); xor(buffer + 0x804, 0x400, 0x43); xor(buffer + 0xC04, 0x400, 0x48); //swap 0x4 0x404 memcpy(tmpbuffer, buffer + 0x4, 0x400); memcpy(buffer + 0x4, buffer + 0x404, 0x400); memcpy(buffer + 0x404, tmpbuffer, 0x400); //xor NET xor(buffer + 0x4, 0x400, 0x4E); xor(buffer + 0x404, 0x400, 0x45); xor(buffer + 0x804, 0x400, 0x54); //swap 0x4 0x804 memcpy(tmpbuffer, buffer + 0x4, 0x400); memcpy(buffer + 0x4, buffer + 0x804, 0x400); memcpy(buffer + 0x804, tmpbuffer, 0x400); //xor BRN xor(buffer + 0x4, 0x400, 0x42); xor(buffer + 0x404, 0x400, 0x52); xor(buffer + 0x804, 0x400, 0x4E); //fix header #1 memcpy(tmpbuffer, buffer + 0x4, 0x20); memcpy(buffer + 0x4, buffer + 0x68, 0x20); memcpy(buffer + 0x68, tmpbuffer, 0x20); //fix header #2 hilobswap(buffer + 0x4, 0x20); wswap(buffer + 0x4, 0x20); //write deobfuscated file outfile = fopen(argv[2], "wb"); if (outfile == NULL) { fputs("cant open outfile", stderr); return -1; } printf("wrote \t%i bytes\n", fwrite(buffer + 4, 1, insize - 4, outfile)); fclose(outfile); printf("all done! - use lzma to unpack"); return 0; }
You can see that it would have been impossible to understand how the obfuscation works without looking at the actual assembly. Luckily this routine also works for EasyBox 803.
Let’s unpack first segment, which is the biggest one and therefore most likely to contain code.
>fdd if=dsl_803_752DPW_FW_30.05.211.bin of=dsl_803_s1_obfuscated count=0x1e83a4 count : 0x1e83a4 1999780 skip : 0x0 0 seek : 0x0 0 1999780+0 records in 1999780+0 records out 1999780 bytes (2.00 MB) copied, 0.009540 s, 199.90 MB/s >easybox_deobfuscate dsl_803_s1_obfuscated dsl_803_s1.bin.lzma read 1999780 bytes descrambling file ... wrote 1999776 bytes all done! - use lzma to unpack >xz -d dsl_803_s1.bin.lzma >l dsl_803_s1* -rw-r--r--+ 1 stefan None 8.3M 6. Sep 11:27 dsl_803_s1.bin -rw-r--r--+ 1 stefan None 2.0M 6. Sep 11:25 dsl_803_s1_obfuscated dsl_803_s1.bin: 00000000h: 40 02 60 00 3C 01 00 40 00 41 10 24 40 82 60 00 ; @.`.<..@.A.$@‚`. 00000010h: 40 80 90 00 40 80 98 00 40 1A 60 00 24 1B FF FE ; @€.@€˜.@.`.$.ÿþ 00000020h: 03 5B D0 24 40 9A 60 00 40 80 68 00 40 80 48 00 ; .[Ð$@š`.@€h.@€H. 00000030h: 40 80 58 00 00 00 00 00 04 11 00 01 00 00 00 00 ; @€X............. 00000040h: 03 E0 E0 25 8F E9 00 00 03 89 E0 20 00 00 00 00 ; .àà%é...‰à .... 00000050h: 00 00 00 00 00 00 00 00 24 04 40 00 24 05 00 10 ; ........$.@.$... 00000060h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................ 00000070h: 3C 06 80 00 00 C4 38 21 00 E5 38 23 BC C1 00 00 ; <.€..Ä8!.å8#¼Á.. 00000080h: 14 C7 FF FE 00 C5 30 21 00 00 00 00 00 00 00 00 ; .Çÿþ.Å0!........ 00000090h: 00 00 00 00 00 00 00 00 24 04 40 00 24 05 00 10 ; ........$.@.$... ...
Now we can load the file into IDA. This sounds easier than it is, because the unpacked firmware segment is raw code (mipsb) and data without information about segmentation, like you would have when dealing with a PE or ELF binary.
Continued in E02: Reverse engineering an obfuscated firmware image – analysis
Note: Most of this research was conducted several months ago and my findings were probably not in this particular order. – I think it just makes more sense presenting it this way.
Note²: fdd is my silly Python implementation of dd. It takes HEX-offsets and has bs=1 by default.
Note³: Make sure to comply with Vodafone’s terms of use.