Plists

We’re going to kick this off with a bit about Plist files. I was recently asked about the internal structure of Plist files and wasn’t happy with my answer, so I needed to know more. Below is what I found out. Ironically, the same person that asked me about Plist files is the one that told me to stop blogging, and in doing so created this blog-a-day challenge.

 

Plist files are found sprinkled throughout OS X and iOS and contain the various configuration settings and other information of use to the OS and applications. They are one of the features that was inherited from NeXTSTEP when it became the new core of Mac OS (along with application bundles and the Mail app). Plists are key/value pairs that are stored in either text or binary. The values can be one of the following data types:

Type  Used for
string ASCII or Unicode strings
data Binary data
date A Date, seconds since 2001-01-01T00:00:00Z
integer A whole number
real A number with a decimal point
boolean   True or False
array Array of any of these types
dict Dictionary, array of key/value pairs where value is any of these types

The text version is XML looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http//www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
   <key>BackupState</key>
   <string>new</string>
   <key>Date</key>
   <date>2014-08-12T18:06:28Z</date>
   <key>IsFullBackup</key>
   <false/>
   <key>SnapshotState</key>
   <string>finished</string>
   <key>UUID</key>
   <string>F29CEA33-8C0D-45BE-B9AE-6BDD05ACEE6B</string>
   <key>Version</key>
   <string>2.4</string>
</dict>
</plist>

But, XML is not a very efficient way to save data on disk (too much overhead created by all those < and >). So, Apple introduced a binary format. When opened in something that doesn't like it, it will look like this:

bplist00Ö    
TUUID\IsFullBackupWVersion[BackupStateTDate]SnapshotState_
$F29CEA33-8C0D-45BE-B9AE-6BDD05ACEE6BS2.4Snew3A¹š$¸NµXfinished'/;@Nuvz~‡             

Or in hex so we can see all the unprintables:

00000000: 62 70 6C 69 73 74 30 30 – D6 01 02 03 04 05 06 07 |bplist00        |
00000010: 08 09 0A 0B 0C 54 55 55 – 49 44 5C 49 73 46 75 6C |     TUUID\IsFul|
00000020: 6C 42 61 63 6B 75 70 57 – 56 65 72 73 69 6F 6E 5B |lBackupWVersion[|
00000030: 42 61 63 6B 75 70 53 74 – 61 74 65 54 44 61 74 65 |BackupStateTDate|
00000040: 5D 53 6E 61 70 73 68 6F – 74 53 74 61 74 65 5F 10 |]SnapshotState_ |
00000050: 24 46 32 39 43 45 41 33 – 33 2D 38 43 30 44 2D 34 |$F29CEA33-8C0D-4|
00000060: 35 42 45 2D 42 39 41 45 – 2D 36 42 44 44 30 35 41 |5BE-B9AE-6BDD05A|
00000070: 43 45 45 36 42 08 53 32 – 2E 34 53 6E 65 77 33 41 |CEE6B S2.4Snew3A|
00000080: B9 9A 8F 24 B8 4E B5 58 – 66 69 6E 69 73 68 65 64 |   $ N Xfinished|
00000090: 08 15 1A 27 2F 3B 40 4E – 75 76 7A 7E 87 00 00 00 |   '/;@Nuvz~    |
000000A0: 00 00 00 01 01 00 00 00 – 00 00 00 00 0D 00 00 00 |                |
000000B0: 00 00 00 00 00 00 00 00 – 00 00 00 00 90          |                |
    

Apple does include a utility to convert the binary files to XML that is native to the OS in v10.2+ and is available on Windows as part of the iTunes install (\Program Files (x86)\Common Files\Apple\Apple Application Support\plutil.exe). With it you can convert the file in place. Keep in mind that this converts the file in place meaning that the original binary gets overwritten by the new XML version. If you don't want this to happen (cause, eh, forensics), use the "-o path" option to specify a different output file name/location. Using the "-p" switch produces output that is "easy" for humans to read that looks like this: 

\>plutil.exe -p Status.plist
{
  "UUID" => "F29CEA33-8C0D-45BE-B9AE-6BDD05ACEE6B"
  "IsFullBackup" => 0
  "Version" => "2.4"
  "BackupState" => "new"
  "Date" => 2014-08-12 18:06:28 +0000
  "SnapshotState" => "finished"
}

In forensics, we like to know how our tools are getting the data they present us. So, how do WE read that binary file? Unfortunately, I had to go to source code to find the answer. There is a comment in this source code file that explains the structure of these files.  

First there is an 8 byte header that provides a signature and version number. The signature is always "bplist". So far, the version is "00", but this can change in the future. 

Next is a series of variable sized objects, with each object having a 1 byte header that provides an object type and length in bytes.

Last is a trailer containing a series numbers that consume 8-bytes each that provide us some tips for reading the plist.

 

To help us fully understand how binary plists work, let’s take the binary above and carve it into its elements.

00000000: 62 70 6C 69 73 74                                 |bplist          |

The first 6 bytes are "bplist" and provide the magic signature that identifies this as a binary plist file.

00000006: 30 30                                             |00              |

The next two bytes provide us a version number so we know which format this plist will follow. So far, this is always "00" but could change in the future, someday, maybe.

Now we start reading the objects. 

00000008: D6 01 02 03 04 05 06 07 – 08 09 0A 0B 0C          |                |

The first object's first byte is xD6, which tells us this is a dict object with 6 elements. If we consider a dict a special type of array that consists of key/value pairs, this means that it will contain 12 objects for the 6 keys and 6 values. The next 12 bytes provide object reference numbers to those 12 objects so programs reading this can refer to the objects by number. All plists will have a top level object that is a dict. 

The objects in a dict are key/value pairs with all the keys listed first and values last, in order, respectively. So, objects x01 and x07 go together, and x02/x08, and x03/x09, and so on.

Objects in an array are similarly structured, using a xAn marker where n is the number of elements. Arrays only contain values, though, thus have half the number of elements as a dict and do not have an offset table (more on that below). The code also references a set type using a xBn marker that is structured exactly like an array, but no other documentation or plist editors I've looked at include a set as a data type. 

Object references are global throughout the file, so an array or child dict within the main, top-level dict will not restart their numbering at x01. If the array/dict is in the middle of the top-level dict, then the top-level will skip numbering to account for it. For example, let's sidetrack over to this example:

    : 62 70 6C 69 73 74 30 30 – D2 01 02 03 08 57 73 75 |bplist00     Wsu|
    : 62 64 69 63 74 56 73 74 – 72 69 6E 67 D2 04 05 06 |bdictVstring    |
    : 07 54 73 75 62 31 54 73 – 75 62 32 53 61 62 63 53 | Tsub1Tsub2SabcS|

The two object references from the main and child dict show us these objects:
    dict
    01   subdict key
    02   string key
    03   subdict value
       04    sub1 key
       05    sub2 key
       06    sub1 value
       07    sub2 value
    08   string value

Part of the reason for the unique object references is to provide a way to prevent repeating the same objects over again. Any time there are multiple objects that are identical, the object itself will only be written one and each object reference list will refer to it rather than repeating it over again. 

00000015: 54 55 55 49 44                                    |TUUID           |
0000001A: 5C 49 73 46 75 6C 6C 42 – 61 63 6B 75 70          |\IsFullBackup   |
00000027: 57 56 65 72 73 69 6F 6E                           |WVersion        |
0000002F: 5B 42 61 63 6B 75 70 53 – 74 61 74 65             |[BackupState    |
0000003B: 54 44 61 74 65                                    |TDate           |
00000040: 5D 53 6E 61 70 73 68 6F – 74 53 74 61 74 65       |]SnapshotState  |

These six are all the same type (string), so I'll only describe them once. 

A string object's first byte is x5n where 5 means ASCII string and n tells us how many characters to read. With ASCII, each character is one byte, so this is fairly straight forward. There is also a Unicode string type that uses a x6n marker; remember to read n x 2 bytes to get the Uint16_t for each character.

Incidentally, that was six objects, so now I expect the next 6 objects to the be the values that go with the above named keys in the order listed.

0000004E: 5F 10 24 46 32 39 43 45 – 41 33 33 2D 38 43 30 44 |_ $F29CEA33-8C0D|
        : 2D 34 35 42 45 2D 42 39 – 41 45 2D 36 42 44 44 30 |-45BE-B9AE-6BDD0|
        : 35 41 43 45 45 36 42                              |5ACEE6B         |

This is still a x5n marker telling us it is also an ASCII string, but it is a little different. The F in x5F tells us this string is longer than 15 characters, thus one nibble can't tell us the length. So, we read the next few bytes to get length then read from there. The big side of the first byte that tells us the length tells us how many bytes the length number is. If the hex value of that first nibble is 0 the number is 1 byte, if 1 is 2 bytes, if 3 then 4 bytes, and if 4 than it is 8 bytes. So, x1024 means we read two bytes as x0024, which is 36 characters. 

Several data types (string, data, array, set, and dict, namely) use this if their data gets longer than 15 somethings. 

00000075: 08                                                |                |

Boolean objects are x08 = true and x09 = false. So, for our example, "IsFullBackup = true".

00000076: 53 32 2E 34                                       |S2.4            |
0000007A: 53 6E 65 77                                       |Snew            |

A couple of ASCII strings that we already know how to read.

0000007E: 33 41 B9 9A 8F 24 B8 4E – B5                      |3A   $ N        |

Dates are identified with the marker x33 and the next 8 bytes are a big endian 8-byte float that denotes the number of seconds since 2001-01-01T00:00:00Z (Jan 1, 2001). 

00000087: 58 66 69 6E 69 73 68 65 – 64                      |Xfinished       |

One last string brings us to the sixth value and the end of the dict.

00000090: 08 15 1A 27 2F 3B 40 4E – 75 76 7A 7E 87          |   '/;@Nuvz~    |

This is an offset table that tells us at what offsets into the file we will find all of the objects. This is global, like the object references in the dict and array types, thus it shows location of all objects regardless of their placement in the tree.

0000009D: 00 00 00 00 00 00 01 01                           |                |

Six bytes of x00 padding followed by two bytes that tell us the size of the entries in the offset table (immediately above) and the object reference list (the index numbers at the start of the dict or array objects).

000000A5: 00 00 00 00 00 00 00 0D                           |                |

This tells us the number of entries in the offset table, and thus the number of objects in the file. 

000000AD: 00 00 00 00 00 00 00 00                           |                |

This number tells us the element number in offset table that points to the top level dict object.

000000B5: 00 00 00 00 00 00 00 90                           |                |

This number tells us the offset to the offset table.

For slightly faster reference, here is a table of the various markers for each of the data types. 

    type    binary       hex        meaning        
    null    0000 0000
    bool    0000 1000    x09        false
    bool    0000 1001    x08        true
    fill    0000 1111    x0F        fill byte
    int     0001 nnnn    x1n        Integer, # of bytes is 2^n
    real    0010 nnnn    x2n        Floating Point Number, # of bytes is 2^n
    date    0011 0011    x33        Date, 8-byte float, # of seconds since 2001-01-01
    data    0100 nnnn    x4n        Binary data, n is # of bytes or F if count follows
    string  0101 nnnn    x5n        ASCII string, n is # of chars or F if count follows
    string  0110 nnnn    x6n        Unicode string, n is # of chars or F if count follows 
    uid     1000 nnnn    x8n        n+1 is # of bytes (only used by NSKeyedArchiver)
    array   1010 nnnn    xAn        objref* // n is count, unless F if count follows
    set     1100 nnnn    xCn        objref* // n is count, unless F if count follows
    dict    1101 nnnn    xDn        keyref* objref* // n is count, unless F if count follows

 

 

References:

http://fileformats.archiveteam.org/wiki/Property_List/Binary

http://opensource.apple.com/source/CF/CF-550/CFBinaryPList.c

http://www.appleexaminer.com/MacsAndOS/Analysis/PLIST/PLIST.html

https://developer.apple.com/library/ios/documentation/Cocoa/Conceptual/PropertyLists/AboutPropertyLists/AboutPropertyLists.html#//apple_ref/doc/uid/10000048i-CH3-SW2

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s