Microsoft Excel Workbook Encryption

Summary

I recently found myself considering how a non-technical user of Microsoft Windows in a business environment might go about encrypting text within a file. Since Microsoft Office has a rather ubiquitous presence in the workplace I decided to look into what it had to offer in terms of producing encrypted files. For the purpose of this post I’m focusing specifically on Microsoft Excel workbook files.

XLS Workbooks

Microsoft Office 2003 was the last version of Microsoft Office to favour file formats that were based on the OLE Compound File Binary Format. The file extension for Excel workbooks under this format was XLS.

At the time of writing (November 2021) it’s still possible to save a workbook as an XLS; indeed, Excel 2021 and Excel for Microsoft 365 (Version 2109, Build 16.0.14430.20314) provide the option to save a workbook using two variations of this format:

  1. Excel 97-2003 Workbook (*.xls)
  2. Microsoft Excel 5.0/95 Workbook (*.xls)

Saving a workbook using one of the two variations above will produce a file that’s faithful to the respective implementation of the format including adherence to its password protection specification. So in the case of Excel 97-2003 Workbook (*.xls), protection is achieved using RC4 encryption and MD5 hashing, whereas Microsoft Excel 5.0/95 Workbook (*.xls) uses XOR obfuscation. For further details see Microsoft Office encryption evolution: from Office 97 to Office 2019.

To demonstrate these differences I created two password protected XLS files in Excel 2021:

  1. Excel_97-2003_Workbook.xls saved using the file format Excel 97-2003 Workbook (*.xls).
  2. Microsoft_Excel_5.0_95_Workbook.xls saved using the file format Microsoft Excel 5.0/95 Workbook (*.xls).

Within each file I launched the Visual Basic Editor (Alt+F11), opened the Immediate Window (Ctrl+G) and obtained the workbook’s password protection related properties:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Debug.Print ActiveWorkbook.Name
Excel_97-2003_Workbook.xls

Debug.Print Application.Application + ", Version: " + Application.Version + ", Build:" Trim(Str(Application.Build))
Microsoft Excel, Version: 16.0, Build: 14326

Debug.Print ActiveWorkbook.PasswordEncryptionAlgorithm
RC4

Debug.Print ActiveWorkbook.PasswordEncryptionFileProperties
False

Debug.Print ActiveWorkbook.PasswordEncryptionKeyLength
 128 

Debug.Print ActiveWorkbook.PasswordEncryptionProvider
Microsoft Enhanced Cryptographic Provider v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Debug.Print ActiveWorkbook.Name
Microsoft_Excel_5.0_95_Workbook.xls

Debug.Print Application.Application + ", Version: " + Application.Version + ", Build:" Trim(Str(Application.Build))
Microsoft Excel, Version: 16.0, Build: 14326

Debug.Print ActiveWorkbook.PasswordEncryptionAlgorithm
OfficeXor

Debug.Print ActiveWorkbook.PasswordEncryptionFileProperties
False

Debug.Print ActiveWorkbook.PasswordEncryptionKeyLength
-1 

Debug.Print ActiveWorkbook.PasswordEncryptionProvider
Office

I also repeated the above using Excel for Microsoft 365 (Version 2109, Build 16.0.14430.20314) and the result was exactly the same except for line 5, where the output was: Microsoft Excel, Version: 16.0, Build: 14430.

An alternative way to obtain this information is to parse the file, which is something you’d need to do if you don’t have a copy of Excel and/or you don’t know a workbook’s password and therefore can’t launch VBA. I’m not going to cover this in detail but one such way is using the Python script office2john.py which is intended to be run from the command line and retrieve the password hash contained within a Microsoft Office file so it can be fed into the password cracking tool John the Ripper (specifically the community-enhanced, “jumbo” version). When I ran this script against Microsoft_Excel_5.0_95_Workbook.xls the output helpfully included Excel 95 XOR obfuscation detected. The output returned for Excel_97-2003_Workbook.xls wasn’t quite so helpful but by stepping through the code with some strategically placed print statements (aka printf() debugging), I was able to determine the encryption algorithm, key length and the encryption provider.

XLSX Workbooks

Since the release of Microsoft Office 2007, the default file formats in Microsoft Office have been based on Office Open XML. This lead to the introduction of a new password protection specification using AES encryption and SHA hashing. For Microsoft Excel workbooks, the adoption of Office Open XML resulted in the creation of a new file extension, XLSX.

VBA can be used to obtain the password protection related settings of an XLSX file in exactly the same way as an XLS file. Here’s an example of what’s returned against the XLSX workbook that was created in Excel 2021:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Debug.Print ActiveWorkbook.Name
Excel_Workbook.xlsx

Debug.Print Application.Application + ", Version: " + Application.Version + ", Build:" Trim(Str(Application.Build))
Microsoft Excel, Version: 16.0, Build: 14326 

Debug.Print ActiveWorkbook.PasswordEncryptionAlgorithm

Debug.Print ActiveWorkbook.PasswordEncryptionFileProperties
True

Debug.Print ActiveWorkbook.PasswordEncryptionKeyLength
 256 

Debug.Print ActiveWorkbook.PasswordEncryptionProvider

I also repeated the above using Excel for Microsoft 365 (Version 2109, Build 16.0.14430.20314) and the result was exactly the same except for line 5, where the output was: Microsoft Excel, Version: 16.0, Build: 14430.

You may have noticed that in the output above the PasswordEncryptionAlgorithm and PasswordEncryptionProvider properties are empty. These properties don’t appear to be populated in XLSX files. Regarding PasswordEncryptionAlgorithm, its absence is no great loss because we know it’s going to be AES. Regarding PasswordEncryptionProvider, I’ve struggled to find a proper definition of this property but for the the purpose of this blog post I don’t think it’s something we need to be concerned with.

What is of concern is that none of the password protection related properties exposed in VBA cover hashing which is a problem when dealing with XLSX files because the hashing implementation isn’t static, it’s subject to change between Office versions, so assumptions cannot be made. The aspects of hashing that I was specifically interested in obtaining was the algorithm and the number of iterations performed (hashing the password + salt and then iterating over itself n times).

In pursuit of answers I stumbled upon the ExcelTable README which helped tip me off that Office 2007 uses something called the Standard encryption method and subsequent versions use the Agile encryption method. I decided to confine my research to the Agile method only; to find out more I consulted the [MS-OFFCRYPTO]: Office Document Cryptography Structure documentation, the latest version of which was published on 2021-10-05, see: [MS-OFFCRYPTO]-211005.pdf which contains the following:

Equipped with this information, I proceeded to create a password protected XLSX file in Excel 2021 and inspected its EncryptionInfo stream using PowerShell as follows:

# 1. Extract workbook file using 7-zip.
& 'C:\Program Files\7-Zip\7z.exe' x <INPUT FILE> -o<OUTPUT FOLDER>

# 2. Read the 'EncryptionInfo' file and treated everything from the 9th character as XML.
$xml = [xml](Get-Content <OUTPUT FOLDER>\EncryptionInfo -Raw).Substring(8)

I then retrieved only the values that were of interest to me from the XML:

$xml.encryption.keyData | Select-Object cipherAlgorithm, keyBits, hashAlgorithm

cipherAlgorithm keyBits hashAlgorithm
--------------- ------- -------------
AES             256     SHA512       

$xml.encryption.keyEncryptors.keyEncryptor.encryptedKey | Select-Object cipherAlgorithm, keyBits, hashAlgorithm, spinCount | ft

cipherAlgorithm keyBits hashAlgorithm spinCount
--------------- ------- ------------- ---------
AES             256     SHA512        100000

I repeated the above using Excel for Microsoft 365 (Version 2109, Build 16.0.14430.20314) and the result was exactly the same.

If you’re wondering why the property names cipherAlgorithm, keyBits and hashAlgorithm appear twice in the XML then bear with me, I will attempt to explain this a bit later. What I will explain now is the purpose of spinCount (because it’s perhaps not immediately obvious from the name), it refers to the number of hash iterations performed.

In addition to Microsoft Office Pro Plus 2021 and Excel for Microsoft 365 (Version 2109, Build 16.0.14430.20314), I also happened to have access to the Pro Plus version of 2010, 2013 and 2016, so I decided to extract the password protection settings from an XLSX workbook created in each of these versions for comparison purposes:

$xml.encryption.keyData | Select-Object cipherAlgorithm, keyBits, hashAlgorithm

  cipherAlgorithm keyBits hashAlgorithm
Excel 2010 AES 128 SHA1
Excel 2013 AES 256 SHA512
Excel 2016 AES 256 SHA512
Excel 2021 AES 256 SHA512
Excel for Microsoft 365
Version 2109, Build 16.0.14430.20314
AES 256 SHA512

$xml.encryption.keyEncryptors.keyEncryptor.encryptedKey | Select-Object cipherAlgorithm, keyBits, hashAlgorithm, spinCount

  cipherAlgorithm keyBits hashAlgorithm spinCount
Excel 2010 AES 128 SHA1 100000
Excel 2013 AES 256 SHA512 100000
Excel 2016 AES 256 SHA512 100000
Excel 2021 AES 256 SHA512 100000
Excel for Microsoft 365
Version 2109, Build 16.0.14430.20314
AES 256 SHA512 100000

Regarding the XML containing two instances of the property names cipherAlgorithm, keyBits and hashAlgorithm, Chris Morgan has written a blog post on Default Encryption Settings and Behaviors for OneNote 2013 (Office 365) which explains that a user’s password is used to generate a key (referred to as the user key) which encrypts an intermediate key (produced from a random array of bytes) and it’s the intermediate key that’s responsible for encrypting the data. Knowing that there are two layers of encryption goes some way to explaining the repetition of these property names.

Here’s my attempt at trying to identify the role of each property based on Chris Morgan’s breakdown of the encryption process using a sample OneNote 2013 file:

When each occurrence of cipherAlgorithm, keyBits and hashAlgorithm contains the same value there’s really no need to try and understand the precise purpose that each serves. However, there may be occasions when the values differ, read on or a few such examples…

I wanted to know what effect (if any) manipulating an XLSX workbook in Excel 2021 would have on a file that was created in Excel 2010, so I conducted a few tests:

1) Action: Opened and closed the file.
   Outcome: No change in encryption settings.

2) Action: Added some text to a cell and saved the file.
   Outcome: No change in encryption settings.

3) Action: Changed the existing password.
   Outcome: See below.

$xml.encryption.keyEncryptors.keyEncryptor.encryptedKey | Select-Object cipherAlgorithm, keyBits, hashAlgorithm, spinCount | ft

cipherAlgorithm keyBits hashAlgorithm spinCount
--------------- ------- ------------- ---------
AES             256     SHA512        100000


$xml.encryption.keyData | Select-Object cipherAlgorithm, keyBits, hashAlgorithm

cipherAlgorithm keyBits hashAlgorithm
--------------- ------- -------------
AES             128     SHA1
              

4) Action: Removed the existing password and then added a new password.
   Outcome: See below.

$xml.encryption.keyEncryptors.keyEncryptor.encryptedKey | Select-Object cipherAlgorithm, keyBits, hashAlgorithm, spinCount | ft

cipherAlgorithm keyBits hashAlgorithm spinCount
--------------- ------- ------------- ---------
AES             256     SHA512        100000


$xml.encryption.keyData | Select-Object cipherAlgorithm, keyBits, hashAlgorithm

cipherAlgorithm keyBits hashAlgorithm
--------------- ------- -------------
AES             256     SHA512
             

I repeated the above using Excel for Microsoft 365 (Version 2109, Build 16.0.14430.20314) and the result was exactly the same.

The examples above illustrate an important point for system administrators which is just because you’ve deployed the latest version of Microsoft Office in your environment it doesn’t mean that all existing password protected files will automatically begin to use the AES and SHA capabilities it has to offer.

Excel for the Web

Excel for the web is a browser based version of the product that’s available free of charge with a Microsoft account or through a paid subscription using Microsoft 365.

I first began performing research for this blog post in early October 2021 and at that time the Microsoft Support article entitled Differences between using a workbook in the browser and in Excel stated that: “Workbooks that are protected (encrypted with password protection) cannot be viewed in a browser window. To edit, open the workbook in Excel on the desktop.”. Contrary to this claim, I found that password protected workbooks opened just fine using Excel for the web but I guess this limitation must have existed at some point in the product’s history. Anyhow, the article has since been updated and as at 2021-11-23 it now states: “Workbooks which are protected (encrypted with password protection) can be viewed and edited in Excel for the web.”.

Whilst opening existing password protected workbooks in Excel for the web is not a problem it’s not possible to modify a password or create a new password protected workbook.

Comments

Leaving comments has been disabled for this post.

Copyright © 2018 - 2022 thecliguy.co.uk
For details, see Licences and Copyright