The following is a puzzle constructed by Trinity
Somebody has transmitted one of the personnel files of a fictitious company from their corporate network out to a host on the internet, in contravention of the IT policy of the company. The personnel files are held in unencrypted and un-encoded txt files on the company servers in a secure location, and the following screen shot shows a listing of the directory in question
Unfortunately the company were not using the services of e2e-assure to protect their IT infrastructure (apologies for the blatant advert for our services, but marketing made me do it) and when e2e-assure were called in to investigate, the only piece of evidence that could be recovered regarding the theft of the file from the system was the following partial packet capture, which emanated from a piece of malware which was encoding the file contents before transmitting them:
The ASCII from the partial packet capture is :
From the evidence presented here, e2e-assure were able to determine which personnel file had been stolen.
Whose file was stolen, and how do you know ?
Answer from Trinity
Looking at the content of the partial packet capture, the first step is to recognise that the content is encoded using the base64 system, with the ‘=’ sign at the end of the text being one of the tell tale signs that gives away base64 encryption. The next logical step in trying to solve the puzzle is to decode the encoded text, which can be done using an online base64 decoding webpage. When you do this, you find that the text in the pcap decodes to onnel file is private and confidential. Any unauthorised use of this file is prohibited by law.
This text is non user specific and could have come from any personnel file, so at first glance it appears that there is no way of solving the puzzle. However, a full understanding of how base64 encoding works will help us make some progress
Base64 encoding works by taking 3 single byte characters from the source file and encoding them into 4 characters of base64 encoding. The following diagram explains the system in terms of bytes and bits for base64 encoding the letters ‘abc’ into the resulting ‘YWJj’:
The 3 bytes of input characters (24 bits) are converted to 4 base64 characters (also 24 bits) and the mathematics within the encoding balances. However, this system comes across a problem if the length of the input data is not divisible by three, because there is then no way to encode the final one or two characters at the end of the input stream. Base64 encoding gets around this by adding padding characters to the end of the input so that the data input length + the padding characters is always divisible by three. This is what can produce the tell-tale ‘=’ sign at the end of the encoded text, however if you consider the maths involved, it is equally likely that it produces two padded characters at the end of the encoded text ‘==’, or none at all.
Back to the puzzle, and we can now see that the ‘=’ sign at the end of the encoded text not only indicates base64 encoding, but also tells us that the original file from which the encoding was derived must have had one padding character added to the end of it to make it divisible by 3. So we now know that the original file length divided by 3 must give a remainder of 2, or put another way, after dividing the file length by 3, you must end up with a decimal of .666666. So, if you divide through all of the file lengths shown in the directory in figure 1, you will find that the only file that would have needed just one padding character in the encoding process would have been Paul.txt.
Keep an eye out for more puzzles in the future.