In this post, we will find the file type for a file using Magic Number.
What is a Magic Number?
From Wikipedia Magic Number
Magic Number is a constant numerical or text value used to identify a file format or protocol.
Magic number is a hex number occupying a few bytes at the beginning of the file and indicates the type of content, but is not visible to users.
Some users may be thinking why not simply check the file extension to find the file type. Yes, We made the same mistake!
In common scenarios this would be enough to check by file extensions, but we faced this scenario where
we can not trust on file extension. The problem we faced during our website's Penetration Testing. There is a form which let user to upload files and it is required that user can only upload PDF
files. We added a check based on file extension and made it vulnerable.
During the Penetration Test, our QA team has found this vulnerability, hackers can upload even exe
files by renaming the target file to PDF
,
like some-dangerous-file.exe.pdf
. If you are checking by file extension then this file will get successfully uploaded on server.
In order to correctly find the content type of a file,
we have to check Magic Number of target file.
Since Magic Number can vary in length for different file types, for example, 5 bytes (4D-5A
) for exe
file and 14 digits (25-50-44-46-2d
) for pdf
file.
In this example, I am reading first 20 bytes to find magic number, you may need to read more bytes for magic number if the file format you are targetting has the magic number with length greater than 20.
Lets move to the code segment, in this sample I have written two functions. IsMagicNumberMatched()
is the method doing the real work to check magic number for the file-path passed as parameter.
GetAuditorOpinionFromFile()
is the wrapper method to test magic numbers for different files. I am writing here 4 common file types exe
, pdf
, xml
and rar
.
In the end, this method will return status message as string, to display the Magic Number status if it is matched with the parameter we passed.
public static string GetAuditorOpinionFromFile()
{
string filePath_EXE = @"C:\SOME_PATH_TO\MyFile.exe";
string filePath_PDF = @"C:\SOME_PATH_TO\MyFile.pdf";
string filePath_XML = @"C:\SOME_PATH_TO\MyFile.xml";
string filePath_RAR = @"C:\SOME_PATH_TO\MyFile.rar";
Dictionary numberList = new Dictionary();
numberList.Add("exe", "4D-5A");
numberList.Add("pdf", "25-50-44-46-2d");
numberList.Add("xml", "3c-3f-78-6d-6c-20");
numberList.Add("rar", "52-61-72-21-1A-07-00");
StringBuilder sb = new StringBuilder();
sb.AppendFormat("File Path: {0}, File Magic No: {1}, IsMatched: {2}", filePath_EXE, numberList["exe"], IsMagicNumberMatched(filePath_EXE, numberList["exe"])).AppendLine();
sb.AppendFormat("File Path: {0}, File Magic No: {1}, IsMatched: {2}", filePath_PDF, numberList["pdf"], IsMagicNumberMatched(filePath_PDF, numberList["pdf"])).AppendLine();
sb.AppendFormat("File Path: {0}, File Magic No: {1}, IsMatched: {2}", filePath_XML, numberList["xml"], IsMagicNumberMatched(filePath_XML, numberList["xml"])).AppendLine();
sb.AppendFormat("File Path: {0}, File Magic No: {1}, IsMatched: {2}", filePath_RAR, numberList["rar"], IsMagicNumberMatched(filePath_RAR, numberList["rar"])).AppendLine();
return sb.ToString();
}
private static bool IsMagicNumberMatched(string filePath, string candidateMagicNo)
{
BinaryReader reader = new BinaryReader(new FileStream(Convert.ToString(filePath), FileMode.Open, FileAccess.Read, FileShare.None));
////set start position = 0, and read first 20 bytes. for some other with magic number length greater than 20, you may need to read more bytes.
reader.BaseStream.Position = 0x0;
byte[] data = reader.ReadBytes(20);
//close the reader
reader.Close();
//convert bytes data to string in hex format
string string_data_as_hex = BitConverter.ToString(data);
// substring to select first (n) characters from hexadecimal array
string currentMagicNo = string_data_as_hex.Substring(0, candidateMagicNo.Length);
return currentMagicNo.ToLower() == candidateMagicNo.ToLower();
}
I hope you find this post helpful, I welcome your comments or suggestions to help improve this post.
Resources: