PHP: Getting The Real File Type
A couple of days ago I had to make a file upload system for one of my client. He wanted to upload images to a gallery. The only image file types that users should be able to upload are jpeg, png, and gif.
I started to design the application’s logic. Then I realized that the file’s mime type given by the client form varies. It is based on the files extension. So for example a GIF file named sample.gif.jpg would have a mime type of image/jpeg which is not very good.
This method is very unreliable. I started to Google for alternatives. I did found a PECL extension called Fileinfo, but it’s not in the PHP distribution by default, you have to download the source and compile it into the PHP package. This is a major drawback.
So I created my own little file-type-getting function. Every file you wish to check for has specific markers in their header. Let’s take jpeg for ex. it has a start of image (SOI) marker at the beginning of the file (offset 0) 2 bytes long with a hex value of FFD8.
The function to check this is below:
function is_jpeg($file_path)
{
$ret_val = false;
$fp = fopen($file_path,'r');
$raw_marker = fread($fp,2);
$marker = unpack("H*",$raw_header);
if (intval($marker[1],16) == 0xFFD8)
{
$ret_val = true;
}
return $ret_val;
}
And the explanation:
$fp = fopen($file_path,'r'); $raw_marker = fread($fp,2);
The first line openes the desired file for reading.
The second line reads the first 2 bytes from the file which in case of jpeg format is the marker.
Note that a markers doesn’t always start at the beginning of the file. For ex. the tar archive format’s marker starts at position 257 and it is 5 bytes long.
If the marker is not at the beginning of the file we should seek to the specified position first with fseek ($fp,257) In the case of tar archives.
$marker = unpack("H*",$raw_header);
Now the real magic is this line. unpack is a PHP specific function, it extracts data from a binary string into an array by a specified format. In our case the format is H* which means to extract the data in hex (high nibbles first). * means to the end of the string.
You can extract data in other formats too. Here’s a list of available formats.
if (intval($marker[1],16) == 0xFFD8)
{
$ret_val = true;
}
This checks if the marker is a HEX value of FFD8 (Note that in PHP you write hex values with 0x format). The intval function gets the integer from a variable in our case it converts it to hex. (base 16).
Note: I haven’t figured out why but unpack returns arrays always starting at index 1 instead of index 0 (maybe bug, i don’t know).
Advantage of using this method: you can get any other information from the file not just file markers. For ex. image width, height, etc.
Disadvantage would be that it accesses the files, which is very process intensive. If you’re working with files on server side extensively than I recommend using the Fileinfo extension, because it compiles into the PHP distribution and it is faster.
For other file header information you can visit this great site called Filext.

