#1  
Old 31st August 2011, 19:08
Brian_A Brian_A is offline
Member
 
Join Date: Nov 2008
Location: Paris France
Posts: 31
Thanks: 8
Thanked 3 Times in 2 Posts
Default UTF-8 BOM and PHP

We had a site that must be internationalized, to be available in several European languages, so we used UTF-8 string encoding throughout. This, however, was not without a certain number of headaches having all types of display issues in the browsers; mainly extra line spaces showing up that were not obvious from the html source, and of course IE (7, 8 &9) going into quirks mode. The problem turns out to be that we had BOMs. http://en.wikipedia.org/wiki/Byte_order_mark

So to provide some possible help to other that go down this route here is what we found.

1. If you have any UTF-8 encoded file that contains a BOM anywhere included in your page generation script, PHP will add a BOM to the resulting file or output stream. This means if you read-in or otherwise include another PHP or any text file, concatenate a file with other text, reference a JavaScript file, css file, echo, or copy or read-in an html template whatever you like to do, if any of these files contains a BOM then PHP will include the BOM in the final result.
2. None of the browsers we tested; FireFox, Chrome, and IE support the BOM.
3. If you use MS windows notepad to save a UTF-8 file it will automatically add a BOM. So NEVER ever use notepad. Of course the browser with the biggest problems with the BOM is IE.
4. We use Netbeans as an IDE. If a file contains a BOM and you edit and save it with Netbeans it will still contain the BOM. If you copy/paste a file in Netbeans that has a BOM the result will also have a BOM. If you start a new UTF-8 file in Netbeans it will not have a BOM.
5. So how did we identify this problem? The browser will identify the encoding from the meta tag if it is present. <meta http-equiv = "Content-Type" content = "text/html; charset=UTF-8"/>,. We loaded the page (in FireFox) then told it to change the encoding to ISO-8859-1. The BOM will show up before the <!DOCTYPE HTML> at the beginning of the file as 3 strange marks.
6. So how did we how find out which of our files had BOMs. We used the following code. We didn’t write this, we found it on another site but we did not make a note of the author. So if the original author sees this post please feel free to add your credit or add a post and I will do it for you.

PHP Code:
<?php
// Tell me the root folder path.
// You can also try this one
// $HOME = $_SERVER["DOCUMENT_ROOT"];
// Or this
// dirname(__FILE__)
//$HOME = dirname(__FILE__);
$HOME $_SERVER["DOCUMENT_ROOT"].'/V2';
// Is this a Windows host ? If it is, change this line to $WIN = 1;
$WIN 0;

// That's all I need
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>UTF8 BOM FINDER</title>
<style>
body { font-size: 10px; font-family: Arial, Helvetica, sans-serif; background: #FFF; color: #000; }
.FOUND { color: #F30; font-size: 14px; font-weight: bold; }
</style>
</head>
<body>
<?php

$BOMBED 
= array();
RecursiveFolder($HOME);
echo 
'<h2>These files have UTF8 BOM:</h2><p class="FOUND">';
foreach (
$BOMBED as $utf) { echo substr($utf,30) ."<br />\n"; }
echo 
'</p>';

// Recursive finder
function RecursiveFolder($sHOME) {
  global 
$BOMBED$WIN;

  
$win32 = ($WIN == 1) ? "\\" "/";

  
$folder dir($sHOME);

  
$foundfolders = array();
  while (
$file $folder->read()) {
    if(
$file != "." and $file != "..") {
      if(
filetype($sHOME $win32 $file) == "dir"){
        
$foundfolders[count($foundfolders)] = $sHOME $win32 $file;
      } else {
        
$BOM SearchBOM(file_get_contents($sHOME $win32 $file));
        if (
$BOM$BOMBED[count($BOMBED)] = $sHOME $win32 $file;
      }
    }
  }
  
$folder->close();

  if(
count($foundfolders) > 0) {
    foreach (
$foundfolders as $folder) {
      
RecursiveFolder($folder$win32);
    }
  }
}

// Searching for BOM in files
function SearchBOM($string) {
    if(
substr($string0,3) == pack("CCC",0xef,0xbb,0xbf)) return true;
    return 
false;
}
?>
</body>
</html>
7. Now to remove the offending BOMs; we didn’t have so many infected files so we did it by hand. We used a text editor called BabelPad that lets you save the file with or without the BOM. http://www.babelstone.co.uk/Software/BabelPad.html

Having removed all the BOMs everything on the site compiles and runs without problem.

Hope we can save you the time it took us to identify and solve this problem.

Last edited by Ben; 1st September 2011 at 19:23.
Reply With Quote
The Following 2 Users Say Thank You to Brian_A For This Useful Post:
Ben (1st September 2011), falko (1st September 2011)
Sponsored Links
  #2  
Old 1st September 2011, 19:26
Ben Ben is offline
Moderator
 
Join Date: Jul 2006
Posts: 1,029
Thanks: 7
Thanked 62 Times in 56 Posts
Default

Just wrapped your code in php vbb tags

Besides this, good work!
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 05:41.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.