Latest Posts

Alternative Error Messages From CakePHP Custom Validation Methods

Posted in CakePHP by Karl on

Sometimes when validating a model in CakePHP, you need to use a custom validation method:

1
2
3
4
5
6
7
8
9
10
public $validate = array(
    'fieldName' => array(
        'rule' => 'customMethod',
        'message' => 'Default error message'
    ),
);

public function customMethod($check) {
    return $check == 'correct!';
}

This works well, but what if you want your custom method to check for various states and respond with a custom error message? This is very simple to do, but it is not documented in the Book. All you need to do is, instead of returning a boolean from your custom method, simply return the error message as a string. Strings of length > 0 are treated by Cake as errors automatically. That’s it!

1
2
3
4
5
6
7
8
9
public function customMethod($check) {
    if ($check == 'foo') {
        return 'You cannot use foo!';
    } else if ($check == 'bar') {
        return 'bar is not allowed!';
    } else {
        return true;
    }
}

PHP Sodoku Solver Class

Posted in PHP by Karl on

I wrote this small class to solve Sudoku puzzles. It also has a very basic sudoku puzzle generator method. The code uses a backtracking algorithm to solve the puzzles and should be fairly fast even with slower computers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
class Sudoku {
   
    private $_matrix;
   
    public function __construct(array $matrix = null) {
        if (!isset($matrix)) {
            $this->_matrix = $this->_getEmptyMatrix();
        } else {
            $this->_matrix = $matrix;
        }
    }
   
    public function generate() {
        $this->_matrix = $this->_solve($this->_getEmptyMatrix());
        $cells = array_rand(range(0, 80), 30);
        $i = 0;
        foreach ($this->_matrix as &$row) {
            foreach ($row as &$cell) {
                if (!in_array($i++, $cells)) {
                    $cell = null;
                }
            }
        }
        return $this->_matrix;
    }
   
    public function solve() {
        $this->_matrix = $this->_solve($this->_matrix);
        return $this->_matrix;
    }
   
    public function getHtml() {
        echo '<table border="1">';
        for ($row = 0; $row < 9; $row++) {
            echo '<tr>';
            for ($column = 0; $column < 9; $column++) {
                echo '<td>' . $this->_matrix[$row][$column] . '</td>';
            }
            echo '</tr>';
        }
        echo '</table>';
    }
   
    private function _getEmptyMatrix() {
        return array_fill(0, 9, array_fill(0, 9, 0));
    }
   
    private function _solve($matrix) {
        while(true) {
            $options = array();
            foreach ($matrix as $rowIndex => $row) {
                foreach ($row as $columnIndex => $cell) {
                    if (!empty($cell)) {
                        continue;
                    }
                    $permissible = $this->_getPermissible($matrix, $rowIndex, $columnIndex);
                    if (count($permissible) == 0) {
                        return false;
                    }
                    $options[] = array(
                        'rowIndex' => $rowIndex,
                        'columnIndex' => $columnIndex,
                        'permissible' => $permissible
                    );
                }
            }
            if (count($options) == 0) {
                return $matrix;
            }
           
            usort($options, array($this, '_sortOptions'));
           
            if (count($options[0]['permissible']) == 1) {
                $matrix[$options[0]['rowIndex']][$options[0]['columnIndex']] = current($options[0]['permissible']);
                continue;
            }
           
            foreach ($options[0]['permissible'] as $value) {
                $tmp = $matrix;
                $tmp[$options[0]['rowIndex']][$options[0]['columnIndex']] = $value;
                if ($result = $this->_solve($tmp)) {
                    return $result;
                }
            }
           
            return false;
        }
    }
   
    private function _getPermissible($matrix, $rowIndex, $columnIndex) {
        $valid = range(1, 9);
        $invalid = $matrix[$rowIndex];
        for ($i = 0; $i < 9; $i++) {
            $invalid[] = $matrix[$i][$columnIndex];
        }
        $box_row = $rowIndex % 3 == 0 ? $rowIndex : $rowIndex - $rowIndex % 3;
        $box_col = $columnIndex % 3 == 0 ? $columnIndex : $columnIndex - $columnIndex % 3;
        $invalid = array_unique(array_merge(
            $invalid,
            array_slice($matrix[$box_row], $box_col, 3),
            array_slice($matrix[$box_row + 1], $box_col, 3),
            array_slice($matrix[$box_row + 2], $box_col, 3)
        ));
        $valid = array_diff($valid, $invalid);
        shuffle($valid);
        return $valid;
    }
   
    private function _sortOptions($a, $b) {
        $a = count($a['permissible']);
        $b = count($b['permissible']);
        if ($a == $b) {
            return 0;
        }
        return ($a < $b) ? -1 : 1;
    }
   
}

Usage is very simple, you can either pass to the constructor a two dimensional array representing the grid – 9 arrays of 9 elements – and then call the solve() method, or else do not pass anything in and call generate() to create a new puzzle. Either way the results can be displayed as a HTML table by calling the getHtml() method.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$grid = array(
    array(0,0,0,0,0,0,2,0,3),
    array(8,0,7,0,0,0,0,6,0),
    array(0,0,2,6,5,0,0,0,8),
    array(0,3,0,0,0,0,0,0,0),
    array(7,5,0,2,0,0,1,0,0),
    array(0,0,1,0,3,0,5,0,0),
    array(4,0,0,5,0,0,8,7,0),
    array(6,0,0,0,4,2,0,0,0),
    array(0,9,5,0,6,0,0,2,0)
);
$s = new Sudoku($grid);
$s->solve();
echo $s->getHtml();

$s2 = new Sudoku();
$s2->generate();
echo $s2->getHtml();

PHP Regular Expression Fails Silently on Long Strings

Posted in PHP by Karl on

I had an odd bug today which took me a while to track down. I was using preg_replace_callback to match blocks of code in a string and hand them off to Geshi for syntax highlighting. However I found that some blocks which should match, were not being matched. I couldn’t find any explanation for it at all. I was using the following non-greedy-match-anything sub-pattern:

1
(.*?)

There really shouldn’t be any reason why that would fail to match. I gradually started removing bits of text from my string to try to find the cause, and suddenly after a few chunks were gone the pattern matched. I couldn’t see anything in what I had removed which could be causing an issue, so I assumed the string length itself was the issue, and this assumption proved correct.

As of PHP 5.2, a new ini setting was implemented called pcre.backtrack_limit. The documentation is very sparse for this setting, but it basically sets an upper limit on how much data the regular expression engine will trawl through to check dependant characters. This affects things like non-greedy patterns, and I assume lookahead and lookbehind assertions (though I have not tested this). The default value for this setting is a meagre 100000 bytes, or 97KB. Prior to 5.2, this setting did not exist and longer patterns would match without problem. The really annoying thing about all this is that the regex function will just fail silently, leaving you to start madly pulling your hair out while you try to see what could be preventing your pattern from matching. A notice or warning error would have saved me a couple of hours!

The pcre.backtrack_limit setting can be altered either in your php.ini, or at runtime. I set mine to 1MB and have not had any issues.

1
ini_set('pcre.backtrack_limit', '1048576');

Find the System Temp Directory with PHP

Posted in PHP by Karl on

Sometimes it is useful to be able to save a file to the system’s temporary directory in a PHP application. Depending on the platform and it’s configuration however, this directory can be in a variety of places. As of PHP 5.2.1, there is a native function sys_get_temp_dir which will do the job. For earlier versions though, the following function will try to determine the temp directory for you and return its path as a string. If you use it and upgrade later, the code will degrade gracefully and switch to the native function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
if (!function_exists('sys_get_temp_dir')) {
    function sys_get_temp_dir() {
        // check environment variables.
        foreach (array('TMP', 'TEMP', 'TMPDIR') as $env_var) {
            if ($temp = getenv($env_var)) {
                return $temp;
            }
        }
        // test for a temp directory by having PHP create a temporary file.
        $temp = tempnam(__FILE__, '');
        if (file_exists($temp)) {
            unlink($temp);
            return dirname($temp);
        }
        // couldn't find a temp directory.
        return null;
    }
}

Editing Virtualhost Settings in WHM / cPanel

Posted in Apache by Karl on

If you are using WHM and you try to edit your Virtualhost settings in your httpd.conf, you will find the following warning message:

1
# DO NOT EDIT. AUTOMATICALLY GENERATED.  IF YOU NEED TO MAKE A CHANGE PLEASE USE THE INCLUDE FILES.

This is not the most helpful warning message. Your first thought is probably that “the include files” refers to the pre_main, pre_virtualhost and post_virtualhost includes, but obviously they are no use when you are trying to edit the Virtualhost settings themselves.

Looking a bit closer inside the Virtualhost blocks in httpd.conf, the following advice is uncovered:

1
2
# To customize this VirtualHost use an include file at the following location
# Include "/usr/local/apache/conf/userdata/std/2/username/domain.com/*.conf"

This helps a bit, but it’s still not especially clear what you need to do. For one thing this file doesn’t exist, and even if you create it the Include line is commented – and the whole point of this exercise is that the http.conf file cannot be manually edited so you can’t just uncomment it!

For anyone with this problem, hopefully the following should explain the full process needed.

First create the file mentioned inside the Virtualhost block. Note that this will be dependent on your username and domain so don’t just copy and paste the path below!

1
mkdir -p /usr/local/apache/conf/userdata/std/2/username/domain.com

Navigate to the directory you just created:

1
cd /usr/local/apache/conf/userdata/std/2/username/domain.com

Create a configuration file using your editor of choice (filename must end in .conf), and add any settings you need:

1
nano extra.conf

Now to uncomment the include, run the following command:

1
/scripts/ensure_vhost_includes --all-users

This script will uncomment any Include lines in httpd.conf Virtualhost blocks where it finds at least one *.conf file in the relevant directory. It will restart Apache for you so no need to do that – your changes should be immediate.

Configure Apache MaxClients via WHM

Posted in Apache by Karl on

We recently moved to a new server provider at work, and we soon noticed that at around 4pm each day the site would slow to a crawl. At this time of day we get in the region of 50-60 requests per second, but we have a beefy server so this shouldn’t be an issue.

The first request to each domain would sit for ages, anywhere up to 10 seconds, then subsequent requests would be in the order of a couple of hundred miliseconds. This lead me to think that there were no available processes on the server and the request was being queued.

Looking into this, I found that indeed Apache was reporting “150 requests currently being processed, 0 idle workers”.

I had a look at the mod_prefork settings, and they were set to the default values targetted to a much smaller site than ours. In the past I’d just have edited http.conf, rebooted and been sorted, but the new server uses WHM which means straight editing of the apache conf won’t stick – WHM will just overwrite any changes.

Having never used WHM before, I googled the issue and came accross a post advocating using the pre-main include to add these settings, however this does not work since the settings from there are over-written back to the measly defaults. Instead the changes need to be made in the pre-virtualhost include.

Here is a step-by-step guide to changing the relevant settings to allow your site to handle more concurrent clients.

  1. Log into WHM
  2. Under Service Configuration in the left menu, select Apache Configuration.
  3. Click Include Editor
  4. In the Pre-Virtualhost Include section, choose All Versions from the drop down.
  5. Enter your config settings. For our server I used the following which works for me:
    1
    2
    3
    4
    5
    6
    7
    <IfModule prefork.c>
        StartServers 32
        MinSpareServers 10
        MaxSpareServers 30
        ServerLimit 512
        MaxClients 512
    </IfModule>

    One thing to bear in mind here. ServerLimit must come before MaxClients! If you swap them you’ll find that you can never use more than 256 for MaxClients and you will get a warning from Apache:

    1
    2
    3
    WARNING: MaxClients of 512 exceeds ServerLimit value of 256 servers,
     lowering MaxClients to 256.  To increase, please see the ServerLimit
     directive.
  6. Click the Restart Apache button to apply the new config
  7. You can check this is working by clicking Server Status > Apache Status. This should tell you how many requests are being processed. If all has gone well this will be a number less than your MaxClients setting!

One word of caution with this. You need to make sure that the MaxClients setting does not cause so many processes to be spawned that you run out of RAM and the OS starts to use swap space. You can check this by finding the amount of RAM an httpd process uses. I use the following command for this, which gives you the average size of all the httpd processes (obviously change “httpd” for whatever your Apache runs as):

1
ps -ef | grep httpd | grep -v ^root | awk '{ print $2 '} | xargs pmap -d | grep ^mapped: | awk '{ print $4 }' | cut -dK -f1 | awk '{ SUM += $1} END { print SUM/NR"KB" }'

Take this value and divide that into your RAM capacity, after allowing enough for other processes such as the OS and mysql etc. This will give you a rough idea how many apache processes you can afford to spawn. On modern hardware, generally you can run more than enough, but if you find you are causing swap to be used, you will either need to remove unnecessary modules to make the apache process smaller, lower the MaxClients or other memory-related directives, or else throw more hardware at the problem!

To give you an idea though, serving all 512 clients should use approx 1.2GB of RAM with my Apache setup.

CakePHP “empty” Files in Each Directory

Posted in CakePHP by Karl on

If you use CakePHP you have probably noticed that each empty directory contains a file named “empty” which is, unsurprisingly, empty. The purpose of this file is to allow the CakePHP project to be managed via Git, which tracks only files and not directories. In order to add a directory to a project, it must contain a file.

Usually this isn’t a problem, you can just ignore these files, but if you want to revision control your source code (which is always a good idea), you probably don’t want to include these empty files. Here is a quick and easy way to get rid of them before your initial import:

1
2
$ cd /path/to/your/working/directory
$ find . -type f -name 'empty' -print0 | xargs -0 rm -rdf

You should now be a few dozen empty files lighter.

Javascript Resistor Calculation

Posted in Javascript by Karl on

I recently wrote the following javascript class which can be used to calculate information about electronic resistors. There are various online apps which already do this sort of thing, but I thought I would have a go at creating something as it meant I would learn a bit about resistors in the process, and it may even help someone who is interested in looking at how these calculations work as well as getting the results.

The idea is, you create a Resistor object and provide some known information such as the colour bands, or the value, or the SMT code. The object will then calculate any missing information based on what you have provided. For instance, if you only have the resistor in front of you, you can set the object’s colour bands and it will calculate the resistor’s value, tolerance, temperature coefficient and SMT code.

Here is the class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
function Resistor() {
   
    this.smtCode = null;
    this.value = 0;
    this.scaledUnit = null;
    this.scaledValue = 0;
    this.tolerance = 0;
    this.tempCoefficient = 0;
    this.bands = new Array();
   
    var that = this;
    var colorMap = {
        black:  {color: "black",  value: 0,    multiplier: 1,      tolerance: null, tempCoefficient: null},
        brown:  {color: "brown",  value: 1,    multiplier: 1e1,    tolerance: 1,    tempCoefficient: 100},
        red:    {color: "red",    value: 2,    multiplier: 1e2,    tolerance: 2,    tempCoefficient: 50},
        orange: {color: "orange", value: 3,    multiplier: 1e3,    tolerance: null, tempCoefficient: 15},
        yellow: {color: "yellow", value: 4,    multiplier: 1e4,    tolerance: null, tempCoefficient: 25},
        green:  {color: "green",  value: 5,    multiplier: 1e5,    tolerance: 0.5,  tempCoefficient: null},
        blue:   {color: "blue",   value: 6,    multiplier: 1e6,    tolerance: 0.25, tempCoefficient: null},
        violet: {color: "violet", value: 7,    multiplier: 1e7,    tolerance: 0.1,  tempCoefficient: null},
        gray:   {color: "gray",   value: 8,    multiplier: 1e8,    tolerance: 0.05, tempCoefficient: null},
        white:  {color: "white",  value: 9,    multiplier: 1e9,    tolerance: null, tempCoefficient: null},
        gold:   {color: "gold",   value: null, multiplier: 1e-1,   tolerance: 5,    tempCoefficient: null},
        silver: {color: "silver", value: null, multiplier: 1e-2,   tolerance: 10,   tempCoefficient: null},
        none:   {color: "none",   value: null, multiplier: 1,      tolerance: 20,   tempCoefficient: null}
    };
    var unitMap = new Array("pΩ", "nΩ", "µΩ", "mΩ", "Ω", "kΩ", "MΩ", "GΩ", "TΩ");

    this.setColorBands = function(colors) {
        for (index in colors) {
            if (typeof colorMap[colors[index]] != "undefined") {
                this.bands.push(colorMap[colors[index]]);
            }
        }
    }
   
    this.setSmtCode = function(code) {
        this.smtCode = String(code);
    }
   
    this.setValue = function(value) {
        this.value = Number(value);
        this.scaledUnit = calculateScaledUnit(this.value);
        this.scaledValue = calculateScaledValue(this.value);
    }
   
    this.calculate = function() {
        if (!this.value > 0) {
            this.setValue(calculateValue());
        }
        if (this.smtCode == null) {
            this.setSmtCode(calculateSmtCode(this.value));
        }
        if (!this.bands.length > 0) {
            this.setColorBands(calculateBands(this.value));
        }
        if (this.bands.length > 4) {
            this.tolerance = this.bands[4].tolerance;
            if (this.bands.length == 6) {
                this.tempCoefficient = this.bands[5].tempCoefficient;
            }
        }
    }
   
    this.toString = function() {
        var string = "Value: " + this.scaledValue + this.scaledUnit + " - ";
        string += "Colours: ";
        for (i in this.bands) {
            string += this.bands[i].color + ", ";
        }
        string = string.slice(0, -2) + " - ";
        string += "SMT Code: " + this.smtCode + " - ";
        string += "Tolerance: " + (this.tolerance == 0 ? "Unknown" : this.tolerance + "%") + " - ";
        string += "Temperature Coefficient: " + (this.tempCoefficient == 0 ? "Unknown" : this.tempCoefficient + "ppm");
        return string;
    }
   
    var calculateValue = function() {
        if (that.bands.length > 3) {
            return calculateValueViaBands(that.bands);
        }
        if (that.smtCode != null) {
            return calculateValueViaSmtCode(that.smtCode);
        }
        return false;
    }
   
    var calculateValueViaBands = function(bands) {
        var value = String(bands[0].value) + String(bands[1].value);
        var multiplier = bands[2].multiplier;
        if (bands.length > 4) {
            value += String(bands[2].value);
            multiplier = bands[3].multiplier;
        }
        return Number(value) * multiplier;
    }
   
    var calculateValueViaSmtCode = function(smtCode) {
        var value;
        if (smtCode.indexOf("R") > -1) {
            value = smtCode.replace(/R/i, ".");
        } else {
            var multiplier = 1;
            switch (smtCode.length) {
                case 2:
                    value = smtCode;
                    break;
                case 3:
                    value = smtCode.substring(0, 2);
                    multiplier = smtCode.substring(2);
                    break;
                case 4:
                    value = smtCode.substring(0, 3);
                    multiplier = smtCode.substring(3);
                    break
                default:
                    return false;
            }
            value = Number(value) * Math.pow(10, Number(multiplier));
        }
        return value;
    }
     
    var calculateBands = function(value) {
        var multiplier = Math.pow(10, String(value).length - 2);
        var values = String(value / multiplier).split("");
        var result = new Array();
        for (i in colorMap) {
            if (colorMap[i].multiplier == multiplier) {
                multiplier = i;
                break;
            }
        }
        for (i in values) {
            for (j in colorMap) {
                if (colorMap[j].value == Number(values[i])) {
                    result.push(j);
                }
            }
        }
        result.push(multiplier);
        return result;
    }
   
    var calculateSmtCode = function(value) {
        var indeces = calculateBands(value);
        var multiplier = indeces.pop();
        var lastDigit = (Math.log(colorMap[multiplier].multiplier) / Math.LN10);
        if (lastDigit > 9) {
            return false;
        }
        var code = "";
        for (i in indeces) {
            code += colorMap[indeces[i]].value;
        }
        return code + lastDigit;
    }
   
    var calculateScaledValue = function(value) {
        return value/Math.pow(1000, getExponent(value));
    }
   
    var calculateScaledUnit = function(value) {
        if (value == 0) {
            return "Ω";
        }
        var unit = unitMap[getExponent(value) + 4];
        if (typeof unit == "undefined") {
            return false;
        }
        return unit;
    }
   
    var getExponent = function(value) {
        return Math.floor(Math.log(value) / Math.log(1000));
    }
   
}

It should be pretty self explanatory, but here are a few simple very examples of usage (these are really just tests and will only work in firefox due to the console.log and String::toString methods):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
var resistor = new Resistor();
resistor.setColorBands(["brown", "black", "black", "black", "brown", "yellow"]);
resistor.calculate();
console.log(resistor.toString());
// Value: 100Ω - Colours: brown, black, black, black, brown, yellow - SMT Code: 101 - Tolerance: 1% - Temperature Coefficient: 25ppm
       
var smtResistor = new Resistor();
smtResistor.setSmtCode("101");
smtResistor.calculate();
console.log(smtResistor.toString());
// Value: 100Ω - Colours: brown, black, brown - SMT Code: 101 - Tolerance: Unknown - Temperature Coefficient: Unknown
     
var knownResistor = new Resistor();
knownResistor.setValue(220);
knownResistor.calculate();
console.log(knownResistor.toString());
// Value: 220Ω - Colours: red, red, brown - SMT Code: 221 - Tolerance: Unknown - Temperature Coefficient: Unknown

Tidy PHP: Fatal Error: Class ‘tidy’ Not Found

Posted in PHP by Karl on

I was playing around with Tidy on my development machine at work, however even simple examples copied directly from the PHP manual were giving me errors such as:

1
Fatal error: Class 'tidy' not found in /var/www/dev.test.domain.org/tidy.php on line 149

I checked my PHP version, which is 5.2.6, and made sure I had the php5-tidy Ubuntu package installed. All was well on those fronts, yet I still had the problem.

I thought maybe the extension wasn’t loading, but running the following confirmed that it was indeed loaded:

1
<?php echo extension_loaded('tidy') ? "LOADED" : "NOT LOADED" ?>

I tested a bit further using the procedural syntax. More oddness ensued. I used the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<?php ob_start() ?>

<html>
  <head>
   <title>test</title>
  </head>
  <body>
   <p>error<br>another ĨńtêrʼnåtȉΌnժlizǽtioǸ line</i>
  </body>
</html>

<?php

$buffer = ob_get_clean();

$tidy_config = array(
    'clean' => true,
    'output-xhtml' => true,
    'wrap' => 200,
);

$tidy = tidy_parse_string($buffer, $tidy_config, 'UTF8');
$tidy->cleanRepair();
echo $tidy;

?>

Which resulted in the following error:

1
Warning: tidy_parse_string() expects exactly 1 parameter, 3 given in /var/www/dev.test.domain.org/tidy.php on line 147

This made me suspicious. The fact that the tidy_parse_string function existed but did not expect the documented number of parameters made me suspect I had somehow got the wrong version installed. Sure enough, on further inspection I found I had at some point in the past installed the PECL version of tidy, which is 1.2. This was obviously taking precedence over the version installed via the php5-tidy package.

So the fix was pretty simple:

Remove the PECL version of Tidy:

1
# pecl uninstall tidy

For good measure I uninstalled the php5-tidy package too, but this is probably unnecessary:

1
# apt-get remove php5-tidy

Restart Apache (again probably not necessary at this point but I did it anyway):

1
# apache2ctl restart

Re-installed the php5-tidy package:

1
# apt-get install php5-tidy

Restart Apache:

1
# apache2ctl restart

After this everything worked as expected.

Replace HTML Special Characters With Entities – But Without Touching Tags

Posted in PHP by Karl on

I came a across a problem during the development of a CMS at work where I had to take a string of HTML source code and make sure all special html characters are replaced with their entities. For example, & (ampersand) should become &amp;.

PHP has a couple of useful functions for this sort of thing, namely htmlentities and htmlspecialchars. However running my string through either of these was no good to me because doing so would convert the characters used in the html tags too. For example, the following:

1
<p class="foo">This is a paragraph & that ampersand needs fixing</p>

Would become:

1
&lt;p class="foo"&gt;This is a paragraph &amp; that ampersand needs fixing&lt;/p&gt;

The ampersand is converted nicely, but now the HTML is useless. The first thought that struck me was to parse the string using php’s XML parser in order to get at the cdata directly, but of course that idea didn’t last long since the very characters I was trying to fix would have broken the parser.

In the end I settled on using a regular expression to match content in between tags, but leave the tags themselves alone. I also added some functionality to leave anything between tags along so I could pass though HTML with embedded PHP and not have it break.

Here is the function. It is coded to work with UTF-8, hence the multibyte functions and the /u modifier on the regex, but if you are working with a single byte character set you can just swap this out accordingly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<?php
function clean_entities($string) {
   
    $string = htmlspecialchars_decode($string);
   
    $parts = preg_split('/(<\?.*?\?>)/us', $string, -1, PREG_SPLIT_DELIM_CAPTURE);
   
    $string = '';
   
    foreach ($parts as $part) {
        if (false === mb_strpos(trim($part), '<?')) {
            $string .= preg_replace_callback(
                '/(?<=\>)((?![<](\?|\/)*[a-z][^>]*[>]).)+/ius',
                create_function(
                    '$matches',
                    'return htmlspecialchars($matches[0]);'
                ),
                $part
            );
        } else {
            $string .= $part;
        }
    }
   
    return $string;
   
}
?>

This results in nice valid entities, but the tags and any embedded php are left alone:

1
<p class="foo">This is a paragraph &amp; that ampersand <?php echo "has been" ?> fixed!</p>