Allowing Syntax Higlighting In Comments

I have installed the SyntaxHighlighter Evolved plugin to allow me to post snippets of code to my blog. However (as far as I can tell) it does not natively support allowing syntax highlighting in comments. As I was reviewing the old posts pulled in from my Drupal site, and in particular the comments that I had made to update some of the older posts, I realised that I had quoted code in them which would be useful to also highlight. I decided that I would try and add the facility for syntax highlighting to be added to comments in a simple way as possible.

I noticed that that way highlighting seemed to work is that it adds a construct to the <pre> tags that delineate the code that should be highlighted with “class=brush:lang;” where lang is the particular language we are highlighting for. I felt this was a little complex for comments to include in their “semi” html markup, but that it should be simple enough for them to add <pre lang=”lang”> as a construct.

The first step was to allow this as one of the “allowed tags” for the site. So in my functions.php file, in the function called on init I added the following code

global $allowedtags
//Change the allowed comment tags to include <pre> with a brush
$allowedtags[pre] = array(lang => array());

The next step is to give the commenter some extra information at the bottom of the post to let them know how to do this. I used the following call, with additional parameter to the comment_form() function in my theme. In my theme, copied from twentyten, this was also in functions.php

<?php comment_form(
  array(comment_notes_after =>
    <p class="form-allowed-tags"> . sprintf(
__( You may use these <abbr title="HyperText Markup Language">HTML</abbr> tags and attributes: %s ),
 <code> . allowed_tags() . </code>
) . </p> . __(<p class="form-allowed-tags">Use the &lt;pre lang="lang"&gt; tag to syntax highlight for language "lang"</p>))); ?>

The last step was to actually change the output from the comment. the comment.php file held this code

<div class="comment-body"><?php
  $comment_text = get_comment_text();
  $comment_text=preg_replace(#<pre.*?lang=(")?([^>"]*)(")?(.*?)>#sm’,
    <pre class=$1brush:$2;$3$4>,$comment_text); //we need to find <pre brush=xxx> and replace it with <pre class="brush:xxx;">
  echo $comment_text;
  unset($comment_text);
?></div>

What we have done is use the preg_replace function to search through the code for <pre lang=”xxx”> tags and replace them with <pre class=”brush:xxx”> tags. It seems to work well. When the page displays the javascript highlighter spots these tags and highlights them like the rest of the page.

Being that I am a newbie at this sort of thing, there are still a couple of issues that worry me. I would be grateful for any comments on the fact.

Firstly, should I be filling the $allowedtags entry with the languages allowed? I don’t know what the entry array in the lang attribute does, might it be possible to verify the allowed tags?

Secondly – am I opening any security holes with the preg_replace function. Could someone use it to inject something nasty into the page. I think it would be quite difficult as my criteria to end the pattern after the lang attribute is either a > or ” character. Which doesn’t leave much scope. However I am still not sure and will need to keep a close eye on it.