17 Aug, 2007, Guest wrote in the 1st comment:
Votes: 0
This may need better explanation at some point, if so, feel free to say where.

I'm trying to come up with some code that will allow site administrators to create custom bbcode tags for their sites. I figure I need basically this:

* DB table to store them in. With the bbcode tag name and HTML translation of the tag. This is easy, I created it inside of 30 seconds :)
* A way for users to input their tag names, eg: text[/blah]. Stuck on this. Not sure how to gather the input from the user.
* Second input area to take the HTML translation, eg: <font face="$1">$2</font> where $1 is the font name, $2 is user text. This part I think I can handle ok. The person inputting the code is going to need to know to format it that way anyway.
* A function to query the bbcode table and then go over them and translate as needed. Again, pretty easy stuff.

So if someone could lend a hand on the second part. If it was a simple tag like text I could just have the user input the "b" and let the code assume the rest during translation. My problem comes in wanting to enter stuff like the font tag that you need a tag to specify what font you want. My guess would be this requires regex, in which case I'm going to be in a heap of trouble since I don't understand it.

I'd also rather avoid digging through code in another forum package to figure it out.
17 Aug, 2007, Scandum wrote in the 2nd comment:
Votes: 0
Basic regexp isn't overly complicated:

I'm sure you can take it from there.

int regexp(char *exp, char *str)
{
short cnt;

for ( ; *exp != 0 ; exp++, str++)
{
switch (*exp)
{
case '?':
if (*str == 0)
{
return FALSE;
}
break;

case '*':
for (cnt = 0 ; str[cnt] ; cnt++)
{
if (regexp(exp + 1, &str[cnt]))
{
// copy cnt chars to string or something
return TRUE;
}
}
return FALSE;

case '\\':
if (*++exp != *str)
{
return FALSE;

default:
if (tolower(*exp) != tolower(*str))
{
return FALSE;
}
break;
}
}
return (*exp == *str);
}
17 Aug, 2007, Guest wrote in the 3rd comment:
Votes: 0
I suppose I probably should have been more clear in this, but what I need is in PHP. Basic regex may not seem complicated to you or others who know it well, but it's gibberish to me :)
17 Aug, 2007, Scandum wrote in the 4th comment:
Votes: 0
Samson said:
I suppose I probably should have been more clear in this, but what I need is in PHP. Basic regex may not seem complicated to you or others who know it well, but it's gibberish to me :)

Well, recursion is gibberish to me as well, but I grasp the basic concept, and if it works it works :)

Didn't have anything better to do, so here's some php code:

<?php

$var = array("0", "1", "2", "3", "4", "5", "6", "7", "8", "9");

function regexp($exp, $str)
{
global $var;

$num = 0;
$cnt = 0;
$tnc = 0;

while ($exp[$cnt])
{
switch ($exp[$cnt])
{
case '%':
$num = $exp[$cnt+1];

for ($len = strlen($str) ; $len >= $tnc ; $tnc++)
{
if (regexp(substr($exp, $cnt+2), substr($str, $tnc)) == "1")
{
$var[$num] = substr($str, $cnt, $tnc - $cnt);

return "1";
}
}
return "0";

case '\\':
if (strtolower($exp[$cnt]) != strtolower($str[$tnc]))
{
return "0";
}
break;

default:
if (strtolower($exp[$cnt]) != strtolower($str[$tnc]))
{
return "0";
}
break;
}
$cnt++;
$tnc++;
}

return (strtolower($exp[$cnt]) == strtolower($str[$tnc]) ? "1" : "0");
}

function substitute($input)
{
global $var;
$output = "";

for ($cnt = 0 ; $input[$cnt] ; $cnt++)
{
if ($input[$cnt] == "%")
{
$cnt++;
$num = $input[$cnt];
$output = $output . $var[$num];
}
else
{
$output = $output . $input[$cnt];
}
}
return $output;
}

if (regexp("[font='%1']%2[/font]", "[font='test']hiyas[/font]"))
{
echo "%1: $var[1] %2:$var[2]<br>";

echo "substitute: " . substitute("[bla='%1']%2[/bla]");
}
else
{
echo "no match";
}
?>
18 Aug, 2007, Scandum wrote in the 5th comment:
Votes: 0
Figured to finish it up since the above was an overly rough draft.

The following (works stand alone) should get the job done.

<?php

$bbcodes = array(
array("[b]", "[/b]", "[b]%5[/b]", "<b>%5</b>"),

array("", "<b>%5</b>"),

array("[font size", "[/font]", "[font size='%0']%5[/font]", "<font size='%0'>%5</font>"),

array("", "", "", "")
);


$var = array("0", "1", "2", "3", "4", "5", "6", "7", "8", "9");

function match($exp, $str)
{
global $var;

$expcnt = 0;
$strcnt = 0;

while ($expcnt < strlen($exp))
{
switch ($exp[$expcnt])
{
case '%':
$num = $exp[$expcnt+1];

if ($num <= 5)
{
for ($strlen = $strcnt ; $strlen < strlen($str) ; $strlen++)
{
if (match(substr($exp, $expcnt+2), substr($str, $strlen)) == "1")
{
$var[$num] = substr($str, $strcnt, $strlen - $strcnt);

return "1";
}
}
}
else
{
for ($strlen = strlen($str) ; $strlen >= $strcnt ; $strlen–)
{
if (match(substr($exp, $expcnt+2), substr($str, $strlen)) == "1")
{
$var[$num] = substr($str, $strcnt, $strlen - $strcnt);

return "1";
}
}
}
return "0";

default:
if (strcasecmp(substr($exp, $expcnt, 1), substr($str, $strcnt, 1)))
{
return "0";
}
break;
}
$expcnt++;
$strcnt++;
}

if (strlen($exp) == strlen($str))
{
return "1";
}
else
{
return "0";
}
}

function substitute($input)
{
global $var;
$output = "";

for ($cnt = 0 ; $cnt < strlen($input) ; $cnt++)
{
if ($input[$cnt] == "%")
{
$cnt++;
$num = $input[$cnt];
$output = $output . $var[$num];
}
else
{
$output = $output . $input[$cnt];
}
}
return $output;
}

function bblicious($text)
{
global $bbcodes;

$level = 0;

for ($ind = 0 ; $bbcodes[$ind][0] ; $ind++)
{
if (strncasecmp($text, $bbcodes[$ind][0], strlen($bbcodes[$ind][0])) == 0)
{
break;
}
}

if (strcmp($bbcodes[$ind][0], "") == 0)
{
return "error";
}

for ($cnt = 0 ; $cnt < strlen($text) ; $cnt++)
{
if (strncasecmp(substr($text, $cnt), $bbcodes[$ind][0], strlen($bbcodes[$ind][0])) == 0)
{
$level++;
}
else if (!strncasecmp(substr($text, $cnt), $bbcodes[$ind][1], strlen($bbcodes[$ind][1])))
{
$level–;

if ($level == 0)
{
$cnt += strlen($bbcodes[$ind][1]);

break;
}
}
}

if ($level != 0)
{
return "error";
}

if (match($bbcodes[$ind][2], substr($text, 0, $cnt)))
{
return substitute($bbcodes[$ind][3]) . substr($text, $cnt);
}
else
{
return "error";
}
}

function bbfication($text)
{
$cnt = 1;

for ($cnt = 0 ; $cnt < strlen($text) ; $cnt++)
{
if ($text[$cnt] == "[")
{
if (strcmp(bblicious(substr($text, $cnt)), "error"))
{
$text = substr($text, 0, $cnt) . bblicious(substr($text, $cnt));

$cnt–;
}
}
}
return $text;
}

echo "[B]hello[/b]: " . bbfication("[b]hello[/b]") . "<br>";

echo "[fOnt siZe='6']Hello[/font]: " . bbfication("[font size='6']Hello[/font]") . "<br>";
echo "[fOnt siZe='6'][b]hello[/b][/font]: " . bbfication("[font size='6'][b]hello[/b][/font]") . "<br>";

echo "[font size='6']he[font size='3']ll[/font]o[/font]: " . bbfication("[font size='6']he[font size='3']ll[/font]o[/font]") . "<br>";

?>
[/code]
18 Aug, 2007, Davion wrote in the 6th comment:
Votes: 0
We moved away from a regexp parser a few months back because it's has problems nesting things (quotes were funky!). The new parser works to build a DOMtree. re adding a regexp parser (btw, PHP comes with preg_match and preg_replace which would make your code much smaller!) would be a step back. The way we have it bbcodes can interpreted correctly even as an attribute eg.
[quote=[url=http://www.google.ca]Google[/url]]
We own the internet
[/quote]


Of course, that's not a real quote. Anyways… I'm sure just expanding the bbcode handlers array dynamically (go php go!) will do the trick. We can even use templates!
18 Aug, 2007, Scandum wrote in the 7th comment:
Votes: 0
My code probably needs to know when to either take the shortest or longest match. I guess I'll edit my post when I have some time to add support for that.

Using preg_match and preg_replace would save some code, but I'm not sure if regexp can be told to either pick the shortest or longest match.

My parser would deal correctly with urls nested within the quote name, but only if the urls are checked before the quotes. I guess I could have it bb-fy the highest nests first to fix that.

Anywho, are you just bragging or are you gonna give Samson your bb code parser?
18 Aug, 2007, Justice wrote in the 8th comment:
Votes: 0
Regex is capable of handling that with it's quantifiers. I'd have to play with it to be sure, but I think you'd need a greedy quantifier vs a lazy one. One of those areas where a character or two will exhibit vastly different behavior within the pattern matcher.
18 Aug, 2007, Scandum wrote in the 9th comment:
Votes: 0
Made %0 to %4 lazy and %5 to %9 greedy which should settle that.

The standard regexp syntax gives me road rage :)
18 Aug, 2007, Davion wrote in the 10th comment:
Votes: 0
Scandum said:
Anywho, are you just bragging or are you gonna give Samson your bb code parser?

The boards already use the parser. It was commited to Quicksilver forums 1.3.0 or 1.3.1, can't remember…
See…


Click your name, should throw you to your sourceforge page! Feel free to keep going on the code though, looks interesting… we should also make an easier way of getting the pid for links like that too, might be useful.
18 Aug, 2007, Scandum wrote in the 11th comment:
Votes: 0
Oh I see =)

I think my code gets the nesting right now.

Tags as attributes should work if you execute bbfication twice, once for tags that have no attributes and once for tags that do. Would still mess up with attribute tags within an attribute, but it'd be the easiest way to go about it.

I'd say expanding the bbcode array dynamically will indeed do the trick, don't think my dummy regexp would be of any use to that. It was fun to write though.
21 Aug, 2007, Fizban wrote in the 12th comment:
Votes: 0
Quote
The standard regexp syntax gives me road rage :)


*refuses to use any regexp syntax besides UNIX's.
0.0/12