parser

Building tree structures

Misha v.3 [May 19, 2002]

DISCLAIMER.
This article is not a set of examples which will work right after you copy/paste them. Moreover, I have deliberately added certain syntactical errors to make you think before you use it, instead of just using it blindly. The reason why I did it is because, as long as you have understood what is written in the article, you will be able to easily produce this kind of code without any cheating.
I have also omitted certain subjects not directly related to the subject of this article, which is building tree structures by means of Parser.
Neither will we discuss how to store data in text files, which programs to use to connect to the MySQL database, or how to interpret the queries syntax. Answers to all these questions may be found in the documentations.

To fully understand this article, I recommend that you first read the article on the compatibility of SQL queries.


We have to build and store tree structures quite often: when we make forums, create site structures, or create classifications of goods and store them to the database.

I will not focus on the cases when tree structures are retrieved from the database by means of SQL servers, as long as servers which can do it are rare. For example, MySQL—the most popular due to its being free of charge—cannot do it.

In my example, I will use a simple forum, powered by MySQL, as database server, and Parser 3, as the server’s scripting language.

First, we need to agree on the way the forum messages will be stored in the database.

Let us create simple table and several indexes:

CREATE TABLE forum_message (
	forum_message_id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
	forum_id MEDIUMINT UNSIGNED NOT NULL DEFAULT 1,
	parent_id INTEGER UNSIGNED NOT NULL DEFAULT 0,
	thread_id INTEGER UNSIGNED NOT NULL DEFAULT 0,
	is_published TINYINT(1) UNSIGNED NOT NULL DEFAULT 1,
	dt_published DATETIME NOT NULL,
	author VARCHAR(255) NOT NULL,
	email VARCHAR(255),
	title VARCHAR(255) NOT NULL,
	body TEXT
);
CREATE INDEX ix_forum_message_0 ON forum_message (
	parent_id,
	is_published,
	dt_published
);

CREATE INDEX ix_forum_message_1 ON forum_message (
	thread_id,
	is_published,
	dt_published
);

This table will contain all forum messages.

forum_message_id—the message’s unique identifier.
forum_id—used to let us use one table for storing messages from several forums. So far we will write just 1 to it (if you want, you can develop this example further and this field will contain reference to the appropriate forum ID).
parent_id—reference to the ID of the parent message. For root messages, it will be 0.
thread_id—thread ID. In fact, it has forum_message_id of the root message for all messages of this thread. This field is extra; we need it only to facilitate retrieving data matching the specified criteria.
is_published—tells is the message has to be displayed on site. For example, you may virtually delete this message by simply setting this flag to 0.
dt_published—date/time indicating when the message was published. The field has index to speed up sorting.
author—the name of the one who wrote the message. This field is required.
email—the author’s email address.
title—message title. This is a required field, so it is declared not null. We also add an index to speed up search process.
body—message text.

Now let’s write several methods which will retrieve the data from the database and which we will need later on.

1. the first method will retrieve all data related to a certain message. Its use is not shown in the given example, but you will need it if you decide to expand this example into a full-fledged forum.

@getMessageById[iMessageId]
$result[^table::sql{
	SELECT
		forum_message_id AS id,
		parent_id,
		thread_id,
		title,
		author,
		email,
		dt_published,
		body
	FROM
		forum_message
	WHERE
		forum_message_id = ^iMessageId.int(0)
		AND is_published = 1
}]
#end @getMessageById[]

2. The second method will retrieve all data related to the messages having the same parent.

@getMessagesByParent[iParentId;iLimit]
$result[^table::sql{
	SELECT
		forum_message_id AS id,
		parent_id,
		thread_id,
		title,
		author,
		email,
		dt_published,
		IF(body IS NULL, 1, 0) AS is_empty
	FROM
		forum_message
	WHERE
		is_published = 1
		^if(def $iParentId){
			AND parent_id = ^iParentId.int(0)
		}
	ORDER BY
		dt_published
}[
	^if(^iLimit.int(0)){
		$.limit(^iLimit.int(0))
	}
]]
#end @getMessagesByParent[]

3. The third method will retrieve all data on the titles of those threads which are related to the given thread/list of threads.

@getMessagesByThread[uThreadIds]
$result[^table::sql{
	SELECT
		forum_message_id AS id,
		parent_id,
		thread_id,
		title,
		author,
		email,
		dt_published,
		IF(body IS NULL, 1, 0) AS is_empty
	FROM
		forum_message
	WHERE
		is_published = 1
		^if($uThreadIds is "table"){
			^if($uThreadIds){
				AND thread_id IN (^uThreadIds.menu{$uThreadIds.thread_id}[,])
			}
		}{
			AND thread_id = ^uThreadIds.int(0)
		}
	ORDER BY
		dt_published DESC
}]
#end @getMessagesByThread[]

4. The fourth method will build tree structure and call itself recursively (see below why this method is bad and how you can improve them).

@printMessagesByParent[tMessage;iParentId][tLevel]
$tLevel[^tMessage.select($tMessage.parent_id == $iParentId)]
^tLevel.menu{
	^printTreeItem[$tLevel.fields;^printMessagesByParent[$tMessage;$tLevel.id]]
}
#end @printMessagesByParent[]

5. The fifth method will output one element of the tree structure.

@printTreeItem[hMessage;sBody]
<tree-item
	id="$hMessage.id"
	date="$hMessage.dt_published"
	is-empty="$hMessage.is_empty"
>
	<title>$hMessage.title</title>
	<author>$hMessage.author</author>
	<email>$hMessage.email</email>
	$sBody
</tree-item>
#end @printTreeItem[]

You have certainly noticed that I use strange tags:

<tree-item />
<title />
<author />
etc. These are XML.

My task now is to create tree structure of elements. I don’t care what these elements are: they may be forum trees, site structure, or whatever else. Neither do I care about the number or the type of these elements. My purpose is to organize the data logically.

For example, if one day I decide to add a parameter “link to the message,” I will just make a change like this:

<tree-item
	href="./?id=$tLevel.forum_id"
	id="$tLevel.id"
	date="$tLevel.dt_published"
	is-empty="$tLevel.is_empty"
>

And a person who wants this tree structure to be displayed will just output this element's attribute. This is it. And if next time I decide to include mod_rewrite, which will need all url’s to be changed, all I will have to do is just change the content href="./${tLevel.id}.html"

There’s more to it. Say, in a couple of days we decide to change the output of the forum from the tree-view to a list of root messages, which will display all children messages only when they are referred to. In this case, I won’t have to change anything at all, as long as all information needed for display is already there, in the tree structure I have at hand. The only matter to decide will be how to output it, and in solving this task xsl will do just fine.

Well, this was just a thought of how we can make it. In our case, we’d better simplify it all by changing the content of ^printTreeItem[] like this:

@printTreeItem[hMessage;sBody]
$result[<li>$hMessage.title ^if($hMessage.is_empty){(-)},
	$hMessage.author [$hMessage.dt_published]
	^if(def $sBody){<ul>$sBody</ul>}
</li>]
#end @printTreeItem[]

Actually, after all these methods are written, we won’t need much to create a simple forum:

@main[]
^pSQL.server{
	^rem{ *** get no more than 20 root messages (parent_id == 0) *** }
	$tRootMessage[^getMessagesByParent[0;20]]

	^rem{ *** get all messages in the threads we just retrieved *** }
	$tMessage[^getMessagesByThread[$tRootMessage]]

	^rem{ *** output all data on the messages as a tree *** }
	^printMessagesByParent[$tMessage;0]
}
#end @main[]

That’s it and that’s all. Who said that forum task was complicated? :)

If you studied the method printMessagesByParent closely enough, you most probably came up with a question:
“Why the hell should we fetch the table $tMessage everywhere we go? Why don’t we just make it global?”

Well, we might, but… First, we better look at the following modification of the method’s first line (it’s only useful when we shift from the root element to the first; its totally useless afterwards):

^printTreeItem[$tLevel.fields;^printMessagesByParent[^tMessage.select($tLevel.thread_id == $tMessage.thread_id);$tLevel.id]]
(you better spend some time and figure out why we need it and how it works). Second, it practically doesn’t use Parser’s memory, as long as data is sent by means of linking. At that, we still have an opportunity to introduce extra filtering tools.


However, the tree epic is not all over.

The given example of the recursive bypass is not good at all.

In the given example, it is convenient, but if we needed to check something, for example, if we have an element with a specified id, we would have to do ^locate[] over and over again, which would dramatically slow down the whole process. In this case we’d better do it like this:
1. retrieve all elements of the tree (or just the part we need), just as we did in the first case
2. create the tables’ hash ($hTree), with the element’s id being the key, and the table with its children—being the value.

In this method, we run over all messages just once, running up a check for each one. We check if in our hash there’s an element with the key equal to parent_id of this element. If there is not, we create this element in the hash, after which we add the element to the table with the children of the created object.

@createHashTree[tMessage][tEmpty]
$tEmpty[^table::create[$tMessage][$.limit(0)]]
$result[^hash::create[]]
^tMessage.menu{
	^if(!$result.[$tMessage.parent_id]){
		$result.[$tMessage.parent_id][^table::create[$tEmpty]]
	}
	^result.[$tMessage.parent_id].join[$tMessage][$.limit(1)$.offset[cur]]
}
#end @createHashTree[]

3. during any iteration of the recursive bypass, we may get the children and output them the same way. Here is the way it will look:

@printMessagesByParent[hTree;iParentId][tLevel]
^if($hTree.[$iParentId]){
	^hTree.[$iParentId].menu{
		^printTreeItem[$hTree.[$iParentId].fields;^if($hTree.[$hTree.[$iParentId].id]){^printMessagesByParent[$hTree;$hTree.[$iParentId].id]}]
	}
}
#end @printMessagesByParent[]

Although this method looks more complicated, the end justifies the means. It is almost three times as fast as the methods employing ^select[] and it uses up to twice as little memory.

If the complexity of the method @createHashTree makes you feel uneasy, don’t worry: Parser 3.0.8 and later releases have an internal method creating a hash like this and @createHashTree[] will now look like the following:

@createHashTree[tMessage]
$result[^tMessage.hash[parent_id][$.distinct[tables]]]
#end @createHashTree[]

One more thing is that if you use lib.p, the above-mentioned method is already there, and it fully uses the capabilities of Parser’s newer versions.

You should notice that, as long as the chosen example provides an easy (and very fast) way of checking if there exist elements with specified parent_id (^if($hTree.[$iParentId]){there are children with id equal to $iParentId}{there are no children}, we find it easy to avoid recursion on elements that don’t have children. In the previous example, we could achieve the same result by means of a construction
^if(^tMessage.locate[parent_id;$iParentId]{the element has children with id = $iParentId}{there are no children}). This way, however, wouldn’t be good: in cases when there are many elements in table $messages, such locate routine would be rather slow.

@main[]
^pSQL.server{
	^rem{ *** retrieve no more than 20 root messages (parent_id == 0) *** }
	$tRootMessage[^getMessagesByParent[0;20]]

	^rem{ *** retrieve all messages in threads we just got *** }
	$tMessage[^getMessagesByThread[$tRootMessage]]

	^rem{ *** create a hash with id elements being the keys, and table with all children elements - being the values *** }
	$hTree[^createHashTree[$tMessage]]

	^rem{ *** output all elements as tree *** }
	^printMessagesByParent[$hTree;0]
}

General notes on recursion and methods given here:
1. recursion is by no means the most effective algorithm. When you use it, do not create too many local variables, as long as they will use too much memory when you work with big trees (I mean Parser3).
2. the methods given here as examples are not so effective. Moreover, I omitted some useful features for the sake of simplicity. There are some cases when you need to follow a different logic.
3. I didn’t focus on tools creating dates and analyzing whether the message was created today or not. For this sake you better use SQL tools, for example:

$result[^table::sql{
	SELECT
		forum_message_id AS id,
		…
		email,
		^pSQL.date_format[dt_published;%d.%m %H:%i] AS dt_published,
		IF(dt_published >= ^pSQL.today[], 1, 0) AS is_new
		…
	FROM
		forum_message
	WHERE
		…
	ORDER BY
		dt_published DESC
}]

I tried to get the date from SQL server as it is and format it by means of Parser…

I don’t really like it: although I can use it to form whatever I want, memory and time management is a far cry from that ensured by an SQL server, which is cut out for such operations.

It doesn’t mean that all work should be given up to SQL, as long as use of GROUP BY + ORDER BY may make very simple queries the very straw that breaks the camel’s back…

Before using any query intensively, please turn to your SQL-server documentation for best ways to handle a task, and, if you use MySQL, let EXPLAIN become your favourite command :)

P.S. Great thanks to for many helpful ideas.