Parsing the BibTex File

The Parsing is done by going through every character in the text.

for($i = 0 ; $i < strlen($this->content) ; $i++)
where $this->content is the BibTex data. Inside this loop the actual character is stored in $char and the previous character is stored in $lastchar. The $lastchar is needed to detect if a brace is escaped. The amount of opened braces is stored in $open. The boolean $entry is used to specify if the actual character is inside a entry or not. If not the character belongs to a comment. Every character of an entry is appended to $buffer. Finally when an error, in this case unbalanced parenthesis, is detected $valid is set to false. At first the the beginning of an entry is detected:

if( ( 0 == $open ) && ( '@' == $char ) ) {
    $entry = true;
}
Inside an entry the $open should be greater zero. Then every '@' is ignored. Outside an entry the $open has to be Zero (see second item in the Requisites). Then an '@' marks the beginning of an entry and therefore the $entry is set to true. Then an opening brace is detected:

elseif ( $entry && ( '{' == $char ) && ( '\\' != $lastchar ) ) {
    $open++;
}
Opening braces are only counted if they are inside entries and if they are not escaped. If they are escaped the previous character would be '. If an opening brace is detected the value stored in $open is incremented. Finally an closing brace is detected:

elseif ( $entry && ( '}' == $char ) && ( '\\' != $lastchar ) ) {
    $open--;
    if( $open < 0 ) {
        $valid = false;
    }
    if( 0 == $open ) {
        $entry = false;
        $entrydata = $this->_parseEntry($buffer);
        if(!$entrydata) {
            $valid = false;
        } else {
            $this->data[] = $entrydata;
        }
        $buffer = '';
    }
}
Closing braces are only counted if they are inside entries and if they are not escaped. If an closing brace is detected the value stored in $open is decreased. Next the $open is checked. If this value should be less than zero than there are more closing than opening braces which is not possible in a valid BibTex File. If at this point the value stored in $open is equal zero the end of an entry is reached. Then $entry is set to false - we are not inside an entry anymore. The entry stored in $buffer is parsed separately in the private function _parseEntry. If this functions returns false, something went wrong and valid is set to false. It is important to mention that _parseEntry returns false if the opening delimiter of a value does not match the closing. This usually happens if the follow up of braces is not correct. If the entry is parsed correctly it is appended to the array $this->data. And finally the buffer is cleared. There are two final things to do in the loop:

if( $entry ) {
    $buffer .= $char;
}
$lastchar = $char;
If we are inside an entry the actual character is appended to the buffer and the actual character is stored as the previous character in the next cycle. After finishing the loop an error is risen when valid is false. The mesage than is 'Unbalanced parenthesis'. It is important to mention that Comments are simply dropped.

Elmar Pitschke 2006-07-28