The PHP search script
For this website
A micro search engine (quite neat).
Match the search terms
Match $terms and get snippets (if any exist), each with one word on the left of the term(s) and two words to the right, all into the $count variable. As it loops through the pages, search gets up to three snippets with each result, with $count being the number of matches for each result. The results are presented in rows.
$count = preg_match_all("/((\S*\s){0,2})(\b$terms\b)((\s?\S*){0,3})/ui", $alltext, $matches, PREG_SET_ORDER))
Explanation
$alltext is the content of a page. Search loops through all the pages and excludes any that are password-protected.
PREG_SET_ORDER orders results so that $matches[0] is an array of the first set of matches, $matches[1] is an array of the second set of matches, and so on. That is to include the count of matches in each row of results. It would be even better if the results were ordered with the most matches at the top but that seems difficult without slowing it down.
/ / = delimiters
S = non-whitespace character
s = whitespace character
* = 0 or more
? = ungreedy
(.*?) see below:
Eg: (.*?) tells the regex engine: "Match any character zero or more times as few times as possible". The ? after * makes it ungreedy. If you leave it out it will match everything between the first WORD1 and the last WORD2. If you have multiple occurences of WORD2 the ungreedy operator will only match until the first WORD2.
(\S*\s) = 0 or more non-whitespace characters followed by a whitespace character
i.e. a string of characters (a word) followed by a space
{0,2} = match 0-2
(\S*\s){0,2}) = 0 or 2 'words' each followed by a space
(\b$terms\b) = search terms enclosed by a word boundary
(\s?\S*) = the first whitespace character (ungreedily) followed by 0 or more non-whitespace characters
i.e. a space followed by string of characters (a word)
{0,3} = match 0-3
((\s?\S*){0,3}) = 0-3 'words' each preceded by a space
/ui 'u' is the unicode flag and 'i' is the ignore case flag