some mnogosearch thoughts:
- the documentation is out of date and generally unclear. (at both ends of the scale—it's missing some things that would help a new user get going, and the information to stretch it in more interesting ways is hard to find.)
- the order in which
Disallow
andAllow
configuration options are processed is not documented. (the first rule matched fromindexer.conf
will be used.) - the
minimal
configuration file is so minimal, it lacks the bits that causes page content to actually get indexed. (you need the variousSection
lines that you can grab frometc/indexer.conf
.) - the indexer doesn't have a mode that corresponds to
index the whole site right now.
it only indexes pages that are new or expired when the indexer is started, and records the address of new pages it finds so they will be indexed later. - while it checks
robots.txt
files, it doesn't use the information to avoid storing the url in its table of urls to be visited. (it just deletes it when it goes to index that page and realizes that the robots.txt disallows it.) - oh yeah, the
robots.txt
support is broken in the most recent version. (here is a patch to fix this.) - there's no simple command-line search tool. you have to run the cgi version and deal with the html output.
all that said, the results seem to be pretty good, and searching is fast (using mysql for the backend, of course). once you've figured out how the disallow and allow rules work, they appear to allow for more flexibility than htdig does.
»
»
Comment
Add a comment
Sorry, comments on this post are closed.