Daboo Meta Tag Generator Deluxe Robot
David Dienhart
 
       
Dienhart.net
Free Graphics
Free Software
Delphi Applications
PERL Scripts
PERL Snippets
JavaScript Snippets
David Dienhart's Resume
Discussion Groups
Links
Background Tile Viewer   Daboo Backup   Daboo Banner Ad   Daboo Local Search Daboo Meta Tag Generator Deluxe Robot Download Tracker   HTML2PERL   HTML MetaTag Generator   Daboo Local Analysis   Meta Word Spy Demo
 
 

Daboo Meta Tag Generator Deluxe Bot

Version 0.7
Copyright 1997-2004 David Dienhart All Rights Reserved.
Release Date: 1-8-2004
http://www.dienhart.com
 

License Agreement

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

See GNU-GPL_2.html for complete license

 

Files

  • DMetaGenDlx.pl (Spider Script)
  • DMetaGenDlx.exe (Compiled Spider)
  • DMetaGenDlx.html (Documentation)
  • site.txt (search index (autogenerated))
  • GNU-GPL_2.html (License Agreement)
  • progresslog.txt (a list of all pages that were indexed or attempted to be indexed)
 

Requirements

  • Linux, UNIX, or Windows
  • PERL 5.8.4 (Required by DMetaGenDlx.pl)
  • HTML::LinkExtor (Required by DMetaGenDlx.pl)
  • LWP::UserAgent (Required by DMetaGenDlx.pl)
  • HTML::TokeParser (Required by DMetaGenDlx.pl)

* Note: DMetaGenDlx.exe does not require PERL to be installed. It is a standalone Windows executable.

 

Description

  • DMetaGenDlxBot.pl indexes the selected site and generates an index based on the Page Address, and Page Content.
  • DMetaGenDlxBot.exe indexes the selected site and generates an index based on the Page Address, and Page Content and does not require PERL to be installed.
  • The primary purpose for this program is to use the power and ease of PERL to index sites for use with other applications.
 

Setup

There is not much setup required, as DMetaGenDlxBot is designed to run from the command line with variables passed to it.

  • If you are using DMetaGenDlxBot.pl on a UNIX or LINUX box, you will need to CHMOD it to 755.
 

Usage

DMetaGenDlx is executed from the command line as follows:

  • PERL DMetaGenDlx.pl http://yourhost.com (If you choose to run the perl source)
  • DMetaGenDlx.exe http://yourhost.com (If you choose to run the binary)
    • Note: do not type index.html or any specified page, the robot will not function correctly.
  • An ASCII table will be generated (site.txt) containing the URL and the Body of the site.
  • As the spider indexes your site the results are returned to the command prompt window. This is useful if you develop an application that uses DMetaGenDlxBot.exe and you wish to capture and provide feedback.
 

Notes

  • Daboo Meta Tag Generator Deluxe Bot will not index outside of the domain it is set to crawl. This includes sub-domains. If I set DMetaGenDlxBot to index "help.dienhart.com" and there are links within the site to "free.dienhart.com", the spider considers "free.dienhart.com" to be another domain and does not index it.
 

Troubleshooting

  • The Crawler Progress is printed to the screen during execution. The following messages should help you figure out what is going on:
    • ADDED: This indicates links that were successfully indexed and added to searchdb.txt.
    • DUPLICATE NOT ADDED: This indicates that this page has already been added to the index.
    • INVALID LINK NOT ADDED: This includes links outside the domain being spidered and broken links found within the domain.
    • FILTERED NOT ADDED: email links and certain file types are ignored to improve crawler performance.
 

History

0.05 (11-30-2004)
  • Initial Release
  • Based on DLBot.pl Revision 0.04
 
0.7 (1-8-2005)
  • Improved script performance. DMetaGenDlxBot.exe now writes the index to a file as the site is indexed while at the same time it returns the pages that are currently being indexed to the calling program. This interfaces and integrates very well with Delphi.
  OISV - Organization of Independent Software Vendors - Contributing Member
Still can't find what you are looking for, try google.
Google
[Home] [Free Graphics] [Free Software] [Free CAD Drawings] [Resume] [Discussion Forums] [Links]
© David Dienhart All Rights Reserved
Contact: DSupport@Dienhart.com