Data Mining
What is Data Mining?
Data mining is processing of data to extract information. Examples include:
- Process a web site to extract product catalog and cost information, which can then be used to compare prices between different suppliers.
- Process a web site to extract email addresses or web URLs.
- Harvest the data on a web site for your own purposes.
Extracted data is designed to be easily loaded into a database for further analysis.
How does it work in Offline Explorer Enterprise?
If you need to data mine a Web site, you should create a Project and download the desired site to your hard disk. When the download is complete, you should select the Project and choose Tools | Data Mining from the main menu. Offline Explorer Enterprise will use an external utility - TextPipe - to process the downloaded Web site.
How can TextPipe help?
TextPipe can be used to generate an extract from any text data source, including web sites. TextPipe can also be used to perform data cleansing or any additional processing e.g.
- add a header record (e.g. provide column titles for .CSV files)
- remove unwanted data
- replace specific text
- convert line feeds to DOS/Unix/Mac
- expand tabs
- fix capitalization
- convert from EBCDIC to ASCII
- remove multiple whitespace
- remove columns, lines or fields
- remove duplicate records
- sort
- extract email addresses from specific fields
- discard records matching a pattern
- and much more
You may find more information about TextPipe at the Web site: http://www.crystalsoftware.com.au/offlineexplorer.html.
You may download TextPipe from: http://www.crystalsoftware.com.au/textpipe.zip.
You can also run TextPipe automatically when a Project download is complete. Simply add the following line to the URLs field of the Project:
TextPipe=c:\path\filter_filename.fll
To make TextPipe quit after processing downloaded files, add ;/Q at the end:
TextPipe=c:\path\filter_filename.fll;/Q |