The rep file: ebay.rep
The source file:ebay-src.htm
The result file:ebay-res-1.htm
(Above two hTML files need internet connection to view images)
Here's a screenshot of a common ebay page of searching result. As you can see, half of the page is ocuppied with advertisement and information of te website, but what we need is only the data entries listed behind.
Screenshot of the source file: ebay-src.htm
After replaced by Ultra Replace with the rep file, a tidy page you will as below,
Screenshot of the result file: ebay-res-1.htm
See large screenshot of the result file
Steps of using this example
So you will see the result - The just downloaded ebay entries became a tidy table of items, it's your item list. If you can modify the rep file, you can control the style, and if links and images there.
Below is another result file without any image or link, you can even copy it into a excel file.
See large screenshot of a more tidy result without image
View the result HTML file that without image
Here's the source file of the rep file
============================================ Repcmd-ebay 0.1 beta version This is a example replacing command file. For some pages from wikipedia.com Copyright reserved 2008 Wonder Studio ============================================= /// delete all information before the table and impose new style sheet regrep ---^(.|\n)+?<table class="ebItemlist single"[^>]+>--- /// if you want to reserve the head of the table please use this command instead /// rep ---$$begin--- /// to ---<div id="ebContent"><div class="ebFrame">--- with ---<html><head><title>ebay Sample page</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <style> body,p {font:normal verdana 11px;} table {border:solid 2px gray;border-collapse:collapse;} td {border-bottom:solid 1px gray; font:normal 11px black;padding:3px 5px;} h3 {font:bold 11px verdana;color:gray;margin-bottom:3px} </style></head> <body> <table>---; /// delete all information behind the table ///regrep ---<div id="expHigh"><h3>Get more results</h3>(.|\n)+$--- rep ---<div id="paging">--- to ---$$end--- with ---</body></html>---; ///optional: delete all links rep ---<a href="#Enlarge">Enlarge</a>---; regrep ---<a [^>]+>---; rep ---</a>---; /// delete all javascripts regrep ---<script(.|\n)+?</script>---; /// replace image link, only src reserved regrep ---<img[^>]* src="([^">]+?)"[^>]*?>--- with ---<img src="$1">---; /// delete all inner style regrep --- class=[^>]+---; /// delete useless info rep ---<span>---; rep ---</span>---; rep ---<SPAN > ---; rep ---<div>---; rep ---</div>---; /// optional: delete images regrep ---<img[^>]+>---; /// delete empty table cell rep ---<td></td>---; rep ---<td> </td>---;
The command of deleting links and images are optional, you can choose to reserve them or not by comment the lines.
This example is more complex than the previous one of BBC news, besides clipping out the data entries, we have to do more work to get it cleaned, such as cleaning the javascripts inside, handling links and images, wipping out useless class defination inside tags. When you are composing your rep file, you can choose writing more for a better result or less for a easier work.
From this example, you can deal with any pages with table or list of data, convert them into a professional data table, with the extensions of next version HTML Page Cleaner, you can export them into Excel worksheet or database.