are you looking for some automated means to do this or just to be able to import to excel for example?
may are may not be possible. Is the information you are collecting all going to be from the same website.
well here is the deal, because scraping directly from a website would require the exact code of the website to be viewed and the scraper to be programmed for that exact code, if the website changed the code or if you went to a different site the scraper would no longer function.
So option 1 is to create a scraper that pulls from a single site, provided they don't alter their code.
Option 2 is to manually copy and paste each table you want into excel, save that as a csv, then have a file that imports the CSV files, assuming that all of the files have the same number of columns.
you will need a PHP script, and essentially will need to find a specific line of code as the starting point for example say the table starts with something like '<div id="table1">; so you would search the code for that starting point, and essentially look for every <TD></TD> combination and then just extract the data into a DB from that. I could write such a script assuming you pick one site and can give me some more examples.
Also it is a very complex script, I couldn't do it for less than $300.
those 2 sites are in very different formats, so you would need different scripts and that adds to the cost. I am not saying its impossible. But any script I create would depend on a uniform format among sites. And most likely would take the first table it sees on any page.
well that could be done once the data was in the DB. Then you can run whatever queries to do sums and averages for any date range.
The key to understand in this process, is that any site you want to do this for will require its own script. It will usually looks for a piece of code that is unique to the specific site such as an ID# ***** tag name, and use that as a starting point, then just itterate through a table and copy any data to its own mysql database. The tables will need to be predefined in the database with the same number of columns as the table on the website you choose. I assume each site has a different number of columns, so that is why each script would need to be modified in such a way as to work with that which would require a different script for each site.
Which presents you with a couple of options. I set up one script for one site reducing your cost, and you can try to modify it yourself for other sites, but I cannot garantee that the script would work as it is on other sites.
Option 2, increased cost, I create a script for each site you need it for.
also you should note, that any scripts I create run natively in PHP5, and do not require any plugins other then you have a PHP enabled server and a mysql database set up.
There are some out of the box solutions for parsing HTML DOM, but I personally try to stay away from those, it is a little more costly, but in the end it is better because my scripts are standalone and more efficient.
ok, so you will have 2 scripts, one will scrape and populate the data. The second one will display it from the DB. The calculations for sum / average can be preformed by this script that displays the data. The summed data is not stored in the db.
So the first script would need to be run for whatever data you wish to collect, as many times as you need
the 2nd one would run each time you want to display the data.
id like to make a suggestion maybe it would be better to talk on the phone and I can explain everything there and listen to what you need. Shouldn't take more than a few minutes that way and its better then this chat. Let me know if you are interested
havent heard back from you in days. are you no longer interested in this project?
I understand that we all get involved in things, I am happy to accept payment. However before I could quote you a fair price for the completion of your project, I would need to know more details.
i seam to have lost the thread here. But if memory serves, you wanted to take specific information from a specific site, and copy that to a db.
I belive I said that anything set up, would be for 1 specific site as it is currently configured, if the site changes their setup, the code may need to be modified.
that will be tough, because the way this works is that it looks at specific code of a specific site, and it may not be the same on all sites. So using it for different sites may require some significant modification. Also there is no telling if all sites have the same number of columns in their tables so that would make it exceedingly difficult.
I'm still not entirely sure of the purpose of this project. If this information is containing within these sites why not just goto these sites.
so if you want to proceed with this project you could give me one specific url. I will write the program for that specific url. If you say that the website has 100 pages of the same content and it is just a matter of counting from 1 to 100 then a loop can easily be created to do that. However the source will be specific to that site. If at a later time you decide you want to alter it for another site we can address that, but there would be a charge for altering it. You would be welcome to attempt to do so yourself, but the code will be a bit complicated and you may not fully understand it. If you did choose to make your own modifications, I would suggest making backup copy of the code first, in case you break it.