How JustAnswer Works:

  • Ask an Expert
    Experts are full of valuable knowledge and are ready to help with any question. Credentials confirmed by a Fortune 500 verification firm.
  • Get a Professional Answer
    Via email, text message, or notification as you wait on our site.
    Ask follow up questions if you need to.
  • 100% Satisfaction Guarantee
    Rate the answer you receive.

Ask The-PC-Guy Your Own Question

The-PC-Guy
The-PC-Guy, Software Engineer
Category: Programming
Satisfied Customers: 1844
Experience:  Extensive Knowledge in PHP, MYSQL, CSS & Javascript
62934938
Type Your Programming Question Here...
The-PC-Guy is online now
A new question is answered every 9 seconds

I am seeking best way to download information stored in a

Customer Question

I am seeking for the best way to download information stored in a HTML table (such as a League Standings table) to a format that can be stored, sorted, filtered, and have math operations run on.
For example:
http://www.justsportsstats.com/hockeystandings.php?league=NHA&year=1909
I would like to take that table and export it to csv or mysql so that I can run various operations on it. So I could take the 'W' column and sum it, or run a sum on the 'W' and 'L' and 'T' columns to insert a GP (Games Played) column.
The purpose would be to then add this data to a HTML page or ideally Wordpress page.
Submitted: 1 year ago.
Category: Programming
Expert:  The-PC-Guy replied 1 year ago.

are you looking for some automated means to do this or just to be able to import to excel for example?

Customer: replied 1 year ago.
In thinking further on this, I would like to see an automated means to get the info (with me identifying the pages to scrape) and then info being compiled into a CSV, XML, MYSQL.
I am fine with data being to imported in Excel as long as I can also run some filtering and then save the results so it can be easily viewed.
One website that inspires me is http://www.hockeydb.com/scoreboard/ where you can drill down into leagues, teams, players. But I believe that would be too ambitious right now. I am just looking for ways to collect League Standings (or other tables as I see fit) and then be able to run operations on the data so I can come up with new insights on my website.
Expert:  The-PC-Guy replied 1 year ago.

may are may not be possible. Is the information you are collecting all going to be from the same website.

Customer: replied 1 year ago.
I am wanting to get the information from one website primarily, but also want to be able to edit the code to go to other websites as well. I am comfortable making the edits as long as I have instructions.
Expert:  The-PC-Guy replied 1 year ago.

well here is the deal, because scraping directly from a website would require the exact code of the website to be viewed and the scraper to be programmed for that exact code, if the website changed the code or if you went to a different site the scraper would no longer function.

So option 1 is to create a scraper that pulls from a single site, provided they don't alter their code.

Option 2 is to manually copy and paste each table you want into excel, save that as a csv, then have a file that imports the CSV files, assuming that all of the files have the same number of columns.

Customer: replied 1 year ago.
I have already done Option 2 - and its pretty slow. This does not include running algorithms on it.If I was to do Option 1, how hard would it be for me to change the code assuming that things like sports league standings do not change too much (essentially team name, games, wins, etc.)? I have seen some online websites that basically convert tables into CSV, but I am wanting to automate it. I am imagining a script where I would tell it to look in a website address and download all tables that are x columns long, etc.
Expert:  The-PC-Guy replied 1 year ago.

you will need a PHP script, and essentially will need to find a specific line of code as the starting point for example say the table starts with something like '<div id="table1">; so you would search the code for that starting point, and essentially look for every <TD></TD> combination and then just extract the data into a DB from that. I could write such a script assuming you pick one site and can give me some more examples.

Also it is a very complex script, I couldn't do it for less than $300.

Customer: replied 1 year ago.
Here is some examples of websites I would like to pull data from:http://www.hockeydb.com/ihdb/stats/leagues/seasons/nhl19272015.html - League standings, but I would also like to get league standings for other sports leagues like MLB, NFL like here
http://www.pro-football-reference.com/years/2015/You can see a script similar to what I am wanting here:
http://www.convertcsv.com/html-table-to-csv.htmI am wanting something like that but actually will go through the website for tables of my choosing. Is that possible for it to spider and scrape a site?
Expert:  The-PC-Guy replied 1 year ago.

those 2 sites are in very different formats, so you would need different scripts and that adds to the cost. I am not saying its impossible. But any script I create would depend on a uniform format among sites. And most likely would take the first table it sees on any page.

Customer: replied 1 year ago.
Okay, so the script you create would take the first table found. What about an additional script that allows me to create filters to do things like running sum on columns, averages, pulling team data across years?
Expert:  The-PC-Guy replied 1 year ago.

well that could be done once the data was in the DB. Then you can run whatever queries to do sums and averages for any date range.

The key to understand in this process, is that any site you want to do this for will require its own script. It will usually looks for a piece of code that is unique to the specific site such as an ID# ***** tag name, and use that as a starting point, then just itterate through a table and copy any data to its own mysql database. The tables will need to be predefined in the database with the same number of columns as the table on the website you choose. I assume each site has a different number of columns, so that is why each script would need to be modified in such a way as to work with that which would require a different script for each site.

Which presents you with a couple of options. I set up one script for one site reducing your cost, and you can try to modify it yourself for other sites, but I cannot garantee that the script would work as it is on other sites.

Option 2, increased cost, I create a script for each site you need it for.

Expert:  The-PC-Guy replied 1 year ago.

also you should note, that any scripts I create run natively in PHP5, and do not require any plugins other then you have a PHP enabled server and a mysql database set up.

There are some out of the box solutions for parsing HTML DOM, but I personally try to stay away from those, it is a little more costly, but in the end it is better because my scripts are standalone and more efficient.

Customer: replied 1 year ago.
Would the queries that are run on the data need to be run each time I want to see a sum of a column or would the query just run once and insert the new value (say a sum of a column) in the MySQL?I know the data I want to collect, but don't know how it should be structured in a database. Would you know how to do that in addition to the script?
Customer: replied 1 year ago.
I have a PHP5 shared host that can run that script.Also, how does the data contained in the MySQL get displayed? Would the script be able to show the data that I tell it to?
Expert:  The-PC-Guy replied 1 year ago.

ok, so you will have 2 scripts, one will scrape and populate the data. The second one will display it from the DB. The calculations for sum / average can be preformed by this script that displays the data. The summed data is not stored in the db.

So the first script would need to be run for whatever data you wish to collect, as many times as you need

the 2nd one would run each time you want to display the data.

Expert:  The-PC-Guy replied 1 year ago.

id like to make a suggestion maybe it would be better to talk on the phone and I can explain everything there and listen to what you need. Shouldn't take more than a few minutes that way and its better then this chat. Let me know if you are interested

Customer: replied 1 year ago.
I think talking on the phone is good, but let me do some draft mockups of what I am thinking so I can demonstrate as much as possible what I am thinking of. You have been a big help so far.
Expert:  The-PC-Guy replied 1 year ago.

ok

Expert:  The-PC-Guy replied 1 year ago.

havent heard back from you in days. are you no longer interested in this project?

Expert:  The-PC-Guy replied 1 year ago.

havent heard back from you in days. are you no longer interested in this project?

Customer: replied 11 months ago.
Hi there. I am sorry, but I got sidetracked with doing more research on what exactly I am asking for. I would like to send you the payment as you have been very helpful.
Expert:  The-PC-Guy replied 11 months ago.

I understand that we all get involved in things, I am happy to accept payment. However before I could quote you a fair price for the completion of your project, I would need to know more details.

Customer: replied 11 months ago.
Okay, I would be happy with that - getting a quote would be helpful.Based on what I already shared what more info do you want?
Should we communicate live? Is that through Skype?
Expert:  The-PC-Guy replied 11 months ago.

i seam to have lost the thread here. But if memory serves, you wanted to take specific information from a specific site, and copy that to a db.

I belive I said that anything set up, would be for 1 specific site as it is currently configured, if the site changes their setup, the code may need to be modified.

Customer: replied 11 months ago.
that is correct. I know some basic PHP and can change code as long as I know what to change. Do you think it would be possible to give me some pointers on how I modify the script to fit changes to a website's setup? For example, maybe the headings would change or something else.
Expert:  The-PC-Guy replied 11 months ago.

that will be tough, because the way this works is that it looks at specific code of a specific site, and it may not be the same on all sites. So using it for different sites may require some significant modification. Also there is no telling if all sites have the same number of columns in their tables so that would make it exceedingly difficult.

I'm still not entirely sure of the purpose of this project. If this information is containing within these sites why not just goto these sites.

Expert:  The-PC-Guy replied 11 months ago.

so if you want to proceed with this project you could give me one specific url. I will write the program for that specific url. If you say that the website has 100 pages of the same content and it is just a matter of counting from 1 to 100 then a loop can easily be created to do that. However the source will be specific to that site. If at a later time you decide you want to alter it for another site we can address that, but there would be a charge for altering it. You would be welcome to attempt to do so yourself, but the code will be a bit complicated and you may not fully understand it. If you did choose to make your own modifications, I would suggest making backup copy of the code first, in case you break it.