It is very easy to scap data for academic purpose.
Suppose i want to careta enggstaff Directory how to do it. Let us deep dive into this.
Step 1:
Open Chrome, look for Engg staff detailes, find out the url. Suppose for chemical engg department we found out the following url for display faculty details. Copy the URL in the addressbar.
Chemical Engg. Open this. it will appear likethis.
In Cell A1 type Dept, B1 type URL
Enter E01 in A2, copy the URL : https://annamalaiuniversity.ac.in/E01_factmem.php?dc=E01 in B2 cell
Enter E02 in A3.
enter ="https://annamalaiuniversity.ac.in/"&A2&"_factmem.php?dc="&A2"" in B2.
You will use excel Fill Pattern feature by pulling + sign at the lower bottom of the A2 cell and B2 cell
You will get the following excel contents. Once you convert all the contents as Table
by pressing CNTL + T
Step 3:
Now click Data, From Web on the toolbar. it will open a dialog box and ask for url. Paste the copies url in to the textbox as shown below:
Press ok to get the details.
Click Transform Data to get the screen below.
Please note that unwanted blank columns are deleted in the above snap shot.
Step 4:
Click Home tab and click the Advanced Editor to get the following screen:
In the shown Text area, enter
(URL) as table=>
in the top as shown. Modify the function web.contents(URL) removing actual URL inside the quotes.
Now click Close& Load button.
Now select anywhere in the table displayed. Click Add Column, Custom Column to get window similar to this:
type after = fxTablefetch([URL]).
type after = fxTablefetch([URL]).
Once you invokde like this, by pressing OK button. you get the follwing continue button screen.
Press Continue button to get the following dialog.
Click the Ignore Check Box and Save Button. It will bring all the tables as shown below.
Select Contact Column in the table, Click the Add Column, Column by Example to get the Column 1 Col as shown. Select the the second cell from top in the contact col, you will get
drcktech@rediffmail.com9865356561. In which select only email and past in the first cell of Text before Delimeter Column. You can see the screen similar to the following screen: Click OK to add to the table. Similary add Mobile. Finally you will the screen like this.
Remove URL, Contact Details. Rename new coulmns are as email and Mobile.
You will get sheet similar to the below:
Enjoy Web Scrapping !!!!😊😊😊😊
No comments:
Post a Comment