{"id":996943,"date":"2021-09-06T17:13:00","date_gmt":"2021-09-06T09:13:00","guid":{"rendered":"https:\/\/geetests.com\/article\/web-scraping-nba-salary"},"modified":"2025-09-15T12:05:56","modified_gmt":"2025-09-15T04:05:56","slug":"web-scraping-nba-salary","status":"publish","type":"post","link":"\/en\/article\/web-scraping-nba-salary","title":{"rendered":"How Does Web Scraping Become Simpler and How To Prevent It? (With Scraping NBA Players&#8217; Salary Example)"},"content":{"rendered":"<div class=\"vgblk-rw-wrapper limit-wrapper\">\n<p class=\"ql-align-justify\">\n<h2 class=\"ql-align-justify\"><strong style=\"background-color: transparent; color: #000000;\">Introduction<\/strong><\/h2>\n<p class=\"ql-align-justify\"><span class=\"ql-font-serif\">\u00a0<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">Web scraping refers to extracting the content of a website programmatically. Specifically, developers create bots to get the HTML code of a website, parse the code and export the result to an external data source.<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">Developers do it for different purposes. Search Engines scrape data from websites and further index it so that we can find information much easily. However, there are quite a lot of bad bots on the internet (<\/span><a style=\"background-color: transparent; color: #1155cc;\" href=\"https:\/\/www.imperva.com\/blog\/bad-bot-report-2021-the-pandemic-of-the-internet\/\" target=\"_blank\" rel=\"noopener noreferrer\">25.6% of all website traffic comes from bad bots<\/a><span style=\"background-color: transparent; color: #000000;\">), these bad bots may try to steal your content, e.g. <\/span><a style=\"background-color: transparent; color: #1155cc;\" href=\"https:\/\/www.wsj.com\/articles\/alibaba-falls-victim-to-chinese-web-crawler-in-large-data-leak-11623774850\" target=\"_blank\" rel=\"noopener noreferrer\">Data Leak in Alibaba&#8217;s Taobao due to web scraping<\/a><span style=\"background-color: transparent; color: #000000;\">.<\/span><\/p>\n<p class=\"ql-align-justify\"><span class=\"ql-font-serif\">\u00a0<\/span><\/p>\n<h2 class=\"ql-align-justify\"><strong style=\"background-color: transparent; color: #000000;\">Web Scraping On NBA Players&#8217; Information<\/strong><\/h2>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">Today, web scraping becomes much easier due to technology advance, which we will illustrate it by a simple example, how to scrape NBA players&#8217; information, e.g. Height, Birthdate, salary.<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">Here&#8217;s the main page of NBA players&#8217; basic information:<\/span><\/p>\n<p class=\"ql-align-justify\"><a style=\"background-color: transparent; color: #000000;\" href=\"https:\/\/hoopshype.com\/salaries\/players.\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/hoopshype.com\/salaries\/players.<\/a><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">We can navigate to another web page that contains each player&#8217;s basic information from this page.<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/6AJsq6J4gbJtoR_A81yQY0UBRIM5FlmtpaDdhIa3XTD6UZyaHWf5CST9Z4hOSaJC8oe2tsl1Ts0o4kMzMarGgUoK3RCSa1Gho7S0v1b1kF0ikrPH01F7IFyQCI4XmpRR8D1XpHY5s0.png\" width=\"499\" height=\"479\" alt=\"\"><\/span><\/p>\n<p class=\"ql-align-justify\"><em style=\"background-color: transparent; color: #000000;\">Main page<\/em><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\"><img decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/HyvKBySdRb4Bpz3Z6NrF-ExF39hXXaoIQs1vZQAbNSfbwMS5340X5VF8Rr4SkHd7bp8flPYn-rYy4fcCKSo79DS5HKADT_Rax-HDFNJC4mII0_yJBMgFtGOkj7xY_1D1fzCiVAses0.png\" width=\"602\" height=\"337\" alt=\"\"><\/span><\/p>\n<p class=\"ql-align-justify\"><em style=\"background-color: transparent; color: #000000;\">Stephen Curry&#8217;s basic information<\/em><\/p>\n<p class=\"ql-align-justify\">\n<h3 class=\"ql-align-justify\"><strong style=\"background-color: transparent; color: #000000;\">Prerequisite<\/strong><\/h3>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">Basic Python &amp; HTML knowledge is required.<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">We will use Python for web scraping, these are Python modules that we will use<\/span><\/p>\n<ol>\n<li class=\"ql-align-justify\"><a style=\"background-color: transparent; color: #000000;\" href=\"https:\/\/selenium-python.readthedocs.io\/\" target=\"_blank\" rel=\"noopener noreferrer\">Selenium<\/a><\/li>\n<li class=\"ql-align-justify\"><a style=\"background-color: transparent; color: #000000;\" href=\"https:\/\/beautiful-soup-4.readthedocs.io\/en\/latest\/\" target=\"_blank\" rel=\"noopener noreferrer\">Beautiful Soup<\/a><\/li>\n<li class=\"ql-align-justify\"><a style=\"background-color: transparent; color: #000000;\" href=\"https:\/\/pandas.pydata.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">Pandas<\/a><\/li>\n<\/ol>\n<h3 class=\"ql-align-justify\"><strong style=\"background-color: transparent; color: #000000;\">Selenium Driver Installation<\/strong><\/h3>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">To let Selenium module functions, we need to install Selenium driver. The driver depends on the operating system of your machine and the version of your web browser.<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">We will illustrate the installation steps for Windows, you may refer to<\/span><a style=\"background-color: transparent; color: #000000;\" href=\"https:\/\/selenium-python.readthedocs.io\/installation.html#drivers\" target=\"_blank\" rel=\"noopener noreferrer\"> https:\/\/selenium-python.readthedocs.io\/installation.html#drivers<\/a><span style=\"background-color: transparent; color: #000000;\"> for more detail.<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent;\">1.Download the zip file containing the chromedriver.exe<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent;\"><img decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/rwFl4eTL4lqv4QiC7D2Gxmuy7I8F82U5P-zbXwz5x9DS-lLyxnbL5MPB8BxP8_NuoCVUQYcNI540m44Vv9JKl8I9evWqJDhsd_435TklmGd0Rk7DDnXm9yPIPk-Kj_R-zXu7pCsKs0.png\" width=\"602\" height=\"83\" alt=\"\"><\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/ZDBJ4Wfl7Juw3__bYeOIw9RdetJqj8OR1aL2Efj40OWkg23KjCLdul3bFFv5UPck-7Z176dnp5vtJtTGftvdg8S8nS1NI1TPoFzuLXh0WsGYLYpNlbwkzQyrzlLuVZI7t9ZDo1n9s0.png\" width=\"602\" height=\"89\" alt=\"\"><\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent;\">2.Unzip the folder. Optionally, you can move the folder to another directory<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent;\">3.Type &#8220;Environment Variables&#8221; in start menu &amp; select &#8220;Edit the system environment variables&#8221;<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/cRqgQTQkaABBs5Tagwm9vpB5y5V1M7Xi2Vt_WBj2koRyvQ0G2dDBtLYi7llubV5OeLKdez2eozbXGaU8LQyTbU5Hy9HMnSwyEtmG7Z5P9MqEeccuaychKLpQ6CmUUnPLNJShX-lis0.png\" width=\"280\" height=\"550\" alt=\"\"><\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent;\">4.Update PATH variable to include the folder path which contains the driver program<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/klKkHnhE034NufOLTSr4mt11LYK4LfYs2mXDN5Kw6C6vyjY5CPxU9fcnCWP2ZFn1s4J0zyCbTob9LUB7rjRa2eDlEbnmCoDmVdPD8idS0ZTZte3LvuNA_bJTN2sK4PVtiOq9J1rqs0.png\" width=\"636\" height=\"478\" alt=\"\"><\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<h3 class=\"ql-align-justify\"><strong style=\"background-color: transparent; color: #000000;\">Get HTML code of the website<\/strong><\/h3>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">First of all, let us try to use Selenium to launch a new browser &amp; get the source code of NBA players&#8217; salary data source.<\/span><\/p>\n<p class=\"ql-align-justify\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/aowqpm6lLgNWLo1ZfzgkndCkxUgRVTbDWRZTDku8KFUtf8UzDaJpMsXinWSDDZqS2orVlMLht2M0zQx_hdpADswX5I8-zB76O3FonmZlayJT1cbQ2yW85rP4UCtwwRDGgcDJjOGMs0.png\" width=\"723\" height=\"212\" alt=\"\"><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">You should get the HTML code as below.<\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/GPJspbSBDIFxZj4dIEOv-pv4r5gLjga2htCZmeIm5hTZXavlb7RTzpAyoY1DQ-mEW1oJtr0HjT2_7_Jwrd8SjrKbJVG2wL-8qjmo-wy1MFJtwg-ojkNkbDlP6g3WqVfd0ZWyVjFxs0.png\" width=\"696\" height=\"263\" alt=\"\"><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">On the other hand, you should see a new browser is launched.<\/span><\/p>\n<p class=\"ql-align-justify\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/5FRUQDqr5HPpGxVzlhtRfNzfSlGKUi7RfFcTaeBWPr8vbeD0fZYl6lq4ORq9WhDrV9FVEPyto4sNLQ0ZE4Hk83dN-3Bc7wZP7oaIETdXnWt6dR640MiDGYDKl-d1wm6dkvOdZMKzs0.png\" width=\"706\" height=\"98\" alt=\"\"><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<h3 class=\"ql-align-justify\"><strong style=\"background-color: transparent; color: #000000;\">Navigate to NBA player&#8217;s basic information page<\/strong><\/h3>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">In order to get each player&#8217;s basic information, we need to navigate to the corresponding page &amp; extract the data. The links of these pages are already in a table of the main page.<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/iV9l9GpnelQTYWDi0zix81vhNp6UMWlyVxUHJD-SxrJlhVR5vvlWsHtDkM0WdfU_qm5lM-JhByCy525rOOKlKrvmUAm92qEZQwGc0eR7otVkhImQMZFBz7WN7v2_ZCLBC-t9cVBVs0.png\" width=\"641\" height=\"296\" alt=\"\"><\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">To get the links in this table, we can find the corresponding HTML elements. We can use the below method to find the HTML element of those links of NBA players&#8217; basic information page.<\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent;\">1.Right click one of the links<\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/BvjERK9s0SzJ6ljjToVmUoTmgY2HmIHVNzuviF4StMdR5U6Zz-M2WsoCyBO4nrvlfPbQfbXyCVTfDv-P4p20SwLVMFrE9Co1UJuyk6MLG1V1UASxSOY2pJ1O1D5WG49qIOCP_dhcs0.png\" width=\"626\" height=\"280\" alt=\"\">2.<\/span><span style=\"background-color: transparent;\">Select &#8220;Inspect&#8221;<\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/BZuQA4mjOO5yy1uxQQc4eWf2VlPC_t5O6HVEyst1GuWsBNiAh9vFI_5HalkgPUwyKUJsLNZMCQhr4MRMD3wfGhruPks0TGPiyQ5MpmjjslgHeMjDbbDJ95mfe3fjWaVIkIM-XuWWs0.png\" width=\"615\" height=\"274\" alt=\"\"><\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent;\">3. Developer tool should appear &amp; the corresponding HTML element should be highlighted.<\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/uv00cmgm4q6SMOzv_ZdjsdkLmy6u3B_uKngctTizucVoUrOhO4VM-ckPHUtFg-11RvFv8BquVBbSjfyXVOwUnpYE8RkSo2n0kQ2OkwuERnmhmZtG8xlLlJQTP0a3QsL12MDqDDpks0.png\" width=\"621\" height=\"353\" alt=\"\"><\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent;\">4.HTML elements for other players are similar to this one.<\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">Next, we use Beautiful Soup to extract the links of basic information page for all players.<\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/dKK5ENd1_91BgOc6LB-x_N4iTihYWUKDNm-Tg0BqqvlMETtSbd_4Y-dtJG_A-abS4L1_6km-XFnOMgs6R3MGPWXbXc2daJgVeSx6qCkR5Hgb5_vTEzBvxArtI5sUOiNZLWKCXAvs0.jpeg\" width=\"643\" height=\"334\" alt=\"\"><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">There are numerous ways to query the HTML code with BeautifulSoup. Here, we locate the salary table by &#8220;table tag&#8221; &amp; its classes, then extract all links inside it.<\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/4Ox4t9dIaJ5ahF1IEzv2JJ39SvWbf9g7pVIYeptBD7fzxRIwJhEUHl9lIYv0MZKglkYkqqiY4FbFlIr0B3AE6VQXRUSyS3PM3hlC6HXI2YitmhtCESzsPA1UEwqJUfHGt5oKEBHQs0.png\" width=\"620\" height=\"546\" alt=\"\"><\/span><\/p>\n<h3 class=\"ql-align-justify\"><\/h3>\n<h3 class=\"ql-align-justify\"><strong style=\"background-color: transparent; color: #000000;\">Extract basic information<\/strong><\/h3>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">After getting the link to each player&#8217;s basic information page, we will extract the basic information for each user. Most of those pieces of information are text and non-clickable, we need to locate its element by highlighting them and right clicking as below.<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/g3RQs5bL05WBlw6Ri8rE9958YRkXsppVST7OGS24m3YPFPQpQiKDfpKQ-JJWpdeRDRIzk-u9y1TL0_gwLM4X3LDOra1h0bGQCDNSJegGzXnqTV9eK_NJo686-AcVxtrb-GwcUHQOs0.png\" width=\"587\" height=\"311\" alt=\"\"><\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">Once again, you can use BeautifulSoup to extract the elements of those <\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">pieces of information by performing certain queries.<\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/jBfujaD1y-y-Q_lWxFXNo03r01lzz0ZZ0tWeVoy_WxL-lKNgTeLeJ_24D9zR4yShtkSds9m_ogsAfakezb6dHVxu_gIp-RfvAjKSZVbACSQ87xfgpRXZnMkz5m0qF_ysh3VPOlq2s0.jpeg\" width=\"716\" height=\"532\" alt=\"\"><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">Unfortunately, there is no way to identify position, birth date, height, weight and salary, as all of them share common attributes. Therefore, we get all relevant elements and match them one by one.<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">The output should look like this.<\/span><\/p>\n<p class=\"ql-align-justify\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/Q9O8ceyqfzw4npxfg1RREldNhME5XiQwcbpmFedYqz7W6-Fbb7LznGA63hJlunF99OZenOISK0HpRE-0YYXJg3XFmIg4F615dRGVLGP_2RX_Vfre1nVmQkKtR1hJfoh02mjLvrYPs0.jpeg\" width=\"720\" height=\"226\" alt=\"\"><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<h3 class=\"ql-align-justify\"><strong style=\"background-color: transparent; color: #000000;\">Repeat the steps for each player<\/strong><\/h3>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">After being able to extract the information from a player, we just need to repeat the whole process for each player and store the information.<\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/eJ86aUH9__kUdOVz-gtxOdSOfTmbe829PxxCkTxVW3qLwNzm-3_-nY2NhizTAE1chWu744e4lpbzZbpMvG201XcnSPyDF2fjLWlNu-FoVd_F73FSnpOWadZacQx3EdgA1JvfL2Gs0.jpeg\" width=\"655\" height=\"854\" alt=\"\"><\/span><\/p>\n<p class=\"ql-align-justify\">\n<h3 class=\"ql-align-justify\"><strong style=\"background-color: transparent; color: #000000;\">Export result to a csv file<\/strong><\/h3>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">We convert the data to a table-like format<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/AIoOSXGuRRVkMNm8HlhHsjpHJgknwJU-Zv7e_HNWEQ2uadmR86Ch-f1Pf5lbtNrpaT1DqLlJNKqwqgO-YJjMBidjiagNLK0ApY6mfL3V6yXdAPmbQ0K5YslfGwXnV2rAhHq45ldws0.png\" width=\"645\" height=\"399\" alt=\"\"><\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">You should see a table as below.<\/span><\/p>\n<p class=\"ql-align-justify\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/vWVO_GCeaJBlW4tmADvEPhjVuAKU7L6FyXWYqGhCg2LEMoYBvmXVMh4mwCvSm2BIUqXD13AohK5vNY7t3J4xHCIUK3Qm8ivqB09yyfdOF8fPaOv2y_CXseeHVjCZl4rkKAS401-ps0.png\" width=\"1099\" height=\"122\" alt=\"\"><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">Finally, we export the information as a csv file<\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/JGWDUSlODdH2Ug7uxibIf4D5IzXV7SLNj5h5zMPzyrqLTmAgmwJekp9msuZay14VseAo0gGdVZCSK6Cvzjy7xyfRg6c75keV6BwCYTmdIFOBpT6jDU3xl1BNZhYUeYhaCxaT-M52s0.png\" width=\"616\" height=\"193\" alt=\"\"><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<h2 class=\"ql-align-justify\"><strong style=\"background-color: transparent; color: #000000;\">Conclusion<\/strong><\/h2>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">The above tutorial outlines how to scrape data from web pages with just three python modules. In fact, anyone who has basic knowledge of Python &amp; HTML can learn web scraping quickly given that there are lots of mature tools. In other words, anybody can steal your web content easily if you have zero protection on your web content. Therefore, it becomes crucial to protect your content by adopting Cyber Security technologies.<\/span><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">These technologies can monitor your websites&#8217; traffic, verify the authenticity of incoming traffic &amp; block the incoming traffic. For example,<\/span> <a style=\"background-color: transparent; color: #1155cc;\" href=\"https:\/\/www.geetest.com\/en\/\" target=\"_blank\" rel=\"noopener noreferrer\">Geetest&#8217;s<\/a> <span style=\"background-color: transparent; color: #000000;\">BotSonar<\/span><span style=\"background-color: transparent; color: #0066cc;\">, <\/span><span style=\"background-color: transparent; color: #000000;\">which is adopted by multinational companies, e.g. KFC &amp; Nike, that technology monitors your website 24\/7 and distinguishes the traffic between bad bots and human beings by their AI technology. On top of that, you can choose how do you handle those bad incoming traffic, e.g. blocking the bad incoming traffic or showing fake content to them. Besides, Geetest respects your data privacy, their products are GDPR compliant, which is a plus if you are from enterprise background.<\/span><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/geetests.com\/wp-content\/uploads\/2025\/09\/pKVFvN7DIyv5146ZXFvjpOv-_2ZPHYSro8fycjjrKABYEVVUNifs7emcundFd5guXUMpjTjOVcOTkHxV4gWQORpMQ2uV_QJFNiAuy8yTaDlzTn1TeF2NeyR6oAyek6lMHrItoBbis0.png\" width=\"602\" height=\"337\" alt=\"\"><\/span><\/p>\n<h5 class=\"ql-align-justify\"><a style=\"background-color: transparent; color: #1155cc;\" href=\"https:\/\/www.geetest.com\/en\/\" target=\"_blank\" rel=\"noopener noreferrer\"><em>GeeTest&#8217;s<\/em><\/a><em style=\"background-color: transparent; color: #000000;\"> anti web scraping<\/em><\/h5>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-right\"><strong>Wrote By:<\/strong><a href=\"https:\/\/joeho.xyz\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Joe<\/strong><\/a><\/p>\n<p class=\"ql-align-justify\">\n<p class=\"ql-align-justify\"><strong style=\"background-color: transparent; color: #000000;\">Source Code<\/strong><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">Source code is available at <\/span><\/p>\n<p class=\"ql-align-justify\"><a style=\"background-color: transparent; color: #1155cc;\" href=\"https:\/\/github.com\/JoeHO888\/How-does-web-scraping-become-simpler-and-how-to-prevent-it\/blob\/main\/How\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/github.com\/JoeHO888\/How-does-web-scraping-become-simpler-and-how-to-prevent-it\/blob\/main\/How<\/a><\/p>\n<p class=\"ql-align-justify\"><span style=\"background-color: transparent; color: #000000;\">does web scraping become simpler and how to prevent it &#8211; Source Code.ipynb<\/span><\/p>\n<h2 class=\"ql-align-right\"><\/h2>\n<h2 class=\"ql-align-right\"><\/h2>\n<\/div>\n<p><!-- .vgblk-rw-wrapper --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>web scraping becomes much easier due to technology advance, which we will illustrate it by a simple example, how to scrape NBA players&#8217; information, e.g. Height, Birthdate, salary.<\/p>\n","protected":false},"author":7,"featured_media":995786,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[89],"tags":[],"class_list":["post-996943","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-fraud-prevention"],"_links":{"self":[{"href":"\/en\/wp-json\/wp\/v2\/posts\/996943","targetHints":{"allow":["GET"]}}],"collection":[{"href":"\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"\/en\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"\/en\/wp-json\/wp\/v2\/comments?post=996943"}],"version-history":[{"count":2,"href":"\/en\/wp-json\/wp\/v2\/posts\/996943\/revisions"}],"predecessor-version":[{"id":997632,"href":"\/en\/wp-json\/wp\/v2\/posts\/996943\/revisions\/997632"}],"wp:featuredmedia":[{"embeddable":true,"href":"\/en\/wp-json\/wp\/v2\/media\/995786"}],"wp:attachment":[{"href":"\/en\/wp-json\/wp\/v2\/media?parent=996943"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"\/en\/wp-json\/wp\/v2\/categories?post=996943"},{"taxonomy":"post_tag","embeddable":true,"href":"\/en\/wp-json\/wp\/v2\/tags?post=996943"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}