measures are in place, threat intelligence helps professionals understand cybercriminal methods and goals and train security teams, and leads to the creation of tools and systems that protect data and prevent future attacks. Cyber threat intelligence addresses cybercrime with information and skills that identify, minimize and manage cyber attacks. This intelligence is typically gathered from all levels of the web, including darknet forums and websites. Quality intelligence that is current and relevant is critical to the success of cybersecurity strategies. To obtain high-level insights, cybersecurity experts use web scraping to crawl the web and extract information from target websites. The web scraping process comprises three main steps that include 1) sending data requests to the target website server; 2) extracting and parsing data into an easilyreadable format; and 3) data analysis. Cybercriminals attempt to escape detection by identifying cybersecurity company servers and blocking their IP addresses. To address this issue, datacenter and residential proxies are used to maintain anonymity, avoid geo-location restrictions and balance server requests to prevent bans. Threat intelligence strategies typical consist of a process or cycle that includes several steps. The first step is to determine the data that needs to be protected and set goals for what intelligence is required to minimize threats and prevent attacks. Additionally, analysis is conducted to identify potential impacts and outline remediation efforts. Once the project scope is outlined, data is extracted via web scraping from websites, news, blogs, forums and all other relevant locations. In addition, some closed sources may be identified and infiltrated on the dark web. Following the web scraping process, analysts examine the collected data to determine potential threats and their source. The collected data and analysis are forwarded to organizations through distribution channels. Some cybersecurity companies build threat intelligence distribution platforms or feeds that provide real-time information. Following plan implementation, results are recorded and feedback is sent to fine-tune the strategy. Update on personal data scraping During 2022, the scraping industry could breathe a sigh of relief as at least one enduring issue was put to rest. Companies combating scrapers can no longer use the Computer Fraud and Abuse Act (CFAA) to stop scraping public-facing data. But according to Denas Grybauskas, the head of legal at public web data-acquisition solution provider Oxylabs, we can expect that during 2023 other legal grounds and arguments will be tried and become more popular in the courts against data scraping companies, such as infringement of terms of service and intellectual property protection. As 2022 ended with quite a few stories of personal data scraping and data breaches (Clearview fines in Europe, Meta database leak that affected more than 500 million users, Meta’s GDPR fines in Europe, among others we can expect more spotlight on personal data scraping from regulators and authorities. “Finally, 2023 might be the year when the scraping and data collection industry will begin self-regulation initiatives,” said Denas. Meanwhile, Gediminas Rickevičius, vice president of global partnerships at Oxylabs, is confident that with evolving AI capabilities this year, the same as in 2022, the importance and scale of web data applications in commerce will continue to grow. Gediminas predicts that further parallel evolution of web scraping and blocking systems can also be foreseen. It means a greater need for resources and know-how. “Therefore, I suggest leaving web scraping in the expert’s hands. Although the cost of commercial scraping will increase, doing it yourself will be even more expensive than with professionals’ help,” predicted Gediminas. o Andrius Palionis is vice president of enterprise sales at Oxylabs.io. Step-by-step Web Scraping Process Understanding how web scraping works requires an explanation of the basic steps of web scraping. There are the three main steps involded. 1. Sending requests to targeted websites. Web scraping tools (also called web scrapers) are making HTTP requests, such as GET and POST to the target websites for the contents of a specific URL. 2. Extracting required data. Requested web servers return the data in the HTML format. However, you might need to extract specific information from the HTML file. In this case, web scrapers parse the data according to the requirements. 3. Storing scraped data. This is the final step of the whole web scraping process. The required data needs to be stored in CSV, JSON formats, or in any database for further usage. Data acquisition for threat intelligence is a challenging task that requires a lot of resources. Cybersecurity companies mostly face these issues: gathering real-time information, large-scale operations, multiple targets and maintaining anonymity. Sending requests Extracting data Storing data 30 CHANNELV ISION | JANUARY - FEBRUARY 2023
RkJQdWJsaXNoZXIy NTg4Njc=