This article will educate you on creating a web scraping tool using NodeJS to extract financial data from various websites on the internet. If you don’t understand what web scraping is or don’t have much experience with NodeJS, look at a quality NodeJS web scraping tutorial that covers other topics relating to Javascript.
The final application you’ll create at the end of this process will use several npm packages, including Puppeteer and Express.js. Puppeteer automates how you navigate to a financial website and scrape data from the page’s code. Express.js allows users to communicate with the final application through an API client or browser.
Starting a New Project
To begin your new project, create a new folder wherever you’d like on your computer. In the end, this folder will contain your final application. Once you’ve generated the folder, you can initiate a CLI to navigate to it. Windows users will do this through the Command Prompt, while Mac users will use Terminal.
After you’ve navigated to the project folder, enter “npm init” to start the project. The program will ask you several questions, which you can skip through by hitting the “enter” key.
Install the Necessary Packages
You’ll need two applications to execute the application: Puppeteer and Express.js. Essentially, Express.js allows you to create the API and Puppeteer will extract the financial data from your chosen website.
Enter the following command to install Puppeteer and Express.js in your file folder: “npm install express puppeteer.” After entering the command, npm should automatically create a folder called “node_modules” and install both Puppeteer and Express.js along with various other packages.
Application Coding
The finance data scraper will require around 30 lines of coding and is simple to input through your command console.
Index.js
To begin, you’ll have to create a file called “index.js” inside your project directory. You will need to enter the following code on lines 2 through 12:
- Line 2: var express = require(‘express’);
- Line 4: var api = require(‘./api’);
- Line 6: var app = express();
- Line 8: app.use(‘/’, api);
- Line 10: app.listen(3000, function() {
- Line 11: console.log(‘Node app is running on port 3000’);
- Line 12: });
For line 2, the recently installed Express.js module is imported, and line 6 utilizes the newly imported module to create a new express application.
For line 4, you are importing a file that you will create later in the process.
For line 8, you are instructing express to use the API to handle any application requests.
Beginning on line 10, you start listening to requests made on port 3000. This method indicates that the app is running correctly, using a callback function to log a message.
Api.js
In the project directory, create a second file called “api.js.” Enter the following for lines 2 through 28:
- Line 2: var router = require(‘express’).Router();
- Line 3: var puppeteer = require(‘puppeteer’);
- Line 5: router.route(‘/stock/:ticker’).get(async (req, res) => {
- Line 6: let ticker = req.ticker;
- Line 8: const browser = await puppeteer.launch({
- Line 9: headless:true,
- Line 10: args: [‘–no-sandbox’, ‘–disable-setuid-sandbox’]
- Line 11: });
- Line 12: const page = await browser.newPage();
- Line 13: let url = `https://finance.yahoo.com/quote/${ticker}?p=${ticker}&.tsrc=fin-srch`;
- Line 15: await page.goto(url);
- Line 15: await page.waitFor(‘#quote-market-notice’, {timeout: 1000});
- Line 17: let price = await page.evaluate(() => document.querySelector(“#quote-header-info > div.Pos(r) > div > div > span”).textContent);
- Line 18: await browser.close();
- Line 20: res.send({ticker, price});
- Line 21: });
- Line 23: router.param(‘ticker’, (req, res, next, ticker) => {
- Line 24: req.ticker = ticker.toUpperCase();
- Line 25: next();
- Line 26: });
- Line 28: module.exports = router;
To better translate what all this coding means, you are essentially creating routes from your program’s financial website. The program takes ticker data from the website and extracts their algorithm for changes in stock prices.
By the end, you should be able to enter a company’s stock trading name (for example, Apple is AAPL) and compile financial data about why the stock’s price is changing and how services are making these calculations.
Testing Your New Application
In order to test your new application, you have to return to the CLI. Make sure that the CLI remains open in the project directory. Enter the following command through your console: “node index.js,” A message should pop up indicating port 3000 is running the application.
After you’ve ensured the application is running, you can use either an API client or browser to test it out. In the command console, enter “localhost:3000/stock/MSFT” for Microsoft’s latest stock price. Alter the ticker name at the end of the URL to obtain the price of other stocks.
Conclusion
We hope this introduction to web scraping financial data has been helpful. Practice on your own with this tutorial and others to enhance your ability to analyze data. To further your capacity for web scraping and coding in general, visit Zenscrape for more coding tutorials.