Thursday 27 December 2018

Using an Azure Function, Puppeteer and a headless Chrome to drive a Web site like Selenium

These are some notes to describe how you can use Azure Functions and a headless Chrome to parse a Web site similar to using Selenium. It allows you to periodically pull some data from a Website without needing a Virtual Machine or a Cloud Service.

Install nvm-windows

Install the correct version of node (at the time of writing 10.6.0 was compatible with Azure Functions).

nvm install 10.6.0
node use 10.6.0

There was a bug meaning that some packages were missing and the error message "npm-cli.js not found" was displayed when running npm. The solution was to uninstall and reinstall again.

nvm uninstall 10.6.0
nvm install 10.6.0

Install Visual Studio Code.

Install the Azure Functions extension for Visual Studio Code.

Create a folder for the Azure Function.

CD to the folder.
Initialise NPM, otherwise you will get this error.
npm init
npm i puppeteer

Using the Azure icon in Visual Studio Code, login to the Azure Subscription.

Create the function using the Azure Functions integration in Visual Studio Code.

Test the function.

Deploy the function.

Once the function has been deployed, open a Kudu console and navigate to the D:\home\site\wwwroot\[FunctionName].

install Puppeteer by typing 'npm i puppeteer'

However it doesn't work properly as Chromium won't launch because of Azure App Function limitations.

Another option is browserless.io, but this is a paid resource.

Addendum

This project uses an Azure Function to pull data from a website.

No comments:

Post a Comment