展示頁(yè)網(wǎng)站怎么做排名網(wǎng)站設(shè)計(jì)與制作畢業(yè)論文范文
1. puppeteer 和 puppeteer-core
安裝 puppeteer 會(huì)默認(rèn)下載一個(gè)最新版本的 chrome
瀏覽器;
安裝 puppeteer-core
,不會(huì)安裝 chrome
, 若要程序打開瀏覽器運(yùn)行時(shí),需手動(dòng)指定電腦系統(tǒng)安裝的 chrome 瀏覽器路徑
;
2. puppeteer-core
指定系統(tǒng) chrome 瀏覽器路徑
import puppeteer from 'puppeteer-core';
// launch 添加 executablePath 參數(shù)
await puppeteer.launch({executablePath: '/path/to/Chrome'});
查看本機(jī) chrome
路徑: 在 chrome 瀏覽器中輸入地址 chrome://version
3. 簡(jiǎn)單啟用示例
const userAgents = [// 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3',// 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3',// 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.95 Safari/537.36 QIHU 360SE',// 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',// 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36','Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36','Mozilla/5.0 (Macintosh; Intel Mac OS X 11_0_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
];// 隨機(jī)選擇一個(gè) User-Agent
function getRandomUserAgent () {return userAgents[Math.floor(Math.random() * userAgents.length)];
}(async () => {// Launch the browser and open a new blank pageconst browser = await puppeteer.launch({userDataDir: './userData', // 指定存放用戶數(shù)據(jù)目錄headless: false, // 關(guān)閉無頭模式,會(huì)打開 chrome 瀏覽器args: ['--start-maximized', // 窗口最大化// `--proxy-server=${proxySettings.proxy}`, // 配置代理'--no-sandbox','--disable-setuid-sandbox'], defaultViewport: { // 模式頁(yè)面視圖大小width: 1920,height: 1080,},devtools: true, // 打開開發(fā)者工具});try {let isLogin = falseconst page = await browser.newPage()// 設(shè)置隨機(jī) User-Agentconst userAgent = getRandomUserAgent();await page.setUserAgent(userAgent);// 從文件讀取 cookiesconst cookiesJson = fs.readFileSync('qcccookies.json', 'utf8');if (cookiesJson) {const cookies = JSON.parse(cookiesJson);// 在頁(yè)面加載之前設(shè)置 cookiescookies && await page.setCookie(...cookies); // 使用擴(kuò)展運(yùn)算符展開 cookies 數(shù)組}// 進(jìn)入目標(biāo)頁(yè)await page.goto('https://www.baidu.com', {waitUntil: "networkidle2",})// 判斷是否有賬戶頭像,有則說明自動(dòng)登錄成功const userImg = await page.$('xxxxx.img')if (userImg) {isLogin = true;}/*** 登錄 在頁(yè)面 input 中數(shù)據(jù)內(nèi)容并登錄*/if (!isLogin) {// insert nameawait page.type('body > input', config.account, { delay: typeDelay });// insert pwdawait page.type('body > input', config.pwd, { delay: typeDelay });// 點(diǎn)擊登錄await page.click('body > button')// 頁(yè)面截圖await page.screenshot({ path: "test2.png" })// 等待進(jìn)行手動(dòng)登錄驗(yàn)證,進(jìn)入頁(yè)面await page.waitForNavigation({waitUntil: 'load'})// 獲取當(dāng)前頁(yè)面的所有 cookies{const cookies = await page.cookies();console.log(cookies);// 將 cookies 轉(zhuǎn)換為 JSON 字符串并保存到文件await fs.writeFileSync('qcccookies.json', JSON.stringify(cookies, null, 2));}}// 獲取打開的頁(yè)面棧const pages = await browser.pages();console.log(pages);// 獲取最新打開的頁(yè)面let newPage = nullawait new Promise((resolve) => {browser.on('targetcreated', async (target) => {if (target.opener() === page.target()) {newPage = await target.page();}});});if (newPage) {await sleep(3000)await newPage.waitForSelector('body'); // 例如等待頁(yè)面加載完成// 頁(yè)面存 pdfawait page.pdf({path: 'xxxx.pdf',format: 'A3',// displayHeaderFooter: true,margin: {top: '5mm',right: '5mm',bottom: '5mm',left: '5mm'}})}} catch (e) {console.error(e)} finally {// await browser.close()}})();
4. 獲取 dom 中的數(shù)據(jù)
// 在頁(yè)面內(nèi)執(zhí)行 document.querySelector。page.$(selector)// 在頁(yè)面內(nèi)執(zhí)行 document.querySelectorAll。page.$$(selector)// page.$// page.$$// page.evaluateconst pageData = await page.evaluate(() => {// 獲取節(jié)點(diǎn)容器const items = Array.from(document.querySelectorAll('#id li'));// 獲取dom 文字信息return items.map(item => {return ({title: item.querySelector('.xxx a').innerText.replaceAll('/', '//'),})});});
5. 簡(jiǎn)易反反爬蟲
1. 動(dòng)態(tài)設(shè)置 user-agent
page.setUserAgent(userAgent);
2. 讀取 cookie 和設(shè)置 cookie
const cookies = await page.cookies()
await page.setCookie(...cookies)
3. 開啟存儲(chǔ)用戶數(shù)據(jù),登錄一次,未過期時(shí)間內(nèi),下次自動(dòng)登錄
const browser = await puppeteer.launch({userDataDir: './userData',
})