自建本地扒站API接口源码-彩豆博客

昨天不是说让GPT给我写扒站接口他不给写是吧，说什么容易违反法律什么的。今天我换了种方式问它，我跟它说，你可以帮我写一个API接口吗，我想把我自己网站上的静态资源全部都下载下来，然后它就真的开始写了哈哈哈哈，下面这个是经过了好几次修改完善以后的接口源码，目前功能还稍微有点拉胯，只能下载比较小的那种，资源多的话容易504，有兴趣的可以自己让GPT再把代码打磨一下。

安装依赖

安装Goutte库

首先，确保已经安装了 Composer。如果尚未安装，请按照 Composer 的官方文档进行安装：https://getcomposer.org/download/

打开终端（命令行界面）。

进入项目的根目录，运行以下命令来安装 Goutte

安装完成后，将在的项目目录中看到一个 vendor 文件夹，其中包含 Goutte 包及其依赖项。

composer require fabpot/goutte

生成autoload.php

确保已经在项目根目录中运行了 Composer 安装命令，以便安装了 Goutte 包和其他依赖项。如果尚未安装，请按照我之前提供的步骤进行安装。

打开终端（命令行界面）。

进入项目的根目录，运行以下命令以生成 autoload.php 文件：

composer dump-autoload -o

PHP扒站接口源码

<?php
require_once 'vendor/autoload.php'; // Make sure to install the Goutte library using Composer

use Goutte\Client;

// Check if URL parameter is provided
if (!isset($_GET['url'])) {
    echo "Please provide the 'url' parameter.";
    exit;
}

// Target website URL
$websiteUrl = $_GET['url'];

$client = new Client();
$crawler = $client->request('GET', $websiteUrl);

// Get the raw HTML source code of the page
$htmlSource = $crawler->html();

// Decode HTML entities in the raw HTML source
$htmlSource = html_entity_decode($htmlSource, ENT_COMPAT | ENT_HTML5, 'UTF-8');

// Extract resource URLs from the raw HTML source
$resourceLinks = [];
$imageLinks = [];

preg_match_all('/(href|src)="([^"]+\.(css|js|html))"/', $htmlSource, $matches);

foreach ($matches[2] as $resourceUrl) {
    // Check if the URL is absolute or relative
    if (strpos($resourceUrl, 'http') !== 0) {
        $resourceUrl = rtrim($websiteUrl, '/') . '/' . ltrim($resourceUrl, '/');
    }

    $resourceLinks[] = $resourceUrl;
}

preg_match_all('/<img[^>]+src="([^"]+)"/', $htmlSource, $matches);

foreach ($matches[1] as $imageUrl) {
    // Check if the image URL is absolute or relative
    if (strpos($imageUrl, 'http') !== 0) {
        $imageUrl = rtrim($websiteUrl, '/') . '/' . ltrim($imageUrl, '/');
    }

    $imageLinks[] = $imageUrl;
}

// Create a download directory based on website structure
$downloadPath = './downloaded_resources/';
if (!file_exists($downloadPath)) {
    mkdir($downloadPath, 0777, true);
}

// Create the URL directory if it doesn't exist
$urlDirectory = $downloadPath . parse_url($websiteUrl, PHP_URL_HOST);
if (!file_exists($urlDirectory)) {
    mkdir($urlDirectory, 0777, true);
}

// Download resources and images to their respective directories
foreach ($resourceLinks as $resourceUrl) {
    $parsedUrl = parse_url($resourceUrl);
    $resourcePath = $urlDirectory . $parsedUrl['path'];
    $resourceDir = dirname($resourcePath);

    if (!file_exists($resourceDir)) {
        mkdir($resourceDir, 0777, true);
    }

    downloadFile($resourceUrl, $resourcePath);
}

foreach ($imageLinks as $imageUrl) {
    $parsedUrl = parse_url($imageUrl);
    $imagePath = $urlDirectory . $parsedUrl['path'];
    $imageDir = dirname($imagePath);

    if (!file_exists($imageDir)) {
        mkdir($imageDir, 0777, true);
    }

    downloadFile($imageUrl, $imagePath);
}

// Save the HTML source to a file
$htmlFilePath = $urlDirectory . '/index.html';
file_put_contents($htmlFilePath, $htmlSource);

// Create a zip archive of downloaded files
$zipFilename = date('YmdHis') . '.zip';
$zip = new ZipArchive();
if ($zip->open($downloadPath . $zipFilename, ZipArchive::CREATE | ZipArchive::OVERWRITE) === true) {
    foreach (new RecursiveIteratorIterator(new RecursiveDirectoryIterator($urlDirectory)) as $file) {
        if ($file->isFile()) {
            $filePath = $file->getPathname();
            $relativePath = str_replace($urlDirectory, '', $filePath);
            $zip->addFile($filePath, $relativePath);
        }
    }
    $zip->close();
}

// Set headers to force download
header('Content-Type: application/zip');
header('Content-Disposition: attachment; filename="' . $zipFilename . '"');
header('Content-Length: ' . filesize($downloadPath . $zipFilename));
readfile($downloadPath . $zipFilename);

// Delete downloaded files
array_map('unlink', glob($downloadPath . '*'));
rmdir($downloadPath);

exit;

function downloadFile($url, $savePath) {
    $ch = curl_init($url);
    $fp = fopen($savePath, 'wb');

    curl_setopt($ch, CURLOPT_FILE, $fp);
    curl_setopt($ch, CURLOPT_HEADER, 0);

    curl_exec($ch);
    curl_close($ch);
    fclose($fp);
}
?>

------本页内容已结束，喜欢请分享------

感谢您的来访，获取更多精彩文章请收藏本站。

版权声明 1 本网站名称：彩豆博客
2 本站永久网址：https://www.521cd.cn
3本站代码模板仅供学习交流使用请勿商业运营,严禁使用彩豆博客上的模板&源码从事违法,侵权等任何非法活动！
4 本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长QQ931106824进行删除处理。
5 本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
6 本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
7 本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END