昨天不是说让GPT给我写扒站接口他不给写是吧,说什么容易违反法律什么的。今天我换了种方式问它,我跟它说,你可以帮我写一个API接口吗,我想把我自己网站上的静态资源全部都下载下来,然后它就真的开始写了哈哈哈哈,下面这个是经过了好几次修改完善以后的接口源码,目前功能还稍微有点拉胯,只能下载比较小的那种,资源多的话容易504,有兴趣的可以自己让GPT再把代码打磨一下。
安装依赖
安装Goutte库
首先,确保已经安装了 Composer。如果尚未安装,请按照 Composer 的官方文档进行安装:https://getcomposer.org/download/
打开终端(命令行界面)。
进入项目的根目录,运行以下命令来安装 Goutte
安装完成后,将在的项目目录中看到一个 vendor 文件夹,其中包含 Goutte 包及其依赖项。
composer require fabpot/goutte
生成autoload.php
确保已经在项目根目录中运行了 Composer 安装命令,以便安装了 Goutte 包和其他依赖项。如果尚未安装,请按照我之前提供的步骤进行安装。
打开终端(命令行界面)。
进入项目的根目录,运行以下命令以生成 autoload.php 文件:
composer dump-autoload -o
PHP扒站接口源码
<?php
require_once 'vendor/autoload.php'; // Make sure to install the Goutte library using Composer
use Goutte\Client;
// Check if URL parameter is provided
if (!isset($_GET['url'])) {
echo "Please provide the 'url' parameter.";
exit;
}
// Target website URL
$websiteUrl = $_GET['url'];
$client = new Client();
$crawler = $client->request('GET', $websiteUrl);
// Get the raw HTML source code of the page
$htmlSource = $crawler->html();
// Decode HTML entities in the raw HTML source
$htmlSource = html_entity_decode($htmlSource, ENT_COMPAT | ENT_HTML5, 'UTF-8');
// Extract resource URLs from the raw HTML source
$resourceLinks = [];
$imageLinks = [];
preg_match_all('/(href|src)="([^"]+\.(css|js|html))"/', $htmlSource, $matches);
foreach ($matches[2] as $resourceUrl) {
// Check if the URL is absolute or relative
if (strpos($resourceUrl, 'http') !== 0) {
$resourceUrl = rtrim($websiteUrl, '/') . '/' . ltrim($resourceUrl, '/');
}
$resourceLinks[] = $resourceUrl;
}
preg_match_all('/<img[^>]+src="([^"]+)"/', $htmlSource, $matches);
foreach ($matches[1] as $imageUrl) {
// Check if the image URL is absolute or relative
if (strpos($imageUrl, 'http') !== 0) {
$imageUrl = rtrim($websiteUrl, '/') . '/' . ltrim($imageUrl, '/');
}
$imageLinks[] = $imageUrl;
}
// Create a download directory based on website structure
$downloadPath = './downloaded_resources/';
if (!file_exists($downloadPath)) {
mkdir($downloadPath, 0777, true);
}
// Create the URL directory if it doesn't exist
$urlDirectory = $downloadPath . parse_url($websiteUrl, PHP_URL_HOST);
if (!file_exists($urlDirectory)) {
mkdir($urlDirectory, 0777, true);
}
// Download resources and images to their respective directories
foreach ($resourceLinks as $resourceUrl) {
$parsedUrl = parse_url($resourceUrl);
$resourcePath = $urlDirectory . $parsedUrl['path'];
$resourceDir = dirname($resourcePath);
if (!file_exists($resourceDir)) {
mkdir($resourceDir, 0777, true);
}
downloadFile($resourceUrl, $resourcePath);
}
foreach ($imageLinks as $imageUrl) {
$parsedUrl = parse_url($imageUrl);
$imagePath = $urlDirectory . $parsedUrl['path'];
$imageDir = dirname($imagePath);
if (!file_exists($imageDir)) {
mkdir($imageDir, 0777, true);
}
downloadFile($imageUrl, $imagePath);
}
// Save the HTML source to a file
$htmlFilePath = $urlDirectory . '/index.html';
file_put_contents($htmlFilePath, $htmlSource);
// Create a zip archive of downloaded files
$zipFilename = date('YmdHis') . '.zip';
$zip = new ZipArchive();
if ($zip->open($downloadPath . $zipFilename, ZipArchive::CREATE | ZipArchive::OVERWRITE) === true) {
foreach (new RecursiveIteratorIterator(new RecursiveDirectoryIterator($urlDirectory)) as $file) {
if ($file->isFile()) {
$filePath = $file->getPathname();
$relativePath = str_replace($urlDirectory, '', $filePath);
$zip->addFile($filePath, $relativePath);
}
}
$zip->close();
}
// Set headers to force download
header('Content-Type: application/zip');
header('Content-Disposition: attachment; filename="' . $zipFilename . '"');
header('Content-Length: ' . filesize($downloadPath . $zipFilename));
readfile($downloadPath . $zipFilename);
// Delete downloaded files
array_map('unlink', glob($downloadPath . '*'));
rmdir($downloadPath);
exit;
function downloadFile($url, $savePath) {
$ch = curl_init($url);
$fp = fopen($savePath, 'wb');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
}
?>
感谢您的来访,获取更多精彩文章请收藏本站。
© 版权声明
THE END
暂无评论内容