基于Wav2Lip+GFPGAN的高清版AI主播

继上一篇基于Wav2Lip的AI主播的内容之后很多小伙伴反应一个问题就是生成的AI人物并不是很清晰，尤其是放到编辑器里会出现明显的痕迹，因此这次带来的了 Wav2Lip+GFPGAN 高清版的内容，如果不太了解这个项目实做什么的可以来先看一下效果。该项目暂时没有中文介绍，我这个应该是首发。

这个项目是基于基于Wav2Lip的AI主播项目的环境延伸，如果没有该环境的请先去该文章进行配置。

基于Wav2Lip自制高清版，用自己形象做数字人清楚多了

虽然说是自制但是也基于git大佬的源代码按照自己的需求进行的修改，整体的原理就是基于视频的每一帧进行高清处理，然后进行合并拼接成视频，最后拼接音频形成完整的视频。

文章目录

准备工作
pip 项目依赖
生产流程
- inputs 数据文件
- 执行代码脱手制作
- outputs 数据文件
【分享】Wav2Lip-GFPGAN

准备工作

Python环境需要基于 Anaconda 环境。Python初学者在不同系统上安装Python的保姆级指引
配置好 GPU 的 Pytorch 环境。Win10+Python3.9+GPU版pytorch环境搭建最简流程
github上下载源码，对应预训练模型不用着急，在第一次启动的时候没有预训练模型会自动进行下载。如果网络太慢可以将对应的下载地址放到迅雷中下载，并复制到对应的项目地址中即可。
自行创建虚拟环境。Python虚拟环境的安装和使用
如果都不会或者懒的话，直接看文章最下方的的网盘分享一键包。

pip 项目依赖

pip install basicsr>=1.3.4.0
pip install facexlib>=0.2.3
pip install lmdb
pip install pyyaml
pip install scipy
pip install tb-nightly
pip install yapf
pip install realesrgan
pip install ffmpeg

pip install torch==1.10.2+cu113 torchvision==0.3.0 --extra-index-url https://download.pytorch.org/whl/cu113

生产流程

首先你要确定几个生产目录，即下面代码中需要使用到的。

inputs：制作口播视频的基础视频。
outputs：输出制作好成品以及基础数据。

inputs 数据文件

创建一个自己的项目，例如用自己的昵称作为文件夹。这里需要将音频文件和数字人的视频文件分开存储。

在这里插入图片描述

脚本会自动的搜索 source_audio 下的文件并在 output 中创建对应的文件目录。

在这里插入图片描述

source_video 是你制作数字人的视频素材，建议不要超过1分钟，具体为什么不多说。

在这里插入图片描述

执行代码脱手制作

如果对自己机器性能有信心的话可以在下面多进程那个地方解开注释，使用多进程进行高清处理。

import os
import random
import shutil
import cv2
from tqdm import tqdm
from os import path
import numpy as np
import threading

basePath = "."

# 需要的算法框架目录
wav2lipFolderName = 'Wav2Lip-master'
gfpganFolderName = 'GFPGAN-master'
wav2lipPath = basePath + '/' + wav2lipFolderName
gfpganPath = basePath + '/' + gfpganFolderName

# 确定需要制作的用户视频
userPath = "Mr数据杨"

# 获取本次需要合成视频的音频文件
userAudioPathList = os.listdir("inputs/" + userPath + "/source_audio")

for sourceAudioName in userAudioPathList:
    # 获取每个音频的名称
    title = sourceAudioName.split(".")[-2]
    # 每次随机从用户的原始文件中提取一个视频作为素材文件
    userVideoPathList = os.listdir("inputs/" + userPath + "/source_video")
    sourceVideoName = random.sample(userVideoPathList, 1)[0]

    # 输出项目目录
    outputPath = basePath + "/outputs/" + title
    if not os.path.exists(outputPath):
        os.makedirs(outputPath)
    # 输入音频目录
    inputAudioPath = basePath + "/inputs/" + userPath + "/source_audio/" + sourceAudioName
    # 输入视频目录
    inputVideoPath = basePath + "/inputs/" + userPath + "/source_video/" + sourceVideoName
    # 视频数据输出目录
    lipSyncedOutputPath = basePath + '/outputs/' + title + "/result.mp4"

    # wav2lip生成cmd命令行处理数据
    cmd = "F:\MyEnvsProject\Wav2Lip\python.exe {}/inference.py --checkpoint_path {}/checkpoints/wav2lip_gan.pth --face {} --audio {} --outfile {} --resize_factor 2 --fps 60  --face_det_batch_size 8 --wav2lip_batch_size 128".format(
        wav2lipFolderName, wav2lipFolderName, inputVideoPath, inputAudioPath, lipSyncedOutputPath)
    os.system(cmd)

    # 将视频中的每一帧生成图片到目录中
    inputVideoPath = outputPath + '/result.mp4'
    unProcessedFramesFolderPath = outputPath + '/frames'

    if not os.path.exists(unProcessedFramesFolderPath):
        os.makedirs(unProcessedFramesFolderPath)

    # gpu_frame = cv2.cuda_GpuMat()

    vidcap = cv2.VideoCapture(inputVideoPath)
    numberOfFrames = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))
    fps = vidcap.get(cv2.CAP_PROP_FPS)
    # print("FPS: ", fps, "Frames: ", numberOfFrames)

    for frameNumber in tqdm(range(numberOfFrames)):
        _, image = vidcap.read()
        cv2.imwrite(path.join(unProcessedFramesFolderPath, str(frameNumber).zfill(4) + '.jpg'), image)

    # 高清处理每一帧图片
    cmd = "F:\MyEnvsProject\Wav2Lip\python.exe {}/inference_gfpgan.py -i {} -o {} -v 1.3 -s 2 --only_center_face --bg_upsampler None".format(
        gfpganPath,
        unProcessedFramesFolderPath,
        outputPath)
    os.system(cmd)

    restoredFramesPath = outputPath + '/restored_imgs/'
    if not os.path.exists(restoredFramesPath):
        os.makedirs(restoredFramesPath)
    processedVideoOutputPath = outputPath

    dir_list = os.listdir(restoredFramesPath)
    dir_list.sort()

    batch = 0
    batchSize = 600

    for i in tqdm(range(0, len(dir_list), batchSize)):
        img_array = []
        start, end = i, i + batchSize
        print("processing ", start, end)
        for filename in tqdm(dir_list[start:end]):
            filename = restoredFramesPath + filename;
            img = cv2.imread(filename)
            if img is None:
                continue
            height, width, layers = img.shape
            size = (width, height)
            img_array.append(img)

        out = cv2.VideoWriter(processedVideoOutputPath + '/batch_' + str(batch).zfill(4) + '.mp4',
                              cv2.VideoWriter_fourcc(*'DIVX'), 60, size)
        batch = batch + 1

        for i in range(len(img_array)):
            out.write(img_array[i])
        out.release()

    # 最终合成视频
    concatTextFilePath = outputPath + "/concat.txt"
    concatTextFile = open(concatTextFilePath, "w", encoding='utf8')
    for ips in range(batch):
        concatTextFile.write("file batch_" + str(ips).zfill(4) + ".mp4\n")
    concatTextFile.close()

    concatedVideoOutputPath = outputPath + "/concated_output.mp4"
    cmd = "ffmpeg -y -f concat -i {} -c copy {}".format(concatTextFilePath, concatedVideoOutputPath)
    os.system(cmd)

    finalProcessedOutputVideo = processedVideoOutputPath + '/final_with_audio.mp4'
    cmd = "ffmpeg -y -i {} -i {} -map 0 -map 1:a -c:v copy -shortest {}".format(concatedVideoOutputPath, inputAudioPath,
                                                                                finalProcessedOutputVideo)
    os.system(cmd)