最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

php - Parse video id and start time from differently formatted youtube URL strings - Stack Overflow

programmeradmin1浏览0评论

I needed to extract the video Id and the start time from any kind of youtube url that the users can input. I have a working solution but it is not right.

Questions:

  • Could someone help me to fix the preg_match pattern to handle the urls commented in the tests?
  • Is there any other kind of youtube url?

UPDATE 2024/01/16: It has to work wit playlists too

I have checked this stackoverflow page to build my own youtube url parser.

This preg_match can extract the video Id and the start time but cannot handle the many different youtube url formats:

preg_match("/[a-zA-Z\/\/:\.]*youtu(?:be\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)(?:[&?\/]t=)?(\d*)(?:[a-zA-Z0-9\/\*\-\_\?\&\;\%\=\.]*)/i", $url, $matches);

This preg_match handles many different youtube urls (maybe all kind of?) but doesn't extract the start time:

preg_match("/^(?:http(?:s)?:\/\/)?(?:www\.)?(?:m\.)?(?:youtu\.be\/|youtube\\/(?:(?:watch)?\?(?:.*&)?v(?:i)?=|(?:embed|v|vi|user|shorts)\/))([^\?&\"'>]+)/", $url, $matches);

I have changed it and it works for me, but I know that my change is not right because I don't parse the end of the url properly:

preg_match("/^(?:http(?:s)?:\/\/)?(?:www\.)?(?:m\.)?(?:youtu\.be\/|youtube\\/(?:(?:watch)?\?(?:.*&)?v(?:i)?=|(?:embed|v|vi|user|shorts)\/))([^\?&\"'>]+)(?:[&?\/]t=)?(\d*)/", $url, $matches);

The code

<?php
declare(strict_types=1);

namespace AppBundle\Value;

class YoutubeVideoData
{
    private function __construct(public ?string $videoId = null, public ?int $time = null)
    {
    }

    public static function fromUrl(string $url): self
    {
        // `#action=share` is not supported
        preg_match("/^(?:http(?:s)?:\/\/)?(?:www\.)?(?:m\.)?(?:youtu\.be\/|youtube\\/(?:(?:watch)?\?(?:.*&)?v(?:i)?=|(?:embed|v|vi|user|shorts)\/))([^\?&\"'>]+)(?:[&?\/]t=)?(\d*)/", $url, $matches);

        $videoId = null;
        if (isset($matches[1])) {
            $videoId = $matches[1];
        }

        $time = null;
        if (isset($matches[2]) && $matches[2] !== "") {
            $time = (int) $matches[2];
        }

        return new self($videoId, $time);
    }

}

The tests:

<?php

namespace Justimmo\Tests\Value;

use AppBundle\Value\YoutubeVideoData;
use PHPUnit\Framework\Attributes\DataProvider;
use PHPUnit\Framework\TestCase;

/**
 * @covers \AppBundle\Value\YoutubeVideoData::class
 */
class YoutubeVideoDataTest extends TestCase
{
    #[DataProvider('urlProvider')]
    public function testUrls(string $url, ?string $expectedVideoId, ?int $expectedTime)
    {
        $videoData = YoutubeVideoData::fromUrl($url);

        $this->assertSame($expectedVideoId, $videoData->videoId);
        $this->assertSame($expectedTime, $videoData->time);
    }

    public static function urlProvider(): iterable
    {
        // vimeo
        yield 'vimeo' => ['', null, null];

        // playlist
        yield 'youtube_link_pl1' => [';list=PLiIQbaWYR99iZFpLIJ5SImA2y8DaDTv9G&index=21', 'YjdIF7PuUug', null];
        yield 'youtube_link_pl2' => [';v=YjdIF7PuUug&index=21', 'YjdIF7PuUug', null];
        yield 'youtube_link_pl3' => ['', 'YjdIF7PuUug', null];

        // without https://www
        yield 'youtube_link_1' => ['youtube/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_2' => ['youtube/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_3' => ['youtube/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_4' => ['youtube/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_5' => ['youtube/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_6' => ['youtube/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_7' => ['youtube/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_8' => ['youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_9' => ['youtube/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_10' => ['youtube/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_11' => ['m.youtube/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        // without https://
        yield 'youtube_link_12' => ['www.youtube/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_13' => ['www.youtube/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_14' => ['www.youtube/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_15' => ['www.youtube/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_16' => ['www.youtube/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_17' => ['www.youtube/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_18' => ['www.youtube/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_19' => ['www.youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_20' => ['www.youtube/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_21' => ['www.youtube/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        // http
        yield 'youtube_link_22' => ['', 'dE5jPNvLvOk', null];

        yield 'youtube_link_23' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_24' => ['', 'dE5jPNvLvOk', null];

        yield 'youtube_link_25' => ['/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_26' => ['/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_27' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_28' => ['', 'dE5jPNvLvOk', null];

        yield 'youtube_link_29' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_30' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_31' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_32' => ['', 'dE5jPNvLvOk', null];

        // https
        yield 'youtube_link_33' => ['', 'dE5jPNvLvOk', null];

        yield 'youtube_link_34' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_35' => ['', 'dE5jPNvLvOk', null];

        yield 'youtube_link_36' => ['/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_37' => ['/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_38' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_39' => ['', 'dE5jPNvLvOk', null];

        yield 'youtube_link_40' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_41' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_42' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_43' => ['', 'dE5jPNvLvOk', null];

        // with start time
        yield 'youtube_link_44' => ['', 'dE5jPNvLvOk', 30];

        yield 'youtube_link_45' => ['', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_46' => ['', 'dE5jPNvLvOk', 30];

        yield 'youtube_link_47' => ['/?v=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_48' => ['/?vi=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];

        yield 'youtube_link_49' => [';t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_50' => [';t=30', 'dE5jPNvLvOk', 30];

        yield 'youtube_link_51' => ['', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_52' => ['', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_53' => ['', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_54' => [';t=30', 'dE5jPNvLvOk', 30];

        // with feature
        yield 'youtube_link_55' => [';v=7HCZvhRAk-M&feature=related', '7HCZvhRAk-M', null];

        yield 'youtube_link_56' => ['', 'dE5jPNvLvOk', null];

        yield 'youtube_link_57' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_58' => ['', 'dE5jPNvLvOk', null];

        yield 'youtube_link_59' => ['/?v=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_60' => ['/?vi=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];

        yield 'youtube_link_61' => [';feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_62' => [';feature=youtube_gdata_player', 'dE5jPNvLvOk', null];

        yield 'youtube_link_63' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_64' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_65' => ['', 'dE5jPNvLvOk', null];
        yield 'youtube_link_66' => [';feature=youtube_gdata_player', 'dE5jPNvLvOk', null];

        // with #action=share
        yield 'youtube_link_67' => [' ', 'dE5jPNvLvOk', null];

        yield 'youtube_link_68' => [' ', 'dE5jPNvLvOk', null];
        yield 'youtube_link_69' => [' ', 'dE5jPNvLvOk', null];

        yield 'youtube_link_70' => ['/?v=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
        yield 'youtube_link_71' => ['/?vi=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];

        yield 'youtube_link_72' => ['#action=share ', 'dE5jPNvLvOk', null];
        yield 'youtube_link_73' => ['#action=share ', 'dE5jPNvLvOk', null];

        yield 'youtube_link_74' => [' ', 'dE5jPNvLvOk', null];
        yield 'youtube_link_75' => [' ', 'dE5jPNvLvOk', null];
        yield 'youtube_link_76' => [' ', 'dE5jPNvLvOk', null];
        yield 'youtube_link_77' => ['#action=share ', 'dE5jPNvLvOk', null];
    }

}

Working and PHPStan safe code - 2025/07/21

Based on mickmackusa and Rob Eyre's code.

<?php
declare(strict_types=1);

namespace AppBundle\Value;

class YoutubeVideoData
{
    private function __construct(public ?string $videoId = null, public ?int $time = null)
    {
    }

    public static function fromUrl(string $url): self
    {
        if (!str_starts_with($url, 'http')) {
            $url = 'https://' . $url;
        }

        $urlParts    = parse_url($url);
        $queryParams = [];

        if (!isset($urlParts['host'])) {
            return new self(null, null);
        }

        if (!in_array($urlParts['host'], ['www.youtube', 'youtube', 'www.youtu.be', 'youtu.be', 'm.youtube'])) {
            return new self(null, null);
        }

        if (isset($urlParts['query'])) {
            parse_str($urlParts['query'], $queryParams);
        }

        if (isset($queryParams['vi']) && is_string($queryParams['vi'])) {
            $videoId = $queryParams['vi'];

        } elseif (isset($queryParams['v']) && is_string($queryParams['v'])) {
            $videoId = $queryParams['v'];

        } elseif (isset($urlParts['path'])) {
            $videoId = basename($urlParts['path']);

        } else {
            $videoId = null;
        }

        $time = isset($queryParams['t']) ? (int) $queryParams['t'] : null;

        return new self($videoId, $time);
    }

}

I needed to extract the video Id and the start time from any kind of youtube url that the users can input. I have a working solution but it is not right.

Questions:

  • Could someone help me to fix the preg_match pattern to handle the urls commented in the tests?
  • Is there any other kind of youtube url?

UPDATE 2024/01/16: It has to work wit playlists too

I have checked this stackoverflow page to build my own youtube url parser.

This preg_match can extract the video Id and the start time but cannot handle the many different youtube url formats:

preg_match("/[a-zA-Z\/\/:\.]*youtu(?:be\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)(?:[&?\/]t=)?(\d*)(?:[a-zA-Z0-9\/\*\-\_\?\&\;\%\=\.]*)/i", $url, $matches);

This preg_match handles many different youtube urls (maybe all kind of?) but doesn't extract the start time:

preg_match("/^(?:http(?:s)?:\/\/)?(?:www\.)?(?:m\.)?(?:youtu\.be\/|youtube\\/(?:(?:watch)?\?(?:.*&)?v(?:i)?=|(?:embed|v|vi|user|shorts)\/))([^\?&\"'>]+)/", $url, $matches);

I have changed it and it works for me, but I know that my change is not right because I don't parse the end of the url properly:

preg_match("/^(?:http(?:s)?:\/\/)?(?:www\.)?(?:m\.)?(?:youtu\.be\/|youtube\\/(?:(?:watch)?\?(?:.*&)?v(?:i)?=|(?:embed|v|vi|user|shorts)\/))([^\?&\"'>]+)(?:[&?\/]t=)?(\d*)/", $url, $matches);

The code

<?php
declare(strict_types=1);

namespace AppBundle\Value;

class YoutubeVideoData
{
    private function __construct(public ?string $videoId = null, public ?int $time = null)
    {
    }

    public static function fromUrl(string $url): self
    {
        // `#action=share` is not supported
        preg_match("/^(?:http(?:s)?:\/\/)?(?:www\.)?(?:m\.)?(?:youtu\.be\/|youtube\\/(?:(?:watch)?\?(?:.*&)?v(?:i)?=|(?:embed|v|vi|user|shorts)\/))([^\?&\"'>]+)(?:[&?\/]t=)?(\d*)/", $url, $matches);

        $videoId = null;
        if (isset($matches[1])) {
            $videoId = $matches[1];
        }

        $time = null;
        if (isset($matches[2]) && $matches[2] !== "") {
            $time = (int) $matches[2];
        }

        return new self($videoId, $time);
    }

}

The tests:

<?php

namespace Justimmo\Tests\Value;

use AppBundle\Value\YoutubeVideoData;
use PHPUnit\Framework\Attributes\DataProvider;
use PHPUnit\Framework\TestCase;

/**
 * @covers \AppBundle\Value\YoutubeVideoData::class
 */
class YoutubeVideoDataTest extends TestCase
{
    #[DataProvider('urlProvider')]
    public function testUrls(string $url, ?string $expectedVideoId, ?int $expectedTime)
    {
        $videoData = YoutubeVideoData::fromUrl($url);

        $this->assertSame($expectedVideoId, $videoData->videoId);
        $this->assertSame($expectedTime, $videoData->time);
    }

    public static function urlProvider(): iterable
    {
        // vimeo
        yield 'vimeo' => ['https://vimeo/1016625668', null, null];

        // playlist
        yield 'youtube_link_pl1' => ['https://www.youtube/watch?v=YjdIF7PuUug&list=PLiIQbaWYR99iZFpLIJ5SImA2y8DaDTv9G&index=21', 'YjdIF7PuUug', null];
        yield 'youtube_link_pl2' => ['https://www.youtube/watch?list=PLiIQbaWYR99iZFpLIJ5SImA2y8DaDTv9G&v=YjdIF7PuUug&index=21', 'YjdIF7PuUug', null];
        yield 'youtube_link_pl3' => ['https://youtu.be/YjdIF7PuUug?list=PLiIQbaWYR99iZFpLIJ5SImA2y8DaDTv9G', 'YjdIF7PuUug', null];

        // without https://www
        yield 'youtube_link_1' => ['youtube/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_2' => ['youtube/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_3' => ['youtube/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_4' => ['youtube/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_5' => ['youtube/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_6' => ['youtube/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_7' => ['youtube/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_8' => ['youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_9' => ['youtube/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_10' => ['youtube/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_11' => ['m.youtube/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        // without https://
        yield 'youtube_link_12' => ['www.youtube/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_13' => ['www.youtube/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_14' => ['www.youtube/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_15' => ['www.youtube/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_16' => ['www.youtube/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_17' => ['www.youtube/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_18' => ['www.youtube/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_19' => ['www.youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_20' => ['www.youtube/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_21' => ['www.youtube/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        // http
        yield 'youtube_link_22' => ['http://youtube/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_23' => ['http://youtube/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_24' => ['http://youtube/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_25' => ['http://www.youtube/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_26' => ['http://www.youtube/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_27' => ['http://www.youtube/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_28' => ['http://www.youtube/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_29' => ['http://www.youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_30' => ['http://youtube/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_31' => ['http://www.youtube/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_32' => ['http://m.youtube/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        // https
        yield 'youtube_link_33' => ['https://youtube/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_34' => ['https://youtube/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_35' => ['https://youtube/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_36' => ['https://www.youtube/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_37' => ['https://www.youtube/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_38' => ['https://www.youtube/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_39' => ['https://www.youtube/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_40' => ['https://www.youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_41' => ['https://youtube/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_42' => ['https://www.youtube/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_43' => ['https://m.youtube/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        // with start time
        yield 'youtube_link_44' => ['https://youtube/v/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];

        yield 'youtube_link_45' => ['https://youtube/v/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_46' => ['https://youtube/vi/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];

        yield 'youtube_link_47' => ['https://www.youtube/?v=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_48' => ['https://www.youtube/?vi=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];

        yield 'youtube_link_49' => ['https://www.youtube/watch?v=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_50' => ['https://www.youtube/watch?vi=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];

        yield 'youtube_link_51' => ['https://www.youtu.be/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_52' => ['https://youtube/embed/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_53' => ['https://www.youtube/shorts/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_54' => ['https://m.youtube/watch?v=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];

        // with feature
        yield 'youtube_link_55' => ['https://www.youtube/watch?dev=inprogress&v=7HCZvhRAk-M&feature=related', '7HCZvhRAk-M', null];

        yield 'youtube_link_56' => ['https://youtube/v/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];

        yield 'youtube_link_57' => ['https://youtube/v/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_58' => ['https://youtube/vi/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];

        yield 'youtube_link_59' => ['https://www.youtube/?v=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_60' => ['https://www.youtube/?vi=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];

        yield 'youtube_link_61' => ['https://www.youtube/watch?v=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_62' => ['https://www.youtube/watch?vi=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];

        yield 'youtube_link_63' => ['https://www.youtu.be/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_64' => ['https://youtube/embed/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_65' => ['https://www.youtube/shorts/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_66' => ['https://m.youtube/watch?v=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];

        // with #action=share
        yield 'youtube_link_67' => ['https://youtube/v/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];

        yield 'youtube_link_68' => ['https://youtube/v/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
        yield 'youtube_link_69' => ['https://youtube/vi/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];

        yield 'youtube_link_70' => ['https://www.youtube/?v=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
        yield 'youtube_link_71' => ['https://www.youtube/?vi=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];

        yield 'youtube_link_72' => ['https://www.youtube/watch?v=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
        yield 'youtube_link_73' => ['https://www.youtube/watch?vi=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];

        yield 'youtube_link_74' => ['https://www.youtu.be/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
        yield 'youtube_link_75' => ['https://youtube/embed/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
        yield 'youtube_link_76' => ['https://www.youtube/shorts/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
        yield 'youtube_link_77' => ['https://m.youtube/watch?v=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
    }

}

Working and PHPStan safe code - 2025/07/21

Based on mickmackusa and Rob Eyre's code.

<?php
declare(strict_types=1);

namespace AppBundle\Value;

class YoutubeVideoData
{
    private function __construct(public ?string $videoId = null, public ?int $time = null)
    {
    }

    public static function fromUrl(string $url): self
    {
        if (!str_starts_with($url, 'http')) {
            $url = 'https://' . $url;
        }

        $urlParts    = parse_url($url);
        $queryParams = [];

        if (!isset($urlParts['host'])) {
            return new self(null, null);
        }

        if (!in_array($urlParts['host'], ['www.youtube', 'youtube', 'www.youtu.be', 'youtu.be', 'm.youtube'])) {
            return new self(null, null);
        }

        if (isset($urlParts['query'])) {
            parse_str($urlParts['query'], $queryParams);
        }

        if (isset($queryParams['vi']) && is_string($queryParams['vi'])) {
            $videoId = $queryParams['vi'];

        } elseif (isset($queryParams['v']) && is_string($queryParams['v'])) {
            $videoId = $queryParams['v'];

        } elseif (isset($urlParts['path'])) {
            $videoId = basename($urlParts['path']);

        } else {
            $videoId = null;
        }

        $time = isset($queryParams['t']) ? (int) $queryParams['t'] : null;

        return new self($videoId, $time);
    }

}
Share Improve this question edited Jan 21 at 17:19 Zoltán Süle asked Jan 15 at 14:37 Zoltán SüleZoltán Süle 1,70220 silver badges32 bronze badges 3
  • 5 Do you really need to use regex for this? It would be much easier to use parse_url() php/parse_url – Rob Eyre Commented Jan 15 at 15:22
  • @Zol if you want to share your implemented solution, you must post it as an "answer". Resolving advice and solutions never belong in the question. Please rollback your edit and post an answer if you would like to share your implementation. – mickmackusa Commented Jan 21 at 21:32
  • @mickmackusa my solution is based on yours, so it is okay. – Zoltán Süle Commented Jan 22 at 8:37
Add a comment  | 

4 Answers 4

Reset to default 5

Instead of using a regex, you could make use of PHP's parse_url and parse_str methods.

$urlParts = parse_url($url);
$queryParams = [];
parse_str($urlParts['query'], $queryParams);

$videoId =
    $queryParams['vi'] ?: (
        $queryParams['v'] ?: (
            basename($urlParts['path'])
        )
    );

$time =
    isset($queryParams['t']) ? (int) $queryParams['t'] :
    null;

I haven't tried it against all your test cases, but it seems to be a more robust approach.

You could even combine this with KIKO Software's answer (https://stackoverflow/a/79358821/20418616) if you want to have more confidence in the resulting $videoId value.

My opinion is that your codebase will smell of "updoc" if you use regex or other string surgery tools to parse something that PHP already has a native parser for.

Also, your task seems to be more about text extraction than text validation, so I'll assume that we are always working with legitimate youtube formatted strings.

Parse the url, then parse the querystring if it exists. Then null coalesce while you attempt to extract the desired values from the generated arrays. This is going to save you LOOOOOOADS of headaches versus maintaining a cumbersome regex. Demo

foreach (urlProvider() as $data) {
    unset($params);  // prevent previous iteration data bleeding into current iteration
    $components = parse_url($data[0]);
    if (isset($components['query'])) {
        parse_str($components['query'], $params);
    }
    var_export(
        [
           'id' => $params['vi'] ?? $params['v'] ?? basename($components['path'] ?? '') ?: null,
            't' => $params['t'] ?? null,
        ]
    );
}

This approach appears to work well for all of your provided test cases.

Now that I am reviewing other answers, my answer seems to be a more solid version of Rob's answer (their answer doesn't sufficiently leverage null coalescing).

The below regex clears 80 links (updated to clear the 3 playlist links).*

(https?:\/\/)?(((www\.)?youtu\.be\/)|(www\.|m\.)?youtube\\/((embed\/|shorts\/)|(\?|(watch\?(dev=inprogress&|list=[0-9a-zA-Z_-]+&)?)?)?vi?(\/|=)))(([0-9a-zA-Z_-])+)([$=#&?\/'"\s])(t=([0-9]+(:[0-9]{2})*)*)?

(NOTE: Above updated to include playlist 19JAN2025. 2nd update to clean up mistakes)

It will capture the <video-id> and <start-time>. Start time can be an integer, or an integer followed by a colon followed by a 2-digit-integer followed by colon followed by 2-digit integer..., e.g. t=2055577:33:44:00


In PHP**, you can id/capture the group pattern, in this case the and patterns, by adding ?<video-id> and ?<start-time> immediately after the opening parenthesis "(" of the capturing parenthesis like this(*): (?<video-id>([0-9a-zA-Z_-])+) and (?<start-time>t=([0-9]+(:[0-9]{2})*)*)?

Here is the update regex for PHP with the <video-id> and <start-time> group identifiers:

(https?:\/\/)?(((www\.)?youtu\.be\/)|(www\.|m\.)?youtube\\/((embed\/|shorts\/)|(\?|(watch\?(dev=inprogress&|list=[0-9a-zA-Z_-]+&)?)?)?vi?(\/|=)))(?<video-id>([0-9a-zA-Z_-])+)([$=#&?\/'"\s])(?<start-time>t=([0-9]+(:[0-9]{2})*)*)?

(NOTE: Above updated to include playlist 19JAN2025.)

LINK TO REGEX DEMO: https://regex101/r/fl1IPw/1

** Regex Capturing Groups for PHP at * https://www.phptutorial/php-tutorial/regex-capturing-groups/


Visual regex with the IDs so that we can see the logic easier with the <video-id> and <start-time> (***):

(https?:\/\/)?
(
    (
        (www\.)?
        youtu\.be\/
    )
    |
    (www\.|m\.)?
    youtube\\/
        (
            (
                embed\/
                |
                shorts\/
            )
            |
            (
                \?
                |
                (watch\?
                    (
                        dev=inprogress&
                        |
                        list=[0-9a-zA-Z_-]+&
                    )?
                )?
            )?
            vi?
            (
                \/
                |
                =
            )
        )
    )
    (?<video-id>
        ([0-9a-zA-Z_-])+
    )
    ([$=#&?\/'"\s])
    (?<start-time>
        t=
        (
            [0-9]+
            (
                :[0-9]{2}
            )*
        )*
    )?

*** Added 20 JAN 2025:

I don't really like long and complex regular expressions. They are difficult to understand and too much can go wrong. Why not a slightly different approach?

Suppose we take an youTube URL like this:

https://www.youtube/watch?v=Og40mpl8VNc

The video id is Og40mpl8VNc. This is a base64 encoded number. It's the only thing in the URL that is base64 encoded. All the other parts, like https, www, youtube, com, etc, are not valid base64 encoded strings. Perhaps we can use this?

A simple way to check if something is a valid base64 string is to decode and re-encode it. That shouldn't change the string. Only Og40mpl8VNc can be decoded and re-encoded without changing it.

We can apply this check to all parts of all the your Youtube video URLs, using your urlProvider() method:

$youtubeId = [];
foreach (urlProvider() as $data) {
    foreach (array_reverse(preg_split('/[\/?=#&]/', $data[0])) as $part) {
       if (trim(base64_encode(base64_decode($part, true)) , '=') == $part) {
          $youtubeId[$data[0]] = $part; 
          break;
       }
    }
}

Live demo: https://3v4l./KKuRl

I split the URL on these five characters: /?=#&, to get all the parts. Then reverse those, because the video id is often at the end, and then walk all the parts looking for a valid base64 string.

Note that I also remove any base64 = padding.

Now admittedly, I haven't thoroughly tested this. It's just an idea. There is obviously a very tiny risk of false positives, but I hope this risk is negligible.

Now I didn't do the time, because I think you can easily get at that since it is always a parameter. I also didn't check whether the URL, as a whole, is valid, but I don't think that's really what you want to know.

发布评论

评论列表(0)

  1. 暂无评论