最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

macos - Is there anyway to combine .audio and .microphone data to single pcm in swift - Stack Overflow

programmeradmin4浏览0评论

I'm new on Swift and Native concept in general. I was working on an application that purpose to capture application and microphone audio -> format to pcm16 -> send to other service to process. But having problem while combining application and microphone aduio, if you have experient in this, any advice would help

Here the code that processing audio data

import AVFoundation
import ScreenCaptureKit

class AudioStreamOutputHandler: NSObject, SCStreamOutput {
    private let audioEngine = AVAudioEngine()

    let deepgramService: DeepgramService

    init(deepgramService: DeepgramService) {
        self.deepgramService = deepgramService
        super.init()
    }
    
    nonisolated func stream(_ stream: SCStream, didStopWithError error: Error) {
        print("SCStream stopped with error: \(error.localizedDescription)")
    }


    // Called when a sample buffer is received.
    nonisolated func stream(
        _ stream: SCStream,
        didOutputSampleBuffer sampleBuffer: CMSampleBuffer,
        of type: SCStreamOutputType
    ) {
        
        
        switch type {
        case .audio:
            let pcmBuffer: AVAudioPCMBuffer? = createPCMBuffer(from: sampleBuffer)

            guard let pcmBuffer else { return }

            if let convertedPCMData: Data = convertTo16BitPCM(from: pcmBuffer) {
                deepgramService.sendAudioData(convertedPCMData)
            }
            break;
        case .microphone:
            break
            
        default:
            break;

        }
    
        
    }

    private func createPCMBuffer(from sampleBuffer: CMSampleBuffer) -> AVAudioPCMBuffer? {
        guard CMSampleBufferIsValid(sampleBuffer),
            let formatDescription = sampleBuffer.formatDescription,
            let absd = formatDescription.audioStreamBasicDescription
        else {
            NSLog("Invalid CMSampleBuffer or missing format description.")
            return nil
        }

        var audioBufferListCopy: AudioBufferList?
        do {
            try sampleBuffer.withAudioBufferList { audioBufferList, _ in
                audioBufferListCopy = audioBufferList.unsafePointer.pointee
            }
        } catch {
            NSLog("Error accessing AudioBufferList: \(error.localizedDescription)")
            return nil
        }

        guard
            let format = AVAudioFormat(
                standardFormatWithSampleRate: absd.mSampleRate,
                channels: AVAudioChannelCount(absd.mChannelsPerFrame)
            )
        else {
            NSLog("Failed to create AVAudioFormat.")
            return nil
        }

        return AVAudioPCMBuffer(
            pcmFormat: format,
            bufferListNoCopy: &audioBufferListCopy!
        )
    }

    private func convertTo16BitPCM(from buffer: AVAudioPCMBuffer) -> Data? {
        guard let floatChannelData = buffer.floatChannelData else {
            NSLog("Failed to get floatChannelData.")
            return nil
        }

        let frameLength = Int(buffer.frameLength)
        let channelCount = Int(buffer.format.channelCount)
        var pcmData = Data(capacity: frameLength * channelCount * MemoryLayout<Int16>.size)

        for channel in 0..<channelCount {
            let channelData = floatChannelData[channel]
            for sampleIndex in 0..<frameLength {
                let intSample = Int16(
                    max(-1.0, min(1.0, channelData[sampleIndex])) * Float(Int16.max)
                )
                pcmData.append(contentsOf: withUnsafeBytes(of: intSample.littleEndian) { Data($0) })
            }
        }

        return pcmData
    }


}

I'm new on Swift and Native concept in general. I was working on an application that purpose to capture application and microphone audio -> format to pcm16 -> send to other service to process. But having problem while combining application and microphone aduio, if you have experient in this, any advice would help

Here the code that processing audio data

import AVFoundation
import ScreenCaptureKit

class AudioStreamOutputHandler: NSObject, SCStreamOutput {
    private let audioEngine = AVAudioEngine()

    let deepgramService: DeepgramService

    init(deepgramService: DeepgramService) {
        self.deepgramService = deepgramService
        super.init()
    }
    
    nonisolated func stream(_ stream: SCStream, didStopWithError error: Error) {
        print("SCStream stopped with error: \(error.localizedDescription)")
    }


    // Called when a sample buffer is received.
    nonisolated func stream(
        _ stream: SCStream,
        didOutputSampleBuffer sampleBuffer: CMSampleBuffer,
        of type: SCStreamOutputType
    ) {
        
        
        switch type {
        case .audio:
            let pcmBuffer: AVAudioPCMBuffer? = createPCMBuffer(from: sampleBuffer)

            guard let pcmBuffer else { return }

            if let convertedPCMData: Data = convertTo16BitPCM(from: pcmBuffer) {
                deepgramService.sendAudioData(convertedPCMData)
            }
            break;
        case .microphone:
            break
            
        default:
            break;

        }
    
        
    }

    private func createPCMBuffer(from sampleBuffer: CMSampleBuffer) -> AVAudioPCMBuffer? {
        guard CMSampleBufferIsValid(sampleBuffer),
            let formatDescription = sampleBuffer.formatDescription,
            let absd = formatDescription.audioStreamBasicDescription
        else {
            NSLog("Invalid CMSampleBuffer or missing format description.")
            return nil
        }

        var audioBufferListCopy: AudioBufferList?
        do {
            try sampleBuffer.withAudioBufferList { audioBufferList, _ in
                audioBufferListCopy = audioBufferList.unsafePointer.pointee
            }
        } catch {
            NSLog("Error accessing AudioBufferList: \(error.localizedDescription)")
            return nil
        }

        guard
            let format = AVAudioFormat(
                standardFormatWithSampleRate: absd.mSampleRate,
                channels: AVAudioChannelCount(absd.mChannelsPerFrame)
            )
        else {
            NSLog("Failed to create AVAudioFormat.")
            return nil
        }

        return AVAudioPCMBuffer(
            pcmFormat: format,
            bufferListNoCopy: &audioBufferListCopy!
        )
    }

    private func convertTo16BitPCM(from buffer: AVAudioPCMBuffer) -> Data? {
        guard let floatChannelData = buffer.floatChannelData else {
            NSLog("Failed to get floatChannelData.")
            return nil
        }

        let frameLength = Int(buffer.frameLength)
        let channelCount = Int(buffer.format.channelCount)
        var pcmData = Data(capacity: frameLength * channelCount * MemoryLayout<Int16>.size)

        for channel in 0..<channelCount {
            let channelData = floatChannelData[channel]
            for sampleIndex in 0..<frameLength {
                let intSample = Int16(
                    max(-1.0, min(1.0, channelData[sampleIndex])) * Float(Int16.max)
                )
                pcmData.append(contentsOf: withUnsafeBytes(of: intSample.littleEndian) { Data($0) })
            }
        }

        return pcmData
    }


}

Share Improve this question asked Feb 16 at 21:38 Garfield is on board.Garfield is on board. 632 silver badges4 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

Yes! You can combine .audio and .microphone into a single PCM stream in swift. You simply mix the two streams together by summing their samples.

With some caveats:

  1. the sample timestamps may not line up
  2. the stream formats may not match
  3. the stream sample rates may not match
  4. the sample channel counts may not match
  5. the .microphone stream often contains a delayed copy of .audio

That's a lot of caveats! Here are some possibilities for dealing with them:

  1. you could do your best to line the sample timestamps up, or you could make the simplifying assumption that the most recently arrived CMSampleBuffers are "close enough" to mix together, regardless of their timestamps.

  2. you've already got code for converting float to integer samples for .audio, so you can do the same for .microphone. This is slightly odd because I thought ScreenCaptureKit .audio was often integer and .microphone was float. In any case, you can do something similar for .microphone. If you're mixing the samples as integers, take care to avoid overflow.

  3. to match sample rates most people reach for an AVAudioConverter, and this can also cover points 2. and 4. Note that ScreenCaptureKit uses CMSampleBuffers instead of AVAudioPCMBuffers, so you'll need to convert. This is per stream, so you should convert to mono to allow mixing, although keep in mind that AVAudioConverter's idea of converting stereo to mono is to discard the right channel . This is probably a fine simplifying assumption.

  4. see point 3.

  5. When mixing microphone and system audio, if the microphone can hear the speakers then you get a second copy of the system audio in the result. This is called an echo. As far as I know, macOS doesn't have a general echo canceller available to 3rd party apps, so I use a chunk of the webrtc code.

One slight simplification could be not mixing the two streams by using Deepgram's multichannel feature, putting a mono version of .audio in the left channel and a mono version of .microphone in the right channel. This plus diarization may be useful to you. It's only a slight simplification because you still need to (maybe) synchronize, rate and format convert and deal with echoes.

Good luck!

发布评论

评论列表(0)

  1. 暂无评论