feat: add async audio transcription pipeline #4
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/video-ingestion"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
🤖 AI Code Review
Code Review: Async Audio Transcription Pipeline
🔴 Critical
Command Injection Vulnerability in
FFmpegAudioExtractor.cs(Lines 27-29)User-controlled
videoFilePathis directly interpolated into a shell command. An attacker could supply a filename like"; rm -rf / #"to execute arbitrary commands.Fix:
🟡 Important
Race Condition in
TranscriptionBackgroundService.cs(Lines 36-75)Multiple service instances or restarts could lead to duplicate processing. The video is fetched, updated, and processed without transactional guarantees. If the service crashes between status update and transcript storage, the video is lost.
Fix: Consider using database-level locking or optimistic concurrency:
Temporary File Leak in
FFmpegAudioExtractor.cs(Lines 27-45)Extracted audio files are never cleaned up, causing disk space exhaustion over time.
Fix:
In-Memory Queue Persistence (
InMemoryTranscriptionQueue.cs)The queue is lost on service restart, potentially losing pending transcriptions.
🔵 Optional
Missing Error Details Logging (
TranscriptionBackgroundService.cs, Line 73)Only the exception message is stored, not the full stack trace for debugging.
Fix:
Placeholder Implementation (
GenericTranscriptionService.cs, Lines 21-23)The transcription logic is a TODO placeholder without proper abstraction for real providers.
Fix: Consider strategy pattern for multiple transcription providers:
Missing Input Validation (
IAudioExtractor.cs/FFmpegAudioExtractor.cs)No validation for file size, duration, or format before processing.
✅ Positives
ITranscriptionQueue,ITranscriptionService,IAudioExtractor) and dependency injectionasync/awaitthroughout andCancellationTokensupport💡 Recommendations
Add Unit Tests for
FFmpegAudioExtractorwith edge cases (special characters in filenames, missing files)Implement Circuit Breaker for
ITranscriptionServiceto handle backend failures gracefullyAdd Metrics/Telemetry to track transcription success rate, duration, and queue depth
Consider Background Job Library (e.g., Hangfire, Quartz) instead of custom queue for better persistence and monitoring
Add Health Check for transcription service availability:
Generated by [poolside/laguna-xs.2:free] via AI Code Review Action