feat: add async audio transcription pipeline #4

Merged
twerner merged 2 commits from feature/video-ingestion into dev 2026-05-19 14:13:13 +00:00
Collaborator
No description provided.
feat: add async audio transcription pipeline with FFmpeg extraction and background processing
All checks were successful
AI Code Review / ai-review (pull_request) Successful in 31s
f04184d501
Merge branch 'dev' of https://forgejo.to-wer.de/twerner/Video2Recipe into feature/video-ingestion
All checks were successful
AI Code Review / ai-review (pull_request) Successful in 27s
e7f0899584

🤖 AI Code Review

Code Review: Async Audio Transcription Pipeline

🔴 Critical

Command Injection Vulnerability in FFmpegAudioExtractor.cs (Lines 27-29)

var args = $"-i \"{videoFilePath}\" -vn -acodec pcm_s16le -ar 16000 -ac 1 \"{outputPath}\"";

User-controlled videoFilePath is directly interpolated into a shell command. An attacker could supply a filename like "; rm -rf / #" to execute arbitrary commands.

Fix:

// Validate and sanitize the path
var fullPath = Path.GetFullPath(videoFilePath);
if (!File.Exists(fullPath))
    throw new FileNotFoundException($"Video file not found: {videoFilePath}");

// Use ProcessStartInfo with proper escaping or pass arguments as array
var startInfo = new ProcessStartInfo
{
    FileName = _ffmpegPath,
    Arguments = $"-i \"{fullPath.Replace("\"", "\\\"")}\" -vn -acodec pcm_s16le -ar 16000 -ac 1 \"{outputPath.Replace("\"", "\\\"")}\"",
    // ... rest of config
};

🟡 Important

Race Condition in TranscriptionBackgroundService.cs (Lines 36-75)
Multiple service instances or restarts could lead to duplicate processing. The video is fetched, updated, and processed without transactional guarantees. If the service crashes between status update and transcript storage, the video is lost.

Fix: Consider using database-level locking or optimistic concurrency:

// Add row version/timestamp for optimistic concurrency
public byte[] RowVersion { get; set; } // in UploadedVideo

// Use Update with concurrency check
context.UploadedVideos.Update(video);
try 
{
    await context.SaveChangesAsync(cancellationToken);
}
catch (DbUpdateConcurrencyException)
{
    // Handle concurrent modification
}

Temporary File Leak in FFmpegAudioExtractor.cs (Lines 27-45)
Extracted audio files are never cleaned up, causing disk space exhaustion over time.

Fix:

public async Task<string> ExtractAudioAsync(string videoFilePath)
{
    // ... extraction logic ...
    return outputPath;
}

// Add cleanup method or use IAsyncDisposable
public async ValueTask DisposeAsync()
{
    // Clean up old temp files
    foreach (var file in Directory.GetFiles(_tempFolder, "audio_*.wav"))
    {
        try { File.Delete(file); } catch { /* log */ }
    }
}

In-Memory Queue Persistence (InMemoryTranscriptionQueue.cs)
The queue is lost on service restart, potentially losing pending transcriptions.

🔵 Optional

Missing Error Details Logging (TranscriptionBackgroundService.cs, Line 73)
Only the exception message is stored, not the full stack trace for debugging.

Fix:

video.TranscriptionError = $"{ex.Message}\n{ex.StackTrace}";

Placeholder Implementation (GenericTranscriptionService.cs, Lines 21-23)
The transcription logic is a TODO placeholder without proper abstraction for real providers.

Fix: Consider strategy pattern for multiple transcription providers:

public interface ITranscriptionProvider
{
    Task<string> TranscribeAsync(string audioPath, CancellationToken ct = default);
}

// Register specific providers and inject IEnumerable<ITranscriptionProvider>

Missing Input Validation (IAudioExtractor.cs / FFmpegAudioExtractor.cs)
No validation for file size, duration, or format before processing.


Positives

  1. Clean Architecture: Good separation of concerns with interfaces (ITranscriptionQueue, ITranscriptionService, IAudioExtractor) and dependency injection
  2. Proper Async Patterns: Uses async/await throughout and CancellationToken support

💡 Recommendations

  1. Add Unit Tests for FFmpegAudioExtractor with edge cases (special characters in filenames, missing files)

  2. Implement Circuit Breaker for ITranscriptionService to handle backend failures gracefully

  3. Add Metrics/Telemetry to track transcription success rate, duration, and queue depth

  4. Consider Background Job Library (e.g., Hangfire, Quartz) instead of custom queue for better persistence and monitoring

  5. Add Health Check for transcription service availability:

public class TranscriptionHealthCheck : IHealthCheck
{
    private readonly ITranscriptionService _service;
    public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken ct = default)
    {
        var available = await _service.IsAvailableAsync();
        return available ? HealthCheckResult.Healthy("Transcription service available") 
                       : HealthCheckResult.Degraded("Transcription service unavailable");
    }
}

Generated by [poolside/laguna-xs.2:free] via AI Code Review Action

## 🤖 AI Code Review ## Code Review: Async Audio Transcription Pipeline ### 🔴 Critical **Command Injection Vulnerability in `FFmpegAudioExtractor.cs`** (Lines 27-29) ```csharp var args = $"-i \"{videoFilePath}\" -vn -acodec pcm_s16le -ar 16000 -ac 1 \"{outputPath}\""; ``` User-controlled `videoFilePath` is directly interpolated into a shell command. An attacker could supply a filename like `"; rm -rf / #"` to execute arbitrary commands. **Fix:** ```csharp // Validate and sanitize the path var fullPath = Path.GetFullPath(videoFilePath); if (!File.Exists(fullPath)) throw new FileNotFoundException($"Video file not found: {videoFilePath}"); // Use ProcessStartInfo with proper escaping or pass arguments as array var startInfo = new ProcessStartInfo { FileName = _ffmpegPath, Arguments = $"-i \"{fullPath.Replace("\"", "\\\"")}\" -vn -acodec pcm_s16le -ar 16000 -ac 1 \"{outputPath.Replace("\"", "\\\"")}\"", // ... rest of config }; ``` ### 🟡 Important **Race Condition in `TranscriptionBackgroundService.cs`** (Lines 36-75) Multiple service instances or restarts could lead to duplicate processing. The video is fetched, updated, and processed without transactional guarantees. If the service crashes between status update and transcript storage, the video is lost. **Fix:** Consider using database-level locking or optimistic concurrency: ```csharp // Add row version/timestamp for optimistic concurrency public byte[] RowVersion { get; set; } // in UploadedVideo // Use Update with concurrency check context.UploadedVideos.Update(video); try { await context.SaveChangesAsync(cancellationToken); } catch (DbUpdateConcurrencyException) { // Handle concurrent modification } ``` **Temporary File Leak in `FFmpegAudioExtractor.cs`** (Lines 27-45) Extracted audio files are never cleaned up, causing disk space exhaustion over time. **Fix:** ```csharp public async Task<string> ExtractAudioAsync(string videoFilePath) { // ... extraction logic ... return outputPath; } // Add cleanup method or use IAsyncDisposable public async ValueTask DisposeAsync() { // Clean up old temp files foreach (var file in Directory.GetFiles(_tempFolder, "audio_*.wav")) { try { File.Delete(file); } catch { /* log */ } } } ``` **In-Memory Queue Persistence** (`InMemoryTranscriptionQueue.cs`) The queue is lost on service restart, potentially losing pending transcriptions. ### 🔵 Optional **Missing Error Details Logging** (`TranscriptionBackgroundService.cs`, Line 73) Only the exception message is stored, not the full stack trace for debugging. **Fix:** ```csharp video.TranscriptionError = $"{ex.Message}\n{ex.StackTrace}"; ``` **Placeholder Implementation** (`GenericTranscriptionService.cs`, Lines 21-23) The transcription logic is a TODO placeholder without proper abstraction for real providers. **Fix:** Consider strategy pattern for multiple transcription providers: ```csharp public interface ITranscriptionProvider { Task<string> TranscribeAsync(string audioPath, CancellationToken ct = default); } // Register specific providers and inject IEnumerable<ITranscriptionProvider> ``` **Missing Input Validation** (`IAudioExtractor.cs` / `FFmpegAudioExtractor.cs`) No validation for file size, duration, or format before processing. --- ### ✅ Positives 1. **Clean Architecture**: Good separation of concerns with interfaces (`ITranscriptionQueue`, `ITranscriptionService`, `IAudioExtractor`) and dependency injection 2. **Proper Async Patterns**: Uses `async/await` throughout and `CancellationToken` support --- ### 💡 Recommendations 1. **Add Unit Tests** for `FFmpegAudioExtractor` with edge cases (special characters in filenames, missing files) 2. **Implement Circuit Breaker** for `ITranscriptionService` to handle backend failures gracefully 3. **Add Metrics/Telemetry** to track transcription success rate, duration, and queue depth 4. **Consider Background Job Library** (e.g., Hangfire, Quartz) instead of custom queue for better persistence and monitoring 5. **Add Health Check** for transcription service availability: ```csharp public class TranscriptionHealthCheck : IHealthCheck { private readonly ITranscriptionService _service; public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken ct = default) { var available = await _service.IsAvailableAsync(); return available ? HealthCheckResult.Healthy("Transcription service available") : HealthCheckResult.Degraded("Transcription service unavailable"); } } ``` --- *Generated by [poolside/laguna-xs.2:free] via AI Code Review Action*
twerner deleted branch feature/video-ingestion 2026-05-19 14:13:13 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
twerner/Video2Recipe!4
No description provided.