#JustShipIt: Back to the (excali)drawing board 必

#JustShipIt: Back to the (excali)drawing board 必
Photo by Kelly Sikkema / Unsplash

In my first JustShipIt post I described my very first project idea, was a real time transcription app for zoom/google meet meetings. Specifically for transcribing meetings in Spanish. The idea came from when I used to have weekly Spanish classes on Google Meet and I wanted a tool to record notes from my classes so I could review it later.

I abandoned this transcription idea then because the transcription tools I tried weren't incredibly accurate for non-English audio or too expensive for a side project.

However last week openai open sourced whisper, an automatic speech recognition system that works pretty well with Spanish and other non-English speech. And since the release of whisper I have been digging around with it and I am pretty geeked at the different capabilities.

Since the release of whisper, I spent majority of my days hacking and playing around with. I am a language geek and I have been fascinated by audio AI tools for a while. I have also been complaining like forever about not seeing a lot of audio AI advancements.

Note: I know there's like lots of like academic research and advancements but I wanted cool things to play around with, like DALL-E and GPT-3 but for audio.

Since whisper came out I was very tempted to scrap my podcuts project and just focus on the meeting transcription project using whisper ai. However I don't want to loose all the progress I have made working with web assembly and ffmpeg.

So instead I tried to find a middle ground and decided to integrate whisper into my podcuts project idea and changed my MVP to incorporate it. The initial idea for podcuts was a website that cuts long youtube podcasts into short form videos so they can be posted on tiktok.

I still wanted to keep the editing media content as part of the core focus but just shift it from editing video content to audio content via text generated by whisper. The good thing is that the tech stack still remains relatively the same and I get to play around with audio tools which I've been wanting to do for a while. And if the project progresses well, I can always integrate the video part later.

Following this decision to change the focus of my project from video to audio my life became so much easier. Working with video was a frustrating pain in the butt especially on the frontend and my progress was very slow.

Most of what I have shipped are mini tools to test different ideas with my new project scope and help reshape my mvp.

What was shipped over "this sprint"

  1. A website that converts a video to gif ( this was before my project change and was for some web assembly-ffmpeg practice)
  2. A dictophone
  3. A website that takes a youtube url and returns a transcript of the video

My new MVP:

A website where you can record a simple audio file or upload an audio file and the audio gets transcribed and you can edit snippets from the audio file using the transcribed text.

Subscribe to ThisIsRahmat

Sign up now to get access to the library of members-only issues.
Jamie Larson