One More Descript Thing

By: cogdog — June 5^th 2023 at 15:43

People still read blogs. Well, maybe a few of them. I was happy to see others get intrigued and interested in my sharing of the ways Descript had really revolutionized my way of creating podcast audio.

Changing Up, “Decripting” My Podcast Methods, Eh, Ai? Eh?

More than likes and reposts there’s not much more positive effect when you can capture Jon Udell’s interest as it happened in Mastodon and as he shared (aka blogged) about an IT Conversations episode he re-published.

And as it often happens, Jon’s example showed me a portion of a software that I was unaware of. This was as I remember one of the most evident aspects I found in the 1990s when I started using this software called PhotoShop- each little bit I learned made me realize how little of it’s total potential I did not know, like it was infinite software.

You see, I made use of Descript to much more efficiently edit my OEG Voices podcasts – but my flow was exporting audio and posting to my WordPress powered site. Jon’s post pointed to an interesting aspect when audio was published to a Descript.com sharable link.

Start with my most recent episode, published to our site, with audio embedded, a link to the transcript Descript creates.

OEG Voices 057: Sarah Kresh on her OE Award in Emerging Leadership

If you access the episode via the shared link to Descript, when you click the play button in the lower left, the transcript highlights each word, in a kind of read along fashion. That’s nifty, because you might want to stop to perhaps copy a sentence, or look something up.

Descript audio playback where the transcript shows the text of the audio being played back.

Even more interestingly, you can highlight a portion of text, use a contextual menu, and provide a direct link to that portion of audio. Woah. Try this link to hear/read Sarah’s intro from the screenshot above.

Yes, Descript provides addressable links to portions of audio (note, I have found that Descript is not jumping down to the location, maybe that’s my set up, I did post a request in their Discord bug report).

But wait there’s more. You can also add comments (perhaps annotation style) to portions of he transcript/audio.

You do have to create an account to comment, so you might not appreciate that. It looks like it’s more aimed at comments for production notes, but why cannot it be more annotation like?

Anyhow, this was nifty to discover, and I would not have known this, had not Jon shared his own efforts with a link.

This is how the web works, well my web works this way. And refreshing to explore some technology and not with the din of AI doomsday or salvation day reverb (although there is a use of AI for Descript in transcription, but it’s at a functional use level, not a shove it your face level).

I am confident as always there is more here that I do not know with Descript than what I do know (I need learn the Over Dub tool).

Featured Image: There’s always that one thing…

Curly's Law — Curly’s Law flickr photo by cogdogblog shared under a Creative Commons (BY) license

CogDogBlog

Getting a Fill of Generative Fill

By: cogdog — June 2^nd 2023 at 15:53

While there is plenty of academics undergarment wadding over AI generative text (and please stop referring to it all as ChatGPT), I was first interested and still in the generation of images (a year ago Craiyon was the leading edge, now it looks like a sad nub of burnt sienna).

Get ready for everything to get upturned with Adobe Photoshop’s Generative Fill, now in beta. I spotted it and some jaw dropping examples in PetaPixel’s Photoshop’s New ‘Generative Fill’ Uses AI to Expand or Change Photos but was drawn in more by a followup post on So, Who Owns a Photo Expanded by Adobe Generative Fill? This gets into even more muddy, messy, and also interesting (time curse like?) waters.

That latter article has some really fabulous pieces of ?? Extended Album Covers found originally in the twitter stream of Alexander Dobrokotov. I’d post the tweets here for you to see, but twitter broke the capabilty to embed tweets.

The concept is rather DS106-ish a central image of an actual album cover is embedded into a much larger imagined scene (see the Petapixel post for the examples) where all the imagery around was created with this new Adobe Photoshop beta feature.

I have seen this many times with AI, you see these jaw dropping examples that imply someone just typed a phrase in a box, clicked the magic bean button, and it popped out. Most of the time, if you can find where the “making” of is shared, you will find it took hours of prompt bashing and more likely, extra post processing in regular Photoshop.

Hence why my attempts usually look awful (?)

Now I could just share say image (like the Katy Perry cover of her sleeping in soft material that turns out to be a giant cat) and say, this is cool! But I always want to try things myself. So I downloaded (overnight) the beta version of Photoshop.

The way it works is you use the crop tool to create space around a source image. This fills with just white. But then you select all that blank space along with an edge portion of the seed photo, and watch something emerge. In many ways it’s impressive.

I started with my iconic Felix photo, the one I took on his first days with me in 2016, the one I use often as an icon.

2016/366/98 "Did Someone Say Go for a Ride?" — 2016/366/98 “Did Someone Say Go for a Ride?” flickr photo by cogdogblog shared into the public domain using Creative Commons Public Domain Dedication (CC0)

In Photoshop Beta, I enlarged the Canvas to the left a lot, and a little above, and let the magic go to work. Perhaps this is not the best example, since my truck in the background is blurred from depth of field effect.

Not quite magic.

Generated fill attempt 1 (click to see full size)

That’s a rather awkward vehicle there. And since AI has no concept of a porch rail, it would likely extend those posts Felix is peeking through into the stratosphere.

I decided to try again, and added a prompt to the generative gizmo saying “Red truck towing a camper”

Generative fill 2 attempt with prompt of “red truck towing camper” (click for full image)

Well, that looks awkwarder too. But it generates something.

I took another stab, thinking how it might take on extending a wide landscape that is well known. This is tricky because if one knows something of Geology, they canyon to either side extends to a broad plateau.

2018/365/80 Grand is an Understatement flickr photo by cogdogblog shared into the public domain using Creative Commons Public Domain Dedication (CC0)

I did one first where I went about 50% wider on each side

Grand Canyon Generative Fill 1 (click for full size)

It certainly continues the pattern, and is not all that weird. You do get 3 variations, this one is about the same:

Grand Canyon Generative Fill 2 (click for full size)

It’s odd, but not really too far from pseudo reality. I riffed off of this version, adding again another chunk of empty space on either side. Now its getting the geology pretty messed up and messy.

Grand Canyon Generative Fill of a Generative Fill

These are just quick plays, and there are also the other features in the mix to add and remove elements.

This definitely is going to change up a lot things for photographers and digital artist, and what is real and what is generated is getting so inter-tangled that thinking you can separate them is as wise as teetering off that canyon edge.

But getting back to the Petapixel leading headline, “So, Who Owns a Photo Expanded by Adobe Generative Fill?” oh my is ownership, copyright, and licensing going to get mashed up too. So all of those creative album cover expansions? It’s starting with copyrighted material. But the algorithmic extension, is that so far changed to raise a fair use flag? Heck, I have no idea.

At least if you start with an open license image, you stand on slightly less squishy ground.

I’m going back to my shed to tinker (that’s for Martin).

Featured Image: 100% free of AI!

Fill 'er Up — Fill ‘er Up flickr photo by cogdogblog shared under a Creative Commons (BY) license

CogDogBlog

Changing Up, “Decripting” My Podcast Methods, Eh, Ai? Eh?

By: cogdog — May 29^th 2023 at 16:56

You know you’ve been around this game a grey haired time if you remember that podcasting had something to do with this thing called RSS. I found shreds of workshops I did back at Maricopa in 2006 “Podcasting, Schmodcasting…. What’s All the Hype?” and smiled I was using this web audio tool called Odeo who’s founder went on to lay a few technical bird droppings.

I digress.

This post is about a radical change in my technical tool kit, relearning what I was pretty damned comfortable doing, and to a medium degree, appreciating for a refreshing change, something that Artificial Intelligence probably has a hand in. Not magically transforming, but helping.

I’ve had this post in my brain draft for a while, but there is a timely nature, since this coming Friday I am hosting for OE Global a new series I have been getting off the grind, OEG Live, which is a live streamed, unstructured, open conversation about open education and some tech stuff… really the format is gather some interesting people and just let them talk together. Live.

This week’s show came as a spin off from a conversation in our OEG Connect community starting with a request for ideas about creating Audiobook versions of OER content but went down a path that including interesting ideas about how new AI tools might make this more easy to produce. Hence our show live streamed to YouTube Friday, June 2 is OEG Live: Audiobook Versions of OER Textbooks (and AI Implications).

I wanted to jot down some things I have been using and experimenting with for audio production, where AI likely has a place, but is by no means the entire enchilada. So this tale is more about changing out some old tech ways for new ones.

Podcasting Then and Now

Early on I remember using apps like WireTap pro to snag system audio recorded in Skype calls and a funky little portable iRiver audio recorder for in person sessions. My main audio editing tool of choice was Audacity, and still something I recommend for its features and open source heritage. I not only created a ton of resources for it in the days of teaching DS106 Audio, I used it for pretty much all my media project I did over the last maybe 17, 18 years. Heck Audacity comes up 105 times in my blog (this post will make it hit the magic number, right?)/

Audacity is what I used for the first two years of editing the OEG Voices podcast. Working in waveforms was pretty much second nature, and I was pretty good at brining in audio recorded in Zoom or Zencastr (where you can separate speaker audio seperate tracks), layer in the multivoice intros and Free Music Archive music tracks.

This was the editing space:

The multitrack editing in Audacity, waveforms for music, intros, separate speakers.

After editing, to generate a transcript i used various tools like Otter.ai and Rev.ai to generate transcripts, and cleaning up required another listening pass. This was time consuming, and for a number of episodes we paid for human transcriptions (~$70/episode), which still needed some cleanup.

Might AI Come in?

Via a Tweet? a Mastodon Post from Paul Privateer I found an interesting tool from Modal Labs offering free transcription using OpenAI Whisper tech. Just by entering “OEG Voices” it bounced back with links for all the episodes. With a click for any episode, and some time for processing, it returned a not bad transcript, that would take some text editing to use, but it gives a taste, that, AI has a useful space for transcribing audio.

Gardner Campbell tuned my into MacWhisper for a nifty means to use that same AI ______ (tool? machine? gizmo? magic blackbox) for audio transcription. You can get a good taste with the free version, the bump for the advanced features might be worth it. There is also Writeout which does transcription via a web interface and translation (“even Klingon”). And likely a kazillion more services, sprouting every day with a free demo and a link to pay for more. Plus other tools for improving audio- my pal Alex Emkerli has been nudging the new Adobe tools.

There is not enough time in a day to try them all, so I rely on trusted recommendations and lucky hunches,

Descript was a ,luck hunch that panned out.

Something Different: Descript

Just by accident, as it seems to do, something I see in passing, in this case boosted by someone in the fediverse, I saw a post that triggered my web spidey sense

I gave Descript a try starting with the first 2023 OEG Podcast with Robert Schuwer. It’s taken some time to hone, but It. Has. Been. A.Game. Changer.

This is a new approach entirely for my audio editing. I upload my speaker audio tracks (no preprocessing needed to convert say .m4a to .wav nor jumping to the Levelator to even out levels), it chugs a few minutes to transcribe. I can apply a “Studio Sound” effect that cleans sound.

But it’s the editing that is different. Transcribing the audio means most (but not all) editing is done via text- removing words, moving sound around is done via looking at text. The audio is tied to the text.

I can move to any point via text or the waveform. It does something where it manages the separate audio tracks as one, so if I delete a word, or nudging something in the timeline (say to increase or decrease the gap above), it modifies all tracks. But if I have a blip in on track, I can jump into the multitrack editor and replace it with a silence gap.

But because I am working with both the transcript and the audio, but I am done editing, both are final. I’m not showing everything, like inserting music, doing fades, invoking ducking. And it took maybe 4 or 5 episodes of fumbling to train myself, but Descript has totally changed my podcast ways (Don’t worry Audacity lovers, I still use it for other edits).

You can get a decent sense of Descript with their free plan, but with the volume of episodes, we went with the $30/month Pro plan for up to 30 transcription hours per month (a multitrack episode of say 4 voices for 50 minutes, incurs 200 minutes of that). That’s much less than paying for decent human transcription (sorry humans, AI just took your grunt work)

And i am maybe at about the 20% level of understanding all Descript does, but that’s enough to keep my pod.

But it’s not just drop something in a magic AI box and out pops a podcast, this is still me, Alan, doing the editing.

Yet, if you like Magic stuff, read on.

Magic Podcast Production

Editing podcasts us work enough, but all that work writing up show notes, summaries, creating social media posts, maybe there is some kind of magic.

Well, a coffee meetup in Saskatoon with JR Dingwall dropped me intro Castmagic – “Podcast show notes & content in a click, Upload your MP3, download all your post production content.”

That’s right, just give AI your audio, and let the magic churn.

I gave it a spin for a recent podcast episode of OEG Voices, number 56 with Giovanni Zimotti (- a really interesting Open Educator at University of Iowa, you should check it out. It generates potential titles (none I liked), keywords, highlights, key points, even the text for social media posts (see all it regurgitated).

On one hand, what it achieves and produces is impressive. Woah, is AI taking away my podcast production? Like most things AI, if you stand back from the screen and squint, it looks legit. But up close, I find it missing key elements, and wrongly emphasizing what I know are not the major points. I was there in the conversation.

I’d give it an 7 for effort but I am not ready to drop all I do for some magic AI beans.

Ergo AI

I’m not a Debbie Downer in AI, just skeptical. I am more excited here about a tool, Descript, that has really transformed my creation process. It’s not because of AI and frankly I have no idea what AI is really doing in any of these improbable machines, but maybe aided by AI.

This stuff is changing all the time. And likely you out there, random or regular reader, is doing something interesting with AI and audio, so let me know! My human brain seeks more random potential nuerons to connect. And please drop in for our OEG Live show Friday to hash more out for OER, audio, and AI swirling together.

Meanwhile, I have some more Descript-ing to do. You?

Updates:

I got downsed!

Alan: The new OLDaily’s here! The new OLDaily’’s here!
Felix: Well I wish I could get so excited about nothing.
Alan: Nothing? Are you kidding?! Post 7275, CogDogBlog.! I’m somebody now! Millions of people look at this site every day! This is the kind of spontaneous publicity, you’re name on the web, that makes people. I’m on the web! Things are going to start happening to me now.
with apologies to a scene from The Jerk

I also got Jon Udell interested too…

Ed Iacobucci on DayJet, from IT Conversations

And from Jon’s post I discovered more exciting features:

One More Descript Thing

Featured Image: Mine! No Silly MidjournalStableConfusingDally stuff.

Improbable Machine flickr photo by cogdogblog shared under a Creative Commons (BY) license

CogDogBlog

Not Quite the Round Trip You Were AI-ing For

By: cogdog — February 11^th 2023 at 01:56

Current hyped mythology includes the idea that from the outside of one of the MagicalAI machines you can somehow conjure up what produced it. Or can you loop back?

Promises, promises.

But the ride might be interesting.

As he does frequently, my friend and colleague Grant Potter slides me so many interesting sites and tools, I mostly end up re-sharing them through my Pinboard #cooltech (and posted as #cogdogcooltech to Twitter while it lasts and Mastodon).

Like just today:

Sure I could tag, bag, and boost, but I prefer running these things through the paces.

The premise of Img2Prompt is from an image ~~created~~ ~~spawned~~ spit out by Stable Diffusion it can go in and magically suggest/produce a prompt that can create it. Then you can run it and test— can you make a full round trip?

To give it a try, I went to Stable Diffusion Online (free, no login, can it last?) and started trying to make a suitable image. But instead I reached for one suggested “an insect robot preparing a delicious meal” — thus I get these:

Stable Diffusion Playground with prompt entered "an insect robot preparing a delicious meal". 4 image results below of mechanical like insects, only 2 really look like they are preparing a meal — Yeah, robot insects cooking?

Sure the images are better quality than those quaint craiyon ones, but I don’t find them really great. They have the same degree of … sameness. The ones on the left are not really preparing food, so I chose the top right one, lt looks like a metal insect standing in a bowl of taco mixings.

When uploaded to Img2Prompt, what did I get?

a robot – sized bee, salad. it’s hard to believe the power of a machine – made from it. i think this machine is the real one.
prompt generated from an image, rather chatty, eh?

I cannot say I was impressed, but I wanted to try the full loop, and had it generate a new image from this prompt concocted from the first image.

Im2Prompt with uploaded Stable Diffusion image of a mechanical like insect sanding in a bowl of what looks like salad.

The predicted prompt "a robot - sized bee, salad. it's hard to believe the power of a machine - made from it. i think this machine is the real one." was then used to generate a different image, more of a bee hovering behind a dandelion.

Well yes, I got a bee, not robotic, not salad, not real.

Some round trip.

Well the site suggested trying an image from something not generated by Stable D. I have a few, I took this one to toss in the machine:

I Have An Alibi flickr photo by cogdogblog shared into the public domain using Creative Commons Public Domain Dedication (CC0)

I tossed Felix and the busted toy in Image2Prompt, which yielded… stand back for the wows….

a large dog in the middle of a room, with green slime and some brown stains on the floor, hyper realistic 4k
some kind of prompt?

Img2Prompt tried with uploaded photo f brown and white dog looking innocent standing next to a chewed up green toy.

The predicted prompt was "a large dog in the middle of a room, with green slime and some brown stains on the floor, hyper realistic 4k"

It then generated a photo of a green colored dog in the middle of a wood floor room

I am not sure where green slime came from, but this image seems to have slimed the dog.

At least the wood floor is shiny and has some realistic light sources.

It’s interesting, but…

Featured Image: Historic Columbia River Highway – The Rowena Loop by Brett Hansen placed into the public domain as a US Government work.

The Historic Columbia River Highway loops around beneath the Rowena Crest Viewpoint. Byway travelers descend from the viewpoint to the loops below.

FreshRSS

One More Descript Thing

Getting a Fill of Generative Fill

Changing Up, “Decripting” My Podcast Methods, Eh, Ai? Eh?

Podcasting Then and Now

Might AI Come in?

Something Different: Descript

Magic Podcast Production

Ergo AI

Updates:

Not Quite the Round Trip You Were AI-ing For