I have an AI policy page on my website: https://reillyspitzfaden.com/ai-policy/

screwlisp

@reillypascal I have a general question, I do not know if you want to clarify it. Do you consider speech2text or language translation (probably via LLMs now) in the category of generative ai?

Myself I can barely muscle through german and probably not French articles with my brain but I feel guilty about using LLM translation and reading as a fake polyglot.

Reilly Spitzfaden (they/them)

@screwlisp it's a good question. As far as calling it generative, I'm not sure what my answer would be. I have heard anecdotally that Google Translate used to be better before it moved from more traditional machine translation to LLMs.

Both in terms of my values and potential quality issues, I would definitely prefer to avoid using the current crop of LLM tools for either of these tasks, although I'm not opposed to the tasks themselves. I've been using Transcribro for speech to text on my phone (because of joint pain issues), and this is reminding me that I should check what technology that uses.

At any rate, for my website, I haven't used either speech to text or machine translation.

screwlisp

@reillypascal like you point out, I would not want people who medically benefit from speech2text to have that taken away from them because the current deep learning models were made in an unethical way. And I think that taking away translation is similar. Maybe the workable principle would be to just run them locally and independently (e.g. whisper.cpp and w/e).

Morten Mosgaard

@reillypascal what a great approach! Thanks for sharing!

Reilly Spitzfaden (they/them)

@screwlisp update: looks like Transcribro uses OpenAI's Whisper model. I don't want my words going through their stuff (and it sounds like Whisper makes things up: https://www.baldurbjarnason.com/2024/openai-whisper-risks/), so I got rid of it.

Sayboard uses Vosk, and if someone else wants an Android transcription tool, I guess they could use that, but I don't know how that's trained and I probably wouldn't find it much more ethical, so I'm going to try going without.

Heliboard has swipe-typing, which I haven't used before and could be nice. It uses Google's proprietary library to handle it, so I guess I'll look into how that was trained/how it functions, but that could be acceptable to me. At any rate, I think a big thing to do is use my laptop more so I don't need transcription.

screwlisp

@reillypascal
just run https://github.com/ggml-org/whisper.cpp locally for your speech2text. I did look into it, you can also compile it to your own locally-running android app (is that still allowed?).

speech2text is medically / quality-of-life important to so many people, I think it might be appropriate to try running local-only speech2text rather than losing it because all modern services transitioned to objectionable web-services and lots of vulnerable people are in that pickle.

screwlisp

@reillypascal oh, that article is really bad. But there are more than one source of speech2text models iirc.

Reilly Spitzfaden (they/them)

@screwlisp I definitely don't mean to make a prescription for others — this is solely a personal choice because I _can_ make things work without s2t.

Transcribro already runs locally using whisper.cpp, and you can get it through the Accrescent store (rather than the Play store) so I'm fully satisfied with the privacy of that. I just don't like what I hear about Whisper, both in terms of performance, and in terms of knowing that my words are going through OpenAI's tools every time I text or post from my phone.

Reilly Spitzfaden (they/them)

@screwlisp by “bad” do you mean poorly written/sourced, or that the information looks bad for Whisper?

screwlisp

@reillypascal I meant that the experience the article reported of racist words being wrongly transcribed sounds bad!

I am a bit confused because it sometimes sounds like whisper.cpp is producing the speech2text.

Whisper.cpp is just the program that applies a chosen model to a chosen audio file resulting in that model's speech2text.

It is the model that does the speech2text. You can choose different models of different sizes made by different people to call with whisper.cpp.

Reilly Spitzfaden (they/them)

@screwlisp yeah then I don't know. I'd have to look into it more.

Wandering Adventure Party

I have an AI policy page on my website: https://reillyspitzfaden.com/ai-policy/