PDA

View Full Version : Voice Activated Recording for the Nokia N900 / Linux Desktop.



Wizzup?
07-26-2010, 09:37 PM
I had been looking a long time for a program that would start recording, but only save frames that contain sound which is ``louder'' than a specific threshold. I searched for programs. I found none that were open source. There weren't any closed source ones that run on linux or my n900 either... Only a few heavy programs for Windows. (Which were closed)

Why do I want this? Well.. very often I have some great ideas or just plain things I want to remember. When I'm walking in the forest, sitting in bath, cycling in the city... (Or just sitting in my room, music playing doesn't bother it) Then I can't just write it down. So this is kind of a Memo application, but instead of typing, you just say whatever you want to say, and it will record that. It will obviously have to be on all the time, but it will only actually save the useful samples.

So... I knew my phone could record because it had some "recorder" app. It worked quite well, but did not have this particular feature. I found the source, and it was written in C. It seemed to use GStreamer for recording, in combination with PulseAudio. (My phone uses PulseAudio)

I was messing a bit with GStreamer, but it was a bit overkill and didn't send any WAVE headers either. So I fell back to a default linux program. (It's probably on your desktop too, if you run Linux): asound. This program simply dumps the raw .wav file to the standard output. So I made a subprocess in my python script which piped all the wave output to my program. My program will read 8000 samples at a time, - exactly 1 second -and then process it. If it contains at least one high and low tone, it will write the entire sample of one second to the output file. Additionally, it will always record one extra second after a second that contain a loud sound. If that extra second also contains loud sounds, it will also capture the next second, and so on.

It was quite a task to do this, as I first didn't take a good look at my output data and started to do Fast Fourier Transformations, while I did not even need these.

Simply run the program by either making it executable (chmod +x the_file) or call it like "python the_file".
Obviously it will need a GUI and a system to manage all the output files, as it will currently override the last session. It should probably not write output to the file as it does now... More buffering would be more flash card friendly. ;)

Some credits go to Benland100 for flaming/teasing me when I was using FFTs to do this. He put me on the right track. ;)

E: Obviously this will work on any system that has asound and python.


#!/usr/bin/env python

import subprocess
import wave
import time

#p = subprocess.Popen(['gst-launch-0.10', 'pulsesrc ! wavenc ! fdsink fd=1'], \
# stdout=subprocess.PIPE)

# Arecord just dumps the raw wav to stdout. We will use this
# to read from with out wave module.
p = subprocess.Popen(['arecord'], stdout=subprocess.PIPE)

# Open the pipe.
f = wave.open(p.stdout)

# Open file we will write to.
o = wave.open('test.wav', 'w')
o.setparams((1, 1, 8000, 0, 'NONE', 'not compressed'))
o.setnchannels(1)

high, lasthigh = False, False

# Print audio set up
print f.getparams()

while True:
# Read 1 second
a = f.readframes(8000)
b = [ord(x) for x in a]
_min, _max = min(b), max(b)

# Print bounds
print 'min', _min
print 'max', _max

# TODO: The gate should obviously be configurable.
if _max > 135 and _min < 120:
high = True
else:
high = False

# Write always if either is True.
if lasthigh or high:
o.writeframes(a)

lasthigh = high

f.close()
o.close()

mastaraymond
07-26-2010, 09:49 PM
Is this the answer of life?

i luffs yeww
07-26-2010, 10:07 PM
My phone is a joke (http://reviews.cnet.com/cell-phones/samsung-sgh-t229-red/4505-6454_7-33090816.html), but this really excited me. :3 I'd like to have a phone up to today's standards and be able to experiment with this.

Great job! And it sounds really interesting. ^^


if max(b) > 135 and min(b) < 120:
high = True
else:
high = False

#Could simply be

high = max(b) > 135 and min(b) < 120

#no?

Awkwardsaw
07-26-2010, 10:09 PM
My phone is a joke (http://reviews.cnet.com/cell-phones/samsung-sgh-t229-red/4505-6454_7-33090816.html), but this really excited me. :3 I'd like to have a phone up to today's standards and be able to experiment with this.

Great job! And it sounds really interesting. ^^

i have the same phone, cept its black :p

its legit, does every thing i need

Craig`
07-26-2010, 10:10 PM
nice.

i luffs yeww
07-26-2010, 10:13 PM
:p Awkward, mine's black, too!

/spam

Wizzup?
07-26-2010, 10:25 PM
My phone is a joke (http://reviews.cnet.com/cell-phones/samsung-sgh-t229-red/4505-6454_7-33090816.html), but this really excited me. :3 I'd like to have a phone up to today's standards and be able to experiment with this.

Great job! And it sounds really interesting. ^^


if max(b) > 135 and min(b) < 120:
high = True
else:
high = False

#Could simply be

high = max(b) > 135 and min(b) < 120

#no?


Yes. And the *.close() methods on f and o can be removed as well, as the only way to exit the program currently it to use ctrl+c, which will stop the program immediately. (Python will call .close though) The loop should catch the Ctrl+c exception and then break though.

i luffs yeww
07-26-2010, 10:29 PM
:) Thanks for the info.

Wizzup?
07-26-2010, 10:47 PM
Also, I'm not sure if asound always records @ 8000 samples. Probably not, so I'll have to change that as well. Will test tomorrow with my laptop.

R0b0t1
07-27-2010, 12:05 AM
import magic

moser
07-27-2010, 04:49 PM
Thanks for this. How do you change the recording parameters? Other than the "8000" in the line below which is obviously the sample rate, what do the other numbers control? I want to be able to change the bit depth to 16 per sample and the channels to 2. I'm no stranger to arecord, but I don't understand python.

o.setparams((1, 1, 8000, 0, 'NONE', 'not compressed'))

moser
07-28-2010, 05:32 AM
I found this http://docs.python.org/library/wave.html.

But when I change the line o.setparams((1, 1, 8000, 0, 'NONE', 'not compressed')) to o.setparams((1, 2, 8000, 0, 'NONE', 'not compressed')), the audio just becomes garbled.

Also, changing the 8000 in both places in the script doesn't work, it just makes the audio sound like the chipmunks.

Wizzup?
07-28-2010, 09:12 AM
I found this http://docs.python.org/library/wave.html.

But when I change the line o.setparams((1, 1, 8000, 0, 'NONE', 'not compressed')) to o.setparams((1, 2, 8000, 0, 'NONE', 'not compressed')), the audio just becomes garbled.

Also, changing the 8000 in both places in the script doesn't work, it just makes the audio sound like the chipmunks.

Hmmm yeah. Keep in mind that if you have two channels you'll probably need to either select one channel or read from both, I suppose.

If you can wait a bit (two days or so) I can make it more flexible. :)

Also, the 1 you are changing to 2 is the sample width.


Wave_write.setparams(tuple)
The tuple should be (nchannels, sampwidth, framerate, nframes, comptype, compname), with values valid for the set*() methods. Sets all parameters.

moser
07-28-2010, 04:02 PM
Thanks for responding. I understand what the 2 did. But I don't understand why changing to 16 bits/sample causes the recorded audio to go garbled. I also don't understand how to increase the sample rate. Frankly I also don't understand were arecord fits into this! Oh well, I will wait until you make more changes. Thanks very much!

Wizzup?
07-31-2010, 10:47 PM
Well if I decide to work on this any time soon, you can always find the latest version here: http://git.wizzup.org/?p=vacr.git;a=summary

For now, it would probably be easier to specify a sample rate of 8000 in arecord. (Just pass "-r 8000" as second argument)

eg: p = subprocess.Popen(['arecord', '-r 8000'], stdout=subprocess.PIPE)

moser
08-01-2010, 05:15 PM
Thanks very much. I will follow this at your repo.

How does one change the number of bits per sample? Right now it records using unsigned 8 bit, what would I do to change this to signed 16 bit? Putting -f S16_LE in the arecord line just causes an error like this:


arecord: main:495: wrong extended format ' S16_LE'

Wizzup?
08-01-2010, 07:12 PM
Thanks very much. I will follow this at your repo.

How does one change the number of bits per sample? Right now it records using unsigned 8 bit, what would I do to change this to signed 16 bit? Putting -f S16_LE in the arecord line just causes an error like this:


arecord: main:495: wrong extended format ' S16_LE'

I was thinking about -s 16000. (Try arecord -s 16000 > test.wav) Ctrl+C it after a few seconds, and then play it with something that shows the rate.

moser
08-02-2010, 01:13 PM
Yes, changing the sampling rate works fine, I also changed the number in the 2 other places in the script so the processing would make sense. But what about the bits/sample issue? I can't seem to pass that info to arecord, nor change it on the python line.

Wizzup?
08-27-2010, 07:39 PM
I'll get to this in a bit... :)

Wizzup?
02-12-2011, 04:06 PM
I've recreated (https://github.com/MerlijnWajer/SNARP) the repository at Github, since I'm not going to buy a new phone anytime soon and I want to resume work on this.

Also, for people who were waiting on a phone that will be as tweaker friendly as N900, I'd give up waiting and just buy a N900. It's receiving community updates and Nokia is soon going to remove it from it's product line, I think. But there won't be another phone like the N900 anytime soon now they've joined up Microsoft, I suppose.

i luffs yeww
02-13-2011, 06:13 AM
Unless Microsoft starts doing more open stuff like they've been doing. That could end out pretty sweet.