[epistemic status: assorted thoughts on something I’ve not heard discussed, nothing high confidence]
In the moderately near future we may be able to save all notable audio at very low cost (likely freemium) from our mobile phones, due to the ability to automatically filter/save only parts where voices are detected using machine learning techniques plus lower data transmission and storage costs (storage has had a moore’s law equivalent which was interrupted by the 2011 floods in Thailand). Uploading parts in real time will also be an option, for those concerned with device theft or destruction. A little further off, perhaps not until wearables or mini-drones are commonplace, continuous video recording becomes an option (again, with uninteresting sections heavily compressed using machine learning trained to extract important features/only save the full details of particularly significant events).
The implications of having cheap, effective, automatic personal surveillance on society seem likely to be very significant, but it’s something I’ve rarely heard discussed (unlike, for example, mass surveillance by states). I’ve been thinking about this for a while, and have done some cursory googling (unsuccessful, due to the volume of information about other types of surveillance/not knowing the right search terms). I’ll lay out a few of my uncertainties and predictions for once this becomes widespread, though these are preliminary thoughts rather than things I am highly confident about:
1. Events/Gossip becomes verifiable
Rather than just hearing things on the grapevine, in some situations it will be possible to actually hear a conversation for yourself (especially if an accusation is made). There would likely be strong and complex social stigmas around this in at least some subcultures, but I suspect that at least *defensively* using this will be somewhat common.
The social implications here are likely some of the most significant, but also some of the hardest for me to predict.
2. Many crimes are much more risky
Muggings are easier to trace (especially if microdrones are available to respond very rapidly, and are requested automatically). Trespassing, recreational drug use, and other similar crimes too since only one person has to have the app running to incriminate the group. Social pressure is effective (especially using “you’re incriminating yourself”), but there’s always a risk that someone just forgets to turn it off. Repeatedly verifying that an app is not running would be socially costly in at least some circles, automatically verifying may or may not be a thing which is easily available.
3. Serious/controlling abuse is easier to commit, but abuse in general is easier to uncover
Abusive relationships or family structures would be able to more closely monitor the behaviour of victims in order to control them (or give a stronger illusion of such, to create a personal panopticon), but the abuses towards them could, if the victim has or gains access to the recordings, be uncovered along with strong proof.
Workplaces will have to respond. I could imagine some embracing it as a way to verify employee and customer conduct (perhaps automatically), while some ban it in (to avoid exposing bad working conditions or outcry about humans being monitored by machines).
5. At some point it will be easy to fake recordings
Once we can autogenerate/restyle audio undetectably, you start to run into forgery problems. This could be mitigated by recording defensively and having all recordings cryptographically signed.
Police body cams and mobile phone recordings are the early steps, but in a world where by default everything gets recorded many legal proceedings would be simplified in ways which modify incentives around different classes of action.
For example, alibis become significantly based around recordings. Parental custody hearings get to hear the actual disputes.
7. Maybe none of this happens because of outcry?
Some of the social changes are moderately significant. I can imagine there being quite a bit of pressure against it, though I find it hard to imagine a generally successful way of preventing audio recording apps.
Q: How low can you get power use?
If there’s no low-power way to run this, perhaps it would have an unacceptable cost in terms of battery life?
Q: How accurately can you identify humans by voice using neural nets or similar?
Face recognition is already 70% accurate from photos. Perhaps voices are too similar to pin an identity to (or requires very high audio quality). This claims that voices are captured at airports, which is mild evidence that this can be done, but maybe they only narrow people down a bit. Would be interested in relevant links/papers.
Overall it seems like a likely future change which gets insufficient attention considering the possible scale of effects. It’s probably not in the top 10 technological changes of the next 10 years, but it’s still notable enough that I’m surprised I’ve literally never heard or anyone talking about it or found an article on it.
Though perhaps I’m just reading the wrong places and someone can give me search terms to some existing analysis of this?