Why SIRI Is Not Entirely To Blame

I’ve heard from some purchasers of the iPhone 4S that SIRI doesn’t respond well to some accents, such as the Singaporean ones. Most people, being typical customers, think that it is the fault of Apple not to account for varying accents, but is it really so easy for a computer system to understand spoken language? One would think that there’s nothing challenging about deciphering spoken language, but try listening to French or Tamil when you don’t understand it. Can you decipher individual words at all?

The main differences between speech and writing are that firstly, you can’t really tell individual words; secondly, you don’t really pronounce everything perfectly and clearly, but sometimes you mangle up your words, or read things like “Ahve heard from some purchaserzof the iPhone 4S that SIRI doesn respond well to somaccents”; thirdly, speech is just different to different people. Anyone you show this same sentence to will say it somewhat differently, put emphasis on the wrong parts of speech, intone differently, things like that. So how can SIRI understand all these variations?

Perhaps the excellence of the human brain in comprehending sounds of the spoken word has a trade-off. As we grow older, we apparently start to perceive sounds in categories rather than the continuous spectrum that they actually are. For instance, in Mandarin, the word 波 and 坡 are written as “bo” and “po” respectively, but the “bo” in Mandarin isn’t really like how you would say “bo” in English. However, there is no way someone who knows English and not Mandarin can register or pronounce a Mandarin “bo” without a lot of practice. Most of them just end up lumping it in the “po” category, unable to distinguish between the “bo” and the “po” in Mandarin. Okay, maybe this isn’t the best example, but you know what I mean!

The spy saw the cop with the binoculars but the cop did not see him.

Did you need to re-read that sentence again for it to make sense? That was an example of a garden-path sentence, where sentence structure determines the meaning of the sentence. You can either see it as the spy using the binoculars to see the cop, or the spy seeing the cop that has the binoculars. Which meaning came first to you? Most people prefer linking the binoculars to the cop, seeing it as one block phrase.

There’re many kinds of sentence ambiguity.

1. Lexical

This is where the word has more than 1 meaning.

We’re going to the bank.

2. Surface

These sentences can be differentiated based on phrase structure, kinda like the example I brought up above.

Although Hannibal sent troops over a week ago.

3. Underlying

These sentences are differentiated based on meaning rather than phrase structure.

Flying planes can be dangerous.

In this case, in both instances of meaning, “flying planes” is the phrase you’ve got to single out. Only when you understand that flying planes can refer to the action of piloting as well as the state of a plane in flight can you unravel the cryptic cipher!

So as you can see, SIRI can’t work wonders because language processing in our brains is just so complex and difficult to explain to our own handphones. And well, iPhone 5 works just as well without SIRI, doesn’t it?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s