Classification Update

Test Results

Chris provided some feedback on the [ previous analysis | Home ] and found much of it “spot on”. He also provided some useful commentary. He got the high accuracy he reports using data exclusively from himself and a few volunteers, ** all of which were told to put the phone in a trouser pocket **. Only a specific set of known activities were included in this sample as well.

This bit of additional information makes a big difference. So I did some more testing and further thought through the issues more with the following results.

Activity Recorded	Actual Activity	Comments
Walking for 8 minutes	Walking for 8 minutes.	Breast pocket. Flat ground. Perfect result.
Sitting down or travelling by bus.	Standing up looking at screen or making notes on phone.	This is using phone orientation and lack of acceleration to classify activity. If the phone was in a hip pocket it would have this orientation when sitting down.
Walking 5 minutes Walking upstairs less than 1 minute Walking 14 minutes	Walking. some rough ground.	Mostly right but rough ground looks to have been detected as walking up stairs. Phone in top pocket.
Walking for 6 minutes.	Walking for 6 minutes.	No errors. Phone carried in left hand. Long side of phone facing forward back.
Walking for 11 minutes.	Walking with walking up stairs now and then.	Flat ground. Phone in hip pocket. Gets higher accelerations in hip pocket than breast pocket which it sometimes interprets as stairs.
Standing 20 minutes.	Travelling by car on freeway.	Phone vertical -z axis in direction of travel. If the phone is in a hip pocket is not vertical when travelling seated therefore travel is not detected. Lower accelerations than walking are classified as standing.
Travelling by car total 16 minutes. Travelling by bus 8 times mostly less than 1 minute.	Travelling by car in suburban streets.	Phone lying flat on lap. Can't tell bus from car but otherwise always correct. Not much traffic so only short stops at traffic lights.
Travelling by bus total 17 minutes Sitting down 10 times mostly less than 1 min, one time for 3 minutes Travelling by car total 25 minutes.	Travelling by car on freeway.	Phone lying flat on lap. Freeway smoother than suburban streets. Bus and car with some sitting suggests higher accelerations are classified as car, medium as bus and lower as sitting.
Travelling by bus for 40 minutes. Unknown less than 1 minute	Travelling by car on freeway.	Phone in hip pocket so that phone is nearly on its side edge with length of phone facing forward/back.

Reinterpreting The Tests

If you view the above results in the right way they are excellent. If you reinterpret using the following rules:-

Keep the phone in your hip pocket, don’t take it out and look at it.
Combine walking and step climbing as walking,
Combine travelling by car and bus as travelling,
Don’t undertake activities other than those known about.

The results would then become:-

Activity Recorded	Actual Activity	Comments
Walking for 8 minutes	Walking for 8 minutes.	Perfect result.
Walking 36 minutes	Walking 36 minutes	Perfect result.
Travelling total 16 minutes.	Travelling by car.	Perfect result. Travelling in suburban streets.
Travelling for 40 minutes. Unknown less than 1 minute	Travelling by car.	Nearly perfect result. Travelling on freeway.

The results may not be the same for others but are better than Chris suggested in his feedback “For real-world accuracy, the most “optimistic” figure is around 75% for activities which are known to be supported. i.e., if everything not supported (sleeping, phone left on desk, etc, etc) is ignored, ¾ samples are classified as would be expected and ¼ samples weren’t. When you factor in activities which the analyser doesn’t know anything about this figure obviously drops quite a lot.”

It’s Partly Psychology

So when I first experimented without any background knowledge the perception I get is that it is it is wrong a lot. This is in part due to psychology, I judge that I have been doing the same thing for quite some time but there is a lot of less than 1 minute classifications, mostly wrong. There is also some long periods when the results are right but perception of these is overwhelmed by the less than 1 minute classifications. This perception could be overcome with an interface like this.

When doing the summary leave out all the less than 1 minute activities. This interface conveys a lot of information and a few minutes of incorrect classification doesn’t change the summary much.

Some misclassifications are much more concerning than others. The types of errors (Consolvo et.al.) can be categorised as:-

Make an error in the start time.
Make an error in the duration.
Confuse an activity it was trained to infer with another it was trained to infer.
Confuse an activity it was not trained to infer with one it was trained to infer.
Fail to detect an activity it was trained to infer.
Fail to detect an activity it was not trained to infer.
Detect an activity when none occurred.

Consolvo et.al. found of these there were two that people found frustrating and this “frustration often led to participants questioning if the device was malfunctioning”. These were:-

Failing to detect an activity it was trained to infer.
Detecting an activity when none occurred.

Seeing various activities being logged when the phone is sitting on the desk looks very unconvincing and travelling on a bus when you never do is similarly frustrating so it is these errors which should be avoided where possible. Getting correct classifications is the ideal but where this can’t be achieved the psychologically important errors can be much reduced by degrading the system according to the usual measures of accuracy. This is achieved by:-

Keep the number of activities detected small. Less alternatives means less to get wrong and a higher likelihood of being right.
Don’t try to detect activities that can’t be detected to a high reliability. Much better to restrict detection to walking than include walking on stairs but get it wrong a lot of the time.
Phones spend a lot of time charging and not being carried. These are not important for a person to know about but are important to know so that no activities are detected then. Charging and stationary should be easy to detect.
When an activity is detected but there is significant uncertainty what it is, just record that an activity has occurred and give some extra information. Better to leave it unclassified than get it wrong.

Lots of the classifications currently depend on the phone being in the hip pocket so that orientation can be used as an indicator. Standing vs sitting is a useful classification and is also used to infer that a person sitting is likely to be travelling. Parkka et.al. placed accelerometers on rucksack straps and on wrists and found it was not possible to separate sitting and standing from each other using accelerometers at these locations. They tried different approaches for detecting these activities. For example, it was assumed that the direction of a test person’s body would stay more stable during sitting than during standing. However, the recorded data did not show such behaviour. This means we have a choice of ignoring sitting vs standing or specifying to users to keep their phone in their hip pocket if they wish to use this application. It is not clear which is the better choice because anyone who ignores the instruction will be disappointed with the results and is that too burdensome a requirement for the application to be considered useful? Should it be a set-up option along the lines of:-

I carry my phone is my hip pocket,
I carry my phone in a breast pocket,
I carry my phone in a handbag,

with different algorithms depending on the users selection? This would result in detecting different activities depending on the option chosen which would make comparing peoples activities problematical. Even the same person changing options would render temporal activity comparisons problematical.

So with this focussed experimentation and a better understanding of how it works I’m amazed by the accuracy achieved from such simple measurements but that wasn’t the initial reaction. This experience makes it impossible to see the results as a new user would. Its important to get the psychology right or the new user will not not continue after an early disappointment.

Algorithm

Chris in his feedback points out that all of the recorded data comes from the “‘Sensor Logger’ app, which records multiple samples over a period of about a minute and then selects the modal result. The ‘Activity Recorder’ app which just takes one sample a minute and tries to use the aggregation algorithm to adjust for uncertainty and rogue results… this doesn’t work as well (although it saves on battery life quite significantly).” There is a trade off between battery life and accuracy. Sampling about 10% of the time is producing surprisingly accurate results, its not clear what the optimum sampling percentage is but its likely to be different depending on the activity. So if the phone is detected as stationary i.e. not being carried, it should be possible to sample for a lot less than 6.5 seconds to check the orientation hasn’t changed. If walking is detected it may be necessary to sample more often, particularly if steps are to be counted.

Currently its using the k-nearest neighbour algorithm For walking this is really good. If we ignore steps and classify them as walking it is nearly always right whether the phone is in breast pocket, hip pocket or even carried in a swinging hand. It is only using maximum - minimum acceleration so it seems walking produces much larger maximum accelerations than anything else. A bump or jolt at any point in the 6.5 second sampling interval would look like walking but in this limited testing that didn’t happen as walking wasn’t detected when it wasn’t occurring. Standard deviation would be robust to the occasional bump. For vehicle testing it is almost always wrong when the phone is vertical and almost always correct when the phone is on its edge. So it seems when the average y axis acceleration is low and the average x axis acceleration is low i.e the phone is on its side. Then when maximum - minimum acceleration is low on the x axis, the classification will be sitting and the classification will be travelling when the maximum - minimum acceleration is larger.

Once the model is described it can be implemented in logic and it can be predicted when it will fail.

(1) Sunny Consolvo, David W. McDonald, Tammy Toscos, Mike Y. Chen, Jon Froehlich, Beverly Harrison, Predrag Klasnja, 2, Anthony LaMarca, Louis LeGrand, Ryan Libby, Ian Smith, James A. Landay; Activity Sensing in the Wild: A Field Trial of UbiFit Garden; CHI 2008 Proceedings · Personal Health; April 5-10, 2008 · Florence, Italy

(2) Juha Parkka, Miikka Ermes, Panu Korpip¨a¨a, Jani M¨antyj¨arvi, Johannes Peltola, and Ilkka Korhonen; Activity Classification Using Realistic Data From Wearable Sensors; IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 10, NO. 1, JANUARY 2006