Researching Usability

Posts Tagged ‘methodology

As the project embarks on usability testing using mobile devices, it was important to evaluate mobile specific research methods and understand the important differences between desktop usability testing and that of mobile devices. The most important difference to be aware of when designing and testing mobile devices is that it IS different to traditional testing on desktop computers. Additional differences are provided below:

  • You may spend hours seated in front of the same computer, but mobile context is ever-changing. This impacts (amongst other things) the users’ locations, their attention, their access to stable connectivity, and the orientation of their devices.
  • Desktop computers are ideal for consumption of lengthy content and completion of complex interactions. Mobile interactions and content should be simple, focused, and should (where possible) take advantage of unique and useful device capabilities.
  • Mobile devices are personal, often carrying a wealth of photos, private data, and treasured memories. This creates unique opportunities, but privacy is also a real concern.
  • There are many mobile platforms, each with its own patterns and constraints. The more you understand each platform, the better you can design for it.
  • And then there are tablets. As you may have noticed, they’re larger than your average mobile device. We’re also told they’re ideal for reading.
  • The desktop is about broadband, big displays, full attention, a mouse, keyboard and comfortable seating. Mobile is about poor connections, small screens, one-handed use, glancing, interruptions, and (lately), touch screens.

~ It’s About People Not Devices by Stephanie Rieger and Bryan Rieger (UX Booth, 8th February 2011)

Field or Laboratory Testing?

As our interaction with mobile devices happens in a different way to desktop computers, it seems a logical conclusion that the context of use is important in order to observe realistic behaviour. Brian Fling states in his book that you should “go to the user, don’t have them come to you” (Fling, 2009). However, testing users in the field has its own problems, especially when trying to record everything going on during tests (facial expressions, screen capture and hand movements). Carrying out contextual enquiries using diary studies are beneficial, they also have drawbacks as they rely on the participant to provide an accurate account of their behaviour which is typically not always easy to achieve, even with the best intentions. Carrying out research in a coffee shop for example provides the real-world environment which maximizes external validity (Demetrius Madrigal & Bryan McClain, Usability for Mobile Devices). However, for those who field studies are impractical for one reason or another, simulating a real-world environment within a testing lab has been adopted. Researchers believe they can also help to provide external validity which traditional lab testing cannot (Madrigal & McClain, 2011). In the past researchers have attempted a variety of techniques to do this and are listed below:

participant on a treadmill

Image from Kjeldskov & Stage (2004)

  • Playing music or videos in the background while a participant carries out tasks
  • Periodically inserting people into the test environment to interact with the participant, acting as a temporary distraction
  • Distraction tasks including asking participants to stop what they are doing, perform a prescribed task and then return to what they’re doing (e.g. Whenever you hear the bell ring, stop what you are doing and write down what time it is in this notebook.) (Madrigal & McClain, 2010)
  • Having participants walk on a treadmill while carrying out tasks (continuous speed and varying speed)
  • Having participants walk at a continuous speed on a course that is constantly changing (such as a hallway with fixed obstructions)
  • Having participants walk at varying speeds on a course that is constantly changing (Kjeldskov & Stage, 2003)

Although realism and context of use would appear important to the validity of research findings, previous research has refuted this assumption. Comparing the usability findings of a field test and a realistic laboratory test (where the lab was set up to recreate a realistic setting such a hospital ward) found that there was little added value in taking the evaluation into a field condition (Kjeldskov et al., 2004). The research revealed that lab participants on average experienced 18.8% usability problems compared to field participants who experienced 11.8%. In addition to this, 65 man-hours were spent on the field evaluation compared to 34 man-hours for the lab evaluation, almost half the time.

Subsequent research has provided additional evidence to suggest that lab environments are as effective in uncovering usability issues (Kaikkonen et al., 2005). In this study, researchers did not attempt to recreate a realistic mobile environment, instead comparing their field study with a traditional usability test laboratory set-up. They found that the same issues were found in both environments. Laboratory tests found more cosmetic or low-priority issues than in the field and the frequency of findings in general varied (Kjeldskov & Stage, 2004). The research did find benefits or conducting a mobile evaluation in the field.  It was able to inadvertently evaluate the difficulty of tasks by observing participant behaviour; participants would stop, often look for a quieter spot and ignore outside distractions in order to complete the task. This is something that would be much more difficult to capture in a laboratory setting. The research also found that the field study provided a more relaxed setting which influenced how much verbal feedback the participant provided, however this is refuted by other studies which found the opposite to be true (Kjeldskov & Stage, 2004).

Both studies concluded that the laboratory tests provided sufficient information to improve the user experience, in one case without trying to recreate a realistic environment. Both found field studies to be more time-consuming. Unsurprisingly this also means the field studies are more expensive and require more resources to carry out. It’s fair to say that running a mobile test in the lab will provide results similar to running the evaluation in the field. If time, money and/or access to equipment is an issue it certainly won’t be a limitation to test in a lab or empty room with appropriate recording equipment. Many user experience practitioners will agree that any testing is always better than none at all. However, there will always be exceptions where field testing will be more appropriate. For example, if a geo-based mobile application is being evaluated this will be easier to do in the field than in the laboratory.

Capturing data

Deciding how to capture data is something UX2 is currently thinking about. Finding the best way to capture all relevant information is trickier on mobile devices than desktop computers. Various strategies have been adopted by researchers, a popular one being the use of a sled which the participant can hold comfortably and have a camera positioned above to capture the screen. In addition to this it is possible to capture the mobile screen using specialised software specific to each platform ( If you are lucky enough to have access to Morae usability recording software, they have a specific setting for testing mobile devices which allows you to record from two cameras simultaneously; one to capture the mobile device and the other to capture body language. Other configurations include a lamp-cam which clips to a table with the camera positioned in front of the light. This set-up does not cater for an additional camera to capture body language and would require a separate camera set up on a tripod. A more expensive solution is the ELMO-cam, specifically their document camera, which is stationary and requires the mobile device to remain static on the table.  This piece of kit is more likely to be found in specialised research laboratories which can be hired for the purpose of testing.

lamp-cam configurations

Lamp-cam, image courtesy of Barbara Ballard


Based on the findings from previous research, the limitations of the project and its current mobile service development stage, it seems appropriate for the UX2 project to conduct initial mobile testing in a laboratory. Adapting a meeting room with additional cameras and using participant’s own mobile device (where a specific device is recruited) will provide the best solution and uncover as many usability issues than if it took place in the field. A subsequent blog will provide more details of our own test methods with reflections on its success.


Fling, B., (2009). Mobile Design and Development, O’Reilly, Sebastopol, CA, USA.

Kaikkonen, A., Kallio, T., Kekäläinen, A., Kankainen, A and Cankar, M. (2005) Usability Testing of Mobile Applications: A Comparison between Laboratory and Field Testing, Journal of Usability Studies, Issue 1 Vol 1.

Kjeldskov, J., Stage, J. (2004). New techniques for usability evaluation of mobile systems, International Journal of Human-Computer Studies, Issue 60.

Kjeldskov, J., Skov, M.B., Als, B.S. and Høegh, R.T. (2004). Is It Worth the Hassle? Exploring the Added Value of Evaluating the Usability of Context-Aware Mobile Systems in the Field, in Proceedings of the 5th International Mobile HCI 2004 Conference, Udine, Italy, Sringer-Verlag.

Roto, V., Oulasvirta, A., Haikarainen, T., Kuorelahti, J., Lehmuskallio, H. and Nyyssönen, T. (2004) Examining Mobile Phone Use in the Wild with Quasi-Experimentation, Helsinki Institute for Information Technology Technical Report.

Tamminen, S., Oulasvirta, A., Toiskallio, K., Kankainen, A. (2004). Understanding mobile contexts. Special issue of Journal of Personal and Ubiquitous Computing, Issue 8

When carrying out usability studies on search interfaces, it’s often better to favour interview-based tasks over pre-defined ‘scavenger-hunt’ tasks. In this post I’ll explain why this is the case and why you may have to sacrifice capturing metrics in order to achieve this realism.

In 2006, Jared Spool of User Interface Engineering wrote an article entitled Interview-Based Tasks: Learning from Leonardo DiCaprio in it he explains that it often isn’t enough to create test tasks that ask participants to find a specific item on a website. He calls such a task a Scavenger-Hunt task. Instead he introduces the idea of interview-based tasks.

When testing the search interface for a library catalogue, a Scavenger Hunt task might read:

You are studying Russian Literature and your will be reading Leo Tolstoy soon. Find the English version of Tolstoy’s ‘War and Peace’ in the library catalogue.

I’ll refer to this as the Tolstoy Task in this post. Most of your participants (if they’re university students) should have no trouble understanding the task. But it probably won’t feel real to any of them. Most of them will simply type ‘war and peace’ into the search and see what happens.

Red routes

The Tolstoy Task is not useless, you’ll probably still witness things of interest. So it’s better than having no testing at all.

But it answers only one question – When users know the title of the book, author and how to spell them both correctly, how easy is it to find the English version of Leo Tolstoy’s War and Peace?

A very specific question like this can still be useful for many websites. For example a car insurance company could ask – When the user has all of his vehicle documents in front of him, how easy is it for them to get a quote from our website?

Answering this question would give them a pretty good idea of how well their website was working. This is because it’s probably the most important journey on the site. Most websites have what Dr David Travis calls Red Routes – the key journeys on a website. When you measure the usability of a website’s red routes you effectively measure the usability of the site.

However many search interfaces such as that for a university library catalogue, don’t have one or two specific tasks that are more important than any others. It’s possible to categorise tasks but difficult to introduce them into a usability test without sacrificing a lot of realism.

Interview-based tasks

The interview-based task is Spool’s answer to the shortfalls of the Scavenger Hunt task. This is where you create a task with the input of the participant and agree what successful completion of the task will mean before they begin.

When using search interfaces, people often develop search tactics based upon the results they are being shown. As a result they can change tactics several times. They can change their view of the problem based upon the feedback they are getting.

Whilst testing the Aquabrowser catalogue for the University of Edinburgh, participants helped me to create tasks that I’d never have been able to do so on my own. Had we not done this, I wouldn’t have been able to observe their true behaviour.

One participant used the search interface to decide her approach to an essay question. Together we created a task scenario where she was given an essay to write on National identity in the work of Robert Louis Stevenson.

She had decided that the architecture in Jekyll and Hyde whilst set in London, had reminded her more of Edinburgh. She searched for sources that referred to Edinburgh’s architecture in Scottish literature, opinion on architecture in Stevenson’s work and opinion on architecture in national identity.

The level of engagement she had in the task allowed me to observe behaviour that a pre-written task would never have been able to do.

It also made no assumptions about how she uses the interface. In the Tolstoy task, I’d be assuming that people arrive at the interface with a set amount of knowledge. In an interview-based task I can establish how much knowledge they would have about a specific task before they use the interface. I simply ask them.

Realism versus measurement

The downside to using such personalised tasks is that it’s very difficult to report useful measurements. When you pre-define tasks you know that each participant will perform the same task. So you can measure the performance of that task. By doing this you can ask “How usable is this interface?” and provide an answer.

With interview-based tasks this is often impossible because the tasks vary in subject and complexity. It’s often  then inappropriate to use them to provide an overall measure of usability.

Exposing issues

I believe that usability testing is more reliable as a method for exposing issues than it is at providing a measure of usability. This is why I favour using interview-based tasks in most cases.

It’s difficult to say how true to life the experience you’re watching is. If they were sitting at home attempting a task then there’d be nobody watching them and taking notes. Nobody would be asking them to think aloud and showing interest in what they were doing. So if they fail a task in a lab, can you be sure they’d fail it at home?

But for observing issues I feel it’s more reliable. If participants misunderstand something about the interface in a test, you can be fairly sure that someone at home will be making that same misunderstanding.

And it can never hurt to make something more obvious.

In Lorraine’s last blog she described the data gathering methods used to obtain representative data from users of Edinburgh University’s Library services, the purpose of which was to identify patterns in user behaviours, expectations and motivations to form the basis of our personas. Raw data can be difficult to process and it is impossible to jump from raw notes to finished persona in one step, hence our six step guide.

There is no one right way to create personas and it depends on a lot of things, including how much effort and budget you can afford to invest. There are lots of articles on the web detailing various approaches and after much reading we decided to rely on 2 main sources of information which we felt best suited our needs.

One resource was the Fluid Project Wiki which is an open, collaborative project to improve the user experience of community source software and provides lots of useful guidance as well as sample personas. The other resource which we heavily relied on throughout the whole process was Steve Mulder’s book The User Is Always Right: A Practical Guide to Creating and Using Personas for the Web, which contains lots of great advice as well as step-by-step coverage on user segmentation.

There are three primary approaches to persona creation, based on the type of research and analysis performed:

  1. Qualitative personas
  2. Qualitative personas with quantitative validation
  3. Quantitative personas

There are a number of important steps to go through in order to get from raw data to personas and I will now explain the tools and methods used to generate our segments and personas for anyone who wishes to follow in our footsteps.

The first thing we did was to plan out a schedule of work which consisted of the following:

  1. Review and refine interview notes in the project wiki and flesh out user goals
  2. Write summaries for each of the participants
  3. Do a Two by Two comparison, to identify key similarities/differences
  4. Identify segments
  5. Write the personas
  6. Review personas

Step 1: Review/refine notes

We spent a day reviewing our notes in the wiki and fleshing out goals by referring to written notes taking during each interview, checking the audio recordings where necessary. We worked as a team which was beneficial as we were both present for each interview and therefore had a good grasp of all the data in front of us. Once we were happy with our set of notes, we printed out participant’s interview notes and attached each to the white board to make it easier to review all data grouped together.

Step 2: Summarise participants

Next step was to summarise each of our 17 participants (try to figure out who are these people) based on the following 4 categories.

  • Practical and personal goals
  • Information seeking behaviour
  • How they relate to library services
  • Skills, abilities and interests

We used different coloured post-it notes to denote each of the above categories. Once we had gone through this process for each participant, our whiteboard was transformed into a colourful mirage of notes.

We were now ready to start a two-by-two comparison of participants.

Step 3: Two-by-Two Comparisons

The next step utilised the two-by-two comparison method, a technique advocated by Jared Spool at User Interface Engineering ( This works by reading 2 randomly chosen participant summaries and listing attributes that make the participants similar and different. We then replaced one of the summaries with another randomly chosen one and repeated the process until all summaries were read.

Below is a list of some of the distinctions identified between our participants, using this method:

  • Type of library user
  • Years at Edinburgh University
  • Use of Edinburgh University library resources (digital and physical)
  • Use of external resources
  • System Preference (Classic or Aquabrowser)
  • Attitude to individual systems
  • Information seeking behaviour

We then created a scale for each distinction identified during the two-by-two comparison and determined end points. Doing so allowed us to place each participant on the scale and directly compare them.  Most variables can be represented as ranges with two ends. It doesn’t matter whether a participant is a 7 or 7.5 on the scale; but what matters is where they appear relative to other participants. The image below provides an example of our 12 scales mapped for each of our 17 participants.

Step 4: Identify Segments

Now that we had all our participants on the scales, we then colour coded each individual to make it easier to identify groupings of participants on each of the scales.  We looked for participants who were grouped closely together across multiple variables. Once we found a set of participants clustering across six or eight variables, we saw this as a major behaviour pattern which formed the basis of a persona.

After quite a bit of analysis, we identified 6 major groupings, each identifying an archetype / persona, which we gave a brief description to on paper, outlining the characteristics and identifying their unique attributes.

After reviewing each description we realised that group 6 was very similar to group 4 and so merged these two sets together, leaving 5 groups at the end of this step.

When carrying out this step, it is important to remember that your groups should:

  • explain key differences you’ve observed among participants
  • be different enough from each other
  • feel like real people
  • be described quickly
  • cover all users

Step 5: Write the Personas

We were now ready to write up our 5 personas.  For each group we added details around the behavioural traits based on the data we had gathered, describing their goals, information seeking behaviours and system usage amongst other things. We also talked about frustrations and pain points as well as listing some personal traits to make them feel more human.

We gave each persona a name and a photo which we felt best suited their narrative.  We tried to add parts of participant’s personalities without going overboard as this would make the persona less credible.  We kept the detail to one page and based it on a template provided by the Fluid Project wiki. It’s important to keep persona details to one page so they can be referred to quickly during any discussions. Remember that every aspect of the description must be tied back to real data, or else it’s shouldn’t be included in the persona.

Some people prefer to keep their persona details in bullet points, but we felt that a narrative would be far more powerful in conveying each of our persona’s attitudes, needs and problems.  We also added a scale to each persona, detailing their behaviour and attitudes, which serves as a visual summary of the narrative and main points.  It may be useful to refer to Fluid Persons Format page for example of these templates:

Step 6: Review the Personas

Once our personas were written, we reviewed them to ensure they had remained realistic and true to our research data. We felt that 2 personas in particular had more similar behaviours and goals than differences so we merged them into one complete persona. This left us with 4 library personas representing the students and librarians who were interviewed:

  • Eve the e-book reader: “I like to find excerpts of books online which sometimes can be enough. It saves me from having to buy or borrow the book.”
  • Sandra the search specialist: “In a quick-fire environment like ours we need answers quickly”
  • Pete the progressive browser: “Aquabrowser and Classic, it’s like night and day”
  • Baadal the search butterfly: “Classic is simple and direct but Aquabrowser’s innovative way of browsing is also good for getting inspiration.”

A full description of the personas can be found on the persona profiles page of our project wiki:

Research has shown that a large set of personas can be problematic as the personas all tend to blur together. Ideally, you should have only the minimum number of personas required to illustrate key goals and behaviour patterns, which is what we ended up with. Finally, to ensure we had a polished product, we asked a colleague who was not involved in the persona creation, to review the personas for accuracy in spelling and grammar.


From my experience, I would say that the most difficult step of the process was getting from step 3 (Two by Two comparison) to step 4 (Identify segments). Although we had initially planned to spend 3 days creating our personas, in the end it took us 5+ days.  If we were to repeat this exercise, I would allocate adequate time directly after each individual interview to write up detailed notes on the interviewee, detailing their specific goals, behaviours, attitudes and information seeking behaviour, rather than waiting until a later date to review all the notes together, as described in Step 2. In saying this, there are various different approaches which can be taken when creating personas and we would be very interested to learn what other researchers might do with the same data.

In the concluding part of this blog series, “User Research and Persona Creation Part 3: Introducing the personas”, Lorraine will discuss how we plan to keep the personas relevant and current in the future. bookmarks

Twitter feed