Conducting Usability Testing: a case of just following the task script?

Canonical

on 11 November 2013

This article is more than 11 years old.


In the previous post, we talked about how to design effective user testing tasks to evaluate the usability of an interface. This post continues with this topic by highlighting a number of key strategies that you may need to use when conducting a formative user testing, whose main aim is to identify usability problems and propose design solutions, rather than compare quantitative metrics (summative testing), eg. task completion time and mouse clicks. It is unlikely that the prepared task script could be strictly applied without any changes, since the testing situation tends to be dynamic and often unpredictable. To get useful data, you need to be able to manipulate your task script with flexibilities, while also maintaining consistency.

Changing task orders during testing

Normally, to avoid order effect, the issuing of the tasks should be randomised for each participant. Order effect refers to the effect in which the tasks are presented, which can affect the results, specifically: users may perform better in later tasks as their familiarity with the interface increases; or users may perform worse as they become more fatigued. However, as discussed in the previous post, the tasks are often contextual and dependent on each other, so you need to carefully consider which tasks could be shuffled. For example, it is a good practice to mark on the script to indicate dependent tasks, so that you know these tasks should not be reordered and separated from each other. In other words, the dependent tasks must always be moved together. It is worth noting that the randomisation of task orders may not always be possible for a testing, for example, when the tasks are procedurally related, such as a testing focusing on payment flow.

Sometimes you may need to change the task orders by considering their levels of difficulty. This is useful in the following two scenarios: when you notice a participant appears to be slightly nervous before the testing has started, provide a simple first task to put him/her at ease; or when you observe a participant has failed to solve several tasks in a row, provide one or two easy tasks to reduce the frustration and stress, and boost confidence.

Another type of changing task order is made in response to users’ solicited goals that are associated with a coming task. For example, in one phone testing, after a participant checked the battery level, s/he spontaneously expressed a desire to know if there was a way to switch off some running apps to save battery. In this case, we jumped to the task of closing running apps, rather than waiting until later. This makes the testing feel more natural.

Remove tasks during testing

There are typically two situations that require you to skip tasks:

  • Time restriction

  • Questions being answered with previous tasks

Time restriction: user testing normally has a time limit. Participants are paid for certain lengths of time. Ideally, all the tasks should be carried out by all the participants. However, sometimes they take longer to solve tasks. Or you may discover areas that require more time for investigation. In this case, not all the tasks could be performed by a participant within the given time. Subsequently, you need to be able to quickly decide which tasks should be abandoned for this specific participant. There are two ways to approach this:

  • Omit tasks that are less important: it is always useful to prioritise the tasks in terms of their importance – what are the most important areas that have key questions that need to be answered and require feedback; what could be left for the next testing, if not covered this time?

  • Omit tasks that have already received abundant feedback: skip the tasks from which you have already gathered rich and useful information from other participants.

Questions were answered with previous tasks: Sometimes questions associated with a specific task would be answered while a participant was attempting to solve the previous task – in this case, you could skip this task.

In one of our phone testings, we asked a participant to send a text to a non-contact (a plumber). During the task-solving process, s/he decided to save the number to the contact book first and then send a text. In this case, we skipped the task of ‘saving a number to contact book’.

However sometimes you should not skip a task, even if it might seem repetitive. For example, if you want to test the learnability and memorability of a feature, having the participant perform the same task (with slightly different descriptions) for the second time (after a certain time duration) could afford useful insights.

Add tasks during testing

There are three contexts in which you could consider adding tasks:

  • Where the user formulates new goals

  • Re-examinations

  • Giving the user a second chance

The added task must be relevant to the aim of the testing, and should only be included if the testing time permits.

User formulates new goals: you could add tasks based on user-formulated goals in the task solving process.

For example, in one phone testing, one participant wondered if s/he could customise the tiles on the Windows phone’s home screen. We made this an added task for her/him. Adding tasks based on user articulated new goals follows their thought process and make the testing more natural. It also provides opportunities for us to discover new information.

Re-examinations: sometimes the users may succeed in a task accidently, without knowing how s/he did it. In this case, the same task (with a slightly changed description) could be added to re-assess the usability.

For example, in one phone testing, we had one task: “You want to call you sister Lisa to say thank you for the phone”. One participant experienced great difficulties in performing this task, and only completed it after a long time and by accident. In this case, we added another task to re-evaluate the ease of making a phone call:

“Your call is cut off while you are talking to your sister, so now you want to call her again.”

Similarly in the Gallery app testing, where participants managed to add a picture into a selected album accidently, we asked them to add another picture into a different album.

Re-examination allows us to judge accurately the impact of a problem, as well as to understand the learnability of interface – the extent to which users could detect and learn interaction patterns (even by accident), and apply the rules later.

Giving the user a second chance: in the majority of the user testing, participants used the evaluated interface for the first time. It could be very demanding for them to solve the tasks successfully in their first attempt. However, as the testing progresses, participants might discover more things, such as features and interaction patterns (although possibly by accident). Consequently, their knowledge of the interface may increase. In this case, you could give them another chance to solve the task that they failed earlier in the tests. Again, this helps you to test the learnability of the interface, as well as assess the impact of a problem.

For example, in a tablet testing, one participant could not find the music scope earlier in the testing, but later s/he accidentally discovered the video scope. To test if s/he now understood the concept of dash scopes, we asked the participant to find the music scope again after several other tasks.

Change task descriptions (slightly) during testing

Information gathered from pre-testing brief interview and participants’ testing verbal data could often be used to modify the task description slightly to make the task more realistic to the users. This also gives the user the impression that you are an active listener and interested in their comments, which helps to build a rapport with them. The change should be minor and limited to the details of the scenario (not the aim of the task). It is important that the change does not compromise the consistency with other participants’ task descriptions.

For example, in a tablet testing, where we needed to evaluate the discoverability of the HUD in the context of photo editing, we had this task: “You want to do some advanced editing by adjusting the colour of the picture.” One participant commented that s/he often changed pictures to ‘black and white’ effect. In response to this, we changed the task to “You mentioned that you often change a picture to black and white, and now you want to change this picture to ‘black and white’”. The task change here does not change the aim of the task, nor the requirements for solving the task (in this case, the access to the HUD), but it becomes more relatable to the participant.

Another example is from a phone testing. We changed the task of “you want to go to Twitter” to “you want to go to Facebook” after learning the participant uses Facebook but not Twitter. If we continued to request this participant to find Twitter, it would make the testing become artificial, which would result in invalid data. The aim of the task is to evaluate the ease of navigation in finding an app, therefore changing Twitter to Facebook does not change the nature of the task.

Conclusions

This post outlines a number of main strategies you could use to modify your task script to deal with typical situations that may occur in a formative user testing. To sum up:

Changing tasks orders: randomise tasks for each participant if possible, and move the dependent tasks as a whole; consider the difficulties of the task and issue an easy task to start with if you feel participant is nervous, or provide an easy task if participants failed several tasks in a row. Allow them to perform a later task if they verbalise it as a goal/strategy for solving the current task.

Remove tasks: if time is running out with a particular participant, omit certain tasks. This could be tasks with low priorities; tasks that already have enough feedback from other participants; or tasks the participant has already covered while attempting a previous task.

Add tasks: if time permits, allow users to perform a new task if it is a user initiated goal and is relevant to the testing; repeat a task (with slightly different wording and at an appropriate time) if the user succeeds in a task accidently, or has failed this task earlier, or if the aim is to test the learnability of the system.

Change task description: slightly amend the details of the task scenario (not the aim of the task) based on users’ verbal data to make it more relatable and realistic to the user. This will improve the reliability of the data.

If you have other ways to maneuver the tasks during the testing session, or have situations you are unsure about, feel free to share your experience and thoughts.

Talk to us today

Interested in running Ubuntu in your organisation?

Newsletter signup

Get the latest Ubuntu news and updates in your inbox.

By submitting this form, I confirm that I have read and agree to Canonical's Privacy Policy.

Related posts

Designing Canonical’s Figma libraries for performance and structure

How Canonical’s Design team rebuilt their Figma libraries, with practical guidelines on structure, performance, and maintenance processes.

Visual Testing: GitHub Actions Migration & Test Optimisation

What is Visual Testing? Visual testing analyses the visual appearance of a user interface. Snapshots of pages are taken to create a “baseline”, or the current...

Let’s talk open design

Why aren’t there more design contributions in open source? Help us find out!