You are on page 1of 10

Usability and accessibility report concerning the design and redesign of a website (Group 4)

Ann Mortensen amor@itu.dk Lasse Hjer-Pedersen lhoj@itu.dk Maja Studsgaard mstu@itu.dk Kristoffer Rosenmeier krro@itu.dk

Frozen site: http://itu.dk/stud/projekter_e2009/Usability_project_frozen/Group 4 Revised site: http://itu.dk/stud/projekter_e2009/Usability_project/Group 4/Revised


ABSTRACT

A comparative analysis of usability and accessibility studies of the design and redesign of a web shop based on tests and user trials.
Keywords

H4: The revised site works better in terms of accessibility (Efficiency, effectiveness) H5: Errors are more unlikely to occur on the revised site than on the frozen site. (Efficiency, effectiveness) H6: The revised site gives the user a more satisfying experience. (Satisfaction) As shown in the round brackets, each hypothesis has been categorised according to the ISO 9241 definition, according to which usability consists of the following parameters: Effectiveness, efficiency and satisfaction [1]. In this paper, effectiveness is defined as the extent to which the user is able to find the desired information and accomplish tasks. Efficiency is scalable e.g. if the user is able to accomplish tasks quickly. And satisfaction is the users subjective satisfaction quantitative or qualitative. We believe that all three factors are of equal importance when measuring usability. Accessibility is defined here as the aspect of something being accessible to people with impaired vision, colour blindness, blindness, cognitive impairments and/or motor impairments.
BACKGROUND

Usability, accessibility, user trials, heuristics analysis, personas, screen reader, online accessibility testing tools, exclusive calculation.
INTRODUCTION

The work presented in this paper discusses the usability and accessibility studies of an online gift shop for companies and their employees. The site exists in two versions: 1. The frozen site, which was developed in the beginning of the process without taking into account any usability theory, accessibility testing tools or pilot studies. The revised site, which was a result of the testing and refining of the frozen version with the above mentioned tools.

2.

Both websites are based on the concept of the online gift shop offering the same 62 British products. The user is granted up to 10 stars that can be used for shopping. The discussion is a comparison of the two sites in terms of usability and accessibility, the latter primarily concerning blindness. The attached glossy report gives an visual overview of the two sites.
Hypotheses

To evaluate the usability and accessibility on the two websites, we used the following research questions posited as six hypotheses: H1: It is easier to find a specific product on the revised site than on the frozen site. (Effectiveness, efficiency) H2: The concept of the stars is more understandable on the revised site than on the frozen site. (Effectiveness) H3: It is easier to hit the buttons on the revised site. (Effectiveness)

The design of both websites is based on the following pages: 1. Login 2. Welcome 3. Shop 4. Delivery 5. Confirmation 6. Congratulation On both sites the user has to login, read the welcome message, choose the products, enter delivery information and confirm the order to fulfil the purpose of the website. The frozen site was designed with the stars as the main focus. All granted stars had to be used and consequently the frozen site was designed to offer sorting by the stars. No

clustering of the products was offered. The frozen site was designed to be simple, intuitive and easy to use. All pages were white, red and black with a Christmas theme. The breadcrumb was visible at all times and on the product selection page the basket was given a fixed position regardless of scrolling. The revised site was a result of the testing and refining of the frozen site. In the usability aspects, accessibility was also an important parameter.
METHOD

Accessibility

To improve the accessibility on the site, we used the following online tools to test accessibility continuously on the revised site, while making improvements: Jaws (screen reader), CynthiaSays (web content accessibility report), Wave (online validation tool for graphical accessibility issues in HTML), W3C (HTML validator), tab-test (manual tracking tab-sequence), exclusive calculations (calculation of the number of disabled people) and VisCheck (test of colour-blindness). Later on, the same tests were performed on the frozen site in order to determine and compare any potential differences and improvements from the frozen to the revised site.
User trials

The construction of the revised site comprises a series of methods but mainly usability and accessibility tests. The final user trials are made on the basis of both. Subsequently, several experiments were performed in order to develop the revised site:
Usabilitytests Accessibilitycheckers Heuristicanalysis Personas Jaws CynthiaSays Wave W3CsValidator Tabtesting Exclusivecalculations VisCheck Testprotocol Pilottest Finaltestprotocol Finalusertrials

As mentioned above, the test protocol was, based on the six hypotheses. In the test protocol all tasks were designed to give answer to one or more hypotheses. To test the protocol we performed a pilot test, which provided inspiration to the guidelines for the final user trials and test protocol. We discovered several imperative aspects. These findings resulted in increased importance to the fact that all tests should be performed by the same moderator and in the same browser to achieve the most measurable results. Relevant theory supported those findings [6]. Mozilla Firefox was used as the test browser for all user trials except for the blind user who preferred Internet Explorer due to screen reader installations. The following table gives an overview of the participants in the user trials:
User F/M Age Computer Initialtest experience A F 24 Intermediate Revised B C D E F G M M M F M M 30 24 31 26 65 43 Intermediate Revised Advanced Advanced Revised Frozen Profession Mathstudent Project assist ant Research as sistant Headof Section Ebusiness stu dent CEO ITprogrammer

Usertrials:

Table 1: Performed tests Usability

In order to improve the frozen site two discount usability methods were applied; personas[2] and Jakob Nielsens 10 heuristic evaluations [3]. To achieve the most realistic personas, our focus was on a normal working environment in a company from the head of a firm down to the student assistant. The personas were based on the experiences from heuristic testing on both existing sites on the Internet in general and the frozen site too. The heuristics helped develop the possible situations that potential users could take part in, and on the basis of these a set of hypotheses was developed. These hypotheses provided the basis for both the pilot study and the final test protocol used for our user trials. Consequently these hypotheses became the focal point of the project. Finally, the test persons were selected on the background of the personas. We chose six users that all have aspects from one of the personas represented. As seen in the table we have selected an equal number of users to balance the test [4]. By choosing six users, the majority of errors will be discovered [5]. Please note that User G is blind and does not figure in the balanced tests.

Below aver Frozen age. Below aver Frozen age Expert Frozen

Table 2: Background information of the users

The final test protocol was made with time taking in mind on the assumption that the more measurable the tasks, the more comparable the results. The protocol contained 13 tasks of which time was taken on 11 of them. Of these, eight tasks were comparable on both sites as some of the tasks could only be completed on the revised site. An overview of the tasks can be seen in table 3. For each site six supplementary questions were asked. The same protocol was used when the blind person tested the revised site ex-

cept for one task and one question concerning visual aspects.
No. Task Timetaking Fro. site T1 T2 T3 T4 T5 T6 T7 Findandadda4starproduct FindandaddPampersNappies(Bleer) FindandaddCalpol Canyoutelluswhatisinyourcart? x x x x Rev. site x x x x x x x

was asked to say when the task was considered completed at the end of each task. Time taking was performed with all users on the majority of tasks on both sites. By having six users the ability to set up a balanced design was made possible. The results are displayed in the following tables along with the mean values and a standard one-tailed t-test. The t-test was calculated in order to tell whether the calculations in the data sets are statistically significant.
Selected tasks

FindandaddHamleysLimitedEdition x TeddyBearChristopher(Bamse) Find out more information about the n/a TeddyBear(Bamse) You have decided that you want to x choosesomethingcompletelydifferent. Nowclearyourcart Count how many alcoholproducts are x availableinthewebshop(answer=8) Nowyouhavedecidedtofillyourcart. n/a Please use your normal shopping be havior.AfterwardsclickProceed The filled in address is your companys n/a address.Pleasefillinyourownaddress andclickProceed Youreceiveacallfromafriend.Herec ommends Kelloggs Frosties. Please re turn to the shop and choose this pro duct x

Overall, the majority of the tasks were performed faster on the revised site than on the frozen site. In the following some of the most interesting findings will be displayed. In table 4 the results of the task: Find and add Pampers Nappies are displayed:
Frozen Revised st 1 site nd 2 site A 20 12 12 20 B 33 22 22 33 C 18 12 12 18 D 26 11 26 11 E 26 10 26 10 F 33 20 33 20 Mean 26,0 14,5 21,8 18,7 pin% 0,041 28,453

T8 T9

x n/a

Table 4: Task: "Find and add Pampers Nappies".

T10

n/a

T11

T12

You are uncertain if it is possible to n/a sendthegiftstoyourfriend.Showhow you can find the answer to this ques tion Find the phone number of the service n/a provider

T13

The t-test shows that the calculation of the mean values has great significance (0,041% < 1%) and the mean values can consequently be used in comparison. In this case the task on the revised site was performed nearly twice as fast as the frozen site. When looking at data row three and four the ttest also helped conclude that there is no difference to whether the user tests the frozen or the revised site first (28,453% > 5%). As a result, the mean values did not have any significance and the Power Law of Practice [7] cannot be taken into account. Here, an actual improvement from the frozen to the revised site can be seen. In table 5 the results of the task: How many alcohol products are available in the web shop? are displayed:
Frozen Revised 1 site nd 2 site Answerfrozen Answerrevised
st

Table 3: Tasks performed on frozen site (Fro. site) and revised site (Rev. site) respectively.

Before running the user trials we set up a scenario for the users. They were told to imagine that they were with a company which had decided to grant all of their employees a number of stars to spend in an online gift shop.
RESULTS

A B C D E F Mean pin% 33 27 37 39 31 26 32,2 0,029 13 8 7 10 12 14 10,7 13 8 7 39 31 26 20,7 33 27 37 10 12 14 22,2 w w c c c c c c c c w 50% c 100% 44,325 n/a

In the following, an overview of the most relevant and interesting results and observations from the user trials will be presented in quantitative and qualitative results respectively.
Quantitative results Time taking

Table 5: Task: "How many alcohol products are available in the web shop?". Users marked as A, B, C... Answers marked as 'wrong' and 'correct. (Times in seconds).

Time taking was a very important aspect of the user trials and due to this it was essential to obtain as measurable results as possible. At every task the word go was used to let the user know exactly when to start the task. The user

The situation is the same in this case, as the task is performed approximately three times faster on the revised site than on the frozen site. The t-test again concludes that the result is very significant (0,029% < 1%).

In row five and six the answers to this question are collected. Only half of the users gave a correct answer when testing the frozen site compared to all users answering the question correctly on the revised site. Not only did the users perform the task faster on the revised site; a higher percentage of the users also answered the question correctly.
The time taking of the blind user

ter the test of each site. Half of the users were asked to test the sites in reversed order to balance the results.

The time takings on the blind user are analyzed separately as these timings are not comparable to the other users time taking results. In table 6 all comparable tasks that are completed by the blind user are shown:
UserG Frozen Revised 1 site 2 site Diff.%
nd st

T1 52 115 52 115 121

T2 38 96 38 96

T3 15 60 15 60

T4 15 46 15 46

T5

T6 T7

T8 182 182 94 Figure 1: Ratings. Question 1-6. The mean values of the six user responses. (Ratings scaled 1-7).

127 55 283 94 114 70 77 114 70 77 27 73 127 55 283 94

153 300 207 10

Table 6: The blind users time takings on tasks. Percentagewise difference between the frozen and revised site respectively.

The ratings given by the blind User generally showed a higher satisfaction with the revised site. Especially in Q3: How easy is it to find a specific product a remarkable difference is clear:

Some tendencies can be noted; except for one task, the blind user spent considerably more time completing the task on the revised site than on the frozen site which can be seen in table 7 where the result of two tasks are displayed:
"Find and add Pamp ersNappies". Frozen Revised 1 site 2 site
nd st

"How many alcohol products are availableonthesite Frozen Revised 1 site 2 site Answerfrozen Answerrevised
nd st

38 96 38 96

283 77 283 77 Wrong Correct Figure 2: Ratings by the blind user. Question 1-5 (Q6 about graphical design not applicable) Errors

Table 7: Results of user trial performed by the blind user. Ratings

We asked the users to rate six questions in a scale of 1-7 where 7 is highest as seen in figure 1. The questions were: Q1: How satisfied are you with the site? Q2: How did you like the navigation? Q3: How easy was it to find the specific products? Q4: How reliable did you find the site? Q5: How sure are you that your products will be delivered? Q6: How did you like the graphical design? It was clear that the revised site was rated higher in every of the six cases. The questions were asked immediately af-

The amount of errors caused by the interaction between the user and the web sites presents an overview of the overall error tendencies on the site. The errors are divided into two categories: cognitive and technical. Where both cognitive and technical errors are stated as seen in table 8 and 9 below we assume that the errors are caused by wrong navigation. The user might expect something from the system which is understandable under present web conventions but not possible on either of the websites. Because of the variety of errors on the two sites, these are not directly comparable. It is, however, possible to compare the number of errors caused by users. These errors are represented 19 times on the frozen site in comparison with 11 times on the revised site. To compare the overall severity of the errors, the weighting-column has been added. Here, the number of users that have caused an error is multiplied with a severity-score assessed by a 1-5 scale where 5 is the most severe error. This

is done to be able to compare the two sites while taking the number of users and severity of errors into account.
A U S W Cat Errorsonfrozensite Count wrong number of alcohol 3 3 1 3 C/T products 6 5 4 20 C/T Usingbreadcrumbtonavigate Rightclicktonavigate 2 2 5 10 C Fails to add a product (show 1 1 5 5 C/T stopper) Chooses the wrong item (in acc. 1 1 1 1 C totask) Fails to understand the concept 1 1 4 4 C ofstars Doesnotnoticethataproductis 1 1 3 3 C/T notadded Usingbrowserbackbuttontogo 3 2 1 2 C/T back Does not notice the 'return but 1 1 4 4 C/T ton'atpage3 Trying to click on items to view 1 1 4 4 C/T fullcart Browsershows404 1 1 5 5 T Total 21 19 3,4 61 Table 8: Errors found on the frozen site. A = Amount of errors, U = Users (the number of users that have encountered the error), S = Severity of the error, W = Weighting of the error (U*S), Cat = Category: C = Cognitive, T = Technical. A U S W Cat Errorsonrevisedsite CannotfindFAQ 3 3 1 3 C/T Trouble understanding the de 1 1 4 4 C/T liveryprocedure Cannot tell what is in cart (does 2 2 3 6 C notuse'viewfullcart') Clicks on 'Delivery' in bread 3 3 3 9 C/T crumbtogetinfo.NotFAQ Clicksonimagetoaddtocart 1 1 1 1 C Does not understand the con 1 1 3 3 C ceptofgreyedout Total 11 11 2,5 26 Table 9: Errors found on the revised site. A = Amount of errors, U = Users (the number of users that have encountered the error), S = Severity of the error, W = Weighting of the error (U*S), Cat = Category: C = Cognitive, T = Technical.

Critical path

With regard to time taking and errors, the revised site has significantly fewer errors than the frozen site. One of the causes seems to be the fact that the revised site offers more paths to complete the task contrary to the frozen site that only has one path towards the goal. User F for instance had some cognitive difficulties with the delete item icon - the small x - on the revised site. Even though the critical path was categorized as being "to delete the product directly via the icon user F chose to click on view cart and delete the product from the cart overview. As a result, the obvious deviation from the critical path which also facilitated a solution to the task was not necessarily a poorer choice. The frozen site offered only one path which caused problems in this situation. An observation could be that it is no use to focus only on making the user use the critical path.
CynthiaSays

CynthiaSays performs an online accessibility test which singles out and sorts the relevant errors that conflict with the WCAG-guidelines [8].The URL of the product selection page on both sites were tested. The result from CynthiaSays is based on the amount of accessibility checkpoint errors on each site divided by priorities:
Accessibility Priority1 Priority2 Priority3 Frozensite 1 1 0 Revisedsite 1 1 2

Table 10: Accessibility checkpoint errors

On the frozen site the failure in priority 1 is the missing alternate text. This is because only title tags are used. The same failure was repeated at the revised site though the number of these failures was reduced. The failure in priority 2 is missing metadata. This failure also exists on both sites. The failures in priority 3 regarding language-elements and anchor-elements are only found on the revised site.
Wave

Wave was used as a supplement to CynthiaSays and showed the same accessibility errors. Therefore Wave verified the problems that was found using CynthiaSays.
WC3s HTML Validator

The weighting sum for the frozen site is 61 compared to 26 on the revised site. In this case, it seems like the revised site contains fewer and less important errors. The spread of the type of the errors seems to be the same on both websites; both cognitive and technical are represented several times.

All pages were tested in the WC3s HTML Validator and these results can be seen in table 11 and 12. On both sites one of the primary type of errors concerning missing alttags as also shown in CynthiaSays. These alt-tags would have improved accessibility and helped the blind user. So far focus had been on title-tags in order to increase the cognitive understanding on the sites. Consequently alt-tags should have been added as a supplement. Another common validation error is a repetition of an already defined div id to define styling elements for the CSS. These should have been placed in a div class-element instead.

75 and 102. Furthermore, the number of elderly people with vision impairments has increased noticeably.
Page Welcome Productselection Deliveryinformation Confirmation Total Table 11: The WC3s HTML site. Page Welcome Productselection Deliveryinformation Confirmation Total Table 12: The WC3s HTML site. Tab-testing Errors Warnings

11 0 969 155 20 0 14 0 1014 155 Validator results on the frozen

Errors

Warnings

3 12 335 136 6 16 3 16 347 180 Validator results on the revised

Figure 4: Exclusive calculations of the frozen and revised site (75 102 years). Loc = Locomotion, Res = Reach and Stretch, Dex = Dexterity, Vis = Vision, Hea = Hearing, Com = Communication, Int = Intellectual functioning.

Tab-testing on the frozen site indicated a need for product sorting. For instance, it took 62 tabs to go through all the products before being able to navigate to the shopping cart.
Exclusive calculations

Exclusive calculations performed on the revised site also point out the same issues as on the frozen site though reduced considerably.
Qualitative results Heuristics

Exclusive calculation tests performed on the frozen site show that the largest exclusion is dexterity, which is due to the quite small target areas. Intellectual functioning and vision are both noticeable issues. In order to compare the two sites data from both the frozen and the revised site is placed into one chart presented in two figures divided into age groups:

The first qualitative result is based on Nielsens 10 heuristics. All 10 heuristics have been applied to both websites in order to make an individual evaluation and to compare them. The most interesting aspects are the error prevention (heuristic 5) and the flexibility and efficiency of use (heuristic 7). There is no error prevention on the frozen site. If the user does not understand the intended use (cognitive/technical), the site offers no dialogue or FAQ. In contrast to this, the revised site offers an FAQ page as well as a helping alternate text to prevent wasted clicks [Glossy, pp. 3, pic.3]. Nor when over- or underspending stars does the frozen site offer error prevention. If the user clicks on a certain product without having enough stars, nothing happens. The revised site has a 3-step error prevention to avoid cognitive errors. When the user does not have enough stars to choose a certain product, it will turn grey. If the user hovers the product the alternate text explains the situation. If the user finally clicks on the product a dialogue box explains that the amount of stars available is too low [Glossy, pp. 3, pic.7].

Figure 3: Exclusive calculations of the frozen and revised site (16 102 years). Loc = Locomotion, Res = Reach and Stretch, Dex = Dexterity, Vis = Vision, Hea = Hearing, Com = Communication, Int = Intellectual functioning.

Regarding age comparison, the two figures show that the same issues are relevant for both age groups and there is a notable increase in the overall number of people between

Flexibility and efficiency of use is also a heuristic where the two sites differ but still have some similarities. The frozen site has no accelerators to speed up the process. The user has to follow the intended use at all times. The revised site offers sorting by category, stars and popularity. Furthermore, the revised site has a pre-filled address form to speed up the process. Besides this, none of the sites have a search feature. This deliberate choice will be addressed in the discussion of this paper.

Observations and interviews

Our observations during the user trials were accompanied by questions from the moderator. These interviews revealed a general demand for a search feature. Another frequent observation was the attempt to navigate by means of the breadcrumb. On the whole, a great coherence between the observations, the interviews, the task completion time and ratings given at the end of the test was observed.
Online testing tools

For the blind user, Jaws was used at User Gs user trial. This revealed that some of the design ideas were not appropriate as we had chosen to abbreviate all names in the basket and added three full stops (dot). This did result in an unfortunate incident in Jaws as all the names were read aloud including the (dotdotdot) e.g. Calpoldotdotdot. Testing with Jaws consequently influenced the design process of the product selection on the revised site. To attain the best use of Jaws, a division of the site into three sections was added (Sort, Choose, Order) to achieve a much better overview of the site this especially helps the blind user navigating on the sites. To further improve the revised site we have also tested the site to observe how users with colour-blindness would experience the site by using VisCheck. Testing for deuteranope, protanope and tritanope colour vision showed that the colour schemes of the two websites worked well and no redesign was necessary.
DISCUSSION

the two designs. Of course users rarely need to count all the products in a given category but this task shows that the general overview is increased on the revised site. The Power Law of Practice says that the more times the user completes a certain task, the faster the user is able to perform it. In order to address whether the Power Law of Practice has had an influence on the results, it is necessary to look at the t-test results from both the first and second site tested by the user to see if the results are useful. The results show that it is not possible to speak of a statistical significance when it comes to the difference in the time taking results when comparing the first and second site regardless of the order in which the sites are tested. The results show that the users perform the tasks faster on the revised site than on the frozen site despite the balanced design. This is why the Power Law of Practice does not influence the second time the task is performed. Furthermore, in this case, the Power Law of Practice is applicable by the fact that 50 percent of the users answered differently in their second try. This points out the fact that the clustering on the revised site is likely to be the reason for the correct answers and the faster completion times. The users were asked to rate six questions about the sites. The ratings of the six questions asked are shown by mean values in figure 1. The users rated every question higher when it came to the revised site. In question 3 (Q3): How easy was it to find the specific products? the revised site was rated nearly 46% higher than the frozen site (4 compared to 5.83). These ratings support the general tendency that it is easier to find a specific product on the revised site which is supported by both time taking and user statements in general. It is an interesting fact that on the revised site the clustering of the products might force the user to perform an extra click to get to a certain product compared to the frozen sites grid view of all the products - when defining scrolling as a non-click action. But even though it might take an extra click, the users confirm that the revised site is easier to use when it comes to finding a specific product. Regarding the navigation, the users were asked if they had any ideas that could facilitate the use of the site. Numerous suggestions were received, but one that came up several times was the idea of a search field. Jakob Nielsen defines a search field as an accelerator and finds it obvious to place a search feature on a website. It has been a deliberate decision not to use a search field on the websites due to the following reasons. Firstly, both websites offer only 62 products each - a limited number that restricts the challenge of finding a specific product. Secondly, it is rarely the case that a user, visiting the shop for the first time, knows exactly what she is looking for and searches for this product. The tasks given were forcing this need. Despite the users clear thoughts of a search field, the need of this feature is not necessarily supported. This is due to the fact that the users were being placed in an unnatural situation in regard to the use of this site. In the event of an expansion of the

In order to discuss the results and evaluate the six hypotheses, the following discussion is structured around the hypotheses.
H1: It is easier to find a specific product on the revised site than on the frozen site.

To assess whether one of the sites is easier to use, time taking the tasks is one way to go. In figure 4 and 5, the mean values of the time takings are calculated. In the first task, the revised site is twice as fast as the frozen site and in the second task the factor is nearly 3:1. We are able to look at the difference between the mean values because of the great significances in the t-test-results. In order to describe this difference between the mean values of the time takings, a look at the navigation is evident. There is a fundamental difference in the design of the sites in that the frozen site offers a long grid of all products and requires scrolling to view all products while on the revised site the products have been clustered and sorted into categories. Especially in the task regarding the count of alcohol products it took a long time to count them on the frozen site. This has to do with the fact that the user was forced to count the alcohol products one by one when scrolling the grid of all the products. On the revised site a click on Alcohol gives the user a view of all alcohol products available on the site. In this case the difference in ease of use on the two sites is reflected in the difference of

web shop there is no doubt that a search option is a good idea which would secure further scalability should the number of products increase. To sum up, the time taking results indicate that the revised site is more than twice as fast as the frozen site. The reliability of the results is supported by the fact that the Power Law of Practice is not applicable in this case. The ratings support the tendency from the time taking results. A search field is an opportunity, but not necessary in the web shop although it would be an obvious priority in case of expansion. Because of the above mentioned reasons hypothesis 1 can be confirmed.
Hypothesis 2: The concept of the stars is more understandable on the revised site than on the frozen site.

ton that is 23 times larger than the delete icons on the frozen site, which is the only way to delete items on the frozen site [Glossy, pp. 3, pic.1, pp. 4 and 5]. Figures 3 and 4 also show that the visual issue has been optimized and reduces the number of people excluded from the use of the site. By making the icons more visually clear the target areas have obtained a higher definition. The more defined target areas the faster did the users understand their functionality. Contrary to this, it was a problem when the target areas were not defined well enough. For instance, with regard to the View cart button on the revised site the colour was too similar to the background colour and therefore quite indistinguishable [Glossy, pp. 3, pic.1 and 5]. The same problem occurred at the Delivery information page with the Go to shop button. A more contrasting colour scheme might have been able to counter that. To sum up, both the dexterity support and the definition of the buttons have been improved on the revised site and the number of people excluded due to dexterity reasons has been reduced.
Hypothesis 4: The revised site works better in terms of accessibility.

It is difficult to address this hypothesis. The observations show that the users relatively quickly understood the concept of the stars on both sites. However, no data such as time taking or other methods exists to support that this hypothesis can be confirmed. Furthermore, the concept is the same on both sites. When a user has grasped the idea of the stars on the first site, it should also be comprehensible on the second site. This stops the ability to perform useable time taking, making it impossible to measure the progression. Furthermore, it is relevant to imagine that some users would have been embarrassed if their lack of ability to understand the concept had been exposed. To sum up, it is difficult to find out exactly when the user understands the concept of the stars. If the user understands the concept while interacting with the first site, it is clear in the second site from the beginning. Consequently, it has not been possible to confirm hypothesis 2.
Hypothesis 3: It is easier to hit the buttons on the revised site

To determine whether the revised site is actually better in terms of accessibility, an elaboration of the results is necessary. Here, the main focus is on User G, the blind user. The exclusive calculations made on the sites suggest the following exclusion factors influencing the use of the two sites. In general the tendency is that the revised site exclude fewer people in terms of the exclusive calculations. As mentioned in hypothesis 3, dexterity support is primarily a problem on the frozen site. The visibility aspect is also affected by the size issues. Another exclusion factor is the intellectual functioning. The frozen site offers no help for this while the revised site has a 3-step error prevention in terms of cognitive problems consequently resulting in a lower number of excluded people on the revised site. Other aspects beside the exclusive calculations include validation problems and the ability to tab through the site. The relatively high amount of errors in the W3C Validator are caused by missing alternate text and repeated div ids. The same result occurs on the revised site. This is a significant problem but it is also quite simple to correct on a dynamically built website. The title-element was chosen as a part of the 3-step cognitive error prevention, but the missing alternate texts made it difficult for User G to complete some specific tasks as a result of lack of accessibility. If these were corrected the accessibility would improve immensely. The tab-testing made it clear that the frozen site had to be restructured. The concept of showing all products on one page prolonged tasks as it was necessary to tab through all products. When User G was asked to find a specific product and did not know how to spell it, the browser-search (CTRL+F) was not an option, resulting in User G making

When considering the possibilities of hitting the buttons on a website two things are important; dexterity support and visual clarity. Exclusive calculations under Results show calculations based on impaired peoples potential conflicts when interacting with the websites. Figures 3 and 4 show that the dexterity is an area in which many people are influenced by the lack of accessibility on the frozen site. The revised site indicates a substantial reduction in the number of people excluded from the use of the site. Dexterity support was especially a problem on the frozen site. This was particularly a consequence of the small add to basket text anddelete item icon. On the revised site all buttons have been enlarged in accordance with Fittss Law which predicts that the time required to move to a target area is connected to the distance and the size of the target [9]. The enlargement of the buttons is one way of making it easier and in the same time faster to hit the buttons. The revised site also offers an Add button instead of text and if you click the View cart button there is a Remove but-

Jaws read through all the products. Consequently the time taking on these tasks were longer than what might have been expected. Contrary to this, the navigation on the revised site facilitates an overview of the menu before browsing the products in each category, subsequently improving the accessibility. As shown in table 6 the blind user completes the tasks considerably slower at the revised site except for two tasks. Even though User G performed the tasks slower on the revised site it was more satisfying to use due to the ratings by, and the observation of, the user. This was because User G was more aware of the navigation possibilities on the revised site. However, User G suggested the implementation of an even more dialogue based navigation that for example told the user when a product was added or deleted correctly. To sum up, the exclusive calculations show that the 3-step error prevention makes a positive difference when speaking of intellectual functioning. The many, though easily corrected, errors were found due to the online validation. The errors were important to the blind user and need to be reduced. Using the tab as well as the clustering made the revised site worked better though the clustering made the browser built-in search feature more useful on the frozen site. Generally, the blind user performed the tasks slower on the revised site but is at the same time much more satisfied with using the site. According to the results, the hypothesis 4 can be confirmed.
Hypothesis 5: Errors are more unlikely to occur on the revised site than on the frozen site

cluding an FAQ and contact information. These features offered support when the user needed help.
Technical errors

Regarding the technical errors the situation seems to be the same. One of the reasons seems to be the fact that most of the technical errors occur in connection with the cognitive errors. On the frozen site one of the cognitive/technical errors was the issue that all six users tried to use the breadcrumb to navigate. The intention of the breadcrumb was only to provide a visual overview, but turned out to confuse the users. On the revised site, the breadcrumb was backwards clickable. This solved some of the cognitive/technical errors, meaning that the revised site took the cognitive problems into consideration and suggested a technical solution. However, this was highly necessary due to the inconsistency in the design of the buttons on the revised sites Confirmation page. The frozen site did - as the only one - provoke an error 404. This kind of error is not at all and should not be possible to call by the normal use of a website. To sum up, the errors on the two sites show a tendency of the user making primarily cognitive errors and one can say that the cognitive errors occurs because of the technical aspects on the sites. It is clear that both sites should provide a better cognitive support. Even though there is room for improvement, hypothesis 5 is confirmed.
Hypothesis 6: The revised site gives the user a more satisfying experience

When looking at errors, it makes sense to distinguish between cognitive and technical errors resulting in the following definitions: Errors dealing with interaction between the user and the websites are defined as cognitive errors while errors dealing with the technical reliability are defined as technical errors. Errors shown in the chapter Results in the Accessibility tests section are for this reason not applicable in this context.
Cognitive errors

Satisfaction deals with the users feeling of pleasure when using the site. The users obtain their experience of satisfaction through their general experience of the sites. When asking directly about the level of satisfaction (Q1), the revised site ranked clearly above the frozen site as seen in figure 1. Even though all aspects in using the sites seem to have an influence on how the users rated the overall satisfaction, some aspects seem more important than others. For instance, it seemed more important to the users that they had a positive experience of the navigation, even though it was not necessarily time-saving, as seen with the blind user in hypothesis 4. Reliability was also an important issue. Both sites scored high on the question about reliability (Q4 and Q5) but the users made it clear that the scenario that had been set up was of high importance for their evaluation of the sites reliability. The fact that it was a company that had chosen the gift shop for their employees Christmas shopping was perceived as a quality stamp by the users. On the other hand, the gift shop appeared unreliable as a random shop found on the Internet. Also, the graphical design affected their experience of reliability. The more refined and thorough design the more professional it was perceived. To sum up, the revised site was rated higher in all questions consequently offering a higher satisfaction for the users confirming hypothesis 6.

Both sites have a built-in feature to restrict the amount of stars a user can add to the basket. However, the primary difference between the two sites is that the revised site features the 3-step error prevention. This makes the user more aware of potential problems and errors occurring on the revised site. Here, this 3-step error prevention instructs the user to make sure all stars are used before proceeding contrary to the frozen site where no error prevention occurs [Glossy, pp. 3, pic.6 and 7]. Consequently, it could be argued that fewer errors will occur on the revised site. This argument is supported by the tables seen in the Results chapter in the Error section showing a very clear difference in the number of errors occurring combined with the severity of these. This could be due to the 3-step error prevention. Furthermore, the revised site contained significantly more information about the features on the site in-

CONCLUSION

In the light of the hypotheses listed above, this conclusion will take the results of trying to increase usability on the revised site into a critical examination. The results from all the performed tests and user trials indicate that the revised site is distinguishably easier to use. This is especially supported by the fact that some tasks on the revised site were performed more than twice as fast than on the frozen site. As a result we can conclude that when it comes to navigation, the revised site is easier to use which contributes to an overall higher degree of usability. The revised site is generally easier to understand and for that reason it is easier to use. Though it is difficult to establish at exactly what point the user understands the concept of the stars, it is clear that the revised site provides an overall more intuitive use due to, among other things, an improved navigation. This determines that the effectiveness on the revised site has been improved immensely compared to the frozen site. Accessibility has been one of the focal points when revising the site. Both the dexterity support and the definition of the buttons have been improved in the revised site and the number of people excluded due to dexterity and visionrelated reasons have been reduced. In terms of accessibility, the revised site works better although there is still room for improvements. Many, but easily corrected, errors mattered a lot to the blind user and need to be reduced. Using the tab as well as the clustering raised the level zof accessibility on the revised site, which was manifested when the user trial with the blind user was performed. Generally, the blind user performed tasks slower on the revised site but was at the same time much more satisfied with this site. Furthermore, the 3-step error prevention on the revised site results in fewer people being excluded. The summation of all errors on the two sites shows a tendency towards the users making primarily cognitive errors. It could be claimed that these errors occur mainly as a result of the technical aspects on the sites. It is clear that both sites should provide a more cognitive support. When comparing the two sites the revised site gives the users the highest satisfaction. All aspects from navigation to graphical design are of importance for the users perception of satisfaction. All tests as well as user trials have contributed to improvements and to the importance of seeing usability as a necessary aspect on websites in order to make a unified user experience.
FURTHER WORK

To increase knowledge on how the users explore the site eye-tracking could be a rich supplement. The tracking offers a possibility to follow the users way through the product range and especially to find areas of the site that is not used in a proper way. This would give an interesting view on the results which is based on self-reporting and observations. Lucky-pick is a feature that gives the user a one-click-buysolution which automatically adds a selection of products to the cart inspired by the top 10-sellers. The total cost of the product is equal to the number of stars available resulting in the Proceed button showing up immediately. The luckypick could be a good idea when trying to meet the needs of users wanting to go through the website even more quickly and easily.
REFERENCES

1.

Hornbk, Kasper. Usability: What is it, How to messure it, Power Point slides 20.11.2009: http://www.eng.cam.ac.uk/inclusivedesign/index.p hp?section=data&page=exclusion_calc Arvola, Mattias and blomquist, sa (2002). Personas in Action: Ethnography in an Interaction Design Team. NordiCHI, Short Papers, 197-200. Jakob Nielsens 10 heuristics: http://www.useit.com/papers/heuristic/heuristic_li st.html Balanced design. Power Point slides 23.10.2009 and 28.10.2009: http://www.itu.dk/people/awfr/e2009usability_slid es/Slides_23_oct.ppt and http://www.itu.dk/people/awfr/e2009usability_slid es/LSK_posted_ITU_lecture_13_281009.ppt Bevan, Nigel, et al (2003). The magic number 5: is it enough for web testing?, CHI: New Horizons, 698-699. Hollingsed, Tasha and Novick, David G (2007). Usability Inspection Methods after 15 Years of Research and Practice. University of Texas at El Paso, El Paso TX, 249-253. The Power Law of Practice. Power Point slides 28.10.2009: http://www.itu.dk/people/awfr/e2009usability_slid es/LSK_posted_ITU_lecture_13_281009.ppt Web Content Accessibility Guidelines: http://www.w3.org/WAI/intro/wcag.php Fittss Law. Power Point slides 30.10.2009: http://www.itu.dk/people/awfr/e2009usability_slid es/Handout_301009_lecture14.doc

2.

3.

4.

5.

6.

7.

Areas with room for improvement are the graphical design and the overall satisfaction of the site. Satisfaction is an important part of usability but is often not the first thing to go into depth with. As a qualitative tool, focus groups could be a way of gaining a more nuanced insight into these areas.

8. 9.

You might also like