Metrics-based evaluation of learning object reusability more |
44 views |
Software Qual J DOI 10.1007/s11219-010-9108-5
Metrics-based evaluation of learning object reusability
Javier Sanz-Rodriguez • Juan Manuel Dodero Salvador Sanchez-Alonso
•
Ó Springer Science+Business Media, LLC 2010
Abstract This paper aims to help in the selection of reusable educational materials from repositories on the web, developing an indicator of the reusability of learning objects. For this purpose, our research will be carried out in three stages. The first, based on previous studies in this area, will determine those aspects that influence reusability. The second will define a set of metrics that measure those aspects using metadata. The third will propose different methods of aggregation in order to obtain a single resulting value and evaluate the efficiency of the model by analyzing a significant set of learning objects obtained from the eLera and Merlot repositories. The results obtained suggest that the proposed indicator could provide useful information when searching for learning objects in repositories. This reusability measurement could constitute an indicator of quality, which would allow search results to be ordered, with those with the greatest possibility of being reused taking priority. Furthermore, the proposed reusability indicator could be calculated automatically or in an assisted way if metadata elements satisfy the minimum quality requisites identified. Keywords Reusability Á Learning objects Á Metadata Á Metrics
1 Introduction As with the development of open source software in projects such as Linux or Apache, in education, there is a trend towards the development of quality open educational resources, with suitable user rights that enable users to reuse them and modify them to fit their
J. Sanz-Rodriguez (&) University Carlos III of Madrid, Av. Universidad 30, 28911 Leganes, Madrid, Spain e-mail: javier.sanz.rodriguez@uc3m.es J. M. Dodero University of Cadiz, C/Chile, s/n, 1003 Cadiz, Spain e-mail: juanma.dodero@uca.es S. Sanchez-Alonso University of Alcala de Henares, Ctra. Barcelona km. 33600, 28871 Alcala de Henares, Madrid, Spain e-mail: salvador.sanchez@uah.es
123
Software Qual J
context. While they represent tremendous opportunities, open education programs also face novel challenges and new anxieties. Perhaps the most obvious is the quality assurance of the open materials. The task of manually reviewing materials is laborious and the quantity of educational resources is enormous and growing by the day, so we need novel modes of reviewing, assessing and sharing evaluations (Kelty et al. 2008). These open educational resources include learning objects, which differ in that their definition contains no explicit mention of their open nature and in that they are associated with further technological characteristics such as whether they are digital, modular, selfcontained or reusable (Friesen 2009). However, this concept is controversial and currently different definitions coexist—(IEEE 2002; Polsani 2003; Wiley 2002)—which may be synthesized as follows: any educational material that is independent, self-contained, digital, identified by metadata and that may be reused in different educational contexts. The concept of reusability constitutes the main reason behind the technologies associated with learning objects. This is due to the fact that developing quality educational materials is costly in terms of time and resources, which is why being able to reuse already-existing quality materials will generate pedagogical and economic profits (Campbell 2003; Koper 2003). While reusing learning objects is an empirical and observable fact, Sicilia (2004) affirms that reusability is an intrinsic attribute of the object, which provides a priori measure of quality, which may be proven by posteriori reuse data. This concept of reusability may be defined as the degree to which a learning object can work efficiently for different users in different digital environments and in different educational contexts over time. It should always be borne in mind that there are different technical, educational and social factors that will affect reuse (Palmer and Richardson 2004). In most situations, it will be necessary to carry out certain types of modification—modularization, adaptation and aggregation—in order to be able to reuse the learning object (Zimmermann et al. 2007). Currently, most initiatives taken to improve reuse have attempted to define standards by which learning objects may be used in different platforms without interoperability problems (Duval 2004). However, we find ourselves in a situation in which the potential benefits of reuse remain out of reach; there are insufficient studies of reusability indicators and the design criteria to guarantee it is lacking (Sicilia 2004). Ochoa and Duval (2008, 2009) carried out a quantitative analysis of the reuse of learning objects in real world settings. The scope of the study includes objects of differing granularity and different types of repository. It concluded that only 20% of objects stored in repositories are actually reutilized. In addition, the problems that the reuse of learning objects must overcome are similar to those of other shared resources in repositories, such as images, software libraries or APIs. For this reason, the reuse of learning objects is not intrinsically any easier or more difficult than that of other types of component. Ochoa and Duval affirm that the theories used for the reuse of other types of component may be used in the reuse of learning objects. They also indicate that, although the reuse of educational materials is currently on going, even without a technological framework that favors it, an effort must be made to overcome these deficiencies in order to increase the degree of reuse. In the current situation, any search for learning objects in a repository could return an enormous list of results. With no quality indicator to shine a light on this information, looking for learning objects can become a waste of time and effort (Kumar et al. 2005). As happens with any search engine, it is desirable to have a filtering process so that the information supplied satisfies the needs of the user in the best way possible. This lack of information is the motivation behind this paper, the aim of which is to propose an automatic or assisted method for estimating reusability which, by using the
123
Software Qual J
IEEE LOM standard metadata (IEEE 2002), is able to provide an a priori measure of quality that can help in the selection of educational materials. In carrying out this research, Glass’s (1995) characterization of research in computational sciences and Etzkorn et al. (2001) methodology for measuring the reusability of object-oriented software have been used. The result is a model for evaluating the reusability of learning objects, which follows the following stages: 1. A study on the state of the question. In Sect. 2, different approaches to evaluating the reusability of learning objects are analyzed. In Sect. 3, a study is made of those features of learning objects that could influence reusability, as well as proposals from existing reusability indicators. 2. Formulation of the model. In Sect. 4, a number of reusability indicators are proposed that measure the different factors that determine reusability in accordance with the metadata. A model is formulated that aggregates the metrics depending on how significant they are to determining reusability. Different ways of aggregating will be studied, such as the weighted mean, the Choquet integral and multiple linear regression. 3. Evaluation of the model. In Sect. 5, the effectiveness of the model is evaluated by analyzing a significant set of objects and comparing the reusability data provided by the evaluations carried out by the experts at the eLera (www.elera.net) and Merlot (www. merlot.org) repositories, as Nesbit et al. (2006) propose that the data from the evaluations carried out at eLera may be used to verify the validity and reliability of tools and models for evaluating learning objects. The paper closes with a discussion on the applicability of the proposed model (Sect. 6) along with conclusions and future lines of research (Sect. 7).
2 The current situation with regard to evaluating the reusability of learning objects As occurs in the development process of any software product, it is necessary to evaluate learning objects in order to determine their quality. The main reason Nesbit and Belfer (2004) give to justify evaluation is the need to help users search for and select learning objects. Evaluation is necessary in order to guarantee the potential benefits of reuse and of e-learning systems are reached. Thus, improving quality and reducing the costs needed for their development. There are, currently, numerous initiatives under way that are aimed at evaluating learning objects and providing an estimation of their quality. Tzikopoulos et al. (2007) found that 23 out of the 59 repositories they studied offered several mechanisms for evaluating the educational materials. The most frequently used approach is to provide a final evaluation of a learning object. Various summative formats have been used, including general impressions gathered using informal interviews or surveys, measuring frequency of use and assessing learning outcomes. The ultimate goal of this kind of evaluation has been to get an overview of whether participants valued the use of learning objects and whether their learning performance was altered (Kay and Knaack 2007). The eLera repository provides the Learning Object Review Instrument (LORI), which allows nine features to be evaluated: content quality, learning goal alignment, feedback and adaptation capacity, motivation, presentation design, interaction usability, accessibility, reusability and standards compliance. Each feature is evaluated on a scale of 1–5, with the possibility of some of them not being evaluated at all (Nesbit et al. 2006). In the Merlot
123
Software Qual J
repository, learning objects are evaluated by experts to guarantee their quality. Three dimensions are evaluated: content quality, ease of use and effectiveness as a learning tool (Vargo et al. 2003). To sum up, the evaluation method used by most repositories consists of gathering the opinions of users and experts on different aspects of learning objects, with manual inspection being the evaluation tool used. However, there are exceptions to this method, such as that put forward by Ochoa and Duval (2007), who propose a set of metrics that, using information concerning use, context and metadata, sort learning objects according to their relevance. Zimmermann et al. (2007) remind us that in order to reutilize a learning object that was created for a specific scenario, it is frequently necessary to adapt it for use in a new scenario and propose evaluating the effort needed to adapt it. In order to do so, he proposes measuring the similarity of the metadata that describe the ideal learning object searched for and the metadata of the learning objects available. Unlike these initiatives, this paper proposes an aprioristic reusability evaluation that incorporates all the affecting factors and is based on metadata that describe the object. In order to compute it automatically, the metadata must be correctly filled in with nondescriptive values that can be compared. In order to contrast the evaluation model later, the possible values that each of the metrics may take are standardized within the interval [1, 5], using the same scale as that used in the evaluations carried out in eLera and Merlot. This evaluation could be of help when searching for more easily reusable learning objects.
3 The relationship between reusability and the characteristics of learning objects The factors that determine the reusability of a learning object (Palmer and Richardson 2004; Daniel and Mohan 2004; Huddlestone and Pike 2005; Pitkanen and Silander 2004) can be classified as structural or contextual issues. From a structural viewpoint, reusable learning objects must be as following: • Self-contained: a learning object should make sense by itself; references to other resources could decrease reusability; the more prerequisites it needs, the more difficult it will be to adapt it to other contexts. In addition, a learning object is a complete and standalone unit that contains all information and resources needed by learners to complete it (Chang 2006). Furthermore, there is a consensus over the fact that a learning object must be designed with reusability in mind and therefore be selfcontained (Duval et al. 2001). • Modular: a learning object must be combinable with other objects to form composite structures such as lessons and courses. • Properly grained: proper size and a proper learning objective for a learning object will facilitate its reuse. • Traceable: a learning object should be easily identifiable and traceable through the correct metadata. • Modifiable: a learning object should be modifiable allowing it to be reformulated within a context different to that for which it was originally designed. • Usable: just as users reuse and recommend virtual learning environments (VLE) if they are easy to use (Omosule et al. 2008), a reusable learning object must be easy to use and the interactive interface elements it contains should be intuitive. • Standardized: a reusable learning object must be compliant with a shared specification or standard.
123
Software Qual J
From a contextual viewpoint, the more context-dependent and context-specific a learning object is, the more limited its reusability will be. Contextual factors can be dealt with in the following dimensions: technological, educational and social. • The technological dimension of context includes platform dependencies and the software needed to run the learning object, as well as representation issues (reusable learning objects should separate content and format issues). • The social and educational contexts require the following features: learning objects must be generic, i.e. independent from a given subject or discipline; they must be prepared for use on different education and assessment levels; they must be pedagogically neutral, i.e. do not involve a specific pedagogical method; they must lack institutional, legal, social and cultural dependencies; they must be independent of the time and location in which they are run. We should mention that in order to achieve the highest degree of reusability some of the factors described above cannot be taken to the extreme; for instance, a generic, disciplineindependent learning object is more reusable than a discipline-specific one, but clearly it is not usable, as it has to commit to the learning objectives for which it is intended, and these objectives are always subject-specific. A different thing is, for instance, whether a learning object dealing with statistics is more reusable if it does not include examples that deal with a given discipline (e.g. mechanical engineering) that hinders its inclusion in another object (e.g. a biology course). Similar issues can be discussed about the pedagogical neutrality or time-independence features, to mention just a few. Designers tend to produce objects with multiple dependencies to enrich the learning process, in contrast to independent and self-contained objects that contribute little significant knowledge. This situation presents a challenge for designers to design cohesive, uncoupled objects that contain both structural and contextual aspects that do not jeopardize reusability (Boyle 2003).
4 Reusability evaluation model Firstly, the reusability metrics are defined so that different aggregation methods can later be proposed for them. 4.1 Learning object reusability metrics While some authors suggest that object-oriented theory has little to offer the definition and understanding of learning objects (Sosteric and Hesemeier 2002), Downes (2001) proposes designing learning objects using, as a reference, the design model for object-oriented software, in which components may be cloned and adapted for reuse in different contexts. This affirmation has served to inspire the use of software reusability measures, as a reference for defining measures of learning object reusability (Cervera et al. 2009). Some of these software reusability metrics work with concepts such as dependencies or complexity, which have a correlation with learning objects (Cuadrado-Gallego and Sicilia 2005). Traditionally, software engineering has used principles such as cohesion and coupling, which allow for the development of easily maintainable software that can be easily adapted to new requirements (Boyle 2003). In addition, the reuse of learning objects will be related to maintainability, as in most situations it will be necessary to carry out certain types of modification in order to be able
123
Software Qual J
to reuse the learning object (Zimmermann et al. 2007). Software maintainability is defined as the ease with which a software system or component can be modified to correct faults, improve performance or other attributes, or adapt to a changed environment (IEEE 1993). Metrics related to application size, complexity and coupling were the most commonly used maintainability predictors (Riaz et al. 2009). Drawing inspiration from these principles and using as a basis the fact that learning objects are designed to be reused and reformulated, we are going to study how their capacity for reuse can be determined. Apart from the cohesion and coupling of learning objects, we are going to analyze other reusability factors such as portability, size and complexity. 4.1.1 Cohesion Cohesion analyses the relationships between different modules. A module that can be different things depending on the language—a class, package, etc—must realize a single task to be maximally cohesive. Greater cohesion usually implies greater reusability (Vinoski 2005). Cohesion is a software quality indicator, which, applied to learning objects, is fulfilled by the following elements: • A learning object involves a number of concepts (LOM 9 Classification category). The fewer the concepts, the greater the module cohesion (Yang and Yang 2005). • A learning object should have a single and clear learning objective (Boyle 2003). The more learning objectives it has, the less cohesive it will be considered. Information about learning objectives is covered by the educational objective in LOM 9.1 Purpose. • The Semantic density (LOM 5.4 Educational category) shows how concise a learning object is. It may be estimated in terms of its size, span or—in the case of self-timed resources such as audio or video—duration (IEEE 2002). The semantic density of a learning object could be defined as a measure of its effectiveness compared with its size and duration (Richards 2007). More concise objects may indicate greater cohesiveness. • A learning object must be self-contained to be highly cohesive (Yang and Yang 2005). LOM 7 Relation category defines how many instances and relationships the learning object has. For some types of relationships such as references or requirements, we can say the more relationship instances a learning object has, the less self-contained and, therefore, less cohesive it is. Moreover, LOM 1.8 Aggregation level element summarizes the aggregation level of a learning object as ranging from 1 for single resources to 4 for a set of related courses. The lower the level of aggregation, the more cohesive the object. • Structure indicates the organizational structure of a learning object. It can be Atomic, Collection, Networked, Hierarchical or Linear. We observed that there is a relationship between the aggregation level of an object and its structure, e.g. an object with an atomic structure will add a level of 1, whereas the other types of structures have values ranging from 2 to 4 (IEEE 2002). We can conclude that learning object cohesion depends on semantic density, the number of relationships, aggregation level, number of concepts dealt with and the number of learning objectives covered. These metadata elements can be a valid source for estimating the reusability of a learning object. 4.1.2 Coupling Coupling measures interdependencies between software modules and must be minimized (Vinoski 2005). A module must communicate with the minimum number of modules and
123
Software Qual J
must exchange as little information as possible, in order to minimize the impact caused by changes in other modules. Learning object coupling describes interrelationships between distinguishable objects, so if an object has dependencies with others reusability could be compromised, depending on the nature of the relations in question (Boyle 2003), so less coupling predicts greater reusability (Yang and Yang 2005). LOM 7 Relation category indicates the number of objects related to a given learning object, so we conclude that coupling is directly proportional to the number of relationships present in that category. However, the LOM 7 Relation category was already covered in the cohesion measure as a source of information. For this reason, use of this metric would be redundant, providing no new reusability information. Coupling is therefore eliminated as a measure of learning objects’ reusability. 4.1.3 Size and complexity Software size and complexity can be measured by several methods, e.g. lines of code, McCabe’s software complexity. Program size affects reusability—the bigger the less reusable—and also low module complexity improves reusability (Poulin 1996). Other authors, however, maintain that what affects reusability in object-oriented software is not the number of attributes or methods, nor size or complexity, but rather the qualities of the interface offered by the object class (Barnard 1998). Size and complexity would affect reusability when modifications are necessary to reuse the object. The size of a learning object indicates its granularity, and in general terms, granularity provides clear information on learning object reusability, since fine-grained objects are more easily reusable (Wiley 2002). Learning object granularity depends on the following LOM elements: • LOM 4.2 Size: the number of bytes of a learning object. These data should be weighted depending on the learning object format, as it can be interpreted differently depending on the type of content; while 2 MB of plain text would be considered huge, the same size for a video would be considered small. In addition, when measuring the size of multimedia elements their resolution must be taken into account. • LOM 4.7 Duration: the estimated time to run the learning object. • LOM 5.2 Resource type: specific kinds of learning object, exercise, simulation, etc. • LOM 5.9 Typical Learning Time: approximate or typical time it takes to work with or through this learning object for the typical intended target audience (IEEE 2002). This is the most reliable indicator for estimating the size of a learning object, although it depends on the student characteristics. 4.1.4 Portability In the field of portability, metrics measure the ability to transfer software from one system to another. These metrics are based on the analysis of modularity and hardware/software context independence (Poulin 1996). Learning object portability can be measured as context dependence at technological and socio-educational levels. The fewer the dependencies found the more portable the learning object.
123
Software Qual J
Technical portability The following LOM values can be analyzed when considering portability at a technical level: • LOM 4.1 Format: it determines the learning object components delivery format, such as video/mpeg, application/x-toolbook and text/html. Some formats are more readily portable (e.g. text/html is more widespread than application/x-toolbook). Furthermore, while the use of teaching resources in various formats (multimedia) stimulate different sensory perception pathways and improve learning, the formats used should be easily ´ reusable in any possible context of use (Rodrıguez-Ardura et al. 2009). Using a very specific format that is difficult to represent will limit reusability and means portability can be considered null. • LOM 4.4 Requirements: it involves the hardware and software required to run the object. The more complex the requirements, the less portable the object is. Educational portability With regard to educational portability, we can deal with vertical or horizontal portability (Currier and Campbell 2002). Vertical portability means the possibility of a learning object being used and reused across different educational levels. In contrast, horizontal portability determines the interdisciplinarity of the object. We have considered the following IEEE LOM elements of metadata: • LOM 5.6 Context: potential educational contexts in which a learning object can be used (i.e. school, high school, higher education and professional training). Educational portability is greater for those objects that can be used and reused in a number of different educational contexts. • LOM 5.7 Typical age range: potential age range of the users who could benefit from using the object. Educational portability increases as the number of ranges grows. • LOM 1.3 Language: the human languages supported by the object. The more languages available and the more widely used the languages, the more reusability the object will have. • LOM 9 Classification: information used to classify a learning object within the discipline it belongs or is related to. The more specific the classification scheme the less reusable the learning object. We must remember that the classification of educational material using the standard LOM elements described here is not a simple task. It is also influenced by the subjectivity on the part of the evaluator making this classification. This section is summarized in Fig. 1, which shows the factors affecting learning object reusability, the metrics defined to measure them and the metadata elements containing information to quantify the metrics. 4.2 Aggregation methods Once the reusability metrics have been defined, a description of the different aggregation methods is given, which will allow the information contributed by each metric to be included, providing a resultant reusability value. 4.2.1 Weighted mean Given the set of criteria C ¼ fc1 ; . . .; cn g each learning object will have evaluation values for each criterion ðx1 ; . . .; xn Þ where xi 2 f1; 2; 3; 4; 5g. The weighted mean that will give
123
Software Qual J
Reusability factors
Self-contained Modular
Properly grained
Generic Diferent educational levels Diferent academic disciplines
Hardware dependencies Software dependencies Format dependencies
Metrics
Cohesion
Size
Educational portability
Technological portability
LOM metadata elements
1.7 Structure 1.8 Aggregation level 5.4 Semantic density 7 Relation 9.1 Educational objetive
4.2 Size 4.7 Duration 5.2 Resource type 5.9 Typical learning time
1.3 Language 5.6 Context 5.7 Typical age range 9 Classification
4.1 Format 4.4 Requirement
Fig. 1 Relationships between reusability factors, metrics and LOM metadata elements
P P us the aggregate evaluation comes from the formula: Mwð xÞ ¼ i¼1 wi xi , where: i wi ¼ n 1 and wi ! 0 8i 2 C. The weights shown in Table 1, which indicates the contribution of each of the metrics to the final value, were determined by the authors from pairwise comparison matrices, which represent the relative importance of each metric compared to each of the others (Barzilai 1997). 4.2.2 Choquet’s integral In view of the possibility that there may be some interaction between the chosen metrics, Choquet’s integral is the ideal candidate for modeling the aggregation process, as it may be used as a generalization of the weighted arithmetic mean that takes into account interaction between criteria (Marichal 2000). Choquet’s integral allows us to represent the different interactions that exist between the criteria to be aggregated. • Correlation: Two criteria ci y cj 2 C are correlated if there is a linear relationship between their values. This would introduce a certain degree of redundancy into the model. • Substitutiveness: Two criteria ci y cj 2 C are substitutive when the satisfaction of only one causes almost the same effect as the satisfaction of both. • Complementarity: Two criteria ci y cj 2 C are complementary when the satisfaction of one contributes very little meaning in relation to the satisfaction of both.
Table 1 Weighting of each metric
Metric Cohesion Technological portability Educational portability Size
Weight 0.3 0.3 0.3 0.1
123
Software Qual J
The general expression of the integral given in the following formula is a specific instance of the general form of the discrete aggregation operator on the real domain: Mv : Rn ! R, which takes an input vector x ¼ ðx1 ; . . .; xn Þ and yields a single real value. Cv ð xÞ ¼
0 i¼1 X n
 ÀÈ ÉÁ ÀÈ É Áà xðiÞ v j xj ! xðiÞ À v j xj ! xðiþ1Þ
where x ¼ xð1Þ ; . . .; xðnÞ is a non-decreasing permutation of the x input n-tuple, where x0ðnþ1Þ ¼ ; by convention. The integral is expressed in terms of the Choquet v capacity. This measure, applied to an X set, is a monotonic set function v : 2x ! ½0; 1, thus fulfilling vðSÞ vðT Þ when S T, allowing for Choquet’s capacity to assign weights not only to each criterion but also to each subset of criteria. 4.2.3 Multiple linear regression For our proposal, the reusability evaluation obtained from the eLera repository will be taken as the independent variable and the metric reusability evaluations—Cohesion, Size, Educational Portability and Technological Portability—will be the independent or predicting variables. A calculation will be made of the coefficients that give the best fit between reusability calculated by the equation and reusability obtained from eLera, representing the resulting function as the linear combination of the metrics that explain the independent variable.
À
Á
5 Evaluation To evaluate the efficiency of the model proposed for estimating reusability, an experiment was carried out, in which a significant number of objects from the Merlot and eLera repositories were studied, with data from the evaluations of different metric aggregation methods being compared. 5.1 Scenario In terms of the choice of data sample, we should differentiate between the situations in the two repositories: Merlot and eLera. In the eLera repository, all objects registered in the repository, which had been evaluated at least once, were considered. This set comprised 120 objects at the time of the study. On examining each object in detail, it was necessary to rule out 20 due either to the fact that they were not currently available or their evaluation details were not complete. Also ruled out were those objects whose content quality evaluation was lower than 2.5. This was to avoid atypical objects that did not reach a minimum quality and which might introduce distortion into the evaluation of the model. The study finally consisted of 95 objects. In the Merlot repository, the sample consisted of 141 objects. To be precise, this set of materials is the result of a query performed on 1st October 2009 to include all the materials stored in the repository between 2005 and 2008, which had been evaluated by experts and had associated user comments. The aim was to have a significant number of objects evaluated by users and experts.
123
Software Qual J Table 2 Characterization of eLera evaluators Primary teachers Secondary teachers University teachers E-learning researchers Other educational community members Students Unknown
Evaluators 5 16 8 23 18 13 50
Percent 3.76 12.03 6.01 17.29 13.53 9.78 37.60
When looking at the characteristics of reviewers from eLera and Merlot, we can also make distinctions. In eLera, as Table 2 shows, the group of evaluators was made up of primary, secondary and higher education teachers or lecturers as well as researchers in the field of e-learning and other members of the educational community. In Merlot, to guarantee the validity of evaluations, the objects were examined by teachers who use learning objects in their academic work and who are experts in the subject dealt with by each object, with the review process being led by at least two university lecturers who specialize in pedagogy (Cafolla 2006). Looking at rating of the reusability metrics by experts, we can say that in order to obtain the quantitative values that result from applying the metrics to the objects examined, the authors of the article analyzed the metadata of the object by carrying out a detailed description of the metadata if there was anything missing from it. The data relative to the evaluation of each object is registered in a database, with the global object reusability indicator being calculated automatically by means of different aggregation methods. Finally, we should point out that to evaluate the accuracy of an estimation model we will use techniques proposed by Fenton and Pfleeger (1997): • Average absolute error. This analyses the difference, as an absolute value, between the predicted value and the real value. The smaller this error, the greater the accuracy of the model. AAE ¼
i¼1 1X jEstimated Re usabilityi À eLera Re usabilityi j n n
• Average relative error. This analyses the difference, as an absolute value, between the predicted value and the real value, divided by the real value. The smaller this error, the greater the accuracy of the model. ARE ¼
i¼1 1X ðjEstimated Re usabilityi À eLera Re usabilityi j=eLera Re usabilityi Þ n n
• Correlation between the real value and the estimated value. The correlation is used as a measure to quantify the efficiency of the prediction (Fenton and Pfleeger 1997). Specifically, Kendall’s Tau index and Spearman’s Rho index will be used, as they can be applied without the need for the data to follow a normal distribution to determine the level of correlation between the real values and the values predicted by the model.
123
Software Qual J
• Quality of the prediction. This measure provides an indication of the fit of the model based on the magnitude of the relative error. It specifically represents the quotient of the number of cases, in which the estimations are within the absolute l limit of the real values among the total number of cases (Conte et al. 1989). pred ðlÞ ¼ i=n where l is the magnitude of the relative error selected as a limit, i is the number of data whose magnitude of error is smaller than or equal to l and n is the number of data in the sample. 5.2 eLera results The reusability evaluations, which were carried out by experts with the LORI instrument, using a variety of methods to aggregate the proposed metrics are compared for the set of 95 objects obtained from eLera. The data will be analyzed using the SPSS Statistics and Statgraphics instruments. Firstly, using the weighted mean as an aggregation process, we obtain an average absolute error of 0.647, which represents the distance between the estimated reusability and the reusability given by the experts. The average relative error is 0.222. There is a significant correlation at a level of 0.01 between the calculated reusability and that obtained from the eLera evaluations, with Kendall’s Tau index being 0.278 and Spearman’s Rho being 0.366. The quality of the prediction is pred (0.25) = 0.726. To analyze these results and in order to detect a possible redundancy between the contribution made by different metrics, Tables 3 and 4 show the result of a correlation analysis of the Cohesion, Size, Educational Portability and Technological Portability metrics.
Table 3 Spearman’s correlation Cohesion Size Educational portability -0.010 -0.143 1.000 0.115 Technological portability -0.054 -0.008 0.115 1.000
Cohesion Size Educational portability Technological portability
1.000 0.388** -0.010 -0.054
0.388** 1.000 -0.143 -0.008
** Correlation is significant at a level of 0.01
Table 4 Kendall’s correlation Cohesion Size Educational portability -0.010 -0.132 1.000 0.105 Technological portability -0.050 -0.008 0.105 1.000
Cohesion Size Educational portability Technological portability
1.000 0.362** -0.010 0.050
0.362** 1.000 -0.132 0.008
** Correlation is significant at a level of 0.01
123
Software Qual J Table 5 Choquet’s capacities table Cohesion Technological portability Educational portability Size m(S) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
0.3 0.3 0.3 0.1 0.6 0.6 0.3 0.6 0.4 0.4 0.9 0.6 0.6 0.7 1
It can be seen that the cohesion and size metrics show a statistically significant linear relationship that introduces a certain level of redundancy in the model. In order to avoid this effect, we use Choquet’s integral as a method of aggregation. In addition to this, we can affirm that we have found no substitutiveness or complementarity relationship between the criteria. Choquet’s capacities table was constructed, as shown in Table 5. The importance of each combination of criteria is represented by the symbol ? reflecting the presence of a criterion. The relationships between the capacities of the different criteria should satisfy certain restrictions that depend on the interactions detected between them. In this case, where only the correlation of criteria is presented, the following must be true for two correlative criteria i and j: vðf i; j gÞ\vðf igÞ þ vðf jgÞ. By carrying out the estimation with Choquet’s integral, we obtain a slight improvement in the result of the model. This, despite the interrelationship that exists between the cohesion and size metrics, is due to the weighting of the latter being smaller, so the level of redundancy introduced is minimal. The average absolute error is 0.636, and the average relative error is now 0.217. There is a significant correlation at a level of 0.01 between the calculated reusability and that obtained from the eLera evaluations, with Kendall’s Tau correlation index being 0.330 and Spearman’s Rho index being 0.428. The quality of prediction is pred (0.25) = 0.747. Finally, we studied the use of multiple linear regression as an aggregation method. According to Tabachnick and Fidell (1996), to be able to apply the multiple linear regression model, it is necessary for the size of the sample—N—to satisfy the equation N [ 50 þ 8 Â M, where M represents the number of independent variables. For our model, with 4 predicting variables (Cohesion, Size, Educational Portability and Technological Portability) the equation would be N [ 50 þ 8 Â 4, so the size of the sample—N—must be greater than 82, a condition which is guaranteed by using as a starting point a sample of 95 data. Table 6 shows the results of adjusting the multiple linear regression model. The coefficients reflect the relative importance of each criterion, showing how educational portability is the metric that contributes the most, followed by cohesion, while the contribution technological portability makes to the model is minimal. This is due to the fact that most of
Table 6 Multiple coefficients linear regression Parameter Estimation Standard Statistical p-Value error t -0.196 0.447 0.032 0.585 0.004 0.745 0.127 0.111 0.113 0.097 -0.262 3.516 0.290 5.171 0.044 0.793 0.000 0.771 0.000 0.965
Constant Cohesion Size Educational portability Technological portability
123
Software Qual J
the objects analyzed use technologies that are accessible to all users: html, java, flash, javascript, etc. It can also be seen that the contribution size makes is minimal, which fits in with Ochoa and Duval’s (2008) affirmation that the granularity of a learning object influences its reutilization in accordance with the granularity of the context in which it is to be reused. That is to say, when writing a course it is more likely that lessons will be reused, while when we write a complete curriculum different courses will be reused. With this aggregation method an average absolute error of 0.539 is obtained. The average relative error is now 0.152. There is significant correlation at a level of 0.01 between the calculated reusability and that obtained from the eLera evaluations, with Kendall’s Tau index being 0.371 and Spearman’s Rho index being 0.498. The quality of the prediction is pred (0.25) = 0.842. To conclude the experiment on the eLera repository, a comparative summary of the aggregation methods used is shown in Table 7 and in Fig. 2. Both show the progressive improvement in the behavior of the prediction model through the different adjustments applied to the aggregation method. According to the classification of estimation models and in accordance with their accuracy as proposed by Conte et al. (1989), we can grade the weighted mean model and Choquet’s integral as quite good and the linear regression model as very good. 5.3 Merlot results To continue analyzing the efficiency of reusability estimation, the model that behaved better in the eLera objects—multiple linear regression—will be applied to 141 objects obtained from the Merlot repository. Although in this repository there is no explicit evaluation of reusability, we do have available some expert evaluations in the dimensions
Table 7 Comparison of results Method Weighted mean Choquet’s integral Multiple regression Kendall’s Tau 0.278** 0.330** 0.371** Spearman’s Rho 0.366** 0.428** 0.498** Pred (0.25) 0.726 0.747 0.842 ARE 0.222 0.217 0.152 AAE 0.647 0.636 0.539
** Correlation is significant at a level of 0.01
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 K endall’s Tau Spearman’s Rho pred (0,25) ARE AAE
Weighted mean Choquet’s integral Multiple regression
Fig. 2 Comparison of results
123
Software Qual J Table 8 Correlation between estimated reusability and the Merlot evaluations ** Correlation is significant at a level of 0.01
Correlation Content quality Effectiveness Ease of use
Kendall’s Tau 0.301** 0.300** 0.279**
Spearman’s Rho 0.396** 0.402** 0.363**
Table 9 Correlation between estimated reusability and personal collections Correlation Personal collections ** Correlation is significant at a level of 0.01 Kendall’s Tau 0.240** Spearman’s Rho 0.337**
of content quality, effectiveness as a learning tool and ease of use and can therefore study the degree of correlation between prediction of reusability and these evaluations, since reusability is an intrinsic attribute of the object, one that constitutes an a priori measure of quality (Sicilia and Garcia 2003). The results of the correlation analyses shown in Table 8 indicate a statistically significant correlation between estimated reusability and the dimensions evaluated by the experts. In Merlot, personal collections are a compilation of learning objects that members can easily access and use for specific purposes, courses or learning topics. The bookmarking of learning objects in a personal collection is a potential predictor of quality (Garcia and Sicilia 2009), therefore we can use this particular kind of referential data as other way to contrast with reusability estimations. Next, we applied a correlation analysis to quantify the relationship, and Table 9 indicates that the number of times a resource appears in a personal collection is positively correlated with the reusability estimation.
6 Discussion 6.1 On the deficiencies found in the use of metadata The quality of the metadata registers is critical for finding learning objects in repositories, but, unfortunately, the metadata obtained from our study of the Merlot and eLera repositories raise a number of problems that illustrate the current use made of metadata in learning object repositories. Among the deficiencies detected are problems of reliability, as some metadata are incorrectly completed. In our study, it has been shown that the metadata provided by the Merlot reviewers presents fewer problems and describes the learning object with greater accuracy. Another problem is that the application profiles defined in Merlot and eLera define elements of metadata, which only partially cover all the elements defined in LOM, making it necessary, on occasions, to resort to manual inspection of the object in order to complete the missing information and so calculate the reusability metrics. Another difficulty is that, although some metadata, representing the same concept, share the same possible values, such as Type resource in LOM and Context in eLera, the majority use different sets of possible values and structure the information differently. To make the
123
Software Qual J
search for learning objects in repositories easier, it would be better if metadata that represent the same concepts used the same value spaces. Furthermore, in order to calculate our reusability metrics, it is necessary for all the metadata to be filled in, although, as can be seen, some are optional. In eLera, for example, it is not obligatory to complete the following: Description, Resource type and Educational context. While in Merlot, in Material Detail, there is no obligation to complete the following: Technical Requirements and Technical Format. 6.2 On the applicability A number of initiatives to lessen the shortcomings found in the metadata are currently under development. One of them provided by ASPECT (Adopting Standards and Specifications for Educational Content) and sponsored by the European Union’s eContentplus programme is the use of standards, allowing mapping between different vocabularies from different application profiles (Massart 2009). At the same time, the MELT project (Van Assche et al. 2009), financed by the European Commission, aims to enrich the metadata of the huge quantity of educational resources currently available. It also considers concepts such as automatic generation of metadata, which is currently a realistic proposition with automation initiatives available to help generate metadata (Benneker 2006). In addition, for the indicator to work correctly, the metadata records will need to meet certain degree of completeness in terms of metadata to ensure that they are adequately described. Metadata completeness could be measured according to a fine-grained metric system, which takes into account the effect of multiple values in multi-valued fields and can be fully customized to reflect the needs and preferences of its users (Margaritopoulos et al. 2009). We could use this completeness indicator to ensure that we have the information necessary to make a recommendation. 6.3 Limitations of the experiment There are various limitations to the study carried out. The main one arises from the deficiencies mentioned above in the use of metadata in repositories. We could also reflect on the fact that there is a lack of information concerning eLera’s evaluators, a third of whom do not indicate their professional profile. In order to guarantee the quality of observations performed, it would be desirable to be able to characterize the evaluators. Another limitation is that a large proportion of the objects in eLera is only evaluated once. Evaluations would be more reliable if more of them were carried out on each object. It should also be stated that the evaluations present in eLera and Merlot are not always carried out by the same experts, which could produce a certain variability in the data.
7 Conclusions The similarity obtained between the estimated reusability of each case compared with the ratings made by eLera’s evaluators shows evidence in favor of using the proposed reusability estimation model. Moreover, the results obtained from analyzing the Merlot repository provide additional information that would reinforce this idea. In this way, estimating reusability would provide useful information when it comes to selecting reusable objects and would be an aid to their development, improving both the productivity and
123
Software Qual J
quality of e-learning systems. When searching for learning objects in repositories, this measure of reusability could constitute an indicator of quality that would allow search results to be ordered, with those that have greater possibilities of being reused taking priority. Finally, estimation of reusability could be calculated automatically or in an assisted way using metadata that satisfy the minimum quality requisites identified in this study. Future work will include a correlation study between the values that result from applying the metrics to the learning objects, the explicit evaluations carried out by users or experts and the implicit usage data obtained from repositories. Based on this study, we would like to propose, a measure of overall quality incorporating all the available quality indicators, thus completing the information on which the recommendation is based. In addition, to lessen weaknesses in the metadata and ensure that learning objects are adequately described, metadata completeness could be measured to determine whether the reusability indicator can be calculated correctly. Another future line of research will be to develop an experiment to allow a sample of users to determine the reusability metrics important to them in their respective organizations. Taking users’ opinions into account might help to increase the likelihood of user acceptance for reusability measures.
References
Barnard, J. (1998). A new reusability metric for object-oriented software. Software Quality Journal, 7, 35–50. Barzilai, J. (1997). Deriving weights from pairwise comparison matrices. The Journal of the Operational Research Society, 48(12), 1226–1232. Benneker, F. (2006). A quick scan on possibilities for automatic metadata generation. Technical report, Utrecht, The Netherlands: Stichting Digitale Universiteit. http://hdl.handle.net/1820/802. Accessed 14 May 2010. Boyle, T. (2003). Design principles for authoring dynamic, reusable learning objects. Australian Journal of Educational Technology, 19(1), 46–58. Cafolla, R. (2006). Project Merlot: Bringing Peer Review to Web-based Educational Resources. Journal of Technology and Teacher Education, 14(2), 313–323. Campbell, L. (2003). Engaging with the learning object economy. In A. Littlejohn (Ed.), Reusing online resources: A sustainable approach to e-learning (pp. 35–45). London: Kogan Page. Cervera, J. F., Lopez, M. G., Fernandez, C., & Sanchez-Alonso, S. (2009). Quality metrics in learning objects. In F. Sartori, M. A. Sicilia, & N. Manouselis (Eds.), Metadata and semantics research (pp. 135–141). Berlin, Heidelberg: Springer. Chang, K. (2006). Learning objects: Draft quality criteria and quality assurance approach for learnalberta.ca and the society of advancement of excellence in education, futured consulting education futurists. Inc. Technical report, http://www.futured.com/QualityStandardsforLearningObjects.pdf.pdf. Accessed 14 May 2010. Conte, S. D., Dunsmore, H. E., & Shen, V. Y. (1989). Software engineering metrics and models. San Francisco: Benjamin Cummings. Cuadrado-Gallego, J. J., & Sicilia, M. A. (2005). Learning objects reusability metrics: Some ideas from software engineering, In Proceedings of the international conference on internet technologies and applications. Wreham, UK: North East Wales Institute. Currier, S., & Campbell, L. (2002). Evaluating learning resources for reusability: The dner and learning objects study. In Proceeding of the Australasian society for computers in learning in tertiary education, Auckland, New Zeland. Daniel, B., & Mohan, P. (2004). A model for evaluating learning objects. In Proceedings of the IEEE international conference on advanced learning technologies (pp. 50–60). Downes, S. (2001). Learning objects: Resources for distance education worldwide. International Review of Research in Open and Distance Learning, 2(1).
123
Software Qual J Duval, E. (2004). Learning technology standardization: Making sense of it all. International Journal on Computer Science and Information Systems, 1(1), 33–43. Duval, E., Warkentyne, K., Haenni, F., Forte, E., Cardinaels, K., Verhoeven, B., et al. (2001). The ariadne knowledge pool system. Communications of the ACM, 44(5), 72–78. Etzkorn, L. H., Hughes, W. E., & Davis, C. G. (2001). Automated reusability quality analysis. Information and Software Technology, 43(5), 295–308. Fenton, N. E., & Pfleeger, S. L. (1997). Software metrics (3rd ed.). Boston, Massachusetts: International Thompson Publishing. Friesen, N. (2009). Open educational resources: New possibilities for change and sustainability. International Review of Research in Open and Distance Learning, 10(5). Garcia, E., & Sicilia, M. A. (2009). Preliminary explorations on the statistical profiles of highly-rated learning objects. In F. Sartori, M. A. Sicilia, & N. Manouselis (Eds.), Metadata and semantics research (pp. 108–117). Berlin, Heidelberg: Springer. Glass, R. L. (1995). A structure-based critique of contemporary computing research. Journal of Systems and Software, 28(1), 3–7. Huddlestone, J., & Pike, J. (2005). Learning object reuse—A four tier model. In the IEE and MOD HFI DTC symposium on people and systems—who are we designing for, London. IEEE. (1993). Std. 610.12–1990. Standard glossary of software engineering terminology. Los Alamitos,CA, USA: IEEE Computer Society Press. IEEE learning technology standards committee. (2002). Standard for learning object metadata, http://ltsc.ieee.org/wg12. Accessed 1 October 2009. Kay, R., & Knaack, L. (2007). Evaluating the learning in learning objects, open learning. The Journal of Open and Distance Learning, 22(1), 5–28. Kelty, C. M., Burrus, C. S., & Baraniuk, R. G. (2008). Peer review anew: Three principles and a case study in postpublication quality assurance. Proceedings of the IEEE Special Issue on Educational Technology, 96(6), 1000–1011. Koper, R. (2003). Combining reusable learning resources and services to pedagogical purposeful units of learning. In A. Littlejohn (Ed.), Reusing online resources: A sustainable approach to elearning (pp. 46–59). London: Kogan Page. Kumar, V., Nesbit, J. C., & Han, K. (2005). Rating learning object quality with distributed bayesian belief networks: The why and the how. In Fifth IEEE international conference on advanced learning technologies (pp. 685–687). Margaritopoulos, T., Margaritopoulos, M., Mavridis, I., & Manitsaris, A. (2009). A fine grained metric system for the completeness of metadata. In Proceedings of third international conference metadata and semantic research, MTSR (pp. 83–94). Milan, Italy. Marichal, J. L. (2000). An axiomatic approach of the discrete choquet integral as a tool to aggregate interacting criteria. IEEE Transactions on Fuzzy Systems, 8(6), 800–807. Massart, D. (2009). Adopting standards and specifications for educational content. Technical report, European Commission’s eContentplus programme, http://aspect-project.org/. Nesbit, J. C., & Belfer, K. (2004). Collaborative evaluation of learning objects. In R. McGreal (Ed.), Online education using learning objects. London: Routledge/Falmer. Nesbit, J. C., Li, J., & Leacock, T. (2006). Web-based tools for collaborative evaluation of learning resources. Journal on Systemics, Cybernetics and Informatics, 3(5), 102–112. Ochoa, X., & Duval, E. (2007). Relevance ranking metrics for learning objects. In Proceedings of the second European conference on technology enhanced learning (Vol. 4753, pp. 262–276). Springer. Ochoa, X., & Duval, E. (2008). Measuring learning object reuse, In Proceedings of the 3rd European conference on technology enhanced learning: Times of convergence, Springer-Verlag, Vol. 5192, Lecture Notes in Computer Science, pp. 322–325. Ochoa, X., & Duval, E. (2009). Quantitative analysis of learning object repositories. IEEE Transactions on Learning Technologies, 2(3), 226–238. Omosule, S., Shoniregun, C., & Preston, D. (2008).A framework for culture influence virtual learning environments trust. Third international conference on digital information management, ICDIM (pp. 411–416). London, UK. Palmer, K., & Richardson, P. (2004). Learning object reusability—motivation, production and use. In 11th international conference of the association for learning technology. Devon, England:University of Exeter. Pitkanen, S. H., & Silander, P. (2004). Criteria for pedagogical reusability of learning objects enabling adaptation and individualised learning processes. In Proceedings of the IEEE international conference on advanced learning technologies (pp. 246–250). Joensuu, Finland. Polsani, P. R. (2003). Use and abuse of reusable learning objects. Journal of Digital Information, 3(4).
123
Software Qual J Poulin, J. (1996). Measuring software reuse: Principles, practices, and economic models. Boston, Massachusetts: Addison-Wesley Longman Publishing. Riaz, M., Mendes, E., & Tempero, E. D. (2009). A systematic review of software maintainability prediction and metrics. In Proceedings of the third international symposium on empirical software engineering and measurement, ESEM (pp. 367–377). Florida, USA. Richards, G. (2007). Writing to be read: Readability indices for open educational resources. In First international workshop on learning object discovery and exchange. http://fire.eun.org/lode2007/ lode12.pdf. Accessed 14 May 2010. ´ ´ Rodrıguez-Ardura, I., Jimenez-Zarco, A. I., Ammetller-Montes, G., & Pacheco-Berna, M. C. (2009). Improving hypermedia teaching resources–new designs for e-learning environments. International Journal of Technology Enhanced Learning, 1(4), 286–296. Sicilia, M. A. (2004). Reusability and reuse of learning objects, myths, realities and possibilities. In Proceedings of the first pluri-disciplinary symposium on design, evaluation and description of reusable learning contents. Sicilia, M.A., & Garcia, E. (2003). On the concepts of usability and reusability of learning objects. International Review of Research in Open and Distance Learning, 4(2). Sosteric, M., & Hesemeier, S. (2002). When is a learning object not an object. The International Review of Research in Open and Distance Learning, 3(2). Tabachnick, B. G., & Fidell, L. S. (1996). Using multivariate statistics (3rd ed.). New York: HarperCollins. Tzikopoulos, A., Manouselis, N., & Vuorikari, R. (2007). An overview of learning object repositories, in idea group publishing. Learning Objects for Instruction: Design and Evaluation, pp. 44–64. Van Assche, F., Ayre, J., Baumgartner, P., Duval, E., Hartinger, S., Mesdom, F., et al. (2009). Melt final report. Technical report, eContentplus, http://info.melt-project.eu/shared/data/melt/MELT_1_3_Final_ Project_Report.pdf. Accessed 14 May 2010. Vargo, J., Nesbit J. C., Belfer, K., & Archambault, A. (2003). Learning object evaluation: Computermediated collaboration and inter-rater reliability. International Journal of Computers and Applications, 25(3). Vinoski, S. (2005). Old measures for new services. IEEE Internet Computing, 9(6), 72–74. Wiley, D. A. (2002). Connecting learning objects to instructional design theory: A definition, a metaphor and a taxonomy. In D. A. Wiley (Ed.), The instructional use of learning objects (pp. 3–24). Bloomington, Indiana: Agency for Instructional Technology and Association for Educational Communications and Technology. Yang, D., & Yang, Q. (2005). Customizable distance learning: Criteria for developing learning objects and learning model templates. In Proceedings of the 7th international conference on electronic commerce, Xi’an (China) (pp. 765–770). ACM. Zimmermann, B., Meyer, M., Rensing, C., & Steinmetz, R. (2007). Improving retrieval of reusable learning resources by estimating adaptation effort. In Proceedings of the first international workshop on learning object discovery and exchange, Vol. 311, pp. 46–53.
Author Biographies
Javier Sanz Rodriguez received a Computer Science degree from the Polytechnic University of Madrid in 1998, a master degree in Computer Science and Technology from the University Carlos III of ´ Madrid in 2007 and a Ph.D. from the University of Alcala in 2010. He has worked as consultant for Deloitte & Touche and Everis and as analyst for Telefonica. He is currently adjunct professor of the University Carlos III. He has been involved during recent years in projects related to ICT-based learning environments funded by the Comunidad de Madrid.
123
Software Qual J Juan Manuel Dodero holds a Computer Science degree from the Polytechnic University of Madrid (1994) and a Ph.D. from the University Carlos III of Madrid (2002). He has worked as an R&D Engineer for iSOCO S.A. and as a lecturer for the University Carlos III of Madrid. He is currently an associate professor of the University of Cadiz. His research interests include Web Engineering, Computersupported Collaborative Work and Technology-Enhanced Learning. He received the IEEE Learning Technology Systems Committee young researcher award in 2005 and is co-founder of the Spanish chapter of ACM SIGCSE.
Dr. Salvador Sanchez-Alonso is an assistant professor of the Com´ puter Science Department of the University of Alcala (Spain) and a senior member of the Information Engineering research unit of the same university. He previously worked as an associate professor at the Pontifical University of Salamanca for 7 years during different periods and also as a software engineer at a software solutions company during 2000 and 2001. He earned a Ph.D. in Computer Science at the Polytechnic University of Madrid in 2005 with a research on learning object metadata design for better machine ‘‘understandability’’. His current research interests include Learning technologies, Software and Web Engineering, Semantic Web and Computer science education.
123