Skip to main content

Assessing time use in long-term institutional care: development, validity and inter-rater reliability of the Groningen Observational instrument for Long-Term Institutional Care (GO-LTIC)



Limited research has examined what is actually done in the process of care by nursing staff in long-term institutional care. The applied instruments employed different terminologies, and psychometric properties were inadequately described. This study aimed to develop and test an observational instrument to identify and examine the amount of time spent on nursing interventions in long-term institutional care using a standardized language.


The Groningen Observational instrument for Long-Term Institutional Care (GO-LTIC) is based on the conceptual framework of the Nursing Interventions Classification. Developmental, validation, and reliability stages of the GO-LTIC included: 1) item generation to identify potential setting-specific interventions; 2) examining content validity with a Delphi panel resulting in relevant interventions by calculating the item content validity index; 3) testing feasibility with trained observers observing nursing assistants; and 4) calculating inter-rater reliability using (non) agreement and Cohen’s kappa for the identification of interventions and an intraclass correlation coefficient for the amount of time spent on interventions. Bland-Altman plots were applied to visualize the agreement between observers. A one-sample student T-test verified if the difference between observers differed significantly from zero.


The final version of the GO-LTIC comprised 116 nursing interventions categorized into six domains. Substantial to almost perfect kappa’s were found for interventions in the domains basic (0.67–0.92) and complex (0.70–0.94) physiological care. For the domains of behavioral, family, and health system interventions, the kappa’s ranged from fair to almost perfect (0.30–1.00). Intraclass correlation coefficients for the amount of time spent on interventions ranged from fair to excellent for the physiological domains (0.48–0.99) and poor to excellent for the other domains (0.00–1.00). Bland Altman plots indicated that the clinical magnitude of differences in minutes was small. No statistical significant differences between observers (p > 0.05) were found.


The GO-LTIC shows good content validity and acceptable inter-rater reliability to examine the amount of time spent on nursing interventions by nursing staff. This may provide managers with valuable information to make decisions about resource allocation, task allocation of nursing staff, and the examination of the costs of nursing services.

Peer Review reports


Being confronted with the increasing dependency levels of frail residents and limited budgets, managers of long-term institutional care (LTIC) search for an optimal staff, which means an appropriate number of nursing staff and a mix of staff levels, to enhance or maintain quality of care standards while reducing costs [1].

To gain insight into quality of care, the conceptual model of Donabedian [2] indicates that information regarding structure (e.g., number and type of nurses), process, and outcomes (e.g., pressure ulcers) is needed. The total number of nursing staff in LTIC appears to be associated with better quality of care [3, 4]. However, reviews show mixed results concerning the relationship between the type of nursing staff (e.g., nurses, nursing assistants) and quality of care outcomes [35]. Due to the secondary survey data utilized by most studies, the interventions performed by nursing staff in the process of care remained unclear and, therefore, so did their contribution to quality of care outcomes [35].

Arling et al. [6] contend that the amount of time spent with a resident has a great impact on quality of care. What is done, how much, by whom, and how, all influences residents’ care [3]. This increases the importance of the deployment of nursing staff in the provision of care [7]. Identifying nurses’ interventions and the amount of time spent on them may clarify their contribution to quality of care and support task allocation to the type of nursing staff according to their specific scope of practice.

According to Donabedian, process is defined as what is actually done in providing and receiving care and this can be assessed by direct observation [2]. Observational studies addressing the process of care in LTIC provide insight into time use of registered nurses [8, 9] and health care aids [8, 10, 11]. Psychometric properties of the applied instruments were either missing or briefly described, and instruments varied in the content and categorization of nursing activities which made it difficult to compare study results.

Instruments based on an internationally known standardized nursing language compared to colloquial terms allow for data aggregation and analysis between settings [12]. A widely used standardized language that defines and categorizes nursing interventions is the Nursing Interventions Classification (NIC). The NIC describes a nursing intervention as any treatment based on the judgment and clinical knowledge of a nurse aiming to increase the recipient’s care outcomes [13]. The NIC provides labels and definitions of interventions and categorization into classes and domains. Per intervention, a list of activities describes the specific nurses’ behaviors or actions [13]. An advantage of the NIC is that it provides estimates of the amount of time to perform the intervention along with the type of nursing staff to deliver the intervention.

Studies have employed the NIC as a framework for identifying interventions for groups of patients in hospitals [14], ambulatory nursing [15], parish nursing [16] and advanced nursing practice [17]. A number of studies used the NIC to describe the amount of time spent on interventions to examine workload [18, 19] or personnel staffing [20]. No studies were found related to LTIC.

The aim of the current study was to develop and test the content validity and inter-rater reliability of an observational instrument using the NIC as a conceptual framework in order to identify and examine the amount of time spent on nursing interventions in LTIC.


Several stages have been completed to develop and test the observational instrument based on recommendations by Streiner et al. [21, 22]. The stages were: 1) item generation; 2) examining content validity; 3) testing feasibility; and 4) inter-rater reliability assessment.

Population, setting and sampling

The population was nursing staff working in LTIC. A purposive sample was performed to provide for a diversity of facilities, units, and personnel. In total, four nursing homes, two care centers (combined residential care and nursing home), and three residential care homes in the north of the Netherlands consented to participation. The recruitment of nursing staff working in different types of units (somatic, psycho-geriatric, and residential care) was performed in cooperation with facility managers. The inclusion criterion was at least one year of working experience in LTIC.

Data collection

Stage 1 Item generation

The NIC described 542 interventions classified into 30 classes and seven domains [23]. Potential study setting-specific nursing interventions were identified by observing nursing staff during day shifts. Bachelor nursing students (5) in their final year of education and the principal investigator (AT) (further referred to as research team), all with expertise in long-term care (average working experience of two years) and knowledge of the NIC, conducted the observations without a predefined list of activities. Afterwards, the observed care activities were linked to NIC interventions, which resulted in an initial inventory of interventions that was presented to a Delphi panel.

Stage 2 Content validity

A two-round postal Delphi survey was conducted to obtain consensus on the relevance of the initial inventory. Nine experts including five registered nurses and four nursing assistants of participating facilities agreed to contribute. Experience with the NIC was not a prerequisite. The survey comprised concept labels and definitions per NIC intervention. In the first Delphi round, experts were asked to rate the relevance of each intervention by the frequency of occurrence in their facility on a 5-point Likert scale (1 = never; 2 = rarely, less than one time per week; 3 = sometimes, more than one time per week, but less than every day; 4 = often, one time every day; and 5 = very often, more than once per day). An additional column was included for comments.

The second Delphi round comprised interventions on which no consensus was obtained to either include or exclude in the observational instrument. This time, experts were asked to rate an intervention as: 1 = “relevant, could have occurred in the last three weeks”, or 2 = “not relevant”.

Stage 3 Feasibility

The feasibility test was performed to support the Delphi results and to test the data collection method to be used (structured continuous observations) [24]. As a component of the data collection method, five observers (nursing students of the research team) who had gained basic knowledge of the NIC through their professional education were trained during three two-hour sessions. They individually mapped the interventions that were performed by nurses in video fragments to NIC interventions. The mapping procedure implied that an observed intervention, comprising specific nurses’ activities, was linked to the most accurate NIC intervention by comparison of relevant intervention labels and definitions. Discrepancies between observers were discussed until consensus was reached on which NIC intervention was most appropriate, and a log of these decisions was kept. An interventions’ duration was recorded by writing start and end times using a stopwatch. The mapping procedure was subsequently tested in a residential care home and nursing home where two observers simultaneously observed one nursing assistant continuously during a day shift.

Stage 4 Inter-rater reliability

Continuous observations of nursing staff took place in two care centers, two residential care homes, and a nursing home. Different types of nursing staff were observed during day shifts in different types of units. Observations took place with four (out of five) paired observers whereby the combination alternated. Observers linked their observations independently to NIC interventions according to the mapping procedure.

Statistical analyses

Stage 2 Examining content validity

Descriptive statistics were used to present the characteristics of the Delphi experts. Based on the ratings of the experts, the content validity was computed on the item level for each NIC intervention with the item content validity index (I-CVI) and on the scale level for NIC domains with the scale content validity index (S-CVI) [24] in Microsoft Excel® 2010 (Microsoft Corp., Redmond, WA). The I-CVI was computed as the number of experts rating a 3, 4, or 5 divided by the total number of experts which is the proportion of agreement per intervention [24]. The S-CVI was obtained by averaging the proportion of items that were rated as relevant across the experts and divided by the number of items, the S-CVI/Ave. An I-CVI of 0.80 was considered acceptable [24] whereby the intervention was included in the observational instrument. An S-CVI/Ave of 0.90 was considered acceptable [24].

Stage 4 Inter-rater reliability assessment

The interventions’ duration in minutes was entered into IBM SPSS Statistics 19 (Armonk, NY: IBM Corp). Interventions were categorized into the NIC domains. Inter-rater reliability was computed for each observer pair per domain. Inter-rater agreement for the identification of interventions, meaning the extent to which observers mapped observed activities to the same NIC interventions, was calculated by (non) agreement percentages with 95 % confidence intervals (CI). In order to do so, the time recordings of the ratio scale were dichotomized per intervention (0 = time noted, 1 = no time noted). The (non) agreement was calculated to determine whether observers agreed when care did or did not occur [25]. So as not to overestimate the level of agreement, a Cohen’s kappa (unweighted) with a 95 % CI was also calculated. A kappa (K) value of 0–0.20 was considered as slight agreement; 0.21–0.40 as fair; 0.41–0.60 as moderate; 0.61–0.80 as substantial; and 0.81–1 as an almost perfect agreement [26].

To verify the level of inter-rater reliability of time spent on interventions, an intra-class correlation coefficient (ICC) was computed using a two-way random effects model with absolute agreement. Single measures with a 95 % CI are reported. Values less than 0.40 were considered poor; between 0.40 and 0.59 as fair; 0.60 and 0.74 as good; and between 0.75 and 1.0 as excellent [27].

Bland-Altman plots were used to visualize and quantify agreement between all paired observations per domain. Means and 95 % limits of agreement were calculated and provided visual judgement of how well observers agreed on the amount of time spent on a domain. A smaller range between the upper and lower limits indicates a better agreement. A range of agreement is defined as a mean bias ±1.96 standard deviation (SD) [28, 29]. A one-sample student T-test was performed in order to examine if the difference between observers differed significantly from zero, indicating fixed bias. The statistical significance level was set at p < 0.05.

Ethical considerations

This study was conducted in accordance with the guidelines of Good Clinical Practice [30] which principles have their origin in the Declaration of Helsinki [31]. Approval was obtained from the Medical Ethics Review Board of the University Medical Center Groningen, The Netherlands. Informed consent was obtained from the residents or their legal representatives to allow observers entrance to residents’ rooms. Facility managers did not allow that the two observers entered psycho-geriatric units at the same time as this was considered too disruptive for these residents with cognitive impairments.


The results follow the chronological order in which the four stages occurred. A flowchart of the instruments’ development is provided (Fig. 1).

Fig. 1

Flowchart of instrument development

The initial observations of nurses’ activities were linked to 281 (out of 542) potentially setting-specific NIC interventions resulting in an inventory that was forwarded to the nine experts of the Delphi panel in the first round.

Seven experts responded in the first round. Their median age was 32 (interquartile range [IQR] 25) and working experience five years (IQR 17.5) (Table 1). The experts concurred on 75 interventions that frequently occur in LTIC (I-CVI ≥ 0.86) (Fig. 1). Their written comments suggested the inclusion of another 91 interventions with an I-CVI of 0.57 or 0.71. These 91 interventions were again sent to the seven experts in the second round. Then, six experts with a median age of 27 (IQR 26) years and a working experience of four years (IQR 15.6) (Table 1) responded. Following this, nineteen interventions with an I-CVI ≥ 0.83 were added to the observational instrument (Fig. 1). Subsequently, interventions with an I-CVI of 0.50 and 0.67 (19) were critically reviewed by the research team. Considering their individual experience in long-term care, the research team considered these interventions as relevant (Fig. 1). With this inclusion, the observational instrument comprised 113 interventions (Fig. 1) in 24 classes and six domains (Table 2). The S-CVI/Ave of domains ranged from 0.79 to 0.93. An overview of included NIC domains and classes with examples of interventions is provided in Table 2.

Table 1 Expert characteristics and response to Delphi rounds
Table 2 Included NICa domains and classes with two examples of interventions per class

The feasibility test revealed three additional interventions that frequently occurred in practice: spiritual support (praying), circulatory care: venous insufficiency (e.g., compression therapy), and airway management (e.g., teach usage of prescribed inhalers). This resulted in a final observational instrument of 116 interventions – the GO-LTIC (Groningen Observational instrument for Long-Term Institutional Care).

Concerning the mapping procedure, it appeared that the definition and label of NIC interventions was not always clear enough to assign an observation to, for instance, when to classify an intervention as ‘dressing’ or ‘self-care assistance’. After a consensus discussion with all of the observers it was decided which was the most accurate fit. Consensus discussions continued during the stage of inter-rater reliability testing if necessary. The usability of the GO-LTIC was improved by organizing NIC classes on frequency of occurrence. It was decided that time recordings were rounded to 30 s.

Regarding inter-rater reliability, four nursing assistants, two primary caregivers (nursing assistants with additional training in coordinating care), and one registered nurse were observed during seven day shifts. They performed interventions on 108 residents in four somatic units (n = 44) and three residential care units (n = 62). Two residents’ units were unknown. Residents’ average age was 87.1 years; they were primarily female (n = 81). From the 116 interventions, 55 were identified by observers, and the amount of time was registered (Table 3). Unobserved interventions mainly concerned the safety and behavioral domains.

Table 3 Overview of identified interventions and number of observations

The inter-rater agreement for the identification of interventions yielded from 0.93 to 1.00 except for interventions in the family domain (Table 4). When corrected for chance, substantial to almost perfect agreement was perceived within the domains of basic physiological care (K = 0.67, CI: 0.54–0.81 to K = 0.92, CI: 0.84–0.99) and complex physiological care (K = 0.70, CI: 0.42–0.99 to K = 0.94, CI: 0.82–1.00) (Table 3). Values were fair to almost perfect agreement in the behavioral domain (K = 0.40, CI: 0.00–1.00 to K = 1.00, CI: 1.00), family domain (K = 0.40, CI: 0.12–0.77 to K = 1.00, CI: 0.74–1.00), and health system domain (K = 0.30, CI: 0.00–0.77 to K = 0.76, CI: 0.62–0.90). Interventions in the safety domain were often not identified, resulting in few time recordings, therefore kappa could not be calculated.

Table 4 Point estimates of inter-rater reliability tests per NIC domain

Good to excellent inter-rater reliability for the time spent on interventions was found for the domain of basic physiological care (ICC = 0.64, CI: 0.14–0.89 to ICC = 0.99, CI: 0.99–1.00) and fair to excellent for the domain complex physiological care (ICC = 0.48, CI: 0.07–0.76 to ICC = 0.93, CI: 0.81–0.98). Poor to excellent values were found for the domains behavioral (ICC = 0.00, CI: −0.40–0.40 to ICC = 0.99, CI: 0.95–1.00), safety (ICC = 0.00, CI: −0.40–0.40 to ICC = 0.29, CI: −0.33–0.74), family (ICC = 0.24, CI: −0.18–0.60 to ICC = 1.00, CI: −) and health system (ICC = 0.03, CI: −0.38–0.46 to IC = 0.96, CI: 0.85–0.99).

Bland-Altman plots illustrated differences between observers’ paired observations. The mean differences in domains were: physiological basic 0.53 min (SD 4.34), physiological complex 0.02 min (SD 2.16), behavioral 0.16 (SD 0.99), safety 0.03 (SD 0.29), family −0.25 (SD 1.81), and health system 0.15 min (SD 5.25) (Fig. 2). The one-sample student T-test indicated no significant differences between observers (p > 0.05).

Fig. 2

Bland-Altman plots with mean differences (solid lines) and 95 % confidence intervals (dashed lines) in minutes


This study shows that the GO-LTIC has good content validity and acceptable inter-rater reliability to identify nursing interventions and the amount of time spent on these in LTIC. Based on the conceptual framework of the NIC, the instrument comprises 116 interventions categorized into 24 classes and six domains.

Though the content validity of the GO-LTIC was good (I-CVI ≥ 0.80) for most interventions (n = 94), a limited number of interventions (n = 19) showed a value lower than the cut-off point (0.80). A low I-CVI can mean that experts were not sufficiently proficient [32]. Only working experience was an inclusion criterion. The experts’ identification of interventions may have been complicated since the terms employed in a standardized nursing language such as the NIC lack complete alignment between terms that nurses use during their daily practice [33].

With the exception of interventions in the family domain, reliability assessment concerning the identification of interventions yielded, inter-rater agreements from 0.93 to 1.00, which is in concordance with observational LTIC studies of Dellefield et al. [9] (0.82–0.85) and Munysia et al. [34] (0.90). In order to claim adequate inter-rater reliability, agreement should be 0.90 [35]. When corrected for chance, inter-rater reliability varied between ‘almost perfect’ for the physiological domains (K = 0.67–0.94) and from ‘slight agreement’ to ‘almost perfect’ for the other domains (K = 0.30–1.00). This is lower than a study of Cardona et al. [36] who found a Cohen’s kappa of 0.88. An explanation may be that Cardona et al. [36] used work sampling as a data collection technique while this study conducted structured continuous observations which are labor-intensive [37], therefore, data collector fatigue may have resulted in less accurate recordings. However, in time studies, this technique should be considered as it is more accurate especially when results can affect policy decisions concerning, for example, task allocation [37]. In this study, no data were obtained in psycho-geriatric units which may have resulted in fewer observations, especially in the safety and behavioral domains (e.g., elopement precautions, behavior management). Because the number of observations (= prevalence) influences Cohen’s kappa [38], this may explain the lower values in these domains.

In addition, the observational instrument of Cardona et al. [36] comprised 24 interventions specifically for the use in a locked unit where residents exhibited disruptive behavior. The GO-LTIC comprises 116 interventions for the purpose of examining the time use of nursing staff in different types of units. Ferketich [39] contends that instruments should have a minimal length and represent a specific population and purpose while achieving acceptable support for their reliability and validity. The GO-LTIC showed good content validity and acceptable inter-rater reliability, therefore, it was decided not to exclude any interventions. Furthermore, it has been argued that a greater set of activities in time studies is feasible when data are collected by continuous observations because one observer will observe only one subject [37].

The inter-rater reliability for the amount of time spent on interventions varied, and ICC’s ranged from fair to excellent for the physiological domains (0.48–0.99) and poor to excellent for the other domains (0.00–1.00). Bland Altman plots indicated that the clinical magnitude of most differences in minutes was small. Only the standard deviation of the domains physiological basic and health system exceeded the a priori set acceptable mean bias of 1.96 SD. In addition, a one-sample student T-test showed no statistical significant differences between observers.

Structured observations require trained observers with knowledge of the phenomena under investigation and pretesting of instruments in addition to a category system for classifying [24]. In this study, observers with a nursing background were recruited and trained to map activities performed by nursing staff to the most accurate NIC intervention. This, followed by the feasibility test, contributed to the reliability. An advantage of the GO-LTIC is that it is based on a standardized language whereby the work of staff is uniformly represented. This may increase the comparability of studies and, furthermore, could promote benchmarking of LTIC facilities at local, regional, national, and international levels [33]. The instrument shows good content validity and acceptable reliability in the Dutch LTIC context. As instruments are continuously being used in different circumstances and with other groups of people, reliability and validity are never ending processes [22].


This study describes the potential of the GO-LTIC for examining what interventions nursing staff spend their time on during the process of care. The instrument demonstrates good content validity in the Dutch LTIC context. When the observations are conducted by adequately trained observers with a nursing background, the instrument shows acceptable inter-rater reliability. The value of the GO-LTIC is that it allows for the identification of nursing interventions that are performed for a specific population which could also increase the visibility of nursing staffs’ contribution to quality of care outcomes. Furthermore, if it is known who is doing what and the time involved with this, the GO-LTIC has the potential to enable managers’ decisions regarding task allocation of nursing staff according to their specific scope of practice, resource allocation, and the examination of the costs of services. Furthermore, by using a standardized nursing language, the GO-LTIC may be valuable to the analysis across settings and promote benchmarking of LTIC facilities at local, regional, national, and international levels.


  1. 1.

    Organization for Economic Co-operation and Development/European Commission. A good life in old age? Monitoring and improving quality in long-term care. OECD health policy studies. Paris: OECD Publishing; 2013.

    Google Scholar 

  2. 2.

    Donabedian A. An introduction to quality assurance in health care. New York: Oxford University Press; 2003.

    Google Scholar 

  3. 3.

    Castle NG. Nursing home caregiver staffing levels and quality of care: a literature review. J Appl Gerontol. 2008;27:375–405.

    Article  Google Scholar 

  4. 4.

    Spilsbury K, Hewitt C, Stirk L, Bowman C. The relationship between nurse staffing and quality of care in nursing homes: a systematic review. Int J Nurs Stud. 2011;48:732–50.

    Article  PubMed  Google Scholar 

  5. 5.

    Backhaus R, Verbeek H, van Rossum E, Capezuti E, Hamers JPH. Nurse staffing impact on quality of care in nursing homes: a systematic review of longitudinal studies. J Am Med Dir Assoc. 2014;15:383–93.

    Article  PubMed  Google Scholar 

  6. 6.

    Arling G, Kane RL, Mueller C, Bershadsky J, Degenholtz HB. Nursing effort and quality of care for nursing home residents. Gerontologist. 2007;47:672–82.

    Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Organization for Economic Co-operation and Development. Health at a glance 2013: OECD indicators. Paris: OECD Publishing; 2013.

    Google Scholar 

  8. 8.

    Munyisia EN, Yu P, Hailey D. How nursing staff spend their time on activities in a nursing home: an observational study. J Adv Nurs. 2011;67:1908–17.

    Article  PubMed  Google Scholar 

  9. 9.

    Dellefield ME, Harrington C, Kelly A. Observing how RNs use clinical time in a nursing home: a pilot study. Geriatr Nurs. 2012;33:256–63.

    Article  PubMed  Google Scholar 

  10. 10.

    Qian SY, Yu P, Zhang ZY, Hailey DM, Davy PJ, Nelson MI. The work pattern of personal care workers in two Australian nursing homes: a time-motion study. BMC Health Serv Res. 2012;12:305.

    Article  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Mallidou AA, Cummings GG, Schalm C, Estabrooks CA. Health care aides use of time in a residential long-term care unit: a time and motion study. Int J Nurs Stud. 2013;50:1229–39.

    Article  PubMed  Google Scholar 

  12. 12.

    Ozbolt JG, Saba VK. A brief history of nursing informatics in the United States of America. Nurs Outlook. 2008;56:199–205.

    Article  PubMed  Google Scholar 

  13. 13.

    Bulechek M, Butcher HK, Dochterman JM, Wagner CM. Nursing Interventions Classification (NIC). 6th ed. St. Louis: Mosby/Elsevier; 2013.

    Google Scholar 

  14. 14.

    Dochterman J, Titler M, Wang J, Reed D, Pettit D, Mathew-Wilson M, et al. Describing use of nursing interventions for three groups of patients. J Nurs Scholarsh. 2005;37:57–66.

    Article  PubMed  Google Scholar 

  15. 15.

    Figoski MR, Downey J. Facility charging and Nursing Intervention Classification (NIC): the new dynamic duo. Nurs Econ. 2006;24:102–11.

    PubMed  Google Scholar 

  16. 16.

    Solari-Twadell PA, Hackbarth DP. Evidence for a new paradigm of the ministry of parish nursing practice using the nursing intervention classification system. Nurs Outlook. 2010;58:69–75.

    Article  PubMed  Google Scholar 

  17. 17.

    Hahn JE. Using Nursing Intervention Classification in an advance practice registered nurse-led preventive model for adults aging with developmental disabilities. J Nurs Scholarsh. 2014;46:304–13.

    Article  PubMed  Google Scholar 

  18. 18.

    De Cordova PM, Lucero RJ, Hyun S, Quinlan P, Price K, Stone PW. Using the Nursing Interventions Classification as a potential measure of nurse workload. J Nurs Care Qual. 2010;25:39–45.

    Article  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Souza CA, Jericó Mde C, Perroca MG. Nursing intervention/activity mapping at a Chemotherapy Center: an instrument for workload assessment. Rev Lat Am Enfermagem. 2013;21:492–9.

    Article  PubMed  Google Scholar 

  20. 20.

    Bonfim D, Gaidzinski RR, Santos FM, Gonçalves Cde S, Fugulin FMT. The identification of nursing interventions in primary health care: a parameter for personnel staffing. Rev Esc Enferm USP. 2012;46:1462–70.

    Article  PubMed  Google Scholar 

  21. 21.

    Streiner DL, Norman GR, Cairney J. Health measurement scales. A practical guide to their development and use. 4th ed. Oxford: Oxford University Press; 2008.

    Google Scholar 

  22. 22.

    Streiner DL, Kottner J. Recommendations for reporting the results of studies of instrument and scale development and testing. J Adv Nurs. 2014;70:1970–9.

    Article  PubMed  Google Scholar 

  23. 23.

    Bulechek M, Butcher HK, Dochterman JM, Wagner CM. Nursing Interventions Classification (NIC). 5th ed. St. Louis: Mosby; 2008.

    Google Scholar 

  24. 24.

    Polit DF, Beck CT. Nursing Research. Generating and assessing evidence for nursing practice. 9th ed. Philadelphia: Wolters Kluwer Health/Lippincot Williams & Wilkins; 2012.

    Google Scholar 

  25. 25.

    Bailey JS, Burch MR. Research methods in applied behavior analysis. Thousand Oaks: Sage Publications; 2002.

    Google Scholar 

  26. 26.

    Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.

    CAS  Article  PubMed  Google Scholar 

  27. 27.

    Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess. 1994;6:284–90.

    Article  Google Scholar 

  28. 28.

    Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307–10.

    CAS  Article  PubMed  Google Scholar 

  29. 29.

    Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8:135–60.

    CAS  Article  PubMed  Google Scholar 

  30. 30.

    ICH Expert Working Group. ICH Harmonised tripartite guideline. Guideline for good practice E6(R1). Geneva: International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use; 1996. Accessed 10 October 2015.

    Google Scholar 

  31. 31.

    World Medical Association. Declaration of Helsinki. Ethical principles for medical research involving human subjects. JAMA. 2013;310:2191–4.

    Article  Google Scholar 

  32. 32.

    Polit DF, Beck CT, Owen SV. Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Res Nurs Health. 2007;30:459–67.

    Article  PubMed  Google Scholar 

  33. 33.

    Carrington JM. The usefulness of nursing languages to communicate a clinical event. Comput Inform Nurs. 2012;30:82–8.

    Article  PubMed  Google Scholar 

  34. 34.

    Munyisia E, Yu P, Hailey D. Development and testing of a work measurement tool to assess caregivers’ activities in residential aged care facilities. Stud Health Technol Inform. 2010;160:1226–30.

    PubMed  Google Scholar 

  35. 35.

    Pelletier D, Duffield C. Work sampling: valuable methodology to define nursing practice patterns. Nurs Health Sci. 2003;5:31–8.

    Article  PubMed  Google Scholar 

  36. 36.

    Cardona P, Tappen R, Terrill M, Acosta M, Eusebe M. Nursing staff time allocation in long-term care: a work sampling study. J Nurs Adm. 1997;27:28–36.

    CAS  Article  PubMed  Google Scholar 

  37. 37.

    Finkler SA, Knickman JR, Hendrickson G, Lipkin Jr M, Thompson WG. A comparison of work-sampling and time-and-motion techniques for studies in health services research. Health Serv Res. 1993;28:577–97.

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Feinstein AR, Cicchetti DV. High agreement but low Kappa: I. The problems of two paradoxes. J Clin Epidemiol. 1990;43:543–9.

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    Ferketich S. Focus on psychometrics. Aspects of item analysis. Res Nurs Health. 1991;14:165–8.

    CAS  Article  PubMed  Google Scholar 

Download references


The authors would like to thank all of the nurses and nursing assistants who either provided expert review for the development of the instrument or allowed us to observe them. We extend our thanks to the bachelor nursing students who contributed to the data collection. We thank Wim P. Krijnen for statistical advice and Jenny Hill of American Pen for providing language help. This work was supported by Stichting Innovatie Alliantie, Regional Attention and Action for Knowledge circulation (SIA RAAK-PRO), project number: pro-1-035. To qualify for the grant and meet the aims of the SIA RAAK-PRO, nursing students were deployed to collect data. The SIA-RAAK-PRO monitored the progress of the project and had no involvement in the conduct of the results.

Author information



Corresponding author

Correspondence to Astrid Tuinman.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

AT was responsible for the conception and design, instrument development, acquisition of data, analysis, interpretation of data, and manuscript drafting. MHG contributed to the design, analysis, and interpretation of data. RN and WP contributed to the interpretation of data. PR contributed to the conception and design as well as to the interpretation of data. All authors made critical revisions to the paper for important intellectual content. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tuinman, A., de Greef, M., Nieweg, R. et al. Assessing time use in long-term institutional care: development, validity and inter-rater reliability of the Groningen Observational instrument for Long-Term Institutional Care (GO-LTIC). BMC Nurs 15, 13 (2016).

Download citation


  • Classification
  • Content validity
  • Instrument development
  • Inter-rater reliability
  • Long-term care
  • Nursing intervention
  • Nursing home
  • Nursing staff
  • Observation