The CeHLS-D was developed and evaluated in four phases in this study: Conceptualization, Item generation, Content validation, and Field survey for quantitative psychometric tests.
Phase I: conceptualization
The initial step of developing the new scale was to conceptualize the construct being measured, which considered a target population for whom the developed scale is intended for use . Since the first definition of eHealth literacy by Norman and Skinner in 2006, many definitions have been introduced, but without consensus. Griebel et al.  recently defined eHealth literacy as “a dynamic and context-specific set of individual and social factors as well as technology constraints in the use of digital technologies to search, acquire, comprehend, appraise, communicate, apply and create health information in all contexts of healthcare with the goal of maintaining or improving the quality of life throughout the lifespan” (p. 433), mostly based on the meta-definition proposed by Bautista , but with additional aspects from the definitions of others [28, 29].
The target population for the scale developed in the present study was adults diagnosed with type 2 diabetes. This group encounters or needs health information specific to their disease, treatment, and complex self-management to prevent the onset and progression of complications and to improve quality of life [30, 31]. Based on those perspectives, eHealth literacy was conceptualized in the present study as the abilities and skills to search, acquire, comprehend, appraise, communicate, apply, and create health information specific to diabetes, and its treatment and self-management in internet environments using digital devices, with the goals of improving or maintaining health and preventing complications to improve health-related quality of life. Internet environments in the present study not only refer to the read-only mode of the web but also to participative social media. The digital devices considered included personal computers, mobile phones, and tablets.
Phase II: item generation
For item generation during the development of the new scale, it was important to pool all attributes reflecting the construct being measured. A literature review and a semistructured interview were used as the sources of the attributes in this study. For the comprehensive literature review, a matrix table was constructed based on the above-mentioned eHealth literacy conceptualization. The top row of the matrix contained posited abilities and skills (search, acquire, comprehend, appraise, communicate, apply, and create). In the left column of the matrix, internet environments were posited: static searching portal (e.g., Google and NAVER), email/mobile text messengers (e.g., Gmail, NAVER Mail, KakaoTalk, and WhatsApp), and social network/media sharing (e.g., Facebook, Twitter, and YouTube). From the literature review, the cells of the matrix constructed by overlapping columns and rows were filled with the attributes regarding information on diabetes, and its treatment and self-management.
A semistructured interview was conducted by a trained interviewer (a nursing PhD candidate) in a small room at an outpatient clinic in June 2021. The inclusion criteria for the participants were being at least 19 years old, diagnosed with type 2 diabetes, and an internet user. The appropriate sample size in a qualitative interview is determined by data (attributes) saturation, referring to when collecting more data no longer yields any new data. In this study the interview initially included 20 participants, which is a commonly recommended sample size for research involving qualitative interviews . Those who agreed to participate in the interview were asked to sign an informed-consent form. Each interview was conducted based on the above matrix table, and was recorded and transcribed verbatim. One researcher presented the eHealth literacy-related attributes by using the actual words said by the interviewees and filled in the matrix table. These processes were confirmed and discussed with another expert on eHealth and diabetes care.
Phase III: content validity
Content validity refers to the degree to which each item reflects the construct being measured . A panel of five experts on eHealth literacy, measurement properties, and diabetes care were participated in the content validity. They were asked to respond on how much relevant each item was using a four-point Likert scale (1 = “not relevant,” 2 = “somewhat relevant,” 3 = “quite relevant,” and 4 = “very relevant”).
Analysis of content validity
Content validity was assessed using the item-level content validity index (I-CVI) . The I-CVI was calculated as the proportion of experts who answered “quite relevant” or “very relevant.” If I-CVI > 0.78, the item was considered sufficiently relevant to the eHealth literacy construct. Open questions were also asked to the expert panel to ascertain comprehensiveness (if any of the key construct aspects were missed), comprehensibility (reading level, jargon, and ambiguity), an item response format with a five-point Likert scale ranging from 0 (“not at all”) to 4 (“very much”), and instructions on how to respond to items.
Phase IV: field survey
A cross-sectional survey was conducted to evaluate the internal consistency, measurement invariance, and structural, convergent, and known-groups validities of the CeHLS-D.
Sample and data collection
A convenience sample of 453 participants was recruited from outpatient clinics in multiple hospitals in South Korea from August to December in 2021. The inclusion criteria for the sample were being at least 19 years old, diagnosed with type 2 diabetes, experienced in using digital devices (personal computers, mobile phones, or tablets), and articulate in the Korean language. Trained research assistants met and provided the study information to potential participants at outpatient clinics. Those who agreed to participate in this study were asked to sign an informed-consent form and then to complete questionnaires. All participants were offered remuneration for participation.
For convergent validity, eHealth literacy was expected to be moderately correlated with health literacy, based on previous studies [35, 36]. The Diabetes Health Literacy Scale (DHLS)  was administered in this study as a comparator instrument to assess the convergent validity of the CeHLS-D. The DHLS was developed to measure diabetes-specific health literacy, and comprises 14 items scored on a 5-point Likert scale from 0 to 4. The scale score is the average of all items, with higher scores indicating better health literacy. The DHLS yielded good psychometric properties for content validity, structural validity (χ2/df = 2.41, RMSEA = 0.07, SRMR = 0.04, and CFI = 0.95), convergent validity, criterion validity, internal consistency (Cronbach’s alpha = 0.91), and test–retest reliability (intraclass correlation coefficient = 0.89). Cronbach’s alpha of the scale in the present study was 0.94. Cronbach’s alpha of the scale in the present study was 0.94.
The following question was asked about the frequency of internet use: “How often do you use the internet to seek health information?” There were four response options of “almost no use,” “approximately 1 day a week,” “several days a week,” and “almost every day.” This was administered to assess the known-groups validity of the CeHLS-D, because people who use the internet more frequently have a higher eHealth literacy than those who use it less . If the mean CeHLS-D score increased with the frequency of internet use, the scale was considered to have satisfactory known-groups validity.
According to the systematic review of existing eHealth literacy instruments , a few measurement invariances were tested across groups, including demographic (gender and age), cultural, and physical activity frequency groups. Similarly, the measurement invariances in gender, age, and glycemic control status in the CeHLS-D were presented: male vs. female, ≥ 60 vs. < 60 years old, and glycated hemoglobin A1c (HbA1c) ≤ 6.5% vs. HbA1c > 6.5%. HbA1c values were collected from medical records from within the previous 3 months.
The data were analyzed using SPSS for Windows (version 25), AMOS software (version 25), and the R statistical environment . Missing data were replaced using regression imputation. Mean and standard-deviation values of the items were computed using descriptive statistics. An interitem correlation matrix of all items was conducted, and weakly correlated (r < .30) or redundant (r > .80) items were removed .
For the cross-validation of structural validity, the total sample was split into two subsamples using the SPSS random assignment function. Subsample 1 (n = 231) was used for exploratory factor analysis (EFA) and exploratory graph analysis (EGA), and subsample 2 (n = 222) was used for confirmatory factor analysis (CFA). The sample size of each subsample satisfied 7 times the number of items for EFA and at least 200 cases for CFA [40, 41].
To determine whether the application of EFA to the subsample 1 data was available, the Kaiser-Meyer-Olkin (KMO) test and Bartlett’s test of sphericity were conducted . EFA with varimax rotation was conducted to reduce the number of items and determine their underlying structure. Factors with eigenvalue > 1 were retained, and the results was satisfactory when at least 50–60% of the variance was explained by the factors . Factor loadings higher than 0.70 were considered significant to capture the essence of a factor . The dimensionality and patterns of items clustered together in the EFA were further assessed using EGA, which is a new approach for identifying the dimensions of constructs based on network psychometrics . The EGA involves depicting a network as nodes (test items) that are connected by edges (links) representing the internode strengths (i.e., partial correlations). The EGA was conducted using a graphical least absolute shrinkage and selection operator from the EGAnet package.
CFA was performed on subsample 2 using maximum-likelihood estimation. The CFA model fit was determined using multiple indices: normed χ2 (χ2/df < 3), comparative fit index (CFI) > 0.95, standardized root-mean-square residual (SRMR) < 0.08, and root-mean-square error of approximation (RMSEA) < 0.08 . Supplementary to the CFA, the heterotrait-monotrait ratio of correlations (HTMT) was calculated to determine whether a pair of factors (subscales) derived by CFA was distinctively different from another . An HTMT of < 0.85 suggested that the pair of factors was discriminant .
For internal consistency analysis, traditional Cronbach’s alpha was assessed, with acceptable values ranging from 0.70 to 0.95 . In a more robust manner, McDonald’s omega (ω) was computed with the criterion value of > 0.70 .
Measurement invariance across the gender, age, and glycemic control status groups were analyzed using multigroup CFA (MGCFA). Sample sizes of at least 100 in each of the gender, age, and glycemic control status groups were satisfied for the MGCFA . The MGCFA was tested in the following successive phases using AMOS software : configural invariance model (a baseline model for comparing subsequent invariance tests), metric invariance model (all factor loadings were constrained equally, which is called the measurement weights model), structural covariances model (factor loadings, factor variances, and covariances were constrained equally), and measurement residuals model (factor loadings, factor variances, factor covariances, and error variances were constrained equally). The first two models were given the most attention in practice since the others were considered excessively stringent tests that often are not satisfactory . Configural and metric invariance models were therefore tested in the present study. A CFI change (\(\varDelta\)CFI) of <–0.010, supplemented by either an RMSEA change (\(\varDelta\)RMSEA) of < 0.015 or an SRMR change (\(\varDelta\)SRMR) of < 0.030, indicated invariance on the metric invariance model test . A χ2 difference test is a traditional method for measurement invariance decisions on criteria, but has the limitation of being sensitive to a large sample , and was therefore not used in this study.
Convergent validity was analyzed using Pearson’s correlation coefficient. Known-groups validity was tested using one-way analysis of variance (ANOVA). The magnitude of known-groups validity was assessed using the effect size of an eta-squared value (η2), with values of 0.01, 0.06, and 0.14 indicating small, moderate, and large effects, respectively .
The floor and ceiling effects of the potential scores were explored using descriptive statistics, and interpreted if 15% or more of the respondents achieved the lowest and highest scores on the instrument .