SMARTPHONE-BASED EVALUATION OF VOICE QUALITY: PRELIMINARY RESULTS ON THE CLINICAL USE OF THE VOICESCREEN APPLICATION IN TURKISH-SPEAKING PARTICIPANTS
2Karadeniz Teknik Üniversitesi, Kulak Burun Boğaz Anabilim Dalı, Trabzon
Summary
Aim: The objective of this study was to present the preliminary results of the efficacy of the VoiceScreen smartphone application in differentiating between normophonic and dysphonic individuals through the measurement of the Acoustic Voice Quality Index (AVQI) among Turkish-speaking participants.Materials and Methods: A total of 30 participants were included in the study, comprising 15 individuals with dysphonia and 15 normophonic participants. Voice recordings were collected utilizing the VoiceScreen application in a clinical setting. The participants read standard text and produced sustained vowel phonation as part of vocal tasks. The application automatically calculated AVQI scores. In addition, participants were assessed using the Voice Handicap Index-10 (VHI-10) and the GRBAS scale.
Results: Results indicate a significant difference in VoiceScreen and VHI-10 scores between dysphonic and normophonic groups (p<0.05). VoiceScreen scores exhibited significant positive correlations with GRBAS (r=0.694, p=0.004) and VHI-10 (r=0.630, p=0.012) within the dysphonic group.
Conclusions: This study's findings indicate that the VoiceScreen app can distinguish between normophonic and dysphonic individuals in Turkish. The results highlight its potential as an accessible and user-friendly tool for initial voice screening in Turkish-speaking clinical populations.
Introduction
The increasing use of evidence-based practices in voice assessment has led to a notable rise in the significance of acoustic analysis of voice[1]. Traditionally, the acoustic voice assessment has primarily focused on sustained vowel production. Nevertheless, depending exclusively on sustained vowels has demonstrated a lack of ecological validity and weak correlations with auditory-perceptual assessment[2]. Maryn et al.[3] have created the Acoustic Voice Quality Index (AVQI) to assess a continuous speech (cs) sample alongside a sustained vowel (sv) phonation sample. AVQI represents a weighted combination of six acoustic measures, including smoothed cepstral peak prominence (CPPS), harmonics to noise ratio, shimmer local, shimmer dB, the slope of Long-Term Average Spectrum (LTAS) and the tilt of the trendline through the LTAS. The AVQI output ranges from zero to ten, and gives a single score. Higher scores are indicative of lower voice quality. A threshold value of AVQI is employed to differentiate between normophonic and dysphonic voices. Based on the utilization of a cs sample, inter-language phonetic variances may affect the AVQI threshold; therefore, several researchers have conducted validation studies to establish the AVQI threshold for different languages[4-7]. While AVQI offers a reliable multiparametric assessment of voice quality, its application in clinical settings has typically relied on specialized recording equipment and acoustically treated environments, which may not always be readily available. Recent research has explored the use of smartphone applications and telemedicine strategies for acoustic voice evaluation[8-10], and some studies demonstrated that smartphone-based measurements, even with experimentally introduced ambient noise, still show a satisfactory correlation with recordings from reference studio microphones[11,12]. In this context, Grillo et al.[13] have introduced an application (VoiceEvalU8) that can automatically compute AVQI and various acoustic parameters on iOS and Android devices utilizing Praat[14] algorithms. Similarly, the VoiceScreen application[15] was developed to enable background noise monitoring, voice recording, and automated AVQI calculation. However, evidence for smartphone-based applications for AVQI is still restricted, and there has not yet been a study examining their use within Turkish-speaking clinical populations. This gap is especially significant considering that clinical settings frequently do not have access to high-quality recording equipment and sound-treated spaces, highlighting the urgent need for practical and accessible tools. Therefore, this study aims to present the preliminary results evaluating the efficacy of the VoiceScreen application in differentiating between patients diagnosed with voice disorders and normophonic individuals in Turkish speakers.Methods
This study received ethical approval from the Cappadocia University Ethics Board (No. E64577500-050.99-44359).
Participants
This study was conducted at the Otorhinolaryngology Outpatient Clinic of Karadeniz Technical University, Faculty of Medicine. The research population consisted of 30 individuals: 15 diagnosed with voice disorders (aged 25-69) and 15 normophonic participants (aged 25-72) with no history of voice disorders. A voluntary consent form was signed by all the participants.
For normophonic participants, the inclusion criteria were no history of laryngeal pathology or voice disorders, no upper respiratory tract infection at the time of voice recording, and receiving a GRBAS (Grade, Roughness, Breathiness, Asthenia, and Strain) score of zero. The criteria for including dysphonic participants were a diagnosis of a voice disorder from an otorhinolaryngologist. Both groups were required to be older than 18 and be native Turkish speakers with literacy skills. The exclusion criteria for both groups comprised a history of neurological or psychiatric disorders that impact speech or voice; hearing loss and an inability to adhere to study instructions due to cognitive or linguistic limitations. The dysphonic participants were recruited from the otolaryngology clinic of the above-mentioned university with voice-related complaints and were diagnosed with a voice disorder by the second author using laryngoscopic examination. Normophonic participants, who fulfilled the inclusion criteria, were chosen from university personnel and personal contacts using intentionally selected samples.
Data Collection with VoiceScreen Application
The primary voice recording instrument for voice analysis was an Apple iPhone SE that was equipped with the VoiceScreen application. The VoiceScreen application supports 17 languages and functions as a self-assessment tool, providing users with clear, step-by-step instructions. The participants were instructed to choose the Turkish language option; subsequently, the application directed participants to position the smartphone 30 cm away from their mouth.
Upon activation of the "Start Test" button, the application initiated the monitoring of background noise. When the noise level went above the threshold, the application prevented recording and presented a warning message advising the participant to relocate to a quieter environment. It was only allowed to record when the background noise level fell within a permitted range. Two vocal tasks were conducted. Initially, participants received instructions to maintain the vowel /a/ for at least four seconds, with a timer visible on the screen to assist in completing the task. Secondly, participants engaged in a reading task, where they read aloud the standardized Turkish sentence utilized in the validation study of AVQI version 2[16] ("Serüvenim, resimde gördüğünüz doğa harikası şu dağ köyünde başladı"). Participants were directed to execute both tasks while maintaining a comfortable and consistent pitch and loudness level. Upon completion of the recording procedure, the application systematically computed the AVQI version 2 score for each participant. The program reported if the score was over or below the Turkish cut-off criterion of 2.98[16] and gave automatic suggestions for those whose score surpassed this limit. The score was recorded for statistical analysis.
Two standardized clinical assessments were administered in addition to the VoiceScreen application. In order to assess the perceived impact of vocal problems on daily life, the Voice Handicap Index-10 (VHI-10)17 was administered to all participants. The GRBAS scale[18] was employed to provide perceptual ratings of voice quality, based on sv and cs samples, using a 4-point ordinal scale, where 0 indicates normal and 3 signifies severe. GRBAS scale was administered independently and blindly by three clinicians with a minimum of five years of experience in the field of vocal disorders. The overall perceptual severity of dysphonia was assessed by summing the scores of the five parameters to derive a total GRBAS score (ranging from 0 to 15), which was subsequently utilized for statistical analysis. The normophonic group was comprised exclusively of participants who achieved a score of 0 on all GRBAS parameters.
Statistical analysis
All statistical analyses were performed utilizing IBM SPSS Statistics for Windows, Version 25.0 (IBM Corp., Armonk, NY, USA). Descriptive statistics were employed to include participant characteristics and voice parameters. Independent samples t-tests were conducted to assess group differences between normophonic and dysphonic subjects. Pearson correlation analyses were used to evaluate the associations among age, VoiceScreen scores, VHI-10 scores, and GRBAS scores.
Results
A total of 30 participants were recruited in the study. In the dysphonic group, there were 15 individuals, comprising five females (33.3%) and ten males (66.7%), with a mean age of 53.47 (SD=12.19) years (range 25-69 years). The normophonic group also included 15 individuals comrising five females (33.3%) and ten males (66.7%), with mean age of 53.93 (SD=12.36) years (range: 25-72 years). In the dysphonic group, the most prevalent diagnoses were laryngeal cancer (n = 2, 13.3%), vocal fold polyps (n = 2, 13.3%), Reinke's edema (n = 2, 13.3%) and vocal fold cyst. Additional diagnoses, each of which was observed in one participant (6.7%), included acute laryngitis, neck mass, dysplasia, leukoplakic lesion, vocal fold cyst, unilateral vocal fold polyp, history of vocal fold irradiation, and unilateral vocal fold paralysis. Participant characteristics and clinical diagnoses are shown in the Table 1.Table 1: Participant Characteristics and Clinical Diagnoses
Table 2: Descriptive Statistics for Normophonic and Dysphonic Participants
The descriptive statistics for the VoiceScreen and VHI-10 scores of normophonic participants are detailed in Table 3. The mean VoiceScreen score for normophonic participants was 3.80667 (SD = 1.576876). The VHI-10 score was 0.93 (SD = 1.668).
Table 3.:Comparison of VoiceScreen and VHI-10 Scores in Normophonic and Dysphonic Participants
Table 4 presents the descriptive statistics for VoiceScreen, VHI-10 and GRBAS Scores in Dysphonic Participants. The mean VoiceScreen score was 5.65 (SD = 2.41), the mean VHI-10 score was 19.87 (SD = 9.14), and the GRBAS score was 6.47 (SD = 3.82).
Table 4: Gender-based Analysis of VoiceScreen, VHI-10 and GRBAS Scores
To compare the VoiceScreen and VHI-10 scores of normophonic and dysphonic participants, independent samples t-test analyses were conducted (Table 5). Statistical analyses revealed significant differences in VoiceScreen and VHI-10 scores between two groups. Dysphonic individuals exhibited significantly elevated VoiceScreen scores (M = 5.65, SD = 2.41) in contrast to normophonic participants (M = 3.81, SD = 1.58), p = 0.020. The VHI-10 scores were significantly elevated in the dysphonic group (M = 19.87, SD = 9.14) compared to the normophonic group (M = 0.93, SD = 1.67), with a statistically significant difference (p < 0.001).
Table 6 presents the gender-based analysis of VoiceScreen and VHI-10 scores in normophonic participants. No statistically significant differences were observed in VoiceScreen and VHI-10 scores between genders (p > 0.05).
Table 7 demonstrates the gender-based analysis of VoiceScreen, VHI-10 and GRBAS scores in dysphonic participants. Dysphonic males exhibited significantly higher VoiceScreen scores compared to females (p = 0.016). The gender differences in VHI-10 and GRBAS scores did not reach statistical significance, although GRBAS values were close to significance with a p-value of 0.051.
The correlation between age, VoiceScreen, and VHI-10 scores among normophonic participants was analyzed using Pearson correlation analysis. Results are displayed in Table 8. No statistically significant correlations were observed between the VHI-10 scores, age, and VoiceScreen scores of normophonic participants. All correlation coefficients were non-significant and weak (p > 0.05). The VoiceScreen and VHI-10 scores of normophonic participants did not exhibit any significant gender differences; however, males exhibited slightly higher scores. Male dysphonic participants exhibited substantially higher VoiceScreen scores (p = 0.016), while GRBAS and VHI-10 were also elevated, yet not statistically significant.
Table 9 displays the Pearson correlation coefficients among age, VoiceScreen scores, VHI-10 scores, and GRBAS scores in participants with dysphonia. There was no significant correlation between age and any of the three voice assessment measures. Conversely, VoiceScreen scores demonstrated significant positive correlations with both VHI-10 (r = 0.630, p = 0.012) and GRBAS scores (r = 0.694, p = 0.004). Furthermore, there was a significant correlation between VHI-10 and GRBAS scores (r = 0.549, p = 0.034). The results indicate that elevated VoiceScreen scores correlate with increased VHI-10 and GRBAS scores.
Discussion
The significant difference in VoiceScreen scores between normophonic and dysphonic subjects highlights the possible effectiveness of the application in distinguishing dysphonic voices. Dysphonic participants had significantly greater VoiceScreen scores (5.65) than normophonics (3.81), which is consistent with previous research that validates the clinical correctness of VoiceScreen app and smartphone-based acoustic evaluations[12]. However, the normophonic group's mean VoiceScreen score (3.81) was higher than the 2.98 threshold stated in the Turkish validation study of AVQI version 2[16]. Methodological variations might be the cause of this variance. The current study used a smartphone-based application in typical clinical situations, whereas Yeşilli-Puzella's[16] work used studio-grade microphones in controlled circumstances. Even among healthy speakers, background noise levels, device hardware, and environmental variability can raise AVQI values[15]. Significant difference of VHI-10 scores between normophonic and dysphonic subjects corresponds to predicted clinical patterns. Participants with normophonia scored below the Turkish VHI-10 cutoff of 7.5, whereas those with dysphonia scored considerably above it[19]. These findings align with previous research demonstrating the VHI-10 as an effective instrument for distinguishing between normal and disordered voice function[20,21].The absence of significant gender differences in VoiceScreen or VHI-10 scores among normophonic subjects may descend from the fact that they don't have any vocal complaints and known vocal fold pathologies. Since they are normophonic individuals, gender may have no effect on either acoustic or perceptual vocal characteristics. Another thing should be considered is that the small and uneven sample numbers of the genres restrict the power of these findings. Previous studies on gender effects in voice measurements have also yielded diverse results[22], consistent with the current study. Lack of correlations between age, VoiceScreen, and VHI-10 scores among normophonic participants indicates that age has no important effect on acoustic voice quality yielded by VoiceScreen or perceived vocal handicap in normophonics. This is consistent with previous results that aging does not reliably impact AVQI[23].
The positive correlation of VoiceScreen scores with GRBAS and VHI-10 in dysphonic group is similar to the previous studies. Pommée et al.[24] reported a comparable relation, observing moderate correlations between the AVQI and the VHI-10, as well as with the "G" parameter of GRBAS. These findings indicate that higher VoiceScreen scores are linked to greater auditory-perceptual severity and perceived voice impairment. The absence of significant relationships with age in the dysphonic group may suggest that age was not a substantial determinant affecting voice severity[23].
The heterogeneous dysphonia group included in our study reflects the various clinical diagnosis that can be seen in voice clinic settings. This approach aligns with previous AVQI validation methodologies, in which diverse-etiology cohorts were included to assess voice quality across real-world clinical populations[22]. Validation studies by Hosokawa et al.[4] and Uloza et al.[11] used similarly heterogeneous samples to establish AVQI's clinical utility as a screening tool. Our heterogeneous participant group enhances the ecological validity and generalizability of our preliminary findings to typical clinical practice, where screening tools must differentiate dysphonic from normophonic voices regardless of underlying etiology.
The heterogeneity of our dysphonia group may have some implications for interpreting our results. First, our findings demonstrate that VoiceScreen can discriminate between dysphonic and normophonic voices across diverse pathologies This means that this app has a clinical utility required of a screening tool. Second, the significant correlations between VoiceScreen scores and both GRBAS (r=0.694, p=0.004) and VHI-10 (r=0.630, p=0.012) within the heterogeneous dysphonia group suggest that AVQI captures voice quality levels consistently across different etiologies. Third, our results give the preliminary evidence of general screening, rather than etiology-specific diagnostic accuracy.
LIMITATIONS
This research provides initial findings and consequently has several limitations. First, the sample size (n=30) was small, potentially limiting the generalizability of the findings. Our participant number is similar to those of the smartphone-based voice assessment preliminary studies[25,26]. While this sample is adequate for establishing preliminary results, future studies with larger samples are needed to establish refined cutoff values for Turkish-speaking populations.
Second, the gender distribution was unbalanced in both normophonic and dysphonic groups, which would have influenced how gender-related differences were interpreted. Future research should focus on incorporating larger and more diverse samples, ensuring balanced gender representation.
Third, the clinical heterogeneity within the dysphonic group was inadequately balanced. Future studies with larger samples classified by etiology are suggested to determine whether VoiceScreen performance varies across specific dysphonia types.
Conclusion
This preliminary research assessed effectiveness of portable technology for voice evaluation utilizing the VoiceScreen application. In clinical environments, technological tools should be practical and reliable. This study's findings indicate that the VoiceScreen app distinguishes between normophonic and dysphonic individuals. The findings indicate its potential as an accessible and user-friendly tool for initial voice screening in Turkish-speaking clinical populations. Further research involving larger and more diverse samples is necessary to validate its clinical utility across a broader spectrum of voice disorders.
Funding
This study did not obtain any specific funding from public, commercial, or non-profit agencies.
Conflict of interest
The authors declare no conflicts of interest.
All authors have read and approved the final manuscript.
Reference
1) Roy N, Barkmeier-Kraemer J, Eadie T, Sivasankar MP, Mehta D, Paul D, et al. Evidence-based clinical voice assessment: a systematic review. Am J Speech Lang Pathol. 2013 May;22(2):212-26. Available from: http://dx.doi.org/10.1044/1058-0360(2012/12-0014). [ Özet ]
2) Maryn Y, Roy N, De Bodt M, Van Cauwenberge P, Corthals P. Acoustic measurement of overall voice quality: a meta-analysis. J Acoust Soc Am. 2009 Nov;126(5):2619-34. Available from: http://dx.doi.org/10.1121/1.3224706. [ Özet ]
3) Maryn Y, Corthals P, Van Cauwenberge P, Roy N, De Bodt M. Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels. J Voice. 2010 Sept;24(5):540-55. Available from: http://dx.doi.org/10.1016/j.jvoice.2008.12.014
4) Hosokawa K, Barsties B, Iwahashi T, Iwahashi M, Kato C, Iwaki S, et al. Validation of the Acoustic Voice Quality Index in the Japanese language. J Voice. 2017 Mar;31(2):260.e1-260.e9. Available from: http://dx.doi.org/10.1016/j.jvoice.2016.05.010
5) Uloza V, Petrauskas T, Padervinskis E, Ulozaite N, Barsties B, Maryn Y. Validation of the Acoustic Voice Quality Index in the Lithuanian language. J Voice. 2017 Mar;31(2):257.e1-257.e11. Available from: http://dx.doi.org/10.1016/j.jvoice.2016.06.002
6) Barsties V Latoszek B, Lehnert B, Janotte B. Validation of the acoustic voice quality index version 03.01 and acoustic breathiness index in German. J Voice. 2020 Jan;34(1):157.e17-157.e25. Available from: http://dx.doi.org/10.1016/j.jvoice.2018.07.026
7) Hernandez D, Gómez L, Jiménez NM, Izquierdo A, Latoszek BV. Validation of the acoustic voice quality index version 03.01 and the acoustic breathiness index in the Spanish language. Rhinology & Laryngology. 2018;127:317-26.
8) Kojima T, Fujimura S, Hori R, Okanoue Y, Shoji K, Inoue M. An innovative voice analyzer "VA" smart phone program for quantitative analysis of voice quality. J Voice. 2019 Sept;33(5):642-8. Available from: http://dx.doi.org/10.1016/j.jvoice.2018.01.026
9) Petrizzo D, Popolo PS. Smartphone use in clinical voice recording and acoustic analysis: A literature review. J Voice. 2021 May;35(3):499.e23-499.e28. Available from: http://dx.doi.org/10.1016/j.jvoice.2019.10.006
10) Baki M, Wood M, Alston G, Ratcliffe M, Sandhu P, Rubin G, et al. Reliability of opera VOX against multidimensional voice program (MDVP). Clinical Otolaryngology. 2015;40:22-8.
11) Ulozaite-Staniene N, Petrauskas T, Saferis V, Uloza V. Exploring the feasibility of the combination of acoustic voice quality index and glottal function index for voice pathology screening. Eur Arch Otorhinolaryngol. 2019 June;276(6):1737-45. Available from: http://dx.doi.org/10.1007/s00405-019-05433-5
12) Uloza V, Ulozaite-Staniene N, Petrauskas T, Kregzdyte R. Accuracy of acoustic voice quality index captured with a smartphone-measurements with added ambient noise. Journal of Voice. 2023;37:465-e484.
13) Grillo EU, Wolfberg J. An assessment of different Praat versions for acoustic measures analyzed automatically by VoiceEvalU8 and manually by two raters. J Voice. 2023 Jan;37(1):17-25. Available from: http://dx.doi.org/10.1016/j.jvoice.2020.12.003
14) Boersma P. Praat, a system for doing phonetics by computer. Glot International. 2002;5(9):341-5.
15) Uloza V, Ulozaite-Staniene N, Petrauskas T. An iOS-based VoiceScreen application: feasibility for use in clinical settings-a pilot study. Eur Arch Otorhinolaryngol. 2023 Jan;280(1):277-84. Available from: http://dx.doi.org/10.1007/s00405-022-07546-w
16) Yeşilli-Puzella G, Tadıhan-Özkan E, Maryn Y. Validation and test-retest reliability of Acoustic Voice Quality Index version 02.06 in the Turkish language. J Voice. 2022 Sept;36(5):736.e25-736.e32. Available from: http://dx.doi.org/10.1016/j.jvoice.2020.08.021
17) Kılıç MA, Okur E, Yıldırım İ, Öğüt F, Denizoğlu İİ, Kızılay A, et al. Ses handikap endeksi voice handicap index Türkçe versiyonunun güvenilirliği ve geçerliliği. The Turkish Journal of Ear Nose and Throat. 2008;18:139-47.
18) Hirano M. Clinical examination of voice. 1981st ed. Vienna, Austria: Springer; 1981. (Disorders of Human Communication).
19) Düzenli-Öztürk S, Ünsal-Akkaya EM, Tetik-Hacıtahiroğlu K, Ozkaraalp İS, Tadıhan-Özkan E. Determining the cutoff score of the Turkish version of the Voice Handicap Index-10. Folia Phoniatr Logop. 2025 July 21;1-7. Available from: http://dx.doi.org/10.1159/000547535
20) Forti S, Amico M, Zambarbieri A, Ciabatta A, Assi C, Pignataro L, et al. Validation of the Italian Voice Handicap Index-10. J Voice. 2014 Mar;28(2):263.e17-263.e22. Available from: http://dx.doi.org/10.1016/j.jvoice.2013.07.013
21) Tafiadis D, Helidoni ME, Chronopoulos SK, Kosma EI, Ziavra N, Velegrakis GA. Cross-cultural adaptation and validation of the Greek Voice Handicap Index-10 (GVHI-10) with additional receiver operating characteristic analysis. J Voice. 2020 Mar;34(2):304.e1-304.e8. Available from: http://dx.doi.org/10.1016/j.jvoice.2018.09.009
22) Jayakumar T, Benoy JJ. Acoustic Voice Quality Index (AVQI) in the measurement of voice quality: A systematic review and meta-analysis. J Voice. 2024 Sept;38(5):1055-69. Available from: http://dx.doi.org/10.1016/j.jvoice.2022.03.018
23) Barsties V Latoszek B, Ulozaite-Staniene N, Maryn Y, Petrauskas T, Uloza V. The influence of gender and age on the Acoustic Voice Quality Index and Dysphonia Severity Index: A normative study. J Voice. 2019 May;33(3):340-5. Available from: http://dx.doi.org/10.1016/j.jvoice.2017.11.011
24) Pommée T, Maryn Y, Finck C, Morsomme D. The Acoustic Voice Quality Index, version 03.01, in French and the Voice Handicap Index. J Voice. 2020 July;34(4):646.e1-646.e10. Available from: http://dx.doi.org/10.1016/j.jvoice.2018.11.017

