Due to the lower maturity compared with other established biometrics, publicly available benchmark databases are limited. Although some researchers have taken the initiative to share their homemade data set, due to the diverse development setups and variables, many have chosen to generate kinase inhibitor ARQ197 in-house data set. Therefore, this section attempts to provide an overview on most of the properties of dataset employed.4.1. Data SizeIt is collectively agreed that experiments that includes large number of subjects better signify the scalability of study. Regrettably most of the studies performed involve only small number of subjects. This is understandable due to various issues and difficulties encountered in data collection process (to be discussed in the following section).
Generally most research works involve less than 50 subjects, with a vast amount as low as 10 to 20 people. Although some research works reported to have involved large number of users (118 [77] and 250 [78] users), only a portion of the population completed the entire experimental cycle. A clear overview on the frequency distribution of data population has been summarized in Figure 3.Figure 3Frequency distribution of data size in keystroke dynamics experiments.4.2. Subject DemographicMost experimental subjects involve people around a researcher’s institute ranging from undergraduate and postgraduate students [74], researchers [55], academicians, and supporting staffs [18, 76]. Although it may be argued that these populations may not be able to represent the global community, but it is still the primary option as it is the closest readily available resource.
Even though several research works has claimed to involve population from broad age distribution (20 to 60) [55, 66, 79], emphasis should be placed on a more important aspect, such as the typing proficiency of these users. Apart from [12], where the whole population consists of skilled typists, others involved untrained typists who are familiar with the input device [80, 81]. However, none of the experiments specifically conducted on users that come from entirely low typing proficiency.4.3. Data TypeIn general, experimental subjects are required to either provide character-based text or purely numerical inputs [82]. The majority of research works with character-based inputs are illustrated in Figure 4. The input type can be further subdivided into long or short text. Short inputs normally consist of username [62, 83], password [84, 85], or text phrase [61, 86], while long inputs are usually referred to paragraphs of text enclosing 100 words or more [87, 88].Figure 4The percentage distribution of various types of input data.Freedom of input is another determinant factor that distinguishes keystroke dynamics Dacomitinib research.