Reading note

rated by the same set of judges. For example, 100 job applicants may have completed a cognitive ability test, personality inventory, biodata survey, and a structured panel inter-view. If all applicants were assessed by the same panel of interviewers, then one might want to justify aggregating interviewer scores and then use these aggregate scores as a level 2 predictor of job performance (along with the level 1 predictors of cognitive ability, personality, and life history information). Aggregating interviewer ratings could be jus-tified by estimating IRA using rWG (i.e., calculate one rWG for each of the 100 job applicants) or by estimating IRR + IRA using a two-way ICC. If the researcher were interested in generalizing to other judges, then judges would be treated as a random effects variable, and he or she would calculate the ICC using the two-way random effects ANOVA (where both the target and judge effects are random effects). If the researcher were not interested in generalizing to other judges, then judges would be treated as a fixed effects variable, and he or she would calculate the ICC using a two-way mixed effects ANOVA (where the target effect is a random effect and the judge effect is a fixed effect). Both ICCs are estimated as ICCðA,1Þ= MSR −MSE MSR + ðK − 1ÞMSE + K N MSC − MSEð Þ , ð12Þ

where MSR is the mean square for rows (i.e., targets), MSC is the mean square for

columns (i.e., judges), MSE is the mean square error all obtained from a two-way

ANOVA, and K refers to the number of observations (e.g., ratings or judges) for each of

the N targets. The procedure for interpreting ICC(A,1) values is the same as for ICC(1)

values (i.e., reliability of individual judges’ ratings or an estimate of effect size).

Classification a.k.a. Pattern recognition

Association between patterns x ∈ X and classes y ∈ Y.

• The pattern space X is unspecified. For instance, X = R d .

• The class space Y is an unordered finite set.

Examples:

• Binary classification (Y = {±1}). Fraud detection, anomaly detection,. . .

• Multiclass classification: (Y = {C1 , C2 , . . . CM}) Object recognition, speaker identification, face recognition,. . .

• Multilabel classification: (Y is a power set). Document topic recognition,. . .

• Sequence recognition: (Y contains sequences). Speech recognition, signal identification, . . . .

http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf

We have presented the Data Science Machine: an end-toend system for doing data science with relational data.

P. Domingos, “A few useful things to know about machine learning,” Communications of the ACM, vol. 55, no. 10, pp. 78–87, 2012

A Few Useful Things to Know about Machine Learning: the most mature and widely used machine learnings: classification.

A classifier is a system that inputs (typically) a vector of discrete and/or continuous feature values and outputs a single discrete value, the class.

A learner inputs a training set of examples, and outputs a classifier. The test of the learner is whether this classifier produces the correct output for future examples.

LEARNING = REPRESENTATION + EVALUATION + OPTIMIZATION