K: If you use the Euclidean distance to centroid approach, it may be reasonable to "explore" how many clusters of data points do you have on hand. Using cluster analysis, you may find (from the data) that there is NO four clusters as you claim. There may only be three clusters of data. As a result, a more logical approach of using the "Euclidean distance to centroid" concept is to IDENTIFY THE CENTROIDS (i.e., the actual number of clusters) first. Otherwise, you may be challenged by reviewers how do you know there are four clusters? This question does not happen in the dummy variable approach because the four clusters are DEFINED by you. That is why you use categorization process and dummies. I: 這部分如果我採用以下方式處理是否也是可行?ie.將categorization process考慮在內。首先,根據ISD及IdSD的中位數,將樣本分為High ISD and High IdSD(G1),High ISD and Low IdSD(G2),Low ISD and High IdSD(G3),Low ISD and Low IdSD(G4) 第二步,分別計算G1,G2,G3,G4的重心(x1,y1)(x2,y2)(x3,y3) (x4,y4) 最後,檢定四個重心間的距離是否有顯著差異。 以上步驟如同設定為dummy,並採用cluster analysis的精神? K: Yes, of course you can do that. I said that because "centroid" is a concept related to cluster analysis. I said this problem is less serious when you use dummies for two reasons: (1) using dummies is, by definition, a less precise approach of categorization. Please know that this is a GENERAL categorization system. (2) dummies are "theoretically-assumed" categorization system. People would not ask WHY you would use HI-HI, LO-LO, Hi-LO and LO-HI. In other words, FOUR categories is the tradition in the OB field. But once you talk about centroid of the four groups, people may relate it to cluster analysis. For example, if you do not use cluster analysis, even you know that there are four groups, you MAY be challenged how do you determine the centroid? E.g., If the median of ISD is 4 on a 5-point scale, the median of IdSD is 3 on a 5-point scale, why DON'T you use (4.5,4) as the center of the Hi-Hi group (4.5 is the mid-point between 4 and 5; 4 is the mid-point between 3 and 5 )?
|