-
클러스터링 Clustering 자료 조사 방법론 비교ML 2022. 8. 12. 13:50
# 차원축소
- 해야하는 이유 : 고차원의 경우 distance 의미 없어지기 때문에
https://stats.stackexchange.com/questions/99171/why-is-euclidean-distance-not-a-good-metric-in-high-dimensions/
- 하는 법 : PCA, tsne
https://stats.stackexchange.com/questions/12853/when-do-we-combine-dimensionality-reduction-with-clustering
- tsne
https://gaussian37.github.io/ml-concept-t_sne/
# PCA & Clustering
https://www.quora.com/What-is-the-difference-between-cluster-analysis-and-principal-component-analysis
Pca : linear, nonlinear --> autoencoder
K-means produces spherical clusters of the same radius more or less. This may easily not be the case for your data. Try other algorithms like Gaussian Mixture Models or DBSCAN
https://stackoverflow.com/questions/69872024/understanding-principal-component-analysis-with-k-means-clustering
# Factor Analysis
LSA : https://mambo-coding-note.tistory.com/36
FA : http://contents2.kocw.or.kr/KOCW/document/2016/kunsan/jungkangmo/9.pdf
PCA vs FA: https://ai-times.tistory.com/112
FA: http://commres.net/wiki/factor_analysis
FA: https://towardsdatascience.com/factor-analysis-a-complete-tutorial-1b7621890e42
FA - neg: https://stats.stackexchange.com/questions/220384/cfa-negative-factor-loadings
FA - rotation: https://m.blog.naver.com/shoutjoy/221802826087
추출할 요인의 수 : https://m.blog.naver.com/PostView.naver?isHttpsRedirect=true&blogId=y4769&logNo=220619149297
K means vs kmeans ++ : https://itstory1592.tistory.com/19
# PCA vs tsNE vs umap vs LDA
https://sykflyinginthesky.tistory.com/55
- tsne : https://ratsgo.github.io/machine%20learning/2017/04/28/tSNE/
저차원으로 내렸을때 t분포 최대한 오차 적게(kull baek 쿨백)
- pca vs tsne: https://bcho.tistory.com/1210 (잘 설명해 놓음)
https://stats.stackexchange.com/questions/238538/are-there-cases-where-pca-is-more-suitable-than-t-sne
- tsne vs autoencoder: https://stats.stackexchange.com/questions/340175/why-is-t-sne-not-used-as-a-dimensionality-reduction-technique-for-clustering-or
tsne는 시각화로 많이쓰이고, 차원축소보다는. 차원축소는 오히려 autoencoder
# similarity
1) jaccard similarity : 그냥 카테고리컬, 겹치는 정도
2) (pearson corr) vs (cosine similarity) : 관련된 비율 내린다음에 비교하기
Pearson correlation is cosine similarity between centered vectors.
https://stats.stackexchange.com/questions/235673/is-there-any-relationship-among-cosine-similarity-pearson-correlation-and-z-sc
# 임베딩
나는 countvectorizer --> tfidf vectorizer 쓴 것과 동일.
https://simpling.tistory.com/entry/Embedding-%EC%9D%B4%EB%9E%80-%EB%AC%B4%EC%97%87%EC%9D%B8%EA%B0%80-%EC%9D%B4%ED%95%B4%ED%95%98%EA%B8%B0'ML' 카테고리의 다른 글
Minhash & LSH (0) 2022.11.18 Q. PCA에서 highly correlated 변수를 삭제해주어야하나? (0) 2022.11.07 wilcoxon rank sum test 윌콕슨 순위 검정 (0) 2022.08.09 AB test (0) 2022.08.09 Causal Impact example in python (0) 2022.08.09