Pendahuluan
Pada LBB (Learning By Building) UL (Unsupervised Learning) ini saya menggunakan dataset yang berasal dari kaggle di https://www.kaggle.com/orgesleka/used-cars-database , dataset tersebut berisikan informasi mobil bekas yang dijual di Ebay-Kleinanzeigen Jerman dengan 20 variabel didalamnya. Saya akan mencoba meng-cluster-kan data-data tersebut, untuk menentukan cluster value dari mobil bekas tersebut dengan menggunakan metode k-means dengan kombinasi dengan PCA.
Persiapan data dan eksplorasi
Membaca dataset tersebut dan melihat struktur dari data tersebut
autodata <- read.csv("data_input/autos.csv")
str(autodata)
## 'data.frame': 189349 obs. of 20 variables:
## $ dateCrawled : Factor w/ 164590 levels "2016-03-05 14:06:22",..: 96234 96080 44613 62122 135465 158918 142369 82743 161561 60299 ...
## $ name : Factor w/ 128113 levels "_____AUDI_A4_S_LINE______VOLLAUSSTATUNG______",..: 43627 2401 50197 44557 95036 15652 80956 118428 34979 119710 ...
## $ seller : Factor w/ 2 levels "gewerblich","privat": 2 2 2 2 2 2 2 2 2 2 ...
## $ offerType : Factor w/ 2 levels "Angebot","Gesuch": 1 1 1 1 1 1 1 1 1 1 ...
## $ price : int 480 18300 9800 1500 3600 650 2200 0 14500 999 ...
## $ abtest : Factor w/ 2 levels "control","test": 2 2 2 2 2 2 2 2 1 2 ...
## $ vehicleType : Factor w/ 9 levels "","andere","bus",..: 1 5 9 6 6 8 4 8 3 6 ...
## $ yearOfRegistration : int 1993 2011 2004 2001 2008 1995 2004 1980 2014 1998 ...
## $ gearbox : Factor w/ 3 levels "","automatik",..: 3 3 2 3 3 3 3 3 3 3 ...
## $ powerPS : int 0 190 163 75 69 102 109 50 125 101 ...
## $ model : Factor w/ 251 levels "","1_reihe","100",..: 119 1 120 119 104 13 9 42 58 119 ...
## $ kilometer : int 150000 125000 125000 150000 90000 150000 150000 40000 30000 150000 ...
## $ monthOfRegistration: int 0 5 8 6 7 10 8 7 8 0 ...
## $ fuelType : Factor w/ 8 levels "","andere","benzin",..: 3 5 5 3 5 3 3 3 3 1 ...
## $ brand : Factor w/ 40 levels "alfa_romeo","audi",..: 39 2 15 39 32 3 26 39 11 39 ...
## $ notRepairedDamage : Factor w/ 3 levels "","ja","nein": 1 2 1 3 3 2 3 3 1 1 ...
## $ dateCreated : Factor w/ 97 levels "2014-03-10 00:00:00",..: 83 83 73 76 90 94 91 80 94 76 ...
## $ nrOfPictures : int 0 0 0 0 0 0 0 0 0 0 ...
## $ postalCode : int 70435 66954 90480 91074 60437 33775 67112 19348 94505 27472 ...
## $ lastSeen : Factor w/ 111190 levels "2016-03-05 14:15:08",..: 107390 106918 92424 25353 100813 104198 94797 50176 88971 71117 ...
Dari struktur tersebut terdapat 189349 observasi dengan 20 variable. Untuk memulai proses terlebih dahulu saya melakukan pengecekan dan pembersihan data.
anyNA(autodata)
## [1] FALSE
ternyata tidak ada data yang kosong/NA/NULL . Kemudian saya mencoba filter dataset tersebut supaya data yang didalamnya lebih logis. Mulai dari penentuan harga dari 1 hingga 714000 EURO (dalam ribuan) lalu powerPS lebih dari 14, dimana 1 PS = 0.986hp maka saya bisa tentukan sekitar 200hp/14PS.
filterdata <- autodata[(autodata$price > 0),]
filterdata <- filterdata[(filterdata$price < 714000),]
filterdata <- filterdata[(filterdata$powerPS > 14),]
Setelah itu saya membuat variabel baru yang berisikan value yang bukan kategorial, dalam hal ini field yang ber-type double/integer yang dapat di-scale/dinormalisasi dan di-clustering nantinya. Saya sengaja membuat variabel baru supaya lebih mudah dalam scale dan mengcluster nya.
dataanalisisusedcar <- filterdata %>%
select(c(5,8,10,12))
dataanalisis_nz <- scale(dataanalisisusedcar, center = T, scale=T)
Metode Unsupervised Learning
Untuk mengumpulkan informasi dari UL dengan menggunakan metode K-MEANS maka yang perlu saya lakukan adalah mencari nilai dari K tersebut. Untuk itu saya mencoba membuat plot dan grafik elbow dari data yang sudah discale sebelumnya.
set.seed(100)
hasilauto_pr <- prcomp(dataanalisis_nz)
plot(hasilauto_pr)
wss(dataanalisis_nz)
Dari Plot terlihat 4 bar dimana variansi yang maksimal di variance ke 4. Dan di grafik elbow terlihat hanya sampai cluster 4 untuk bentuk elbow yg optimal. Dari 2 visualisasi tersebut maka saya putuskan untuk menggunakan nilai K nya adalah 4.
usedcar.pr_km4 <- kmeans(dataanalisis_nz, 4)
Dari hasil clustering k-means, saya membuat field baru untuk saya masukkan value tersebut
filterdata$cluster <- as.factor(usedcar.pr_km4$cluster)
hasilhitungcluster <- table(filterdata$cluster)
Berikut adalah hasil jumlah data dari masing-masing cluster
hasilhitungcluster
##
## 1 2 3 4
## 6245 36730 27 121923
Kemudian saya mencoba melihat sekilas dari hasil cluster nya
Cluster 1
cluster1 <- filterdata %>%
filter(cluster==1)
head(cluster1,30)
## dateCrawled
## 1 2016-03-21 01:59:07
## 2 2016-03-16 16:44:10
## 3 2016-03-31 00:59:04
## 4 2016-03-31 16:57:18
## 5 2016-03-21 10:44:34
## 6 2016-03-29 22:50:53
## 7 2016-04-04 23:48:59
## 8 2016-03-11 13:52:52
## 9 2016-03-22 19:56:01
## 10 2016-03-18 12:48:27
## 11 2016-03-30 07:54:22
## 12 2016-03-20 20:42:07
## 13 2016-03-08 10:49:40
## 14 2016-03-23 15:42:44
## 15 2016-03-10 15:55:03
## 16 2016-03-19 19:46:25
## 17 2016-03-22 12:52:55
## 18 2016-03-08 16:53:28
## 19 2016-03-06 12:38:01
## 20 2016-04-02 22:46:59
## 21 2016-04-01 23:56:33
## 22 2016-04-02 22:57:47
## 23 2016-03-27 20:38:01
## 24 2016-04-02 09:51:15
## 25 2016-04-01 19:58:33
## 26 2016-03-22 10:25:18
## 27 2016-03-17 18:56:07
## 28 2016-03-21 05:36:19
## 29 2016-03-25 12:42:09
## 30 2016-03-29 10:38:04
## name
## 1 BMW_435i_Sport_coupe
## 2 Hyundai_Genesis_Coupe_GT_3.8_V6_Automatik
## 3 Mercedes_Benz_GLK_250_BlueTEC_4Matic_Standhzg_Alcantara_Voll
## 4 Ford_Mustang_V8_390cui
## 5 Porsche_Cayman_2.9___PCM/Navi___Sport_Chrono___T\xdcV_neu
## 6 Volkswagen_Caravelle_Lang_DSG_Navi_Standheiz._VOLL
## 7 Audi_A6_3.0_TDI_competition_S_Line_LED_BOSE
## 8 Mercedes_Benz_CLS_350_CDI_4Matic_7G_TRONIC
## 9 Mercedes_Benz_ML_350_CDI_4Matic_7G_TRONIC_DPF_BRABUS_AMG
## 10 Corvette_C1_1959_top_Zustand
## 11 Audi_A3
## 12 BMW_X6_xDrive30d_M_Sportpaket_Garantie_bis_2018
## 13 Mercedes_Benz_E350_CGI_Coup\xe9_AMG_|_LEDER_|_SCHIEBEDACH_|_T\xdcV
## 14 Audi_A7_3.0_TDI_multitronic_V6
## 15 Mercedes_Benz_SL_280
## 16 Audi_S5_Exclusive___TipTronic_4.2L_V8_Quattro_Coupe_+
## 17 Audi_Q5_3.0_TDI_quattro_S_tronic
## 18 Audi_A4_allroad_quattro_3.0_TDI_DPF_S_tronic
## 19 Mercedes_Benz_C_63_AMG
## 20 Mercedes_Benz_SL_320
## 21 Audi_A6_Avant_3.0_TDI_DPF_quattro_tiptronic
## 22 VW_T5_Multivan_Special__AHK__Fahrradtraeger
## 23 BMW_435d_Cabrio_xDrive_Sport_Aut.
## 24 Ford_Escort_MK_1_Cosworth_H_Zulassung
## 25 Porsche_Boxster_PDK
## 26 Ford_Kuga_2.0_TDCi_4x4_Individual__Navi__Xenon__AHK
## 27 Volkswagen_T5_California
## 28 Mercedes_Benz_E_350_CDI_DPF_Cabrio_BlueEFFICIENCY_7G_TRONIC
## 29 Porsche_997_GT3_MKII_EIN_SAMMLERST\xdcCK_IN_VOLLAUSSTATTUNG
## 30 BMW_135i_Coupe_Aut.
## seller offerType price abtest vehicleType yearOfRegistration
## 1 privat Angebot 39600 test coupe 2014
## 2 privat Angebot 22999 control coupe 2012
## 3 privat Angebot 41900 control suv 2014
## 4 privat Angebot 25000 control coupe 1968
## 5 privat Angebot 25900 test coupe 2009
## 6 privat Angebot 26999 test bus 2013
## 7 privat Angebot 56900 control limousine 2015
## 8 privat Angebot 48000 test coupe 2014
## 9 privat Angebot 27500 test suv 2009
## 10 privat Angebot 90500 control cabrio 1959
## 11 privat Angebot 29980 control limousine 2013
## 12 privat Angebot 43900 test limousine 2013
## 13 privat Angebot 24900 test coupe 2009
## 14 privat Angebot 27500 test coupe 2010
## 15 privat Angebot 36500 test cabrio 1981
## 16 privat Angebot 30900 control coupe 2011
## 17 privat Angebot 28900 control limousine 2010
## 18 privat Angebot 28499 test kombi 2011
## 19 privat Angebot 41900 test limousine 2012
## 20 privat Angebot 24800 test cabrio 1996
## 21 privat Angebot 24999 control kombi 2011
## 22 privat Angebot 28300 test bus 2012
## 23 privat Angebot 47999 control cabrio 2014
## 24 privat Angebot 34900 control limousine 1968
## 25 privat Angebot 43900 control cabrio 2014
## 26 privat Angebot 25950 test suv 2014
## 27 privat Angebot 44900 control bus 2015
## 28 privat Angebot 30600 test cabrio 2010
## 29 privat Angebot 139997 test coupe 2010
## 30 privat Angebot 26500 control coupe 2010
## gearbox powerPS model kilometer monthOfRegistration fuelType
## 1 automatik 306 andere 30000 7 benzin
## 2 automatik 303 andere 50000 4 benzin
## 3 automatik 204 glk 40000 9 diesel
## 4 automatik 305 mustang 90000 11 benzin
## 5 manuell 265 andere 125000 5 benzin
## 6 automatik 140 transporter 80000 10 diesel
## 7 automatik 326 a6 5000 7 diesel
## 8 automatik 265 andere 80000 10 diesel
## 9 automatik 224 m_klasse 150000 12 diesel
## 10 automatik 295 andere 90000 7 benzin
## 11 manuell 150 a3 50000 9 diesel
## 12 automatik 245 x_reihe 40000 4 diesel
## 13 automatik 292 e_klasse 125000 6 benzin
## 14 automatik 204 andere 100000 11 diesel
## 15 manuell 185 sl 150000 6 benzin
## 16 automatik 354 a5 60000 11 benzin
## 17 automatik 239 q5 125000 10 diesel
## 18 automatik 239 a4 90000 9 diesel
## 19 automatik 457 c_klasse 60000 7 benzin
## 20 automatik 231 sl 50000 8 benzin
## 21 automatik 239 a6 80000 2 diesel
## 22 manuell 114 transporter 40000 8 diesel
## 23 automatik 313 andere 60000 8 diesel
## 24 manuell 220 escort 5000 11 benzin
## 25 automatik 265 boxster 10000 2 benzin
## 26 manuell 163 kuga 30000 7 diesel
## 27 manuell 140 transporter 10000 5 diesel
## 28 automatik 231 e_klasse 80000 5 diesel
## 29 manuell 435 911 20000 3 benzin
## 30 automatik 306 1er 40000 1 benzin
## brand notRepairedDamage dateCreated nrOfPictures
## 1 bmw nein 2016-03-21 00:00:00 0
## 2 hyundai nein 2016-03-16 00:00:00 0
## 3 mercedes_benz nein 2016-03-30 00:00:00 0
## 4 ford nein 2016-03-31 00:00:00 0
## 5 porsche nein 2016-03-21 00:00:00 0
## 6 volkswagen nein 2016-03-29 00:00:00 0
## 7 audi nein 2016-04-04 00:00:00 0
## 8 mercedes_benz nein 2016-03-11 00:00:00 0
## 9 mercedes_benz nein 2016-03-22 00:00:00 0
## 10 chevrolet nein 2016-03-18 00:00:00 0
## 11 audi nein 2016-03-30 00:00:00 0
## 12 bmw nein 2016-03-20 00:00:00 0
## 13 mercedes_benz nein 2016-03-08 00:00:00 0
## 14 audi nein 2016-03-23 00:00:00 0
## 15 mercedes_benz nein 2016-03-10 00:00:00 0
## 16 audi nein 2016-03-19 00:00:00 0
## 17 audi nein 2016-03-22 00:00:00 0
## 18 audi nein 2016-03-08 00:00:00 0
## 19 mercedes_benz nein 2016-03-06 00:00:00 0
## 20 mercedes_benz nein 2016-04-02 00:00:00 0
## 21 audi nein 2016-04-01 00:00:00 0
## 22 volkswagen nein 2016-04-02 00:00:00 0
## 23 bmw nein 2016-03-27 00:00:00 0
## 24 ford nein 2016-04-02 00:00:00 0
## 25 porsche nein 2016-04-01 00:00:00 0
## 26 ford nein 2016-03-22 00:00:00 0
## 27 volkswagen nein 2016-03-17 00:00:00 0
## 28 mercedes_benz nein 2016-03-21 00:00:00 0
## 29 porsche nein 2016-03-25 00:00:00 0
## 30 bmw nein 2016-03-29 00:00:00 0
## postalCode lastSeen cluster
## 1 10435 2016-04-03 23:16:31 1
## 2 88167 2016-04-06 20:18:22 1
## 3 82131 2016-04-06 01:17:24 1
## 4 74547 2016-04-06 10:45:49 1
## 5 93053 2016-03-25 06:17:46 1
## 6 83684 2016-04-06 11:47:14 1
## 7 45525 2016-04-07 04:16:57 1
## 8 96123 2016-04-05 08:47:16 1
## 9 54636 2016-04-06 13:16:23 1
## 10 94032 2016-04-03 21:17:33 1
## 11 59846 2016-04-06 23:46:17 1
## 12 50859 2016-04-07 04:46:53 1
## 13 30453 2016-03-12 01:16:12 1
## 14 49406 2016-03-23 15:42:44 1
## 15 63897 2016-04-05 14:45:46 1
## 16 30159 2016-04-07 07:17:35 1
## 17 89180 2016-04-06 01:44:23 1
## 18 4416 2016-03-11 11:46:33 1
## 19 67657 2016-03-21 11:46:17 1
## 20 53773 2016-04-06 23:45:42 1
## 21 67435 2016-04-06 04:16:23 1
## 22 13467 2016-04-07 03:45:44 1
## 23 46569 2016-04-05 18:45:45 1
## 24 47445 2016-04-04 06:50:17 1
## 25 55128 2016-04-02 15:38:28 1
## 26 57080 2016-04-06 07:44:24 1
## 27 78166 2016-04-07 06:16:53 1
## 28 71229 2016-03-21 09:42:42 1
## 29 71159 2016-04-06 13:45:07 1
## 30 50933 2016-04-05 18:47:08 1
Cluster 2
cluster2 <- filterdata %>%
filter(cluster==2)
head(cluster2,30)
## dateCrawled
## 1 2016-03-31 17:25:20
## 2 2016-04-04 23:42:13
## 3 2016-03-21 12:57:01
## 4 2016-04-01 19:56:48
## 5 2016-03-07 12:51:23
## 6 2016-03-09 11:56:38
## 7 2016-03-25 21:48:47
## 8 2016-03-11 11:50:37
## 9 2016-03-13 15:47:08
## 10 2016-03-17 12:44:43
## 11 2016-03-25 14:40:12
## 12 2016-04-04 10:57:36
## 13 2016-03-22 17:56:12
## 14 2016-03-30 12:49:54
## 15 2016-03-08 12:54:47
## 16 2016-03-07 22:36:54
## 17 2016-03-28 17:41:27
## 18 2016-03-17 09:48:12
## 19 2016-03-11 23:42:53
## 20 2016-03-09 21:46:09
## 21 2016-03-08 13:49:57
## 22 2016-03-10 11:44:54
## 23 2016-03-31 10:53:10
## 24 2016-03-22 17:44:26
## 25 2016-03-12 22:57:58
## 26 2016-03-28 20:45:46
## 27 2016-03-27 20:47:22
## 28 2016-03-23 14:45:57
## 29 2016-03-14 12:54:41
## 30 2016-03-07 12:38:19
## name
## 1 Skoda_Fabia_1.4_TDI_PD_Classic
## 2 Ford_C___Max_Titanium_1_0_L_EcoBoost
## 3 Nissan_Navara_2.5DPF_SE4x4_Klima_Sitzheizg_Bluetooth.Doppelkabine
## 4 Volkswagen_Scirocco_1.4_TSI_Sport
## 5 Honda_Civic_1.4_i_VTEC_Comfort
## 6 Volkswagen_T3_andere
## 7 BMW_325i_Aut.
## 8 Opel_Kadett_E_CC
## 9 Mini_One_Pepper_Scheckheftgepflegt
## 10 Smart_For_two_Klima_regensensor_uSw
## 11 VW_Golf_6___Klima___Alu___Scheckheft_!!!
## 12 Verkaufe_meinen_kleinen_wegen_neu_Anschaffung
## 13 Smart_Cabrio_T\xdcV_bis_07/17
## 14 Skoda_Fabia_1.2
## 15 Volkswagen_Jetta_1.9_TDI_DSG_DPF_Sportline
## 16 BMW_325_i_Cabrio_wenig_Kilometer
## 17 Opel_Astra_1.4_mit_vielen_Extras!!!!
## 18 Nissan_Micra_1.2_CVT
## 19 Mercedes_Benz_E_250_CDI_Mod.2011_Automatik_NAVI_XENON_Glasdach
## 20 Ford_Escort_CLX
## 21 Fiesta_Titanium_1.25
## 22 Audi_A1_1.2_TFSI_S_Line
## 23 BMW_318d_Aut.__Xenon__Navi__Sportsitze_FESTREIS!
## 24 530d_XDRIVE_235_PS
## 25 Volkswagen_Polo_1.4_FSI_Team
## 26 Fabia_II_Combi_Greenline_1_2_TDI_DPF_mit_Gebrauchtwagengarantie
## 27 POLO_1.2_KILIMA_WIE_NEU
## 28 Kia_Sorento__coole_Farbe_schaut......
## 29 Seat_Ibiza_1.2_12V
## 30 Ford_Mustang_GT_V8_Cabrio_Premium_Neuwagenzustand
## seller offerType price abtest vehicleType yearOfRegistration gearbox
## 1 privat Angebot 3600 test kleinwagen 2008 manuell
## 2 privat Angebot 14500 control bus 2014 manuell
## 3 privat Angebot 17999 control suv 2011 manuell
## 4 privat Angebot 10400 control coupe 2009 manuell
## 5 privat Angebot 6900 test limousine 2008 manuell
## 6 privat Angebot 1990 test bus 1981 manuell
## 7 privat Angebot 18000 test limousine 2007 automatik
## 8 privat Angebot 1600 control andere 1991 manuell
## 9 privat Angebot 6990 test limousine 2007 manuell
## 10 privat Angebot 3900 test kleinwagen 2008 automatik
## 11 privat Angebot 7750 control 2017 manuell
## 12 privat Angebot 1400 control 2016 manuell
## 13 privat Angebot 3000 control cabrio 2006 automatik
## 14 privat Angebot 5500 control kleinwagen 2010 manuell
## 15 privat Angebot 7000 test limousine 2006 automatik
## 16 privat Angebot 14999 test cabrio 2007 manuell
## 17 privat Angebot 10900 test 2017 manuell
## 18 privat Angebot 7999 test kleinwagen 2013 manuell
## 19 privat Angebot 20300 test limousine 2010 automatik
## 20 privat Angebot 600 test limousine 1994 manuell
## 21 privat Angebot 6800 control kleinwagen 2009 manuell
## 22 privat Angebot 14500 test kleinwagen 2013 manuell
## 23 privat Angebot 23490 control limousine 2013 automatik
## 24 privat Angebot 7300 control limousine 2009 automatik
## 25 privat Angebot 9290 control kleinwagen 2010 manuell
## 26 privat Angebot 6990 control kombi 2012 manuell
## 27 privat Angebot 6799 control kleinwagen 2009
## 28 privat Angebot 7500 test suv 2007 automatik
## 29 privat Angebot 3200 test kleinwagen 2004 manuell
## 30 privat Angebot 19750 test cabrio 2006 manuell
## powerPS model kilometer monthOfRegistration fuelType
## 1 69 fabia 90000 7 diesel
## 2 125 c_max 30000 8 benzin
## 3 190 navara 70000 3 diesel
## 4 160 scirocco 100000 4 benzin
## 5 99 civic 60000 11 benzin
## 6 50 transporter 5000 1 benzin
## 7 218 3er 20000 5 benzin
## 8 75 kadett 70000 0
## 9 95 one 100000 8 benzin
## 10 61 fortwo 80000 6 benzin
## 11 80 golf 100000 1 benzin
## 12 55 andere 5000 1
## 13 61 fortwo 80000 1 benzin
## 14 60 fabia 70000 4 benzin
## 15 105 jetta 100000 10 diesel
## 16 218 3er 50000 8 benzin
## 17 101 astra 50000 3
## 18 80 micra 40000 4 benzin
## 19 204 e_klasse 80000 12 diesel
## 20 75 escort 100000 6 benzin
## 21 82 fiesta 60000 12 benzin
## 22 86 a1 60000 4 benzin
## 23 143 3er 40000 6 diesel
## 24 235 5er 100000 3 diesel
## 25 86 polo 40000 5 benzin
## 26 75 fabia 100000 5 diesel
## 27 60 20000 5 benzin
## 28 194 sorento 5000 11 benzin
## 29 64 ibiza 80000 2 benzin
## 30 305 mustang 50000 7 benzin
## brand notRepairedDamage dateCreated nrOfPictures
## 1 skoda nein 2016-03-31 00:00:00 0
## 2 ford 2016-04-04 00:00:00 0
## 3 nissan nein 2016-03-21 00:00:00 0
## 4 volkswagen nein 2016-04-01 00:00:00 0
## 5 honda nein 2016-03-07 00:00:00 0
## 6 volkswagen nein 2016-03-09 00:00:00 0
## 7 bmw nein 2016-03-25 00:00:00 0
## 8 opel 2016-03-11 00:00:00 0
## 9 mini nein 2016-03-13 00:00:00 0
## 10 smart 2016-03-17 00:00:00 0
## 11 volkswagen 2016-03-25 00:00:00 0
## 12 hyundai 2016-04-04 00:00:00 0
## 13 smart nein 2016-03-22 00:00:00 0
## 14 skoda nein 2016-03-30 00:00:00 0
## 15 volkswagen nein 2016-03-08 00:00:00 0
## 16 bmw nein 2016-03-07 00:00:00 0
## 17 opel nein 2016-03-28 00:00:00 0
## 18 nissan nein 2016-03-17 00:00:00 0
## 19 mercedes_benz nein 2016-03-11 00:00:00 0
## 20 ford ja 2016-03-09 00:00:00 0
## 21 ford nein 2016-03-08 00:00:00 0
## 22 audi nein 2016-03-10 00:00:00 0
## 23 bmw nein 2016-03-31 00:00:00 0
## 24 bmw nein 2016-03-22 00:00:00 0
## 25 volkswagen nein 2016-03-12 00:00:00 0
## 26 skoda nein 2016-03-28 00:00:00 0
## 27 volkswagen nein 2016-03-27 00:00:00 0
## 28 kia 2016-03-23 00:00:00 0
## 29 seat nein 2016-03-14 00:00:00 0
## 30 ford nein 2016-03-07 00:00:00 0
## postalCode lastSeen cluster
## 1 60437 2016-04-06 10:17:21 2
## 2 94505 2016-04-04 23:42:13 2
## 3 4177 2016-04-06 07:45:42 2
## 4 75365 2016-04-05 16:45:49 2
## 5 12621 2016-03-26 09:44:53 2
## 6 87471 2016-03-10 07:44:33 2
## 7 39179 2016-04-07 04:45:21 2
## 8 2943 2016-04-07 03:46:09 2
## 9 59174 2016-03-21 17:17:50 2
## 10 21073 2016-03-19 11:46:17 2
## 11 48499 2016-03-31 21:47:44 2
## 12 34454 2016-04-06 12:45:43 2
## 13 12055 2016-03-22 17:56:12 2
## 14 57076 2016-04-07 03:44:51 2
## 15 6242 2016-03-11 17:16:18 2
## 16 1129 2016-03-15 10:17:59 2
## 17 63607 2016-04-06 23:15:52 2
## 18 46145 2016-04-06 07:44:23 2
## 19 51491 2016-04-03 01:26:23 2
## 20 37359 2016-04-05 23:44:25 2
## 21 51065 2016-04-05 19:18:01 2
## 22 31582 2016-04-07 01:15:35 2
## 23 49356 2016-04-06 03:44:40 2
## 24 83022 2016-03-22 17:44:26 2
## 25 35630 2016-03-26 06:17:39 2
## 26 44894 2016-04-03 13:58:00 2
## 27 89077 2016-03-27 20:47:22 2
## 28 34314 2016-04-05 15:47:51 2
## 29 29633 2016-04-05 13:15:27 2
## 30 38350 2016-03-12 20:18:29 2
Cluster 3
cluster3 <- filterdata %>%
filter(cluster==3)
head(cluster3,30)
## dateCrawled
## 1 2016-03-10 22:37:21
## 2 2016-03-27 18:47:59
## 3 2016-03-16 13:46:18
## 4 2016-03-12 10:36:18
## 5 2016-03-11 17:40:36
## 6 2016-03-31 18:54:51
## 7 2016-03-12 09:58:46
## 8 2016-03-25 14:55:49
## 9 2016-03-15 15:53:21
## 10 2016-03-31 13:53:03
## 11 2016-03-24 02:58:01
## 12 2016-03-17 10:37:53
## 13 2016-03-19 11:38:23
## 14 2016-04-02 10:49:27
## 15 2016-03-17 19:50:23
## 16 2016-03-11 13:58:35
## 17 2016-04-04 19:49:19
## 18 2016-03-30 16:38:22
## 19 2016-03-12 08:54:32
## 20 2016-03-06 08:52:49
## 21 2016-03-28 01:56:13
## 22 2016-03-16 15:56:43
## 23 2016-03-30 09:53:11
## 24 2016-04-02 08:55:22
## 25 2016-04-02 15:50:55
## 26 2016-03-25 12:55:36
## 27 2016-03-16 21:48:39
## name seller
## 1 Sommer_Auto privat
## 2 VW_Polo_6N_GTI privat
## 3 Reault_megane__cabrio privat
## 4 VW_to_2.5Tdi privat
## 5 Vw_Vento_1.8l_90ps__wenig_kilometer__gti_Ausstattung_top privat
## 6 KIA_CEED_SW_Vision_weiss privat
## 7 2x_VW_Polo_6N1_zum_Basteln_oder_Schlachten_3_und_4_Tuerer privat
## 8 BMW_E39_5er_525i privat
## 9 Twingo_zum_Schlachten_oder_richten! privat
## 10 Mini_MK_2_Austin_Rover privat
## 11 Vw_polo_6n2_1.4_MP?_60PS_tuev_11/16 privat
## 12 Opel_Agila_Tuev_Neu privat
## 13 Audi_a3_2_0_tdi_s_line_plus_125kW_dsg privat
## 14 Mercedes_Clk_w209_320_Avantgarde_voll_Ausstattung privat
## 15 BMW_E30_touring_325i_Projektaufgabe_325e_viele_Teile_Schnitzer privat
## 16 Audi_A4_Quatrro_1.9_TDI privat
## 17 Opel_Corsa_1.0_Motor_ecotek privat
## 18 Mercedes_C_200_CDI_BE_T_Modell_Silber_TOP_GEPFLEGT! privat
## 19 Vw_Move_Up!_H&R!_16'1Hand_Scheckheft_45TKM!!!Winterreifen privat
## 20 VW_Passat_1.9_Tdi_4motion/syncro_Afn privat
## 21 SOMMERREIFEN_215_40_zr_16_86_W privat
## 22 funktionsfaehigen_Opel_Corsa_T\xdcV_2018 privat
## 23 Renault_Kangoo_1_2 privat
## 24 Golf_4_1;4 privat
## 25 BMW_520_Touring__mit__Austauschmoter_38tsd_km_Original_von_BMW privat
## 26 Mitsubishi_carisma privat
## 27 Golf_3/3_Tueren privat
## offerType price abtest vehicleType yearOfRegistration gearbox
## 1 Angebot 2500 control cabrio 1998 manuell
## 2 Angebot 2200 control kleinwagen 1999 manuell
## 3 Angebot 1380 control cabrio 2001 automatik
## 4 Angebot 4700 control bus 1997 manuell
## 5 Angebot 1700 control 2017 manuell
## 6 Angebot 5500 control kombi 2010 manuell
## 7 Angebot 350 control kleinwagen 1995 manuell
## 8 Angebot 2100 control kombi 2001 manuell
## 9 Angebot 120 control kleinwagen 1996 manuell
## 10 Angebot 6500 control kleinwagen 1987 manuell
## 11 Angebot 1150 test 2016 manuell
## 12 Angebot 1750 test limousine 2006 manuell
## 13 Angebot 10900 test limousine 2007 automatik
## 14 Angebot 9000 test 2017 automatik
## 15 Angebot 1749 control kombi 1989 manuell
## 16 Angebot 900 test kombi 1997 manuell
## 17 Angebot 1200 test limousine 2001 manuell
## 18 Angebot 15499 test kombi 2012 manuell
## 19 Angebot 6299 control kleinwagen 2012 manuell
## 20 Angebot 2800 test kombi 1998 manuell
## 21 Angebot 125 test kleinwagen 1999
## 22 Angebot 1000 test kleinwagen 1997 manuell
## 23 Angebot 1150 control bus 2001 manuell
## 24 Angebot 950 control kleinwagen 1998 manuell
## 25 Angebot 11900 control kombi 2007 automatik
## 26 Angebot 1300 test limousine 2000 manuell
## 27 Angebot 850 control limousine 1993 automatik
## powerPS model kilometer monthOfRegistration fuelType
## 1 7512 golf 150000 6
## 2 12012 polo 150000 3 benzin
## 3 10710 megane 150000 10
## 4 10522 transporter 150000 0 diesel
## 5 9010 100000 5 benzin
## 6 11509 ceed 150000 9
## 7 5575 polo 150000 1 benzin
## 8 19208 5er 150000 5
## 9 5815 twingo 150000 9 benzin
## 10 6018 cooper 60000 4 benzin
## 11 6011 polo 150000 5
## 12 6010 agila 90000 8 benzin
## 13 17011 a3 5000 7 diesel
## 14 9000 clk 150000 0 benzin
## 15 17019 3er 5000 12 benzin
## 16 11011 a4 150000 5 diesel
## 17 6512 corsa 150000 12 benzin
## 18 13636 c_klasse 125000 4 diesel
## 19 6062 up 50000 11 benzin
## 20 11025 passat 150000 6 diesel
## 21 11111 corsa 125000 11 benzin
## 22 5420 corsa 150000 0 benzin
## 23 7511 kangoo 150000 11 benzin
## 24 7511 150000 3 benzin
## 25 16311 5er 150000 10 diesel
## 26 12512 carisma 150000 9 benzin
## 27 9012 golf 150000 0 benzin
## brand notRepairedDamage dateCreated nrOfPictures
## 1 volkswagen 2016-03-10 00:00:00 0
## 2 volkswagen 2016-03-27 00:00:00 0
## 3 renault nein 2016-03-16 00:00:00 0
## 4 volkswagen 2016-03-12 00:00:00 0
## 5 volkswagen nein 2016-03-11 00:00:00 0
## 6 kia nein 2016-03-31 00:00:00 0
## 7 volkswagen ja 2016-03-12 00:00:00 0
## 8 bmw ja 2016-03-25 00:00:00 0
## 9 renault ja 2016-03-15 00:00:00 0
## 10 mini 2016-03-31 00:00:00 0
## 11 volkswagen nein 2016-03-24 00:00:00 0
## 12 opel nein 2016-03-17 00:00:00 0
## 13 audi 2016-03-19 00:00:00 0
## 14 mercedes_benz 2016-04-02 00:00:00 0
## 15 bmw 2016-03-17 00:00:00 0
## 16 audi nein 2016-03-11 00:00:00 0
## 17 opel 2016-04-04 00:00:00 0
## 18 mercedes_benz nein 2016-03-30 00:00:00 0
## 19 volkswagen nein 2016-03-12 00:00:00 0
## 20 volkswagen 2016-03-06 00:00:00 0
## 21 opel 2016-03-28 00:00:00 0
## 22 opel nein 2016-03-16 00:00:00 0
## 23 renault nein 2016-03-30 00:00:00 0
## 24 volkswagen nein 2016-04-02 00:00:00 0
## 25 bmw nein 2016-04-02 00:00:00 0
## 26 mitsubishi 2016-03-25 00:00:00 0
## 27 volkswagen 2016-03-16 00:00:00 0
## postalCode lastSeen cluster
## 1 68239 2016-04-05 15:17:50 3
## 2 9526 2016-04-01 19:44:55 3
## 3 71282 2016-03-19 10:45:58 3
## 4 87437 2016-03-12 10:36:18 3
## 5 99706 2016-03-20 02:17:13 3
## 6 15907 2016-04-06 13:15:34 3
## 7 29664 2016-03-16 08:17:18 3
## 8 59556 2016-03-30 05:16:18 3
## 9 73635 2016-03-17 10:46:18 3
## 10 91126 2016-04-06 07:15:55 3
## 11 49477 2016-04-06 19:47:39 3
## 12 10969 2016-03-28 09:44:40 3
## 13 12355 2016-03-21 15:20:06 3
## 14 45699 2016-04-06 08:45:05 3
## 15 89542 2016-03-26 00:46:59 3
## 16 82467 2016-03-19 21:44:26 3
## 17 47198 2016-04-06 22:16:46 3
## 18 24983 2016-04-07 07:15:50 3
## 19 58511 2016-03-12 09:47:05 3
## 20 1665 2016-03-29 16:18:05 3
## 21 42107 2016-04-02 12:44:38 3
## 22 49824 2016-03-27 03:18:07 3
## 23 53757 2016-04-07 00:17:44 3
## 24 84339 2016-04-06 07:17:31 3
## 25 33100 2016-04-06 14:45:57 3
## 26 42105 2016-04-06 14:44:53 3
## 27 14482 2016-04-03 09:17:43 3
Cluster 4
cluster4 <- filterdata %>%
filter(cluster==4)
head(cluster4,30)
## dateCrawled
## 1 2016-03-24 10:58:45
## 2 2016-03-14 12:52:21
## 3 2016-03-17 16:54:04
## 4 2016-04-04 17:36:23
## 5 2016-04-01 20:48:51
## 6 2016-03-17 10:53:50
## 7 2016-03-26 19:54:18
## 8 2016-04-07 10:06:22
## 9 2016-03-15 22:49:09
## 10 2016-03-21 21:37:40
## 11 2016-04-01 12:46:46
## 12 2016-03-20 10:25:19
## 13 2016-03-23 15:48:05
## 14 2016-04-01 22:55:47
## 15 2016-03-27 11:38:00
## 16 2016-03-23 14:52:51
## 17 2016-03-12 19:43:07
## 18 2016-03-13 20:40:49
## 19 2016-03-18 21:44:09
## 20 2016-03-10 19:38:18
## 21 2016-03-08 19:55:19
## 22 2016-04-03 15:48:11
## 23 2016-03-29 16:57:02
## 24 2016-03-17 18:55:12
## 25 2016-03-08 07:54:46
## 26 2016-04-01 17:45:07
## 27 2016-03-25 15:50:30
## 28 2016-03-30 20:38:20
## 29 2016-03-24 00:52:09
## 30 2016-03-29 18:57:46
## name seller
## 1 A5_Sportback_2.7_Tdi privat
## 2 Jeep_Grand_Cherokee_Overland privat
## 3 GOLF_4_1_4__3T\xdcRER privat
## 4 BMW_316i___e36_Limousine___Bastlerfahrzeug__Export privat
## 5 Peugeot_206_CC_110_Platinum privat
## 6 VW_Golf_4_5_tuerig_zu_verkaufen_mit_Anhaengerkupplung privat
## 7 Mazda_3_1.6_Sport privat
## 8 Volkswagen_Passat_Variant_2.0_TDI_Comfortline privat
## 9 VW_Passat_Facelift_35i__7Sitzer privat
## 10 VW_PASSAT_1.9_TDI_131_PS_LEDER privat
## 11 Polo_6n_1_4 privat
## 12 Renault_Twingo_1.2_16V_Aut. privat
## 13 Ford_C_MAX_2.0_TDCi_DPF_Titanium privat
## 14 Mercedes_Benz_A_160_Classic_Klima privat
## 15 BMW_530i_T\xdcV_7/17_Scheckheftgepflegt_sehr_guter_Zustand privat
## 16 Opel_Meriva_1.Hand_T\xdcV_3.2018 privat
## 17 Stadtflitzer privat
## 18 MERCEDES_200E__T\xdcV_04/2016 privat
## 19 BMW_530d_touring_Vollausstattung_NAVI privat
## 20 Citroen_C4_Grand_Picasso. privat
## 21 Fiat_Punto_1.2 privat
## 22 Mercedes_Benz_E_250_D_Original_Zustand_!! privat
## 23 Renault_clio_1.2_T\xdcV_07/2016 privat
## 24 Mercedes_Benz_E_200_CDI_Automatik_Classic privat
## 25 VW_Golf_3 privat
## 26 Abschleppwagen_Vw_LT_195.000_gruene_Plakette_T\xdcV_8/2017 privat
## 27 Mercedes_Camper_D407 privat
## 28 E_500_Avantgarde_AMG_Ausstattung privat
## 29 BMW_E60_530XD privat
## 30 Renault_Clio_1.4 privat
## offerType price abtest vehicleType yearOfRegistration gearbox
## 1 Angebot 18300 test coupe 2011 manuell
## 2 Angebot 9800 test suv 2004 automatik
## 3 Angebot 1500 test kleinwagen 2001 manuell
## 4 Angebot 650 test limousine 1995 manuell
## 5 Angebot 2200 test cabrio 2004 manuell
## 6 Angebot 999 test kleinwagen 1998 manuell
## 7 Angebot 2000 control limousine 2004 manuell
## 8 Angebot 2799 control kombi 2005 manuell
## 9 Angebot 999 control kombi 1995 manuell
## 10 Angebot 2500 control kombi 2004 manuell
## 11 Angebot 300 test 2016
## 12 Angebot 1750 control kleinwagen 2004 automatik
## 13 Angebot 7550 test bus 2007 manuell
## 14 Angebot 1850 test bus 2004 manuell
## 15 Angebot 3699 test limousine 2002 automatik
## 16 Angebot 2900 test 2018 manuell
## 17 Angebot 450 test kleinwagen 1997 manuell
## 18 Angebot 500 test limousine 1990 manuell
## 19 Angebot 2500 control kombi 2002 automatik
## 20 Angebot 5555 control 2017 manuell
## 21 Angebot 690 test kleinwagen 2003 manuell
## 22 Angebot 3300 test limousine 1995 automatik
## 23 Angebot 899 control 2016 manuell
## 24 Angebot 3500 control limousine 2004 automatik
## 25 Angebot 350 test 2016 manuell
## 26 Angebot 11900 test andere 2002 manuell
## 27 Angebot 1500 test bus 1984 manuell
## 28 Angebot 7500 control limousine 2002 automatik
## 29 Angebot 12500 test limousine 2006 automatik
## 30 Angebot 590 control kleinwagen 1999 manuell
## powerPS model kilometer monthOfRegistration fuelType brand
## 1 190 125000 5 diesel audi
## 2 163 grand 125000 8 diesel jeep
## 3 75 golf 150000 6 benzin volkswagen
## 4 102 3er 150000 10 benzin bmw
## 5 109 2_reihe 150000 8 benzin peugeot
## 6 101 golf 150000 0 volkswagen
## 7 105 3_reihe 150000 12 benzin mazda
## 8 140 passat 150000 12 diesel volkswagen
## 9 115 passat 150000 11 benzin volkswagen
## 10 131 passat 150000 2 volkswagen
## 11 60 polo 150000 0 benzin volkswagen
## 12 75 twingo 150000 2 benzin renault
## 13 136 c_max 150000 6 diesel ford
## 14 102 a_klasse 150000 1 benzin mercedes_benz
## 15 231 5er 150000 7 benzin bmw
## 16 90 meriva 150000 5 benzin opel
## 17 50 arosa 150000 5 benzin seat
## 18 118 andere 150000 10 benzin mercedes_benz
## 19 193 5er 150000 9 diesel bmw
## 20 125 c4 125000 4 citroen
## 21 60 punto 150000 3 benzin fiat
## 22 113 e_klasse 150000 1 diesel mercedes_benz
## 23 60 clio 150000 6 benzin renault
## 24 122 e_klasse 150000 11 diesel mercedes_benz
## 25 75 golf 150000 4 benzin volkswagen
## 26 129 andere 150000 11 diesel volkswagen
## 27 70 andere 150000 8 diesel mercedes_benz
## 28 306 e_klasse 150000 4 mercedes_benz
## 29 231 5er 150000 11 diesel bmw
## 30 75 clio 125000 8 benzin renault
## notRepairedDamage dateCreated nrOfPictures postalCode
## 1 ja 2016-03-24 00:00:00 0 66954
## 2 2016-03-14 00:00:00 0 90480
## 3 nein 2016-03-17 00:00:00 0 91074
## 4 ja 2016-04-04 00:00:00 0 33775
## 5 nein 2016-04-01 00:00:00 0 67112
## 6 2016-03-17 00:00:00 0 27472
## 7 nein 2016-03-26 00:00:00 0 96224
## 8 ja 2016-04-07 00:00:00 0 57290
## 9 2016-03-15 00:00:00 0 37269
## 10 nein 2016-03-21 00:00:00 0 90762
## 11 2016-04-01 00:00:00 0 38871
## 12 nein 2016-03-20 00:00:00 0 65599
## 13 nein 2016-03-23 00:00:00 0 88361
## 14 nein 2016-04-01 00:00:00 0 49565
## 15 nein 2016-03-27 00:00:00 0 68309
## 16 nein 2016-03-23 00:00:00 0 49716
## 17 nein 2016-03-12 00:00:00 0 9526
## 18 ja 2016-03-13 00:00:00 0 35390
## 19 ja 2016-03-18 00:00:00 0 73765
## 20 nein 2016-03-10 00:00:00 0 31139
## 21 nein 2016-03-08 00:00:00 0 86199
## 22 nein 2016-04-03 00:00:00 0 53879
## 23 2016-03-29 00:00:00 0 37075
## 24 nein 2016-03-17 00:00:00 0 67071
## 25 nein 2016-03-08 00:00:00 0 19386
## 26 nein 2016-04-01 00:00:00 0 10551
## 27 nein 2016-03-25 00:00:00 0 22767
## 28 2016-03-30 00:00:00 0 33649
## 29 2016-03-23 00:00:00 0 46119
## 30 nein 2016-03-29 00:00:00 0 84180
## lastSeen cluster
## 1 2016-04-07 01:46:50 4
## 2 2016-04-05 12:47:46 4
## 3 2016-03-17 17:40:17 4
## 4 2016-04-06 19:17:07 4
## 5 2016-04-05 18:18:39 4
## 6 2016-03-31 17:17:06 4
## 7 2016-04-06 10:45:34 4
## 8 2016-04-07 10:25:17 4
## 9 2016-04-01 13:16:16 4
## 10 2016-03-23 02:50:54 4
## 11 2016-04-01 12:46:46 4
## 12 2016-04-06 13:16:07 4
## 13 2016-04-05 18:45:11 4
## 14 2016-04-05 22:46:05 4
## 15 2016-04-07 06:44:26 4
## 16 2016-03-31 01:16:33 4
## 17 2016-03-21 01:46:11 4
## 18 2016-03-13 20:40:49 4
## 19 2016-03-18 21:44:09 4
## 20 2016-03-16 09:16:46 4
## 21 2016-03-09 11:45:28 4
## 22 2016-04-05 15:16:05 4
## 23 2016-03-29 17:43:07 4
## 24 2016-03-30 15:46:10 4
## 25 2016-03-08 09:44:50 4
## 26 2016-04-05 12:47:30 4
## 27 2016-03-27 03:17:02 4
## 28 2016-04-03 11:44:49 4
## 29 2016-04-04 16:18:19 4
## 30 2016-03-29 18:57:46 4
Untuk mendukung penyimpulan dari clustering ini saya menggunakan PCA untuk visualisasi nya.
dataanalisis.pca <- PCA(dataanalisis_nz,graph=F)
plot.PCA(dataanalisis.pca,choix = "var",axes=c(1,2))
Kesimpulan
Dari hasil tersebut maka dataset tersebut dapat di-cluster-kan dan menjadi 4 cluster. Dari hasil sekilas data diatas terlihat bahwa cluster 3 merupakan mobil yang high value dimana harga termasuk tinggi dengan kilometer rendah hingga mobil antik yang tahun pembuatannya adalah 1987. Kemudian di cluster ke 4 terlihat memiliki kilometer yang sudah tinggi. Dan cluster 1 dan 2 terlihat value yang variatif. Lalu dilihat dari grafik PCA 4 variabel yang digunakan untuk clustering menunjukkan value panah yang hampir sama dimana menunjukkan bahwa hubungan PC1 dan PC2 kuat sperti powerPS yang tinggi, kilometernya juga tinggi, juga price yang tinggi kilometernya rendah. Oleh sebab itulah dari hasil diatas saya menyimpulkan bahwa mobil dengan cluster 1 dan 2 adalah mobil bekas yang layak untuk di beli untuk dipergunakan harian.