<aside>

</aside>


๐Ÿ“– ์ฑ•ํ„ฐ ์†Œ๊ฐœ

๋ชจ๋“  ๋ฐ์ดํ„ฐ ๋ถ„์„์€ '๋ฐ์ดํ„ฐ' ๊ทธ ์ž์ฒด์—์„œ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ํ›Œ๋ฅญํ•œ ์š”๋ฆฌ์‚ฌ๊ฐ€ ์ข‹์€ ์žฌ๋ฃŒ๋ฅผ ๊ณ ๋ฅด๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, ์šฐ๋ฆฌ๋„ ๋ถ„์„์— ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฒ•์„ ์•Œ์•„์•ผ ํ•˜์ฃ . ์ด๋ฒˆ ์‹œ๊ฐ„์—๋Š” PyCaret์ด ์ œ๊ณตํ•˜๋Š” ํŽธ๋ฆฌํ•œ ๋‚ด์žฅ ๋ฐ์ดํ„ฐ์…‹์„ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฐฉ๋ฒ•๊ณผ, ์ „๋ฌธ๊ฐ€๋ผ๋ฉด ๋ฐ˜๋“œ์‹œ ์•Œ์•„์•ผ ํ•  ๋ฐ์ดํ„ฐ ์ €์ž‘๊ถŒ์— ๋Œ€ํ•ด ๊นŠ์ด ์žˆ๊ฒŒ ๋‹ค๋ค„๋ณผ ๊ฒ๋‹ˆ๋‹ค. ์ž์นซํ•˜๋ฉด ๊ฐ„๊ณผํ•˜๊ธฐ ์‰ฝ์ง€๋งŒ, ์—ฌ๋Ÿฌ๋ถ„์„ ์ง„์ •ํ•œ ํ”„๋กœํŽ˜์…”๋„๋กœ ๋งŒ๋“ค์–ด ์ค„ ์•„์ฃผ ์ค‘์š”ํ•œ ๋‚ด์šฉ์ด๋‹ˆ ์ง‘์ค‘ํ•ด ์ฃผ์„ธ์š”!


๐ŸŽฏ ์ฑ•ํ„ฐ ๋ชฉํ‘œ


๐Ÿ’ป ์ด๋ฒˆ ์ฑ•ํ„ฐ์˜ ์ „์ฒด ์ฝ”๋“œ ๋ฐ ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

์ด๋ฒˆ ์ฑ•ํ„ฐ์˜ ํ•ต์‹ฌ ํ•จ์ˆ˜

๐Ÿ’ก get_data() ํ•จ์ˆ˜ ํ•˜๋‚˜๋งŒ์œผ๋กœ PyCaret์ด ์ œ๊ณตํ•˜๋Š” 50๊ฐœ ์ด์ƒ์˜ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ์ž์œ ๋กญ๊ฒŒ ํƒ์ƒ‰ํ•˜๊ณ  ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

# 1. ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ค€๋น„
from pycaret.datasets import get_data

# 2. ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹ ๋ชฉ๋ก ํ™•์ธํ•˜๊ธฐ
# ๊ฒฐ๊ณผ๋Š” pandas DataFrame ํ˜•ํƒœ๋กœ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.
available_datasets = get_data('index')
print(available_datasets)

# 3. ํŠน์ • ๋ฐ์ดํ„ฐ์…‹(juice) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
# 'juice' ๋ฐ์ดํ„ฐ์…‹์„ ๋ถˆ๋Ÿฌ์™€ juice_df ๋ณ€์ˆ˜์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
juice_df = get_data('juice')

# 4. ๋ถˆ๋Ÿฌ์˜จ ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ (์ƒ์œ„ 5๊ฐœ ํ–‰)
# ์ผ๋ฐ˜์ ์ธ pandas DataFrame๊ณผ ๋˜‘๊ฐ™์ด ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
print(juice_df.head())

์ฝ”๋“œ ์‹คํ–‰ ๊ฒฐ๊ณผ ๋ฏธ๋ฆฌ๋ณด๊ธฐ

get_data('index') ์‹คํ–‰ ๊ฒฐ๊ณผ

์ „์ฒด ๋ฐ์ดํ„ฐ์…‹ ๋ชฉ๋ก๊ณผ ๊ฐ ๋ฐ์ดํ„ฐ์˜ ์ •๋ณด๊ฐ€ DataFrame์œผ๋กœ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.

                  Dataset    Data Types  ... # Attributes  Missing Values
0                 anomaly  Multivariate  ...           10               N
1                  france  Multivariate  ...            8               N
2                 germany  Multivariate  ...            8               N
3                    bank  Multivariate  ...           17               N
4                   blood  Multivariate  ...            5               N
..                    ...           ...  ...          ...             ...
51                   gold  Multivariate  ...          121               N
52                  house  Multivariate  ...           81               Y
53              insurance  Multivariate  ...            7               N
54             parkinsons  Multivariate  ...           22               N
55                traffic  Multivariate  ...            8               N

[56 rows x 8 columns]

get_data('juice').head() ์‹คํ–‰ ๊ฒฐ๊ณผ

'juice' ๋ฐ์ดํ„ฐ์…‹์ด ์„ฑ๊ณต์ ์œผ๋กœ ๋กœ๋“œ๋˜์–ด ์ƒ์œ„ 5๊ฐœ ํ–‰์ด ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.