[Python.Pandas] Selection과 Drop

Python과 머신러닝/Pandas 데이터 분석

[Python.Pandas] Selection과 Drop

개발자로 취직하기 2020. 12. 18. 05:00

0. 이전 포스트

2020/12/15 - [Python과 머신러닝/Pandas 데이터 분석] - [Python.Pandas] Pandas 기본 - DataFrame, Series, Index의 이해
2020/12/16 - [Python과 머신러닝/Pandas 데이터 분석] - [Python.Pandas] DataFrame Series 추출, loc/iloc 이해하기
2020/12/17 - [Python과 머신러닝/Pandas 데이터 분석] - [Python.Pandas] DataFrame / Series 간 Operation 이해

[Python.Pandas] Pandas 기본 - DataFrame, Series, Index의 이해

1. 판다스란? 판다스란 구조화된 데이터의 처리를 지원하는 python 라이브러리이다. Python 계의 엑셀이라고 표현하면 가장 정확한 표현이라고 생각한다. NumPy와 통합하여 강력한 스프레드시트 처리

coding-grandpa.tistory.com

[Python.Pandas] DataFrame Series 추출, loc/iloc 이해하기

1. Dict Data를 Pandas로 변환하기 In [1]:from pandas import Series, DataFrame import pandas as pd import numpy as np In [2]:raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 'last_..

coding-grandpa.tistory.com

[Python.Pandas] DataFrame / Series 간 Operation 이해

1. Series 간 연산 이해하기 In [1]:import pandas as pd from pandas import Series, DataFrame import numpy as np In [2]:s1 = Series(range(1,6), index= list("abced")) s1 Out[2]:a 1 b 2 c 3 e 4 d 5 dtype..

coding-grandpa.tistory.com

1. 원하는 Row만 선택하기 loc과 iloc

원하는 Row만 선택하기 위해서는 지난 포스트에서 정리한 loc과 iloc에 대해서 이해하는 것이 가장 중요하다.
2020/12/16 - [Python과 머신러닝/Pandas 데이터 분석] - [Python.Pandas] DataFrame Series 추출, loc/iloc 이해하기

2. Column별 필터 걸기

In [1]:import pandas as pd
In [2]:raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
                   'last_name' : ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'], 
                   'age' : [42, 52, 36, 24, 73], 
                   'city': ['San Francisco', 'Baltimore', 'Miami', 'Douglas', 'Boston']} 
       df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'city'], 
                         index=[1,2,3,4,5]) 
       df[:5]
Out[2]:
	first_name   last_name   age   city
1	Jason        Miller      42    San Francisco
2	Molly        Jacobson    52    Baltimore
3	Tina         Ali         36    Miami
4	Jake         Milner      24    Douglas
5	Amy          Cooze       73    Boston

In [3]:df[['first_name', 'age']]
Out[3]:
	first_name   age
1	Jason        42
2	Molly        52
3	Tina         36
4	Jake         24
5	Amy          73

df[['first_name', 'age']]를 통해 df에서 2개의 column만 추출하여 데이터를 읽어올 수 있다.

3. Row Drop 하기

In [4]:df.drop(1)
Out[4]:
	first_name   last_name   age   city
2	Molly        Jacobson    52    Baltimore
3	Tina         Ali         36    Miami
4	Jake         Milner      24    Douglas
5	Amy          Cooze       73    Boston

In [5]:df.drop([1,3,5])
Out[5]:
	first_name   last_name   age   city
2	Tina         Ali         36    Miami
4	Amy          Cooze       73    Boston

df.drop(1)과 같이 하나씩 row를 drop 할 수 있다.
배열로 row들의 index를 전달하여 여러 개의 row를 한 번에 drop 할 수도 있다.

4. Column Drop 하기

In [6]:df.drop('city', axis = 1)
Out[6]:
	first_name   last_name   age
1	Jason        Miller      42
2	Molly        Jacobson    52
3	Tina         Ali         36
4	Jake         Milner      24
5	Amy          Cooze       73

In [7]:df.drop(['city', 'last_name'], axis = 1)
Out[7]:
	first_name   age
1	Jason        42
2	Molly        52
3	Tina         36
4	Jake         24
5	Amy          73

Row와의 유일한 차이점은 axis=1이라고 명시적으로 지정해줘야 한다는 것이다.
Default axis가 0이기 때문에 row는 지정해주지 않아도 된다.
여기서도 동일하게 drop 이후 배열에 담으면 여러 column을 동시에 drop 할 수도 있다.