데이터분석 전처리 Pandas 필수 함수(iloc, loc, groupby, drop 등) 뽀개기
2021. 1. 16. 05:00ㆍ빅데이터 잡아라/ADP
반응형
- 필수 라이브러리 import
- 데이터 불러오기
In [2]:
import pandas as pd
pd.options.display.max_columns = 20
pd.options.display.max_rows = 10
absolute_path = '../Data/FoodAgricultureOrganization/Food_Agriculture_Organization_UN_Full.csv'
raw_data = pd.read_csv(absolute_path, sep=',')
- 데이터 확인하기
In [3]:
raw_data.head()
Out[3]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AF | 2 | Afghanistan | 2511 | Wheat and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 3249.0 | 3486.0 | 3704.0 | 4164.0 | 4252.0 | 4538.0 | 4605.0 | 4711.0 | 4810 | 4895 |
1 | AF | 2 | Afghanistan | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
2 | AF | 2 | Afghanistan | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 58.0 | 236.0 | 262.0 | 263.0 | 230.0 | 379.0 | 315.0 | 203.0 | 367 | 360 |
3 | AF | 2 | Afghanistan | 2513 | Barley and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 185.0 | 43.0 | 44.0 | 48.0 | 62.0 | 55.0 | 60.0 | 72.0 | 78 | 89 |
4 | AF | 2 | Afghanistan | 2514 | Maize and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 120.0 | 208.0 | 233.0 | 249.0 | 247.0 | 195.0 | 178.0 | 191.0 | 200 | 200 |
5 rows × 63 columns
- 데이터 타입 확인하기
In [4]:
raw_data.dtypes
Out[4]:
Area Abbreviation object Area Code int64 Area object Item Code int64 Item object ... Y2009 float64 Y2010 float64 Y2011 float64 Y2012 int64 Y2013 int64 Length: 63, dtype: object
- string타입으로 바꾸는 astype()
In [5]:
raw_data['Item Code'] = raw_data['Item Code'].astype(str)
- 데이터 정보 확인하기
In [6]:
raw_data.describe()
Out[6]:
Area Code | Element Code | latitude | longitude | Y1961 | Y1962 | Y1963 | Y1964 | Y1965 | Y1966 | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 21477.000000 | 21477.000000 | 21477.000000 | 21477.000000 | 17938.000000 | 17938.000000 | 17938.000000 | 17938.000000 | 17938.000000 | 17938.000000 | ... | 21128.000000 | 21128.000000 | 21373.000000 | 21373.000000 | 21373.000000 | 21373.000000 | 21373.000000 | 21373.000000 | 21477.000000 | 21477.000000 |
mean | 125.449411 | 5211.687154 | 20.450613 | 15.794445 | 195.262069 | 200.782250 | 205.464600 | 209.925577 | 217.556751 | 225.988962 | ... | 486.690742 | 493.153256 | 496.319328 | 508.482104 | 522.844898 | 524.581996 | 535.492069 | 553.399242 | 560.569214 | 575.557480 |
std | 72.868149 | 146.820079 | 24.628336 | 66.012104 | 1864.124336 | 1884.265591 | 1861.174739 | 1862.000116 | 2014.934333 | 2100.228354 | ... | 5001.782008 | 5100.057036 | 5134.819373 | 5298.939807 | 5496.697513 | 5545.939303 | 5721.089425 | 5883.071604 | 6047.950804 | 6218.379479 |
min | 1.000000 | 5142.000000 | -40.900000 | -172.100000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | -169.000000 | -246.000000 |
25% | 63.000000 | 5142.000000 | 6.430000 | -11.780000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
50% | 120.000000 | 5142.000000 | 20.590000 | 19.150000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 6.000000 | 6.000000 | 7.000000 | 7.000000 | 7.000000 | 7.000000 | 7.000000 | 8.000000 | 8.000000 | 8.000000 |
75% | 188.000000 | 5142.000000 | 41.150000 | 46.870000 | 21.000000 | 22.000000 | 23.000000 | 24.000000 | 25.000000 | 26.000000 | ... | 75.000000 | 77.000000 | 78.000000 | 80.000000 | 82.000000 | 83.000000 | 83.000000 | 86.000000 | 88.000000 | 90.000000 |
max | 276.000000 | 5521.000000 | 64.960000 | 179.410000 | 112227.000000 | 109130.000000 | 106356.000000 | 104234.000000 | 119378.000000 | 118495.000000 | ... | 360767.000000 | 373694.000000 | 388100.000000 | 402975.000000 | 425537.000000 | 434724.000000 | 451838.000000 | 462696.000000 | 479028.000000 | 489299.000000 |
8 rows × 57 columns
- columns이 많을때는 describe().T로 transpose해주기!
In [7]:
raw_data.describe().T
Out[7]:
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
Area Code | 21477.0 | 125.449411 | 72.868149 | 1.0 | 63.00 | 120.00 | 188.00 | 276.00 |
Element Code | 21477.0 | 5211.687154 | 146.820079 | 5142.0 | 5142.00 | 5142.00 | 5142.00 | 5521.00 |
latitude | 21477.0 | 20.450613 | 24.628336 | -40.9 | 6.43 | 20.59 | 41.15 | 64.96 |
longitude | 21477.0 | 15.794445 | 66.012104 | -172.1 | -11.78 | 19.15 | 46.87 | 179.41 |
Y1961 | 17938.0 | 195.262069 | 1864.124336 | 0.0 | 0.00 | 1.00 | 21.00 | 112227.00 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
Y2009 | 21373.0 | 524.581996 | 5545.939303 | 0.0 | 0.00 | 7.00 | 83.00 | 434724.00 |
Y2010 | 21373.0 | 535.492069 | 5721.089425 | 0.0 | 0.00 | 7.00 | 83.00 | 451838.00 |
Y2011 | 21373.0 | 553.399242 | 5883.071604 | 0.0 | 0.00 | 8.00 | 86.00 | 462696.00 |
Y2012 | 21477.0 | 560.569214 | 6047.950804 | -169.0 | 0.00 | 8.00 | 88.00 | 479028.00 |
Y2013 | 21477.0 | 575.557480 | 6218.379479 | -246.0 | 0.00 | 8.00 | 90.00 | 489299.00 |
57 rows × 8 columns
- 데이터 전처리 iloc, loc 함수 뽀개기!
- iloc[0]은 행을 뽑아 냅니다. 그런데 별로 안예쁘죠?
In [8]:
raw_data.iloc[0]
Out[8]:
Area Abbreviation AF Area Code 2 Area Afghanistan Item Code 2511 Item Wheat and products ... Y2009 4538 Y2010 4605 Y2011 4711 Y2012 4810 Y2013 4895 Name: 0, Length: 63, dtype: object
- iloc[[0]] 이렇게 괄호 2개를 사용해서 dataframe형식 그대로 뽑아낼 수 있어요
In [9]:
raw_data.iloc[[1]]
Out[9]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | AF | 2 | Afghanistan | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
1 rows × 63 columns
- iloc[행,열] 이 표현은 열을 뽑아낼 수 있어요
In [10]:
raw_data.iloc[:,0]
Out[10]:
0 AF 1 AF 2 AF 3 AF 4 AF .. 21472 ZW 21473 ZW 21474 ZW 21475 ZW 21476 ZW Name: Area Abbreviation, Length: 21477, dtype: object
- iloc[행,열] 이 표현은 열을 뽑아낼 수 있어요, 위와 마찬가지로 dataframe처럼 예쁘게 보일려면 [[]] 괄호2개를 넣어주세요!
In [11]:
raw_data.iloc[:,[0]]
Out[11]:
Area Abbreviation | |
---|---|
0 | AF |
1 | AF |
2 | AF |
3 | AF |
4 | AF |
... | ... |
21472 | ZW |
21473 | ZW |
21474 | ZW |
21475 | ZW |
21476 | ZW |
21477 rows × 1 columns
- 복습! 행을 여러개 뽑아보기
In [12]:
raw_data.iloc[0:5]
Out[12]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AF | 2 | Afghanistan | 2511 | Wheat and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 3249.0 | 3486.0 | 3704.0 | 4164.0 | 4252.0 | 4538.0 | 4605.0 | 4711.0 | 4810 | 4895 |
1 | AF | 2 | Afghanistan | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
2 | AF | 2 | Afghanistan | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 58.0 | 236.0 | 262.0 | 263.0 | 230.0 | 379.0 | 315.0 | 203.0 | 367 | 360 |
3 | AF | 2 | Afghanistan | 2513 | Barley and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 185.0 | 43.0 | 44.0 | 48.0 | 62.0 | 55.0 | 60.0 | 72.0 | 78 | 89 |
4 | AF | 2 | Afghanistan | 2514 | Maize and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 120.0 | 208.0 | 233.0 | 249.0 | 247.0 | 195.0 | 178.0 | 191.0 | 200 | 200 |
5 rows × 63 columns
- 복습! 열을 여러개 뽑아보기
In [13]:
raw_data.iloc[:,0:2]
Out[13]:
Area Abbreviation | Area Code | |
---|---|---|
0 | AF | 2 |
1 | AF | 2 |
2 | AF | 2 |
3 | AF | 2 |
4 | AF | 2 |
... | ... | ... |
21472 | ZW | 181 |
21473 | ZW | 181 |
21474 | ZW | 181 |
21475 | ZW | 181 |
21476 | ZW | 181 |
21477 rows × 2 columns
- 복습! 행과 열을 여러개 뽑아보기
In [14]:
raw_data.iloc[0:5,0:2]
Out[14]:
Area Abbreviation | Area Code | |
---|---|---|
0 | AF | 2 |
1 | AF | 2 |
2 | AF | 2 |
3 | AF | 2 |
4 | AF | 2 |
- loc()함수! 위의 iloc()와 어떤 차이점이 있을지 저도 매우 궁금했어요.
- loc(), iloc() 비교 실습으로 알아봅시다.
- loc여도 행1개를 추출할 수 있어요
In [15]:
raw_data.loc[[1]]
Out[15]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | AF | 2 | Afghanistan | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
1 rows × 63 columns
- iloc와 똑같이 뽑을 수 있어요
In [16]:
raw_data.iloc[[1]]
Out[16]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | AF | 2 | Afghanistan | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
1 rows × 63 columns
- loc함수는 여러개 행을 범위로 이렇게 뽑을 수 없네요
In [17]:
raw_data.loc[[1:3]]
File "<ipython-input-17-32675df71253>", line 1 raw_data.loc[[1:3]] ^ SyntaxError: invalid syntax
- loc함수는 여러개 행을 index를 지정해서 이렇게 뽑을 수 있어요..
In [18]:
raw_data.loc[[1,3]]
Out[18]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | AF | 2 | Afghanistan | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
3 | AF | 2 | Afghanistan | 2513 | Barley and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 185.0 | 43.0 | 44.0 | 48.0 | 62.0 | 55.0 | 60.0 | 72.0 | 78 | 89 |
2 rows × 63 columns
- iloc함수도 마찬가지로 index로 몇개 추출할 수 있습니다.
In [19]:
raw_data.iloc[[1,3]]
Out[19]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | AF | 2 | Afghanistan | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
3 | AF | 2 | Afghanistan | 2513 | Barley and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 185.0 | 43.0 | 44.0 | 48.0 | 62.0 | 55.0 | 60.0 | 72.0 | 78 | 89 |
2 rows × 63 columns
- loc함수는 범위로 뽑을려면 [[]] 괄호가 2개가 없어도 됩니다.
- 그런데, [1:3]은 대부분 1,2 까지 뽑아내지만, loc는 1,2,3 까지 뽑아 냅니다.
In [20]:
raw_data.loc[1:3]
Out[20]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | AF | 2 | Afghanistan | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
2 | AF | 2 | Afghanistan | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 58.0 | 236.0 | 262.0 | 263.0 | 230.0 | 379.0 | 315.0 | 203.0 | 367 | 360 |
3 | AF | 2 | Afghanistan | 2513 | Barley and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 185.0 | 43.0 | 44.0 | 48.0 | 62.0 | 55.0 | 60.0 | 72.0 | 78 | 89 |
3 rows × 63 columns
- iloc함수로 뽑으면 [1:3]이 1,2로 뽑힐 수 있습니다.
In [21]:
raw_data.iloc[1:3]
Out[21]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | AF | 2 | Afghanistan | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
2 | AF | 2 | Afghanistan | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 58.0 | 236.0 | 262.0 | 263.0 | 230.0 | 379.0 | 315.0 | 203.0 | 367 | 360 |
2 rows × 63 columns
- 그럼 loc함수의 장점은 무엇일까요? 이제 나옵니다.
- 열을 뽑을 때 열의 명을 가지고 뽑을 수 있어요. 사실 열의 명으로 열을 뽑는게 더 쉽잖아요! 유용하게 쓸 수 있을 것 같아요
In [22]:
raw_data.loc[[1,3],['Area Code','Area']]
Out[22]:
Area Code | Area | |
---|---|---|
1 | 2 | Afghanistan |
3 | 2 | Afghanistan |
- iloc함수로는 열의 이름으로 뽑을 수 없습니다.
In [23]:
raw_data.iloc[[1,3],['Area Code','Area']]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-23-5375d4a0863b> in <module> ----> 1 raw_data.iloc[[1,3],['Area Code','Area']] ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key) 871 # AttributeError for IntervalTree get_value 872 pass --> 873 return self._getitem_tuple(key) 874 else: 875 # we by definition only have the 0th axis ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup) 1441 def _getitem_tuple(self, tup: Tuple): 1442 -> 1443 self._has_valid_tuple(tup) 1444 try: 1445 return self._getitem_lowerdim(tup) ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py in _has_valid_tuple(self, key) 700 raise IndexingError("Too many indexers") 701 try: --> 702 self._validate_key(k, i) 703 except ValueError as err: 704 raise ValueError( ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_key(self, key, axis) 1361 # check that the key has a numeric dtype 1362 if not is_numeric_dtype(arr.dtype): -> 1363 raise IndexError(f".iloc requires numeric indexers, got {arr}") 1364 1365 # check that the key does not exceed the maximum size of the index IndexError: .iloc requires numeric indexers, got ['Area Code' 'Area']
- loc함수는 열의 명을 범위로 모두 뽑을 수 있더라구요!
- 유용하게 쓸 수 있을 것 같아요
In [24]:
raw_data.loc[[1,3],'Item':'Y2013']
Out[24]:
Item | Element Code | Element | Unit | latitude | longitude | Y1961 | Y1962 | Y1963 | Y1964 | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | 183.0 | 183.0 | 182.0 | 220.0 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
3 | Barley and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | 237.0 | 237.0 | 237.0 | 238.0 | ... | 185.0 | 43.0 | 44.0 | 48.0 | 62.0 | 55.0 | 60.0 | 72.0 | 78 | 89 |
2 rows × 59 columns
- 내가 원하는 값 뽑아내기, 이것은 굳이 loc함수를 안써도 되어요
In [25]:
raw_data.loc[raw_data['Item']=='Sugar beet']
Out[25]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | AF | 2 | Afghanistan | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
103 | AL | 3 | Albania | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 41.15 | 20.17 | ... | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1 | 1 |
699 | AM | 1 | Armenia | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 40.07 | 45.04 | ... | 1.0 | 2.0 | 1.0 | 2.0 | 1.0 | 3.0 | 1.0 | 1.0 | 0 | 1 |
832 | AU | 10 | Australia | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | -25.27 | 133.78 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
1099 | AZ | 52 | Azerbaijan | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 40.14 | 47.58 | ... | 1.0 | 1.0 | 4.0 | 3.0 | 4.0 | 4.0 | 6.0 | 6.0 | 4 | 4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20020 | AE | 225 | United Arab Emirates | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 23.42 | 53.85 | ... | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
20676 | UZ | 235 | Uzbekistan | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 41.38 | 64.59 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
20900 | VE | 236 | Venezuela (Bolivarian Republic of) | 2537 | Sugar beet | 5142 | Food | 1000 tonnes | 6.42 | -66.59 | ... | 17.0 | 20.0 | 23.0 | 21.0 | 21.0 | 21.0 | 30.0 | 35.0 | 20 | 22 |
21135 | YE | 249 | Yemen | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 15.55 | 48.52 | ... | 0.0 | 0.0 | 1.0 | 13.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21374 | ZW | 181 | Zimbabwe | 2537 | Sugar beet | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
66 rows × 63 columns
- 내가 원하는 값 뽑아내기, iloc는 할 수 없어요.
In [26]:
raw_data.iloc[raw_data['Item']=='Sugar beet']
--------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) <ipython-input-26-24cc43d7f2af> in <module> ----> 1 raw_data.iloc[raw_data['Item']=='Sugar beet'] ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key) 877 878 maybe_callable = com.apply_if_callable(key, self.obj) --> 879 return self._getitem_axis(maybe_callable, axis=axis) 880 881 def _is_scalar_access(self, key: Tuple): ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis) 1480 1481 if com.is_bool_indexer(key): -> 1482 self._validate_key(key, axis) 1483 return self._getbool_axis(key, axis=axis) 1484 ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_key(self, key, axis) 1337 if key.index.inferred_type == "integer": 1338 raise NotImplementedError( -> 1339 "iLocation based boolean " 1340 "indexing on an integer type " 1341 "is not available" NotImplementedError: iLocation based boolean indexing on an integer type is not available
- 내가 원하는 값 뽑아내기, 이것은 굳이 loc함수를 안써도 되고 이렇게 그냥 list의 특성으로 뽑아 낼 수 있어요
In [27]:
raw_data[raw_data['Item']=='Sugar beet']
Out[27]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | AF | 2 | Afghanistan | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
103 | AL | 3 | Albania | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 41.15 | 20.17 | ... | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1 | 1 |
699 | AM | 1 | Armenia | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 40.07 | 45.04 | ... | 1.0 | 2.0 | 1.0 | 2.0 | 1.0 | 3.0 | 1.0 | 1.0 | 0 | 1 |
832 | AU | 10 | Australia | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | -25.27 | 133.78 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
1099 | AZ | 52 | Azerbaijan | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 40.14 | 47.58 | ... | 1.0 | 1.0 | 4.0 | 3.0 | 4.0 | 4.0 | 6.0 | 6.0 | 4 | 4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20020 | AE | 225 | United Arab Emirates | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 23.42 | 53.85 | ... | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
20676 | UZ | 235 | Uzbekistan | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 41.38 | 64.59 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
20900 | VE | 236 | Venezuela (Bolivarian Republic of) | 2537 | Sugar beet | 5142 | Food | 1000 tonnes | 6.42 | -66.59 | ... | 17.0 | 20.0 | 23.0 | 21.0 | 21.0 | 21.0 | 30.0 | 35.0 | 20 | 22 |
21135 | YE | 249 | Yemen | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 15.55 | 48.52 | ... | 0.0 | 0.0 | 1.0 | 13.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21374 | ZW | 181 | Zimbabwe | 2537 | Sugar beet | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
66 rows × 63 columns
- 내가 원하는 값 뽑아내기, 같은 값을 2개 뽑아낼 수도 있어요 loc함수로!
In [28]:
raw_data.loc[raw_data['Item']=='Sugar beet','Area']
Out[28]:
10 Afghanistan 103 Albania 699 Armenia 832 Australia 1099 Azerbaijan ... 20020 United Arab Emirates 20676 Uzbekistan 20900 Venezuela (Bolivarian Republic of) 21135 Yemen 21374 Zimbabwe Name: Area, Length: 66, dtype: object
- 저는 loc함수가 이 대목에서 대박이라 생각합니다. 원하는 값을 뽑아내면서, 열 명으로 원하는 열만 추출하기!
In [29]:
raw_data.loc[raw_data['Item']=='Sugar beet',['Area']]
Out[29]:
Area | |
---|---|
10 | Afghanistan |
103 | Albania |
699 | Armenia |
832 | Australia |
1099 | Azerbaijan |
... | ... |
20020 | United Arab Emirates |
20676 | Uzbekistan |
20900 | Venezuela (Bolivarian Republic of) |
21135 | Yemen |
21374 | Zimbabwe |
66 rows × 1 columns
- list의 특성으로는 안되요
In [30]:
raw_data[raw_data['Item']=='Sugar beet',['Area']]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-30-d0ec0ffd7749> in <module> ----> 1 raw_data[raw_data['Item']=='Sugar beet',['Area']] ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key) 2904 if self.columns.nlevels > 1: 2905 return self._getitem_multilevel(key) -> 2906 indexer = self.columns.get_loc(key) 2907 if is_integer(indexer): 2908 indexer = [indexer] ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2896 casted_key = self._maybe_cast_indexer(key) 2897 try: -> 2898 return self._engine.get_loc(casted_key) 2899 except KeyError as err: 2900 raise KeyError(key) from err pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() TypeError: '(0 False 1 False 2 False 3 False 4 False ... 21472 False 21473 False 21474 False 21475 False 21476 False Name: Item, Length: 21477, dtype: bool, ['Area'])' is an invalid key
- 저는 loc함수가 이 대목에서 대박이라 생각합니다2. 원하는 값을 뽑아내면서, 열 명으로 원하는 여러 열도 추출하기!
In [31]:
raw_data.loc[raw_data['Item']=='Sugar beet',['Area','Item','latitude']]
Out[31]:
Area | Item | latitude | |
---|---|---|---|
10 | Afghanistan | Sugar beet | 33.94 |
103 | Albania | Sugar beet | 41.15 |
699 | Armenia | Sugar beet | 40.07 |
832 | Australia | Sugar beet | -25.27 |
1099 | Azerbaijan | Sugar beet | 40.14 |
... | ... | ... | ... |
20020 | United Arab Emirates | Sugar beet | 23.42 |
20676 | Uzbekistan | Sugar beet | 41.38 |
20900 | Venezuela (Bolivarian Republic of) | Sugar beet | 6.42 |
21135 | Yemen | Sugar beet | 15.55 |
21374 | Zimbabwe | Sugar beet | -19.02 |
66 rows × 3 columns
- endswith('many')함수는 값뒤에 'many'가 있으면 추출하는 함수입니다.
In [32]:
raw_data.loc[raw_data['Area'].str.endswith('many')]
Out[32]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7532 | DE | 79 | Germany | 2511 | Wheat and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 8006.0 | 11084.0 | 10644.0 | 8993.0 | 10669.0 | 10608.0 | 8189.0 | 9242.0 | 7868 | 7494 |
7533 | DE | 79 | Germany | 2511 | Wheat and products | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 6524.0 | 6931.0 | 6796.0 | 6886.0 | 6868.0 | 7137.0 | 7235.0 | 7204.0 | 6712 | 6900 |
7534 | DE | 79 | Germany | 2805 | Rice (Milled Equivalent) | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 7.0 | 9.0 | 12.0 | 17.0 | 8.0 | 18.0 | 13.0 | 12.0 | 7 | 7 |
7535 | DE | 79 | Germany | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 206.0 | 212.0 | 226.0 | 244.0 | 251.0 | 257.0 | 243.0 | 271.0 | 275 | 277 |
7536 | DE | 79 | Germany | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 7571.0 | 6878.0 | 7845.0 | 6940.0 | 7255.0 | 7592.0 | 7543.0 | 6062.0 | 6316 | 6598 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7674 | DE | 79 | Germany | 2948 | Milk - Excluding Butter | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 20252.0 | 20878.0 | 21323.0 | 21661.0 | 21091.0 | 22077.0 | 21431.0 | 21171.0 | 21169 | 21401 |
7675 | DE | 79 | Germany | 2960 | Fish, Seafood | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 11.0 | 11.0 | 32.0 | 51.0 | 28.0 | 25.0 | 29.0 | 25.0 | 6 | 5 |
7676 | DE | 79 | Germany | 2960 | Fish, Seafood | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 1122.0 | 1177.0 | 1238.0 | 1287.0 | 1226.0 | 1187.0 | 1175.0 | 1188.0 | 1128 | 1039 |
7677 | DE | 79 | Germany | 2961 | Aquatic Products, Other | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 3.0 | 3.0 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 5.0 | 1 | 0 |
7678 | DE | 79 | Germany | 2928 | Miscellaneous | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
147 rows × 63 columns
- 저는 isin함수를 더 많이 쓰는데, isin(['Germany'])는 Germany 값과 같은 행을 뽑아 냅니다.
In [33]:
raw_data.loc[raw_data['Area'].isin(['Germany'])]
Out[33]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7532 | DE | 79 | Germany | 2511 | Wheat and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 8006.0 | 11084.0 | 10644.0 | 8993.0 | 10669.0 | 10608.0 | 8189.0 | 9242.0 | 7868 | 7494 |
7533 | DE | 79 | Germany | 2511 | Wheat and products | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 6524.0 | 6931.0 | 6796.0 | 6886.0 | 6868.0 | 7137.0 | 7235.0 | 7204.0 | 6712 | 6900 |
7534 | DE | 79 | Germany | 2805 | Rice (Milled Equivalent) | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 7.0 | 9.0 | 12.0 | 17.0 | 8.0 | 18.0 | 13.0 | 12.0 | 7 | 7 |
7535 | DE | 79 | Germany | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 206.0 | 212.0 | 226.0 | 244.0 | 251.0 | 257.0 | 243.0 | 271.0 | 275 | 277 |
7536 | DE | 79 | Germany | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 7571.0 | 6878.0 | 7845.0 | 6940.0 | 7255.0 | 7592.0 | 7543.0 | 6062.0 | 6316 | 6598 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7674 | DE | 79 | Germany | 2948 | Milk - Excluding Butter | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 20252.0 | 20878.0 | 21323.0 | 21661.0 | 21091.0 | 22077.0 | 21431.0 | 21171.0 | 21169 | 21401 |
7675 | DE | 79 | Germany | 2960 | Fish, Seafood | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 11.0 | 11.0 | 32.0 | 51.0 | 28.0 | 25.0 | 29.0 | 25.0 | 6 | 5 |
7676 | DE | 79 | Germany | 2960 | Fish, Seafood | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 1122.0 | 1177.0 | 1238.0 | 1287.0 | 1226.0 | 1187.0 | 1175.0 | 1188.0 | 1128 | 1039 |
7677 | DE | 79 | Germany | 2961 | Aquatic Products, Other | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 3.0 | 3.0 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 5.0 | 1 | 0 |
7678 | DE | 79 | Germany | 2928 | Miscellaneous | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
147 rows × 63 columns
In [34]:
raw_data[raw_data['Area'].isin(['Germany'])]
Out[34]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7532 | DE | 79 | Germany | 2511 | Wheat and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 8006.0 | 11084.0 | 10644.0 | 8993.0 | 10669.0 | 10608.0 | 8189.0 | 9242.0 | 7868 | 7494 |
7533 | DE | 79 | Germany | 2511 | Wheat and products | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 6524.0 | 6931.0 | 6796.0 | 6886.0 | 6868.0 | 7137.0 | 7235.0 | 7204.0 | 6712 | 6900 |
7534 | DE | 79 | Germany | 2805 | Rice (Milled Equivalent) | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 7.0 | 9.0 | 12.0 | 17.0 | 8.0 | 18.0 | 13.0 | 12.0 | 7 | 7 |
7535 | DE | 79 | Germany | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 206.0 | 212.0 | 226.0 | 244.0 | 251.0 | 257.0 | 243.0 | 271.0 | 275 | 277 |
7536 | DE | 79 | Germany | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 7571.0 | 6878.0 | 7845.0 | 6940.0 | 7255.0 | 7592.0 | 7543.0 | 6062.0 | 6316 | 6598 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7674 | DE | 79 | Germany | 2948 | Milk - Excluding Butter | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 20252.0 | 20878.0 | 21323.0 | 21661.0 | 21091.0 | 22077.0 | 21431.0 | 21171.0 | 21169 | 21401 |
7675 | DE | 79 | Germany | 2960 | Fish, Seafood | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 11.0 | 11.0 | 32.0 | 51.0 | 28.0 | 25.0 | 29.0 | 25.0 | 6 | 5 |
7676 | DE | 79 | Germany | 2960 | Fish, Seafood | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 1122.0 | 1177.0 | 1238.0 | 1287.0 | 1226.0 | 1187.0 | 1175.0 | 1188.0 | 1128 | 1039 |
7677 | DE | 79 | Germany | 2961 | Aquatic Products, Other | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 3.0 | 3.0 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 5.0 | 1 | 0 |
7678 | DE | 79 | Germany | 2928 | Miscellaneous | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
147 rows × 63 columns
- 저는 isin함수를 더 많이 쓰는데, isin([])는 2개이상으로 같은 값을 추출할 수 있어요
In [35]:
raw_data[raw_data['Area'].isin(['Germany','France'])]
Out[35]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6910 | FR | 68 | France | 2511 | Wheat and products | 5521 | Feed | 1000 tonnes | 46.23 | 2.21 | ... | 9867.0 | 11311.0 | 10091.0 | 8483.0 | 10283.0 | 8351.0 | 6262.0 | 7727.0 | 7179 | 7822 |
6911 | FR | 68 | France | 2511 | Wheat and products | 5142 | Food | 1000 tonnes | 46.23 | 2.21 | ... | 6057.0 | 6140.0 | 6402.0 | 6102.0 | 6677.0 | 6331.0 | 6986.0 | 6765.0 | 6984 | 6971 |
6912 | FR | 68 | France | 2805 | Rice (Milled Equivalent) | 5521 | Feed | 1000 tonnes | 46.23 | 2.21 | ... | 80.0 | 75.0 | 84.0 | 88.0 | 92.0 | 82.0 | 81.0 | 91.0 | 99 | 101 |
6913 | FR | 68 | France | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 46.23 | 2.21 | ... | 296.0 | 325.0 | 317.0 | 367.0 | 350.0 | 347.0 | 339.0 | 361.0 | 339 | 314 |
6914 | FR | 68 | France | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 46.23 | 2.21 | ... | 3537.0 | 3560.0 | 4014.0 | 3446.0 | 4275.0 | 4428.0 | 4332.0 | 3137.0 | 3163 | 2865 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7674 | DE | 79 | Germany | 2948 | Milk - Excluding Butter | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 20252.0 | 20878.0 | 21323.0 | 21661.0 | 21091.0 | 22077.0 | 21431.0 | 21171.0 | 21169 | 21401 |
7675 | DE | 79 | Germany | 2960 | Fish, Seafood | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 11.0 | 11.0 | 32.0 | 51.0 | 28.0 | 25.0 | 29.0 | 25.0 | 6 | 5 |
7676 | DE | 79 | Germany | 2960 | Fish, Seafood | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 1122.0 | 1177.0 | 1238.0 | 1287.0 | 1226.0 | 1187.0 | 1175.0 | 1188.0 | 1128 | 1039 |
7677 | DE | 79 | Germany | 2961 | Aquatic Products, Other | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 3.0 | 3.0 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 5.0 | 1 | 0 |
7678 | DE | 79 | Germany | 2928 | Miscellaneous | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
287 rows × 63 columns
In [36]:
raw_data.loc[(raw_data['Area'].str.endswith('many')) & (raw_data['Element']=='Feed')]
Out[36]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7532 | DE | 79 | Germany | 2511 | Wheat and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 8006.0 | 11084.0 | 10644.0 | 8993.0 | 10669.0 | 10608.0 | 8189.0 | 9242.0 | 7868 | 7494 |
7534 | DE | 79 | Germany | 2805 | Rice (Milled Equivalent) | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 7.0 | 9.0 | 12.0 | 17.0 | 8.0 | 18.0 | 13.0 | 12.0 | 7 | 7 |
7536 | DE | 79 | Germany | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 7571.0 | 6878.0 | 7845.0 | 6940.0 | 7255.0 | 7592.0 | 7543.0 | 6062.0 | 6316 | 6598 |
7538 | DE | 79 | Germany | 2514 | Maize and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 3356.0 | 3481.0 | 2947.0 | 3752.0 | 4199.0 | 3895.0 | 3572.0 | 4251.0 | 5434 | 6136 |
7540 | DE | 79 | Germany | 2515 | Rye and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 1400.0 | 1300.0 | 1287.0 | 1391.0 | 1858.0 | 2416.0 | 1625.0 | 1360.0 | 2150 | 3318 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7661 | DE | 79 | Germany | 2918 | Vegetables | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 89.0 | 83.0 | 84.0 | 84.0 | 89.0 | 94.0 | 89.0 | 89.0 | 91 | 90 |
7668 | DE | 79 | Germany | 2945 | Offals | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 2.0 | 2.0 | 2.0 | 2.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1 | 1 |
7670 | DE | 79 | Germany | 2946 | Animal fats | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 16.0 | 18.0 | 20.0 | 15.0 | 16.0 | 14.0 | 18.0 | 18.0 | 20 | 17 |
7673 | DE | 79 | Germany | 2948 | Milk - Excluding Butter | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 1813.0 | 1701.0 | 1595.0 | 1633.0 | 2224.0 | 1503.0 | 1468.0 | 1771.0 | 2088 | 2064 |
7675 | DE | 79 | Germany | 2960 | Fish, Seafood | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 11.0 | 11.0 | 32.0 | 51.0 | 28.0 | 25.0 | 29.0 | 25.0 | 6 | 5 |
43 rows × 63 columns
- list특성으로 값을 추출할 수 있는데, & 기호(and 라는 뜻)를 이용해서 값을 추출 할 수 있습니다.
In [37]:
raw_data[(raw_data['Area'].str.endswith('many')) & (raw_data['Element']=='Feed')]
Out[37]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7532 | DE | 79 | Germany | 2511 | Wheat and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 8006.0 | 11084.0 | 10644.0 | 8993.0 | 10669.0 | 10608.0 | 8189.0 | 9242.0 | 7868 | 7494 |
7534 | DE | 79 | Germany | 2805 | Rice (Milled Equivalent) | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 7.0 | 9.0 | 12.0 | 17.0 | 8.0 | 18.0 | 13.0 | 12.0 | 7 | 7 |
7536 | DE | 79 | Germany | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 7571.0 | 6878.0 | 7845.0 | 6940.0 | 7255.0 | 7592.0 | 7543.0 | 6062.0 | 6316 | 6598 |
7538 | DE | 79 | Germany | 2514 | Maize and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 3356.0 | 3481.0 | 2947.0 | 3752.0 | 4199.0 | 3895.0 | 3572.0 | 4251.0 | 5434 | 6136 |
7540 | DE | 79 | Germany | 2515 | Rye and products | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 1400.0 | 1300.0 | 1287.0 | 1391.0 | 1858.0 | 2416.0 | 1625.0 | 1360.0 | 2150 | 3318 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7661 | DE | 79 | Germany | 2918 | Vegetables | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 89.0 | 83.0 | 84.0 | 84.0 | 89.0 | 94.0 | 89.0 | 89.0 | 91 | 90 |
7668 | DE | 79 | Germany | 2945 | Offals | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 2.0 | 2.0 | 2.0 | 2.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1 | 1 |
7670 | DE | 79 | Germany | 2946 | Animal fats | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 16.0 | 18.0 | 20.0 | 15.0 | 16.0 | 14.0 | 18.0 | 18.0 | 20 | 17 |
7673 | DE | 79 | Germany | 2948 | Milk - Excluding Butter | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 1813.0 | 1701.0 | 1595.0 | 1633.0 | 2224.0 | 1503.0 | 1468.0 | 1771.0 | 2088 | 2064 |
7675 | DE | 79 | Germany | 2960 | Fish, Seafood | 5521 | Feed | 1000 tonnes | 51.17 | 10.45 | ... | 11.0 | 11.0 | 32.0 | 51.0 | 28.0 | 25.0 | 29.0 | 25.0 | 6 | 5 |
43 rows × 63 columns
In [38]:
raw_data[(raw_data['Y2004'] < 1000) & (raw_data['Y2004'] > 990)]
Out[38]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3754 | CL | 40 | Chile | 2531 | Potatoes and products | 5142 | Food | 1000 tonnes | -35.68 | -71.54 | ... | 998.0 | 980.0 | 1018.0 | 875.0 | 884.0 | 850.0 | 1003.0 | 1132.0 | 1096 | 1089 |
7630 | DE | 79 | Germany | 2744 | Eggs | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 999.0 | 974.0 | 1011.0 | 997.0 | 1011.0 | 1016.0 | 1032.0 | 1052.0 | 990 | 1010 |
7672 | DE | 79 | Germany | 2949 | Eggs | 5142 | Food | 1000 tonnes | 51.17 | 10.45 | ... | 999.0 | 974.0 | 1011.0 | 997.0 | 1011.0 | 1016.0 | 1032.0 | 1052.0 | 990 | 1010 |
9038 | IN | 100 | India | 2513 | Barley and products | 5142 | Food | 1000 tonnes | 20.59 | 78.96 | ... | 995.0 | 954.0 | 944.0 | 691.0 | 740.0 | 1276.0 | 984.0 | 1217.0 | 1152 | 835 |
9104 | IN | 100 | India | 2641 | Pimento | 5142 | Food | 1000 tonnes | 20.59 | 78.96 | ... | 998.0 | 844.0 | 1012.0 | 1010.0 | 1019.0 | 941.0 | 893.0 | 954.0 | 871 | 1018 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
13436 | MM | 28 | Myanmar | 2948 | Milk - Excluding Butter | 5142 | Food | 1000 tonnes | 21.91 | 95.96 | ... | 992.0 | 1069.0 | 1117.0 | 1253.0 | 1298.0 | 1472.0 | 1581.0 | 1660.0 | 1600 | 1677 |
13669 | NP | 149 | Nepal | 2919 | Fruits - Excluding Wine | 5142 | Food | 1000 tonnes | 28.39 | 84.12 | ... | 993.0 | 1050.0 | 1044.0 | 1098.0 | 1195.0 | 1178.0 | 1301.0 | 1428.0 | 1859 | 1700 |
15525 | PT | 174 | Portugal | 2511 | Wheat and products | 5142 | Food | 1000 tonnes | 39.40 | -8.22 | ... | 991.0 | 982.0 | 995.0 | 1002.0 | 976.0 | 970.0 | 978.0 | 980.0 | 994 | 1000 |
19714 | TM | 213 | Turkmenistan | 2905 | Cereals - Excluding Beer | 5142 | Food | 1000 tonnes | 38.97 | 59.56 | ... | 995.0 | 1009.0 | 1031.0 | 1019.0 | 1017.0 | 1023.0 | 1019.0 | 1039.0 | 1036 | 1059 |
20768 | UZ | 235 | Uzbekistan | 2919 | Fruits - Excluding Wine | 5142 | Food | 1000 tonnes | 41.38 | 64.59 | ... | 993.0 | 1079.0 | 1448.0 | 1538.0 | 1693.0 | 1875.0 | 2108.0 | 2159.0 | 2422 | 2638 |
14 rows × 63 columns
- loc함수의 가장 큰 장점을 이용하여 뽑을 수 있어요! 장점은 열의 명으로 열을 추출하기!
In [39]:
raw_data.loc[(raw_data['Y2004'] < 1000) & (raw_data['Y2004'] > 990), ['Area', 'Item', 'latitude']]
Out[39]:
Area | Item | latitude | |
---|---|---|---|
3754 | Chile | Potatoes and products | -35.68 |
7630 | Germany | Eggs | 51.17 |
7672 | Germany | Eggs | 51.17 |
9038 | India | Barley and products | 20.59 |
9104 | India | Pimento | 20.59 |
... | ... | ... | ... |
13436 | Myanmar | Milk - Excluding Butter | 21.91 |
13669 | Nepal | Fruits - Excluding Wine | 28.39 |
15525 | Portugal | Wheat and products | 39.40 |
19714 | Turkmenistan | Cereals - Excluding Beer | 38.97 |
20768 | Uzbekistan | Fruits - Excluding Wine | 41.38 |
14 rows × 3 columns
- drop함수를 이용해서 열 삭제하기. 꼭 axis=1를 써야 열이 삭제가 되어요
In [40]:
raw_data.drop('Area',axis=1)
Out[40]:
Area Abbreviation | Area Code | Item Code | Item | Element Code | Element | Unit | latitude | longitude | Y1961 | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AF | 2 | 2511 | Wheat and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | 1928.0 | ... | 3249.0 | 3486.0 | 3704.0 | 4164.0 | 4252.0 | 4538.0 | 4605.0 | 4711.0 | 4810 | 4895 |
1 | AF | 2 | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | 183.0 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
2 | AF | 2 | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | 76.0 | ... | 58.0 | 236.0 | 262.0 | 263.0 | 230.0 | 379.0 | 315.0 | 203.0 | 367 | 360 |
3 | AF | 2 | 2513 | Barley and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | 237.0 | ... | 185.0 | 43.0 | 44.0 | 48.0 | 62.0 | 55.0 | 60.0 | 72.0 | 78 | 89 |
4 | AF | 2 | 2514 | Maize and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | 210.0 | ... | 120.0 | 208.0 | 233.0 | 249.0 | 247.0 | 195.0 | 178.0 | 191.0 | 200 | 200 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
21472 | ZW | 181 | 2948 | Milk - Excluding Butter | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | 230.0 | ... | 373.0 | 357.0 | 359.0 | 356.0 | 341.0 | 385.0 | 418.0 | 457.0 | 426 | 451 |
21473 | ZW | 181 | 2960 | Fish, Seafood | 5521 | Feed | 1000 tonnes | -19.02 | 29.15 | 27.0 | ... | 5.0 | 4.0 | 9.0 | 6.0 | 9.0 | 5.0 | 15.0 | 15.0 | 15 | 15 |
21474 | ZW | 181 | 2960 | Fish, Seafood | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | 6.0 | ... | 18.0 | 14.0 | 17.0 | 14.0 | 15.0 | 18.0 | 29.0 | 40.0 | 40 | 40 |
21475 | ZW | 181 | 2961 | Aquatic Products, Other | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21476 | ZW | 181 | 2928 | Miscellaneous | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21477 rows × 62 columns
- drop함수 내의 columns=를 지정해주면, axis=1을 안써도 됩니다.
In [41]:
raw_data.drop(columns="Area")
Out[41]:
Area Abbreviation | Area Code | Item Code | Item | Element Code | Element | Unit | latitude | longitude | Y1961 | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AF | 2 | 2511 | Wheat and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | 1928.0 | ... | 3249.0 | 3486.0 | 3704.0 | 4164.0 | 4252.0 | 4538.0 | 4605.0 | 4711.0 | 4810 | 4895 |
1 | AF | 2 | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | 183.0 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
2 | AF | 2 | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | 76.0 | ... | 58.0 | 236.0 | 262.0 | 263.0 | 230.0 | 379.0 | 315.0 | 203.0 | 367 | 360 |
3 | AF | 2 | 2513 | Barley and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | 237.0 | ... | 185.0 | 43.0 | 44.0 | 48.0 | 62.0 | 55.0 | 60.0 | 72.0 | 78 | 89 |
4 | AF | 2 | 2514 | Maize and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | 210.0 | ... | 120.0 | 208.0 | 233.0 | 249.0 | 247.0 | 195.0 | 178.0 | 191.0 | 200 | 200 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
21472 | ZW | 181 | 2948 | Milk - Excluding Butter | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | 230.0 | ... | 373.0 | 357.0 | 359.0 | 356.0 | 341.0 | 385.0 | 418.0 | 457.0 | 426 | 451 |
21473 | ZW | 181 | 2960 | Fish, Seafood | 5521 | Feed | 1000 tonnes | -19.02 | 29.15 | 27.0 | ... | 5.0 | 4.0 | 9.0 | 6.0 | 9.0 | 5.0 | 15.0 | 15.0 | 15 | 15 |
21474 | ZW | 181 | 2960 | Fish, Seafood | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | 6.0 | ... | 18.0 | 14.0 | 17.0 | 14.0 | 15.0 | 18.0 | 29.0 | 40.0 | 40 | 40 |
21475 | ZW | 181 | 2961 | Aquatic Products, Other | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21476 | ZW | 181 | 2928 | Miscellaneous | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21477 rows × 62 columns
- 여러 열을 삭제하기
In [42]:
raw_data.drop(['Y2011','Y2012','Y2013'], axis=1)
Out[42]:
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2001 | Y2002 | Y2003 | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AF | 2 | Afghanistan | 2511 | Wheat and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 2668.0 | 2776.0 | 3095.0 | 3249.0 | 3486.0 | 3704.0 | 4164.0 | 4252.0 | 4538.0 | 4605.0 |
1 | AF | 2 | Afghanistan | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 411.0 | 448.0 | 460.0 | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 |
2 | AF | 2 | Afghanistan | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 29.0 | 70.0 | 48.0 | 58.0 | 236.0 | 262.0 | 263.0 | 230.0 | 379.0 | 315.0 |
3 | AF | 2 | Afghanistan | 2513 | Barley and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 83.0 | 122.0 | 144.0 | 185.0 | 43.0 | 44.0 | 48.0 | 62.0 | 55.0 | 60.0 |
4 | AF | 2 | Afghanistan | 2514 | Maize and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 48.0 | 89.0 | 63.0 | 120.0 | 208.0 | 233.0 | 249.0 | 247.0 | 195.0 | 178.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
21472 | ZW | 181 | Zimbabwe | 2948 | Milk - Excluding Butter | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 439.0 | 360.0 | 386.0 | 373.0 | 357.0 | 359.0 | 356.0 | 341.0 | 385.0 | 418.0 |
21473 | ZW | 181 | Zimbabwe | 2960 | Fish, Seafood | 5521 | Feed | 1000 tonnes | -19.02 | 29.15 | ... | 5.0 | 1.0 | 0.0 | 5.0 | 4.0 | 9.0 | 6.0 | 9.0 | 5.0 | 15.0 |
21474 | ZW | 181 | Zimbabwe | 2960 | Fish, Seafood | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 18.0 | 16.0 | 14.0 | 18.0 | 14.0 | 17.0 | 14.0 | 15.0 | 18.0 | 29.0 |
21475 | ZW | 181 | Zimbabwe | 2961 | Aquatic Products, Other | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
21476 | ZW | 181 | Zimbabwe | 2928 | Miscellaneous | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
21477 rows × 60 columns
- unique()함수는 행의 값중에서 1개씩만 추출해주는 함수에요
In [43]:
raw_data['Item'].unique()
Out[43]:
array(['Wheat and products', 'Rice (Milled Equivalent)', 'Barley and products', 'Maize and products', 'Millet and products', 'Cereals, Other', 'Potatoes and products', 'Sugar cane', 'Sugar beet', 'Sugar (Raw Equivalent)', 'Sweeteners, Other', 'Honey', 'Pulses, Other and products', 'Nuts and products', 'Coconuts - Incl Copra', 'Sesame seed', 'Olives (including preserved)', 'Soyabean Oil', 'Groundnut Oil', 'Sunflowerseed Oil', 'Rape and Mustard Oil', 'Cottonseed Oil', 'Palm Oil', 'Sesameseed Oil', 'Olive Oil', 'Oilcrops Oil, Other', 'Tomatoes and products', 'Vegetables, Other', 'Oranges, Mandarines', 'Citrus, Other', 'Bananas', 'Apples and products', 'Pineapples and products', 'Dates', 'Grapes and products (excl wine)', 'Fruits, Other', 'Coffee and products', 'Cocoa Beans and products', 'Tea (including mate)', 'Pepper', 'Spices, Other', 'Wine', 'Beer', 'Beverages, Alcoholic', 'Bovine Meat', 'Mutton & Goat Meat', 'Poultry Meat', 'Meat, Other', 'Offals, Edible', 'Butter, Ghee', 'Cream', 'Fats, Animals, Raw', 'Eggs', 'Milk - Excluding Butter', 'Freshwater Fish', 'Infant food', 'Cereals - Excluding Beer', 'Starchy Roots', 'Sugar Crops', 'Sugar & Sweeteners', 'Pulses', 'Treenuts', 'Oilcrops', 'Vegetable Oils', 'Vegetables', 'Fruits - Excluding Wine', 'Stimulants', 'Spices', 'Alcoholic Beverages', 'Meat', 'Offals', 'Animal fats', 'Fish, Seafood', 'Miscellaneous', 'Rye and products', 'Oats', 'Sorghum and products', 'Cassava and products', 'Sweet potatoes', 'Roots, Other', 'Beans', 'Peas', 'Soyabeans', 'Groundnuts (Shelled Eq)', 'Rape and Mustardseed', 'Oilcrops, Other', 'Maize Germ Oil', 'Onions', 'Lemons, Limes and products', 'Grapefruit and products', 'Plantains', 'Pimento', 'Beverages, Fermented', 'Pigmeat', 'Fish, Body Oil', 'Fish, Liver Oil', 'Demersal Fish', 'Pelagic Fish', 'Marine Fish, Other', 'Crustaceans', 'Cephalopods', 'Molluscs, Other', 'Aquatic Animals, Others', 'Aquatic Plants', 'Aquatic Products, Other', 'Palm kernels', 'Coconut Oil', 'Cloves', 'Palmkernel Oil', 'Yams', 'Sunflower seed', 'Cottonseed', 'Sugar non-centrifugal', 'Ricebran Oil', 'Meat, Aquatic Mammals'], dtype=object)
- value_counts()는 유일한 값 중에서 몇개 있는지 값을 세어주는 함수에요. 유용합니다.
In [44]:
raw_data['Item'].value_counts()
Out[44]:
Milk - Excluding Butter 558 Eggs 360 Cereals - Excluding Beer 347 Fish, Seafood 337 Maize and products 333 ... Sugar non-centrifugal 32 Palm kernels 24 Cottonseed 21 Ricebran Oil 18 Meat, Aquatic Mammals 3 Name: Item, Length: 115, dtype: int64
- sort_values()함수로 오름차순으로 추출해줄 수 있어요
In [74]:
raw_data['Item'].value_counts().sort_values()
Out[74]:
Meat, Aquatic Mammals 3 Ricebran Oil 18 Cottonseed 21 Palm kernels 24 Sugar non-centrifugal 32 ... Maize and products 333 Fish, Seafood 337 Cereals - Excluding Beer 347 Eggs 360 Milk - Excluding Butter 558 Name: Item, Length: 115, dtype: int64
In [73]:
raw_data['Item'].sort_values()
Out[73]:
16297 Alcoholic Beverages 7390 Alcoholic Beverages 320 Alcoholic Beverages 14592 Alcoholic Beverages 16818 Alcoholic Beverages ... 5796 Yams 9825 Yams 20284 Yams 13703 Yams 15545 Yams Name: Item, Length: 21477, dtype: object
- 정말 유용한 groupby함수를 간단히 맛보기로 보자면,
- groupby(['Area'])는 Area값별로 각 열의 카운트(count())를 세어주는 거에요
- groupby는 단독으로 사용할수 없고, count(), sum(), max() 등과 같은 연산함수와 사용해야합니다.
In [52]:
raw_data.groupby(['Area']).count()
Out[52]:
Area Abbreviation | Area Code | Item Code | Item | Element Code | Element | Unit | latitude | longitude | Y1961 | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Area | |||||||||||||||||||||
Afghanistan | 83 | 83 | 83 | 83 | 83 | 83 | 83 | 83 | 83 | 83 | ... | 83 | 83 | 83 | 83 | 83 | 83 | 83 | 83 | 83 | 83 |
Albania | 123 | 123 | 123 | 123 | 123 | 123 | 123 | 123 | 123 | 123 | ... | 123 | 123 | 123 | 123 | 123 | 123 | 123 | 123 | 123 | 123 |
Algeria | 124 | 124 | 124 | 124 | 124 | 124 | 124 | 124 | 124 | 124 | ... | 124 | 124 | 124 | 124 | 124 | 124 | 124 | 124 | 124 | 124 |
Angola | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | ... | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
Antigua and Barbuda | 117 | 117 | 117 | 117 | 117 | 117 | 117 | 117 | 117 | 117 | ... | 117 | 117 | 117 | 117 | 117 | 117 | 117 | 117 | 117 | 117 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Venezuela (Bolivarian Republic of) | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | ... | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 | 130 |
Viet Nam | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | ... | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 | 109 |
Yemen | 119 | 119 | 119 | 119 | 119 | 119 | 119 | 119 | 119 | 119 | ... | 119 | 119 | 119 | 119 | 119 | 119 | 119 | 119 | 119 | 119 |
Zambia | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | ... | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 |
Zimbabwe | 121 | 121 | 121 | 121 | 121 | 121 | 121 | 121 | 121 | 121 | ... | 121 | 121 | 121 | 121 | 121 | 121 | 121 | 121 | 121 | 121 |
174 rows × 62 columns
In [58]:
raw_data.groupby(['Area Code']).sum()
Out[58]:
Element Code | latitude | longitude | Y1961 | Y1962 | Y1963 | Y1964 | Y1965 | Y1966 | Y1967 | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Area Code | |||||||||||||||||||||
1 | 697151 | 5329.31 | 5990.32 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 5515.0 | 5992.0 | 6228.0 | 6753.0 | 6958.0 | 6824.0 | 5908.0 | 6423.0 | 6978 | 7175 |
2 | 430576 | 2817.02 | 5619.93 | 9481.0 | 9414.0 | 9194.0 | 10170.0 | 10473.0 | 10169.0 | 11289.0 | ... | 16542.0 | 17658.0 | 18317.0 | 19248.0 | 19381.0 | 20661.0 | 21030.0 | 21100.0 | 22706 | 23007 |
3 | 641941 | 5061.45 | 2480.91 | 1706.0 | 1749.0 | 1767.0 | 1889.0 | 1884.0 | 1995.0 | 2046.0 | ... | 6637.0 | 6719.0 | 6911.0 | 6744.0 | 7168.0 | 7316.0 | 7907.0 | 8114.0 | 8221 | 8271 |
4 | 645946 | 3475.72 | 205.84 | 7488.0 | 7235.0 | 6861.0 | 7255.0 | 7509.0 | 7536.0 | 7986.0 | ... | 48619.0 | 49562.0 | 51067.0 | 49933.0 | 50916.0 | 57505.0 | 60071.0 | 65852.0 | 69365 | 72161 |
7 | 566163 | -1220.80 | 1947.83 | 4834.0 | 4775.0 | 5240.0 | 5286.0 | 5527.0 | 5677.0 | 5833.0 | ... | 25541.0 | 26696.0 | 28247.0 | 29877.0 | 32053.0 | 36985.0 | 38400.0 | 40573.0 | 38064 | 48639 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | 712577 | 6868.00 | 607.92 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 35273.0 | 32353.0 | 32234.0 | 35016.0 | 34902.0 | 34964.0 | 32708.0 | 32196.0 | 31326 | 33336 |
256 | 665162 | 6327.14 | 778.51 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 1308.0 | 1290.0 | 1295.0 | 1300.0 | 1377.0 | 1345.0 | 1375.0 | 1432.0 | 1501 | 1511 |
272 | 664025 | 5590.54 | 2668.27 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 25423.0 | 22473.0 | 26902.0 | 25401.0 | 25350.0 | 24799.0 | 16587 | 25399 |
273 | 614715 | 5039.78 | 2285.66 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 1605.0 | 1659.0 | 1860.0 | 1891.0 | 1810.0 | 1887.0 | 1786 | 1848 |
276 | 538558 | 1337.44 | 3142.88 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 42543 | 43661 |
174 rows × 56 columns
- groupby함수를 사용해서, group화 하면서 내가 원하는 행만 추출할 수 있어요
In [61]:
raw_data.groupby(['Area Code'])['Y1961'].count()
Out[61]:
Area Code 1 0 2 83 3 123 4 124 7 109 ... 255 0 256 0 272 0 273 0 276 0 Name: Y1961, Length: 174, dtype: int64
- groupby함수를 사용해서, group화 하면서 내가 원하는 행만 추출할 수 있어요. 예쁘게 할려면 [[]] 2개의 괄호를 써야되는거 위에서도 보셧죠?
In [62]:
raw_data.groupby(['Area Code'])[['Y1961']].count()
Out[62]:
Y1961 | |
---|---|
Area Code | |
1 | 0 |
2 | 83 |
3 | 123 |
4 | 124 |
7 | 109 |
... | ... |
255 | 0 |
256 | 0 |
272 | 0 |
273 | 0 |
276 | 0 |
174 rows × 1 columns
In [64]:
raw_data.groupby(['Area Code','Y1961']).count()
Out[64]:
Area Abbreviation | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | Y1962 | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Area Code | Y1961 | |||||||||||||||||||||
2 | 0.0 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | ... | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 |
1.0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | ... | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
2.0 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | ... | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | |
3.0 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | ... | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | |
4.0 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | ... | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
251 | 165.0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
358.0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
373.0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
499.0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
578.0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
5681 rows × 61 columns
In [69]:
raw_data.groupby(['Item'])['Y2010'].sum()
Out[69]:
Item Alcoholic Beverages 247414.0 Animal fats 24267.0 Apples and products 62436.0 Aquatic Animals, Others 1340.0 Aquatic Plants 12767.0 ... Vegetables 950541.0 Vegetables, Other 745438.0 Wheat and products 549926.0 Wine 22297.0 Yams 35898.0 Name: Y2010, Length: 115, dtype: float64
In [72]:
raw_data['Item'].isin(raw_data['Item']).value_counts()
Out[72]:
True 21477 Name: Item, dtype: int64
In [ ]:
'빅데이터 잡아라 > ADP' 카테고리의 다른 글
[데이터에듀]ADP실기 데이터 분석 전문가 모의고사 1회 1번 파이썬 코드 (9) | 2021.06.03 |
---|---|
[데이터에듀]ADP실기 데이터 분석 전문가 모의고사 2회 1번 파이썬 코드 (0) | 2021.02.20 |
[데이터에듀]ADP실기 데이터 분석 전문가 모의고사 1회 2번 파이썬 코드 (8) | 2021.02.17 |
18회 ADP 데이터전문가 실기 후기(시험문제, 환경, 꿀팁 등) (2) | 2020.10.08 |
14회 ADP 데이터전문가 필기 합격 후기 (feat.벼락치기, 공부방법 등) (51) | 2020.04.13 |