Donghua's Blog - DBAGlobe: April 2018

Monday, April 30, 2018

Change elastic password using curl

Donghuas-MacBook-Air:bin donghua$ curl -XPUT -u elastic 'localhost:9200/_xpack/security/user/elastic/_password' -d '{

"password" : "elastic"

Enter host password for user 'elastic':

{"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406}

Donghuas-MacBook-Air:bin donghua$ curl -XPUT -H 'Content-Type: application/json' -u elastic 'localhost:9200/_xpack/security/user/elastic/_password' -d '{

"password" : "elastic"

Enter host password for user 'elastic':

{}

Donghuas-MacBook-Air:bin donghua$ curl -XGET -u elastic:elastic localhost:9200/?pretty

{

"name" : "0rArjNg",

"cluster_name" : "elasticsearch",

"cluster_uuid" : "dZYuMdDkQICV_oS2-nvmpQ",

"version" : {

"number" : "6.2.4",

"build_hash" : "ccec39f",

"build_date" : "2018-04-12T20:37:28.497551Z",

"build_snapshot" : false,

"lucene_version" : "7.2.1",

"minimum_wire_compatibility_version" : "5.6.0",

"minimum_index_compatibility_version" : "5.0.0"

"tagline" : "You Know, for Search"

}

Wednesday, April 25, 2018

Use pandas to read from and write back into Hadoop (Impala) tables

[root@cdh-vm ~]# pip install impyla
Collecting impyla
Downloading https://files.pythonhosted.org/packages/6f/96/92f933cd216f9ff5d7f4ba7e0615a51ad4e3beb31a7de60f7df365378bb9/impyla-0.14.1-py2-none-any.whl (165kB)
100% |████████████████████████████████| 174kB 464kB/s
Collecting bitarray (from impyla)
Downloading https://files.pythonhosted.org/packages/0a/da/9f61d28a20c42b4963334efacfd257c85150ede96d0cd2509b37da69da47/bitarray-0.8.1.tar.gz (46kB)
100% |████████████████████████████████| 51kB 5.1MB/s
Collecting thrift<=0.9.3 (from impyla)
Downloading https://files.pythonhosted.org/packages/ae/58/35e3f0cd290039ff862c2c9d8ae8a76896665d70343d833bdc2f748b8e55/thrift-0.9.3.tar.gz
Requirement already satisfied: six in /usr/lib/python2.7/site-packages (from impyla) (1.11.0)
Installing collected packages: bitarray, thrift, impyla
Running setup.py install for bitarray ... done
Running setup.py install for thrift ... done
Successfully installed bitarray-0.8.1 impyla-0.14.1 thrift-0.9.3

[root@cdh-vm ~]# pip install sqlalchemy
Collecting sqlalchemy
Downloading https://files.pythonhosted.org/packages/c1/c8/392fcd2d01534bc871c65cb964e0b39d59feb777e51649e6eaf00f6377b5/SQLAlchemy-1.2.7.tar.gz (5.6MB)
100% |████████████████████████████████| 5.6MB 721kB/s
Installing collected packages: sqlalchemy
Running setup.py install for sqlalchemy

[cdh-vm.dbaglobe.com:21000] > create table quarters(salesman string,q1 int,q2 int,q3 int,q4 int) row format delimited fields terminated by ',' tblproperties('skip.header.line.count'='1');
Query: create table quarters(salesman string,q1 int,q2 int,q3 int,q4 int) row format delimited fields terminated by ','
Fetched 0 row(s) in 0.50s
[cdh-vm.dbaglobe.com:21000] > load data inpath '/data/quarters.csv' overwrite into table quarters;
Query: load data inpath '/data/quarters.csv' overwrite into table quarters
+----------------------------------------------------------+
| summary |
+----------------------------------------------------------+
| Loaded 1 file(s). Total files in destination location: 1 |
+----------------------------------------------------------+
Fetched 1 row(s) in 2.89s

[cdh-vm.dbaglobe.com:21000] > select * from quarters;
Query: select * from quarters
Query submitted at: 2018-04-25 21:22:18 (Coordinator: http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=77405085e748686d:b3aebdc200000000
+----------+--------+--------+--------+--------+
| salesman | q1 | q2 | q3 | q4 |
+----------+--------+--------+--------+--------+
| Boris | 602908 | 233879 | 354479 | 32704 |
| Bob | 43790 | 514863 | 297151 | 544493 |
| Tommy | 392668 | 113579 | 430882 | 247231 |
| Travis | 834663 | 266785 | 749238 | 570524 |
| Donald | 580935 | 411379 | 110390 | 651572 |
| Ted | 656644 | 70803 | 375948 | 321388 |
| Jeb | 486141 | 600753 | 742716 | 404995 |
| Stacy | 479662 | 742806 | 770712 | 2501 |
| Morgan | 992673 | 879183 | 37945 | 293710 |
+----------+--------+--------+--------+--------+

[donghua@cdh-vm pandas]$ ipython
Python 2.7.5 (default, Aug 4 2017, 00:39:18)
Type "copyright", "credits" or "license" for more information.

IPython 5.5.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.

In [1]: from impala.dbapi import connect

In [2]: conn = connect(host='cdh-vm',port=21050,database='test')

In [3]: cur = conn.cursor()

In [4]: cur.execute('show tables')

In [5]: cur.fetchall()
Out[5]:
[('byname_kudu',),
('getrelationinfobyname',),
('getrelationinfobyname_kudu',),
('quarters',),
('t1',),
('t_timestamp',),
('tsdemo',),
('tsstr',),
('vehicles',)]

In [6]: cur.execute('select * from quarters limit 5')

In [8]: cur.fetchall()
Out[8]:
[('Boris', 602908, 233879, 354479, 32704),
('Bob', 43790, 514863, 297151, 544493),
('Tommy', 392668, 113579, 430882, 247231),
('Travis', 834663, 266785, 749238, 570524),
('Donald', 580935, 411379, 110390, 651572)]

In [9]: from impala.util import as_pandas

In [10]: cur.execute('select * from quarters')

In [11]: df = as_pandas(cur)

In [12]: type(df)
Out[12]: pandas.core.frame.DataFrame

In [13]: df
Out[13]:
salesman q1 q2 q3 q4
0 Boris 602908 233879 354479 32704
1 Bob 43790 514863 297151 544493
2 Tommy 392668 113579 430882 247231
3 Travis 834663 266785 749238 570524
4 Donald 580935 411379 110390 651572
5 Ted 656644 70803 375948 321388
6 Jeb 486141 600753 742716 404995
7 Stacy 479662 742806 770712 2501
8 Morgan 992673 879183 37945 293710

In [16]: df2 = df.melt(id_vars='salesman')

In [17]: df2
Out[17]:
salesman variable value
0 Boris q1 602908
1 Bob q1 43790
2 Tommy q1 392668
3 Travis q1 834663
4 Donald q1 580935
5 Ted q1 656644
6 Jeb q1 486141
7 Stacy q1 479662
8 Morgan q1 992673
9 Boris q2 233879
10 Bob q2 514863
11 Tommy q2 113579
12 Travis q2 266785
13 Donald q2 411379
14 Ted q2 70803
15 Jeb q2 600753
16 Stacy q2 742806
17 Morgan q2 879183
18 Boris q3 354479
19 Bob q3 297151
20 Tommy q3 430882
21 Travis q3 749238
22 Donald q3 110390
23 Ted q3 375948
24 Jeb q3 742716
25 Stacy q3 770712
26 Morgan q3 37945
27 Boris q4 32704
28 Bob q4 544493
29 Tommy q4 247231
30 Travis q4 570524
31 Donald q4 651572
32 Ted q4 321388
33 Jeb q4 404995
34 Stacy q4 2501
35 Morgan q4 293710

In [11]: import sqlalchemy
In [12]: import impala.sqlalchemy as i
In [13]: engine=sqlalchemy.create_engine('impala://cdh-vm:21050/test')
In [26]: df2.to_sql(name='quarters_melt',con=engine,index=False,dtype={'salesman':i.STRING,'variable':i.STRING,'value':i.INT})

cdh-vm.dbaglobe.com:21000] > desc quarters_melt;
Query: describe quarters_melt
+----------+--------+---------+
| name | type | comment |
+----------+--------+---------+
| salesman | string | |
| variable | string | |
| value | int | |
+----------+--------+---------+
Fetched 3 row(s) in 0.03s
[cdh-vm.dbaglobe.com:21000] > show create table quarters_melt;
Query: show create table quarters_melt
+--------------------------------------------------------------------------------------+
| result |
+--------------------------------------------------------------------------------------+
| CREATE TABLE test.quarters_melt ( |
| salesman STRING, |
| variable STRING, |
| value INT |
| ) |
| STORED AS TEXTFILE |
| LOCATION 'hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/test.db/quarters_melt' |
| |
+--------------------------------------------------------------------------------------+
Fetched 1 row(s) in 0.00s
[cdh-vm.dbaglobe.com:21000] >

[cdh-vm.dbaglobe.com:21000] > select * from quarters_melt;
Query: select * from quarters_melt
Query submitted at: 2018-04-25 22:19:06 (Coordinator: http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=c14ae16dfcb301bb:c31d20a200000000
+----------+----------+--------+
| salesman | variable | value |
+----------+----------+--------+
| Donald | q1 | 580935 |
| Jeb | q3 | 742716 |
| Ted | q3 | 375948 |
| Bob | q4 | 544493 |
| Donald | q2 | 411379 |
| Morgan | q2 | 879183 |
| Boris | q4 | 32704 |
| Boris | q2 | 233879 |
| Tommy | q1 | 392668 |
| Jeb | q4 | 404995 |
| Boris | q3 | 354479 |
| Bob | q3 | 297151 |
| Morgan | q3 | 37945 |
| Travis | q2 | 266785 |
| Travis | q4 | 570524 |
| Ted | q2 | 70803 |
| Bob | q1 | 43790 |
| Tommy | q3 | 430882 |
| Stacy | q1 | 479662 |
| Bob | q2 | 514863 |
| Stacy | q4 | 2501 |
| Travis | q3 | 749238 |
| Travis | q1 | 834663 |
| Tommy | q2 | 113579 |
| Jeb | q2 | 600753 |
| Tommy | q4 | 247231 |
| Stacy | q2 | 742806 |
| Donald | q4 | 651572 |
| Morgan | q4 | 293710 |
| Stacy | q3 | 770712 |
| Morgan | q1 | 992673 |
| Jeb | q1 | 486141 |
| Donald | q3 | 110390 |
| Ted | q1 | 656644 |
| Boris | q1 | 602908 |
| Ted | q4 | 321388 |
+----------+----------+--------+
Fetched 36 row(s) in 0.25s

Sunday, April 22, 2018

Merging and Joining in Pandas

import pandas as pd
week1 = pd.read_csv('Restaurant - Week 1 Sales.csv')

len(week1),len(week2)
pd.concat([week1,week2],ignore_index=True)
sales = pd.concat([week1,week2],keys=['Week1','Week2'])
sales.ix[('Week2',240),'Customer ID']
sales = week2.append(week1,ignore_index=True)
sales.info()
week1.merge(week2,how='inner', on='Customer ID',suffixes=['_wk1','_wk2'])

week1.merge(week2,how='inner', on=['Customer ID','Food ID'],suffixes=['_wk1','_wk2'])
merged = week1.merge(week2,how='outer', on='Customer ID',suffixes=['_wk1','_wk2'],indicator=True)
merged['_merge'].value_counts()
mask = merged['_merge'].isin(['left_only','right_only'])
merged[mask]

week1.join(week1_survey,how='inner', on='OrderID')

Melt in Pandas

In [14]: sales=pd.read_csv('pandas/pandas/quarters.csv')

In [15]: sales
Out[15]:
Salesman Q1 Q2 Q3 Q4
0 Boris 602908 233879 354479 32704
1 Bob 43790 514863 297151 544493
2 Tommy 392668 113579 430882 247231
3 Travis 834663 266785 749238 570524
4 Donald 580935 411379 110390 651572
5 Ted 656644 70803 375948 321388
6 Jeb 486141 600753 742716 404995
7 Stacy 479662 742806 770712 2501
8 Morgan 992673 879183 37945 293710

In [17]: pd.melt(sales,id_vars='Salesman')
Out[17]:
Salesman variable value
0 Boris Q1 602908
1 Bob Q1 43790
2 Tommy Q1 392668
3 Travis Q1 834663
4 Donald Q1 580935
5 Ted Q1 656644
6 Jeb Q1 486141
7 Stacy Q1 479662
8 Morgan Q1 992673
9 Boris Q2 233879
10 Bob Q2 514863
11 Tommy Q2 113579
12 Travis Q2 266785
13 Donald Q2 411379
14 Ted Q2 70803
15 Jeb Q2 600753
16 Stacy Q2 742806
17 Morgan Q2 879183
18 Boris Q3 354479
19 Bob Q3 297151
20 Tommy Q3 430882
21 Travis Q3 749238
22 Donald Q3 110390
23 Ted Q3 375948
24 Jeb Q3 742716
25 Stacy Q3 770712
26 Morgan Q3 37945
27 Boris Q4 32704
28 Bob Q4 544493
29 Tommy Q4 247231
30 Travis Q4 570524
31 Donald Q4 651572
32 Ted Q4 321388
33 Jeb Q4 404995
34 Stacy Q4 2501
35 Morgan Q4 293710

In [21]: pd.melt(sales,id_vars='Salesman',var_name='Quarter',value_name='Revenue')
Out[21]:
Salesman Quarter Revenue
0 Boris Q1 602908
1 Bob Q1 43790
2 Tommy Q1 392668
3 Travis Q1 834663
4 Donald Q1 580935
5 Ted Q1 656644
6 Jeb Q1 486141
7 Stacy Q1 479662
8 Morgan Q1 992673
9 Boris Q2 233879
10 Bob Q2 514863
11 Tommy Q2 113579
12 Travis Q2 266785
13 Donald Q2 411379
14 Ted Q2 70803
15 Jeb Q2 600753
16 Stacy Q2 742806
17 Morgan Q2 879183
18 Boris Q3 354479
19 Bob Q3 297151
20 Tommy Q3 430882
21 Travis Q3 749238
22 Donald Q3 110390
23 Ted Q3 375948
24 Jeb Q3 742716
25 Stacy Q3 770712
26 Morgan Q3 37945
27 Boris Q4 32704
28 Bob Q4 544493
29 Tommy Q4 247231
30 Travis Q4 570524
31 Donald Q4 651572
32 Ted Q4 321388
33 Jeb Q4 404995
34 Stacy Q4 2501
35 Morgan Q4 293710

In [24]: sales.set_index('Salesman').stack().to_frame()
Out[24]:
0
Salesman
Boris Q1 602908
Q2 233879
Q3 354479
Q4 32704
Bob Q1 43790
Q2 514863
Q3 297151
Q4 544493
Tommy Q1 392668
Q2 113579
Q3 430882
Q4 247231
Travis Q1 834663
Q2 266785
Q3 749238
Q4 570524
Donald Q1 580935
Q2 411379
Q3 110390
Q4 651572
Ted Q1 656644
Q2 70803
Q3 375948
Q4 321388
Jeb Q1 486141
Q2 600753
Q3 742716
Q4 404995
Stacy Q1 479662
Q2 742806
Q3 770712
Q4 2501
Morgan Q1 992673
Q2 879183
Q3 37945

Q4 293710

Pivot_table in pandas

In [8]: foods = pd.read_csv('Pandas/pandas/foods.csv')

In [9]: foods.head(4)
Out[9]:
First Name Gender City Frequency Item Spend
0 Wanda Female Stamford Weekly Burger 15.66
1 Eric Male Stamford Daily Chalupa 10.56
2 Charles Male New York Never Sushi 42.14
3 Anna Female Philadelphia Once Ice Cream 11.01

In [10]: foods.pivot_table(values = 'Spend',index='Gender',aggfunc='mean')
Out[10]:
Spend
Gender
Female 50.709629
Male 49.397623

In [11]: foods.pivot_table(values = 'Spend',index=['Gender','Item'],aggfunc='sum')
Out[11]:
Spend
Gender Item
Female Burger 4094.30
Burrito 4257.82
Chalupa 4152.26
Donut 4743.00
Ice Cream 4032.87
Sushi 4683.08
Male Burger 3671.43
Burrito 4012.62
Chalupa 3492.26
Donut 4015.76
Ice Cream 4854.12
Sushi 4059.85

In [12]: foods.pivot_table(values = 'Spend',index=['Gender','Item'],columns='City', aggfunc='sum')
Out[12]:
City New York Philadelphia Stamford
Gender Item
Female Burger 1239.04 1639.24 1216.02
Burrito 978.95 1458.76 1820.11
Chalupa 876.58 1673.33 1602.35
Donut 1446.78 1639.26 1656.96
Ice Cream 1521.62 1479.22 1032.03
Sushi 1480.29 1742.88 1459.91
Male Burger 1294.09 938.18 1439.16
Burrito 1399.40 1312.93 1300.29
Chalupa 1227.77 1114.23 1150.26
Donut 1345.27 1249.36 1421.13
Ice Cream 1603.63 2191.27 1059.22
Sushi 1396.15 1395.88 1267.82

In [13]: pd.pivot_table(data=foods,values = 'Spend',index=['Gender','Item'],columns='City', aggfunc='sum')
Out[13]:
City New York Philadelphia Stamford
Gender Item
Female Burger 1239.04 1639.24 1216.02
Burrito 978.95 1458.76 1820.11
Chalupa 876.58 1673.33 1602.35
Donut 1446.78 1639.26 1656.96
Ice Cream 1521.62 1479.22 1032.03
Sushi 1480.29 1742.88 1459.91
Male Burger 1294.09 938.18 1439.16
Burrito 1399.40 1312.93 1300.29
Chalupa 1227.77 1114.23 1150.26
Donut 1345.27 1249.36 1421.13
Ice Cream 1603.63 2191.27 1059.22
Sushi 1396.15 1395.88 1267.82

Pivot in Pandas

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: salesman = pd.read_csv('Pandas/pandas/salesmen.csv', parse_dates=['Date'])

In [4]: salesman.head(5)
Out[4]:
Date Salesman Revenue
0 2016-01-01 Bob 7172
1 2016-01-02 Bob 6362
2 2016-01-03 Bob 5982
3 2016-01-04 Bob 7917
4 2016-01-05 Bob 7837

In [7]: salesman['Salesman'].value_counts()
Out[7]:
Ronald 366
Bob 366
Dave 366
Oscar 366
Jeb 366
Name: Salesman, dtype: int64

In [6]: salesman.pivot(index='Date',columns='Salesman',values='Revenue').head(5)
Out[6]:
Salesman Bob Dave Jeb Oscar Ronald
Date
2016-01-01 7172 1864 4430 5250 2639
2016-01-02 6362 8278 8026 8661 4951
2016-01-03 5982 4226 5188 7075 2703
2016-01-04 7917 3868 3144 2524 4258
2016-01-05 7837 2287 938 2793 7771

Saturday, April 21, 2018

stack/unstack in Pandas with example

In [1]: import pandas as pd

In [2]: import numpy as np

In [14]: df = pd.DataFrame({'Year':[2010,2010,2011,2011,2012,2012],'Class':['2A','2B','2C','2A','2B','2C']})

In [16]: df
Out[16]:
Class Year
0 2A 2010
1 2B 2010
2 2C 2011
3 2A 2011
4 2B 2012
5 2C 2012

In [19]: df['Score']=np.random.randint(low=40,high=100,size=6)

In [20]: df
Out[20]:
Class Year Score
0 2A 2010 73
1 2B 2010 87
2 2C 2011 97
3 2A 2011 41
4 2B 2012 86
5 2C 2012 81

In [23]: df.set_index(['Class','Year'],inplace=True)

In [24]: df
Out[24]:
Score
Class Year
2A 2010 73
2B 2010 87
2C 2011 97
2A 2011 41
2B 2012 86
2C 2012 81

In [25]: df.transpose()
Out[25]:
Class 2A 2B 2C 2A 2B 2C
Year 2010 2010 2011 2011 2012 2012
Score 73 87 97 41 86 81

In [26]: df
Out[26]:
Score
Class Year
2A 2010 73
2B 2010 87
2C 2011 97
2A 2011 41
2B 2012 86
2C 2012 81

In [27]: df.swaplevel()
Out[27]:
Score
Year Class
2010 2A 73
2B 87
2011 2C 97
2A 41
2012 2B 86
2C 81

In [28]: df.swaplevel().transpose()
Out[28]:
Year 2010 2011 2012
Class 2A 2B 2C 2A 2B 2C
Score 73 87 97 41 86 81

In [29]: df.swapaxes(0,1)
Out[29]:
Class 2A 2B 2C 2A 2B 2C
Year 2010 2010 2011 2011 2012 2012
Score 73 87 97 41 86 81

In [37]: df
Out[37]:
Score
Class Year
2A 2010 73
2B 2010 87
2C 2011 97
2A 2011 41
2B 2012 86
2C 2012 81

In [33]: def mark_grade(score):
...: if score > 90:
...: grade='A'
...: elif score > 75:
...: grade='B'
...: elif score > 60:
...: grade='C'
...: else:
...: grade='F'
...: return grade
...:
...:

In [39]: df['Grade']=df['Score'].apply(mark_grade)

In [40]: df
Out[40]:
Score Grade
Class Year
2A 2010 73 C
2B 2010 87 B
2C 2011 97 A
2A 2011 41 F
2B 2012 86 B
2C 2012 81 B

In [41]: df.stack()
Out[41]:
Class Year
2A 2010 Score 73
Grade C
2B 2010 Score 87
Grade B
2C 2011 Score 97
Grade A
2A 2011 Score 41
Grade F
2B 2012 Score 86
Grade B
2C 2012 Score 81
Grade B
dtype: object

In [44]: df.stack().to_frame()
Out[44]:
0
Class Year
2A 2010 Score 73
Grade C
2B 2010 Score 87
Grade B
2C 2011 Score 97
Grade A
2A 2011 Score 41
Grade F
2B 2012 Score 86
Grade B
2C 2012 Score 81
Grade B

In [58]: s = df.stack().to_frame()

In [59]: s
Out[59]:
0
Class Year
2A 2010 Score 73
Grade C
2B 2010 Score 87
Grade B
2C 2011 Score 97
Grade A
2A 2011 Score 41
Grade F
2B 2012 Score 86
Grade B
2C 2012 Score 81
Grade B

In [67]: s.index.names=['Class', 'Year','Item']

In [68]: s
Out[68]:
0
Class Year Item
2A 2010 Score 73
Grade C
2B 2010 Score 87
Grade B
2C 2011 Score 97
Grade A
2A 2011 Score 41
Grade F
2B 2012 Score 86
Grade B
2C 2012 Score 81
Grade B

In [69]: s.columns=['Value']

In [70]: s
Out[70]:
Value
Class Year Item
2A 2010 Score 73
Grade C
2B 2010 Score 87
Grade B
2C 2011 Score 97
Grade A
2A 2011 Score 41
Grade F
2B 2012 Score 86
Grade B
2C 2012 Score 81
Grade B

In [71]: s.unstack()
Out[71]:
Value
Item Score Grade
Class Year
2A 2010 73 C
2011 41 F
2B 2010 87 B
2012 86 B
2C 2011 97 A
2012 81 B

In [72]: s.unstack().unstack()
Out[72]:
Value
Item Score Grade
Year 2010 2011 2012 2010 2011 2012
Class
2A 73 41 None C F None
2B 87 None 86 B None B
2C None 97 81 None A B

In [74]: s.to_csv('class_stack.csv')

In [75]: ! cat class_stack.csv
Class,Year,Item,Value
2A,2010,Score,73
2A,2010,Grade,C
2B,2010,Score,87
2B,2010,Grade,B
2C,2011,Score,97
2C,2011,Grade,A
2A,2011,Score,41
2A,2011,Grade,F
2B,2012,Score,86
2B,2012,Grade,B
2C,2012,Score,81
2C,2012,Grade,B

In [83]: s.unstack(0)
Out[83]:
Value
Class 2A 2B 2C
Year Item
2010 Score 73 87 None
Grade C B None
2011 Score 41 None 97
Grade F None A
2012 Score None 86 81
Grade None B B

In [84]: s.unstack(1)
Out[84]:
Value
Year 2010 2011 2012
Class Item
2A Score 73 41 None
Grade C F None
2B Score 87 None 86
Grade B None B
2C Score None 97 81
Grade None A B

In [85]: s.unstack(-1)
Out[85]:
Value
Item Score Grade
Class Year
2A 2010 73 C
2011 41 F
2B 2010 87 B
2012 86 B
2C 2011 97 A
2012 81 B

In [86]: s.unstack('Item')
Out[86]:
Value
Item Score Grade
Class Year
2A 2010 73 C
2011 41 F
2B 2010 87 B
2012 86 B
2C 2011 97 A
2012 81 B

In [87]: s.unstack('Year')
Out[87]:
Value
Year 2010 2011 2012
Class Item
2A Score 73 41 None
Grade C F None
2B Score 87 None 86
Grade B None B
2C Score None 97 81
Grade None A B

In [98]: s.unstack(level=-2,fill_value='Missing')
Out[98]:
Value
Year 2010 2011 2012
Class Item
2A Score 73 41 Missing
Grade C F Missing
2B Score 87 Missing 86
Grade B Missing B
2C Score Missing 97 81
Grade Missing A B

In [100]: s.unstack(level=['Year','Class'],fill_value='Missing')
Out[100]:
Value
Year 2010 2011 2012
Class 2A 2B 2C 2A 2B 2C
Item
Score 73 87 97 41 86 81
Grade C B A F B B

In [101]: s.unstack(level=['Year','Class'],fill_value='Missing').to_csv('class2.csv')

In [102]: ! cat class2.csv
,Value,Value,Value,Value,Value,Value
Year,2010,2010,2011,2011,2012,2012
Class,2A,2B,2C,2A,2B,2C
Item,,,,,,
Score,73,87,97,41,86,81
Grade,C,B,A,F,B,B

Donghua's Blog - DBAGlobe

Monday, April 30, 2018

Change elastic password using curl

Wednesday, April 25, 2018

Use pandas to read from and write back into Hadoop (Impala) tables

Sunday, April 22, 2018

Merging and Joining in Pandas

Melt in Pandas

Pivot_table in pandas

Pivot in Pandas

Saturday, April 21, 2018

stack/unstack in Pandas with example

Disclaimer

Labels

Blog Archive

Search This Website

Tweets by Me