http://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

Data Types

Source: https://developer.rhino3d.com/guides/rhinopython/python-datatypes/

Data Type Example Description
bool

bool


html source
True, False Boolean
int

int


html source
10 Signed integer (in Python3 the same as long)
long

long


html source
! Only for Python less than 3.0. In python >= 3.0 use int data type.
345L Long (Python less than 3 only)
float

float


html source
34.5 (.) Floating point real values
complex

complex


html source
3.14J Complex data type a+bi where $$i^2=-1$$
decimal

decimal


html source
Decimal(3.12) from decimal import Decimal - required
fraction

fraction


html source
Fraction(3,4) from fractions import Fraction -required
¾ - fraction
tuple

tuple


html source
(1,2,3) Tuple
list

list


html source
[1,2,3] List
dictionary

dictionary


html source
{'john': 425, 'tom': 212} Dictionary
bytearray

bytearray


html source
bytearray('Text','utf-8') Bytearray

List Operations

Command Output
[1,2,3] [1,2,3]
l = [1,'2',3] [1,'2',3]
l *2 [1,2,3,1,2,3]
l[0] 1 (int)
len(l) 3 (int)
l = [1,0,4,2]
l.sort() ! This function returns nothing, operate on the list itself. l = [0,1,4,2]
l = [1,0,-2,"-4","1","5"]
l.sort(key=int)
l = [‘-4’,-2,0,‘1’,‘5’]

key is convertion method
l.reverse()! This function returns nothing, operate on the list itself. l = ['5', '1', 1, 0, -2, '-4']
l+[1,2] ['5', '1', 1, 0, -2, '-4', 1, 2]
[x**2 for x in range(6)] [0,1,4,9,16,25]
l.append(3)! This function returns nothing, operate on the list itself. l =['5', '1', 1, 0, -2, '-4', 3]
l.remove('1') l = ['5', 1, 0, -2, '-4', 3]

Slices

Example Output Description
R = range(2,20,3) range(2,20,3) Returns a range for loop function, from 2 to 10 by 3
L = list(range(2,20,3)) [2, 5, 8, 11, 14, 17] List from range, from 2 to 10 by 3
R[:1] range(2, 5, 3) #[2] splice from range
L[:1] [2] splice from list
V = list(range(0,5)) [0, 1, 2, 3, 4] Vector
V[0] 0 first element index
V[-1] 4 last element index
V[-3] 2
V[-3:-1] [2,3] from last three to last one (without last one)
'abc'[-3:-1] ‘ab’
V[0:3:2] [0,2] from first to 3rd by 2
'abc'[0:3:2] ac
V[::2] [0,2,4] all by 2
'abc'[::2] ‘ac’
V[::-1] [4, 3, 2, 1, 0] reverse
'abc'[::-1] ‘cba’ reverse text

OS

import os
import glob
Example Output Description
os.getcwd() C:\\public\\PROJECTS\examples\ Get current Directory
os.listdir('.') ["file1.py","image.png"] Get list of files
os.listdir('c:\\') ['$Recycle.Bin','Windows',...] Get list of files from the directory
glob.glob('./*.png') ["image.png"] Get all images
for dir in os.walk('.') ‘.’,'.git',... Iteration to walk through all directories
[d[0] for d in os.walk('.')] ['.','.git',...] list of all subfolders

Random

import random as r
Example Output Description
random.choise([1,2,3]) 2 Select random one values
random.sample([1,2,3],2) [1,3] Select random 2 values

Data.Frame

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

Create
?

pd.DataFrame


html source

From Dictionary:

df = pd.DataFrame(
    {"a" : [4 ,5, 6],
    "b" : [7, 8, 9],
    "c" : [10, 11, 12]},
index = [1, 2, 3])

1532494343349

From List

lst = [[1,2,3],[4,5,6]]
df = pd.DataFrame(lst,columns = ['A','B','C'])

1532494647888

import data

Name Description Example
pd.read_csv

pd.read_csv


html source
Read csv file pd.read_csv('name.csv',header=0,index_col=0)

Select

More examples: https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/

dic = {"a" : [4 ,5, 6, 7],"b" : [8, 9, 10, 11],"c" : [12, 13, 14, 15]}
df = pd.DataFrame(dic,index = [0, 1, 2, 3])

1531674822633

Example Output Description
df[:2] # 1531674705310.png Select first two rows (0 and 1, :2 means without 2)
df[-1:] # 1531799872963 Last row
df[1:2] # 1531675681810.png Select row from 1 to 2 (without 2)
df["b"] # 1531674966059.png Column b (output is pandas.Series)
df.b # 1533889467252 Select column b (output is pandas.Series)
df[["b","c"]] # 1531729429195 Columns ‘b’and ‘c’
df.loc[1:2,["b","c"]] # 1531675407075! Different behaviour than df[1:2], in previous it selects without :2, in .loc it selects with :2 row 1 and 2, columns ‘b’and ‘c’(with 2)
df.loc[:,'a':'c'] # 1531675829411 all rows, columns from ‘a’to ‘c’
df.iloc[:,0:2] # 1532176103443 all rows, columns from 0 to 2 (without 2)
df['a']>5 # 1531675963466 Return if column ‘a’is greater than 5
(df['a']>5) & (df['b']>10) # 1531676134435 combine two conditions
df[df['a']>5] # 1531676209026 Filter by condition
df.shape[0] 4 Number of rows
df[df['a']>5].shape[0] 2 Number of rows after filter
df.shape[1] 3 Number of columns

Row Operations

Example Output Description
df.loc[4] = [-1,-2,-3] # 1533806598556 Add/Replace row
df = df.append([{ 'a': 1,'b':2,'c':3}] , ignore_index = True) # 1533814633488 Add rows
?

Performance is slower than on the list


html source
df = df.append([{ 'a': 1,'b':2,'c':3}] ) # 1533814682395 Add rows
df = df.append(df) # 1533815160299 Add DataFrame
df = df.append(df,ignore_index=True) # 1533815229332 Add DataFrame
df = pd.concat([df,df]) # 1533815072114 Concat DataFrames

Column Operations

Example Output Description
df['d']=np.nan # 1533045743443 Fill column with NaN values
df['d'] = df['a']+df['b'] # 1533817633000 Fill column with adding two olumns
df.insert(0,'a0',3) # 1533045838505 Insert column at the begining (0 position)
df = df.assign(d=5) # 1533817420946 Add new column
df = df.assign(d = lambda x: x.a+x.b) # 1533817495154 Add new column from function

Numpy

array_1d = np.array([1, 2, 3]) #1 dimensional array
array_2d = np.array([[1,2,3],[4,5,6]]) #2 dimensional array

1532176580502

1532176593599

Example Output Description
np.array([1, 2, 3]) # 1532176251256 Create numpy array
np.array([[1,2,3],[4,5,6]]) # 1532176593599 Create a two dimensional array
array_1d.shape (3,) Shape of the 1d array
array_2d.shape (2, 3) Shape of the 2d array
np.arange(12) # 1532177621231 array with elements from 0 to 11.
np.max(df[["b","c"]]) # 1532176425655 max from columns “b”and “C”
np.max(df["a"]) # 1532176455463 max from column “a”
np.mean(array_1d) 2.0 mean
np.mean(array_2d) 3.5 mean from 2D array (for all elements)
np.median(array_1d) 2.0 median
np.std(array_1d) 0.816496580927726 standard deviation
np.var(array_1d) 0.6666666666666666 variance
np.sum(array_1d) 6 sum
np.cumsum(array_1d) array([1, 3, 6], dtype=int32) running sum
np.sort([6,3,5]) array([3, 5, 6]) sorted values
np.sort([[7,5,6],[4,1,3]]) # 1532177355991 sorted 2d values
np.random.normal(1100,222,3) # 1532177425880 random with normal distribution, loc= 1100 (center)
scale= 222 (spread)
size= 3 (elements)
array_1d[None] array([[1, 2, 3]]) (shape: (1,3)) Create second dimension of array (array of array)

Sort

import pandas as pd

df = pd.DataFrame({"A":["a","b","c","a"], "B": [2,5,0,1]})
Example Output Description
df.sort_values(“A”) # 1536904372749 Sort values by the column A
df.sort_values(["A","B"]) # 1536904384355 Sort values by columns A and B
df.sort_values(["A","B"],ascending=[False,True]) # 1536904439323 Sort values by the column A descending and the column B ascending

Group calculations

import numpy as np
import pandas as pd

dic = {"a" : ['a' , 'a', 'a', 'b', 'c'], "b" : [1, 2, 2, 3, 2], "c" : [1,1, 1, 1, 15]}
df = pd.DataFrame(dic,index = [0, 1, 2, 3, 4])

df2 = df.groupby(['a']).agg(['sum','mean']).reset_index()

1533832027021

1533889163117

Example Output Description
df.sum() # 1533832055313 Sum of rows. Other:
max,min,count,mean
df.cumsum() # 1533894550721 Running sum. Other:
cummax,cummin
df.sum(axis=1) # 1533832093813 Sum of columns
df.groupby(['a']).sum() # 1533832168045 Sum by group
df.groupby(['a']).agg(['sum','mean']) # 1533832258188 Sum and Mean of column
df.groupby(['a']).agg( { 'b': ['max','min','sum'], 'c': ['sum'] }) # 1533832651365 Aggregation for each column
df.groupby(['a']).agg( { 'c': lambda x: np.max(x) }) # 1533832751741 Own function
df.groupby(['a']).agg( { 'b': { 'max': 'max', 'count_max': lambda x: x[x==np.max(x)].count()} }) # 1533832970629 Count maximum values of column b

Column/Index operations

Example Output Description
df2.b # 1533889585578 All b columns
df2.b['sum'] # 1533889715268 From column (‘b’, ‘sum’)
df2.loc[3,('b','sum')] 11 3rd row from column (‘b’,‘sum’)
df2.loc[3,[('b','sum'),('a','')]] # 1533890808267 3rd row, column(‘b’,‘sum’) and ‘a’
df2.iloc[3,1] 11 3rd from from column 1 (‘b’,‘sum’)
df = df.groupby(['a']).sum().reset_index() # 1533832199895 Move index from ‘a’to column and create new index.
df2.columns = df2.columns.map('_'.join).str.strip() # 1533889213044 Remove hierachy columns from DataFrame

Categories

import pandas as pd

df = pd.DataFrame({"A":["a","b","c","a"]})
Example Output Description
df["B"] = df["A"].astype("category") # 1536903886904 Convert to category
df['B'].cat.reorder_categories(['b','c','a'], ordered=True ) # 1536904499948 Reorder categories
df['B'] = df['B'].cat.rename_categories(['BB','CC','AA']) # 1536904717755 Rename categories

Ipython/Jupyter

Run jupyter

jupyter notebook

Setup password instead of token

If you don’t want to use url for jupyter with the token you can replace it with the password for the jupyter notebook.

Instead of

http://localhost:8888/?token=c8de56fa4deed24899803e93c227592aef6538f93025fe01

You go to the jupyter webpage by the link:

http://localhost:8888

jupyter notebook password

1544617894175

Run Jupyter on ssh more

  1. Setup password for jupyter notebook (not required)
jupyter notebook password
  1. Run jupyter notebook. --ip is the port number, --ip=0.0.0.0 - allows you to go to a webpage from any source ip.
jupyter notebook --no-browser --port=8002 --ip=0.0.0.0
  1. On local machine run ssh tunnellink
ssh -N -f -L localhost:8002:localhost:8002 user@ip_adress

If you got certificate (like AWS), for the ssh you need to add your private key (.pem file) in the command

ssh -i key.pem user@ip_address
ssh  -i key.pem user@ip_address -N -f -L localhost:8002:localhost:8002 user@ip_adress

Next on the local machine run ssh tunneling:

  • localhost:8002 is local port,
  • localhost:8000 port run on the ipython notebook , Can be the same as above
  • user - user name,
  • ip_adress - ip address for the ssh).
  1. go to the webpage with the selected port (add the token link if you don’t set up a password):

http://localhost:8002

1544618503426

Hide warnings

import sys
import warnings

if not sys.warnoptions:
    warnings.simplefilter("ignore")

Show all columns

from IPython.display import display
pd.options.display.max_columns = None

Larger plot

import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [10, 5]

Reload module

Python 2

import numpy as np # linear algebra

reload(np)

Python 3

from imp import reload
import numpy as np # linear algebra

reload(np)

Python >=3.4

from importlib import reload
import numpy as np # linear algebra

reload(np)

Autoreload module in Jupyter/Ipython

Reload all modules (except those excluded by %aimport) every time before executing the Python code typed.

%load_ext autoreload
%autoreload 2

Reload all modules imported with %aimport every time before executing the Python code typed.

%autoreload 1

Disable automatic reloading.

%autoreload 0

Reload all modules (except those excluded by %aimport) automatically now.

%autoreload

Keyboard shortcuts

Source: https://www.cheatography.com/weidadeyue/cheat-sheets/jupyter-notebook/

Both modes:

Key Description
Shift+Enter run cell, select bellow
Ctrl+Enter run cell, keep at cell
Alt+Enter run cell, insert bellow

Command Mode (Esc to enable)

Key Description
or k Go Up
or j Go Down
a Add cell above
b Add cell below
d,d delete cell
x cut cell
c copy cell
v paste cell
z undo last cell deletion
0,0 restart kernel
y convert to code
m convert to markdown
r convert to raw
shift+m merge with the next cell

Edit Mode (Esc to enable):

Key Description
Esc Move to Command mode
ctrl+] indent selected cells
ctrl+[ dedent selected cells
ctrl+a select all
ctrl+z undo
ctrl+shift+z or ctrl+y redo
ctrl+backspace delete word before
ctrl+delete delete word after
ctrl+shift+- split cell on the cursor into two cells
ctrl+s save
ctrl+/ toggle comments

Magic and cell commands

More: https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/

Command Info Description
%magic # 1543671424435 Info about magic function
%%lsmagic # 1543671606220 List all magic commands
%time # 1543671299107 Time for a single command
%%time # 1543670941465 Time for run single cell
%timeit # 1543671339531 TimeIt for single commnad
%%timeit # 1543671046468 Mean and Standard Devaition of running cell couple times.
%%writefile pythoncode.py # 1543671667126 Write python code to the file
%run pythoncode.py # 1543671772900 Run python code
?str.replace() # 1543672157595 Describes command
??func # 1543672264670 Display the source of the command
!command Run shell command (for example ls)
!ls *.csv # 1543671852938 list all *.csv files
!pip install numpy # 1543671993940 install by pip
%%html # 1543672054107 Treat cell as a HTML

Jupyter/Show Progress (tdqm)

source: https://github.com/tqdm/tqdm#installation

1543673704644

from tqdm import tqdm

a = 0
for i in tqdm(range(10000000)):
    a = a+1