Python Cheat Sheet | One Page Knowledge

Links

http://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

Data Types

Source: https://developer.rhino3d.com/guides/rhinopython/python-datatypes/

Data Type	Example	Description
bool ↶ bool html source	`True`, `False`	Boolean
int ↶ int html source	`10`	Signed integer (in Python3 the same as long)
long ↶ long html source ! `Only for Python less than 3.0. In python >= 3.0 use int data type.`	`345L`	Long (Python less than 3 only)
float ↶ float html source	`34.5`	(.) Floating point real values
complex ↶ complex html source	`3.14J`	Complex data type `a+bi` where $$i^2=-1$$
decimal ↶ decimal html source	`Decimal(3.12)`	`from decimal import Decimal` - required
fraction ↶ fraction html source	`Fraction(3,4)`	`from fractions import Fraction` -required ¾ - fraction
tuple ↶ tuple html source	`(1,2,3)`	Tuple
list ↶ list html source	`[1,2,3]`	List
dictionary ↶ dictionary html source	`{'john': 425, 'tom': 212}`	Dictionary
bytearray ↶ bytearray html source	`bytearray('Text','utf-8')`	Bytearray

List Operations

Command	Output
`[1,2,3]`	`[1,2,3]`
`l = [1,'2',3]`	`[1,'2',3]`
`l *2`	`[1,2,3,1,2,3]`
`l[0]`	`1` (int)
`len(l)`	`3` (int)
`l = [1,0,4,2]`
`l.sort()` ! `This function returns nothing, operate on the list itself.`	`l = [0,1,4,2]`
`l = [1,0,-2,"-4","1","5"]` `l.sort(key=int)`	`l = [‘-4’,-2,0,‘1’,‘5’]` key is convertion method
`l.reverse()`! `This function returns nothing, operate on the list itself.`	`l = ['5', '1', 1, 0, -2, '-4']`
`l+[1,2]`	`['5', '1', 1, 0, -2, '-4', 1, 2]`
`[x**2 for x in range(6)]`	`[0,1,4,9,16,25]`
`l.append(3)`! `This function returns nothing, operate on the list itself.`	`l =['5', '1', 1, 0, -2, '-4', 3]`
`l.remove('1')`	`l = ['5', 1, 0, -2, '-4', 3]`

Slices

Example	Output	Description
`R = range(2,20,3)`	`range(2,20,3)`	Returns a range for loop function, from 2 to 10 by 3
`L = list(range(2,20,3))`	`[2, 5, 8, 11, 14, 17]`	List from range, from 2 to 10 by 3
`R[:1]`	`range(2, 5, 3) #[2]`	splice from range
`L[:1]`	`[2]`	splice from list
`V = list(range(0,5))`	`[0, 1, 2, 3, 4]`	Vector
`V[0]`	`0`	first element index
`V[-1]`	`4`	last element index
`V[-3]`	`2`
`V[-3:-1]`	`[2,3]`	from last three to last one (without last one)
`'abc'[-3:-1]`	`‘ab’`
`V[0:3:2]`	`[0,2]`	from first to 3rd by 2
`'abc'[0:3:2]`	`ac`
`V[::2]`	`[0,2,4]`	all by 2
`'abc'[::2]`	`‘ac’`
`V[::-1]`	`[4, 3, 2, 1, 0]`	reverse
`'abc'[::-1]`	`‘cba’`	reverse text

OS

import os
import glob

Example	Output	Description
`os.getcwd()`	`C:\\public\\PROJECTS\examples\`	Get current Directory
`os.listdir('.')`	`["file1.py","image.png"]`	Get list of files
`os.listdir('c:\\')`	`['$Recycle.Bin','Windows',...]`…	Get list of files from the directory
`glob.glob('./*.png')`	`["image.png"]`	Get all images
`for dir in os.walk('.')`	`‘.’,'.git',...`	Iteration to walk through all directories
`[d[0] for d in os.walk('.')]`	`['.','.git',...]`	list of all subfolders

Random

import random as r

Example	Output	Description
`random.choise([1,2,3])`	`2`	Select random one values
`random.sample([1,2,3],2)`	`[1,3]`	Select random 2 values

Data.Frame

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

Create
?
↶
pd.DataFrame

html source

From Dictionary:

df = pd.DataFrame(
    {"a" : [4 ,5, 6],
    "b" : [7, 8, 9],
    "c" : [10, 11, 12]},
index = [1, 2, 3])

1532494343349

From List

lst = [[1,2,3],[4,5,6]]
df = pd.DataFrame(lst,columns = ['A','B','C'])

1532494647888

import data

Name	Description	Example
pd.read_csv ↶ pd.read_csv html source	Read csv file	`pd.read_csv('name.csv',header=0,index_col=0)`

Select

More examples: https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/

dic = {"a" : [4 ,5, 6, 7],"b" : [8, 9, 10, 11],"c" : [12, 13, 14, 15]}
df = pd.DataFrame(dic,index = [0, 1, 2, 3])

1531674822633

Example	Output	Description
`df[:2]`	#	Select first two rows (0 and 1, :2 means without 2)
`df[-1:]`	#	Last row
`df[1:2]`	#	Select row from 1 to 2 (without 2)
`df["b"]`	#	Column b (output is pandas.Series)
`df.b`	#	Select column b (output is pandas.Series)
`df[["b","c"]]`	#	Columns ‘b’and ‘c’
`df.loc[1:2,["b","c"]]`	# ! `Different behaviour than df[1:2], in previous it selects without :2, in .loc it selects with :2`	row 1 and 2, columns ‘b’and ‘c’(with 2)
`df.loc[:,'a':'c']`	#	all rows, columns from ‘a’to ‘c’
`df.iloc[:,0:2]`	#	all rows, columns from 0 to 2 (without 2)
`df['a']>5`	#	Return if column ‘a’is greater than 5
`(df['a']>5) & (df['b']>10)`	#	combine two conditions
`df[df['a']>5]`	#	Filter by condition
`df.shape[0]`	`4`	Number of rows
`df[df['a']>5].shape[0]`	`2`	Number of rows after filter
`df.shape[1]`	`3`	Number of columns

Row Operations

Example	Output	Description
`df.loc[4] = [-1,-2,-3]`	#	Add/Replace row
`df = df.append([{ 'a': 1,'b':2,'c':3}] , ignore_index = True)`	#	Add rows ? ↶ Performance is slower than on the list html source
`df = df.append([{ 'a': 1,'b':2,'c':3}] )`	#	Add rows
`df = df.append(df)`	#	Add DataFrame
`df = df.append(df,ignore_index=True)`	#	Add DataFrame
`df = pd.concat([df,df])`	#	Concat DataFrames

Column Operations

Example	Output	Description
`df['d']=np.nan`	#	Fill column with NaN values
`df['d'] = df['a']+df['b']`	#	Fill column with adding two olumns
`df.insert(0,'a0',3)`	#	Insert column at the begining (0 position)
`df = df.assign(d=5)`	#	Add new column
`df = df.assign(d = lambda x: x.a+x.b)`	#	Add new column from function

Numpy

array_1d = np.array([1, 2, 3]) #1 dimensional array
array_2d = np.array([[1,2,3],[4,5,6]]) #2 dimensional array

1532176580502

1532176593599

Example	Output	Description
`np.array([1, 2, 3])`	#	Create numpy array
`np.array([[1,2,3],[4,5,6]])`	#	Create a two dimensional array
`array_1d.shape`	`(3,)`	Shape of the 1d array
`array_2d.shape`	`(2, 3)`	Shape of the 2d array
`np.arange(12)`	#	array with elements from 0 to 11.
`np.max(df[["b","c"]])`	#	max from columns “b”and “C”
`np.max(df["a"])`	#	max from column “a”
`np.mean(array_1d)`	`2.0`	mean
`np.mean(array_2d)`	`3.5`	mean from 2D array (for all elements)
`np.median(array_1d)`	`2.0`	median
`np.std(array_1d)`	`0.816496580927726`	standard deviation
`np.var(array_1d)`	`0.6666666666666666`	variance
`np.sum(array_1d)`	`6`	sum
`np.cumsum(array_1d)`	`array([1, 3, 6], dtype=int32)`	running sum
`np.sort([6,3,5])`	`array([3, 5, 6])`	sorted values
`np.sort([[7,5,6],[4,1,3]])`	#	sorted 2d values
`np.random.normal(1100,222,3)`	#	random with normal distribution, loc= 1100 (center) scale= 222 (spread) size= 3 (elements)
`array_1d[None]`	`array([[1, 2, 3]])` (shape: `(1,3)`)	Create second dimension of array (array of array)

Sort

import pandas as pd

df = pd.DataFrame({"A":["a","b","c","a"], "B": [2,5,0,1]})

Example	Output	Description
`df.sort_values(“A”)`	#	Sort values by the column `A`
`df.sort_values(["A","B"])`	#	Sort values by columns `A` and `B`
`df.sort_values(["A","B"],ascending=[False,True])`	#	Sort values by the column `A` `descending` and the column `B` `ascending`

Group calculations

import numpy as np
import pandas as pd

dic = {"a" : ['a' , 'a', 'a', 'b', 'c'], "b" : [1, 2, 2, 3, 2], "c" : [1,1, 1, 1, 15]}
df = pd.DataFrame(dic,index = [0, 1, 2, 3, 4])

df2 = df.groupby(['a']).agg(['sum','mean']).reset_index()

1533832027021

1533889163117

Example	Output	Description
`df.sum()`	#	Sum of rows. Other: `max,min,count,mean`
`df.cumsum()`	#	Running sum. Other: `cummax,cummin`
`df.sum(axis=1)`	#	Sum of columns
`df.groupby(['a']).sum()`	#	Sum by group
`df.groupby(['a']).agg(['sum','mean'])`	#	Sum and Mean of column
`df.groupby(['a']).agg( { 'b': ['max','min','sum'], 'c': ['sum'] })`	#	Aggregation for each column
`df.groupby(['a']).agg( { 'c': lambda x: np.max(x) })`	#	Own function
`df.groupby(['a']).agg( { 'b': { 'max': 'max', 'count_max': lambda x: x[x==np.max(x)].count()} })`	#	Count maximum values of column b

Column/Index operations

Example	Output	Description
`df2.b`	#	All `b` columns
`df2.b['sum']`	#	From column `(‘b’, ‘sum’)`
`df2.loc[3,('b','sum')]`	`11`	3rd row from column `(‘b’,‘sum’)`
`df2.loc[3,[('b','sum'),('a','')]]`	#	3rd row, column`(‘b’,‘sum’)` and `‘a’`
`df2.iloc[3,1]`	`11`	3rd from from column 1 `(‘b’,‘sum’)`
`df = df.groupby(['a']).sum().reset_index()`	#	Move index from `‘a’`to column and create new index.
`df2.columns = df2.columns.map('_'.join).str.strip()`	#	Remove hierachy columns from DataFrame

Example	Output	Description
`df["B"] = df["A"].astype("category")`	#	Convert to category
`df['B'].cat.reorder_categories(['b','c','a'], ordered=True )`	#	Reorder categories
`df['B'] = df['B'].cat.rename_categories(['BB','CC','AA'])`	#	Rename categories

Ipython/Jupyter

Run jupyter

jupyter notebook

Setup password instead of token

If you don’t want to use url for jupyter with the token you can replace it with the password for the jupyter notebook.

Instead of

http://localhost:8888/?token=c8de56fa4deed24899803e93c227592aef6538f93025fe01

You go to the jupyter webpage by the link:

http://localhost:8888

jupyter notebook password

1544617894175

Run Jupyter on ssh more

Setup password for jupyter notebook (not required)

jupyter notebook password

Run jupyter notebook. --ip is the port number, --ip=0.0.0.0 - allows you to go to a webpage from any source ip.

jupyter notebook --no-browser --port=8002 --ip=0.0.0.0

On local machine run ssh tunnellink

ssh -N -f -L localhost:8002:localhost:8002 user@ip_adress

If you got certificate (like AWS), for the ssh you need to add your private key (.pem file) in the command

ssh -i key.pem user@ip_address

ssh  -i key.pem user@ip_address -N -f -L localhost:8002:localhost:8002 user@ip_adress

Next on the local machine run ssh tunneling:

localhost:8002 is local port,
localhost:8000 port run on the ipython notebook , Can be the same as above
user - user name,
ip_adress - ip address for the ssh).

go to the webpage with the selected port (add the token link if you don’t set up a password):

http://localhost:8002

1544618503426

Hide warnings

import sys
import warnings

if not sys.warnoptions:
    warnings.simplefilter("ignore")

Show all columns

from IPython.display import display
pd.options.display.max_columns = None

Larger plot

import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [10, 5]

Reload module

Python 2

import numpy as np # linear algebra

reload(np)

Python 3

from imp import reload
import numpy as np # linear algebra

reload(np)

Python >=3.4

from importlib import reload
import numpy as np # linear algebra

reload(np)

Autoreload module in Jupyter/Ipython

Reload all modules (except those excluded by %aimport) every time before executing the Python code typed.

%load_ext autoreload
%autoreload 2

Reload all modules imported with %aimport every time before executing the Python code typed.

%autoreload 1

Disable automatic reloading.

%autoreload 0

Reload all modules (except those excluded by %aimport) automatically now.

%autoreload

Keyboard shortcuts

Source: https://www.cheatography.com/weidadeyue/cheat-sheets/jupyter-notebook/

Both modes:

Key	Description
`Shift+Enter`	run cell, select bellow
`Ctrl+Enter`	run cell, keep at cell
`Alt+Enter`	run cell, insert bellow

Command Mode (Esc to enable)

Key	Description
`↑` or `k`	Go Up
`↓` or `j`	Go Down
`a`	Add cell above
`b`	Add cell below
`d,d`	delete cell
`x`	cut cell
`c`	copy cell
`v`	paste cell
`z`	undo last cell deletion
`0,0`	restart kernel
`y`	convert to code
`m`	convert to markdown
`r`	convert to raw
`shift+m`	merge with the next cell

Edit Mode (Esc to enable):

Key	Description
`Esc`	Move to Command mode
`ctrl+]`	indent selected cells
`ctrl+[`	dedent selected cells
`ctrl+a`	select all
`ctrl+z`	undo
`ctrl+shift+z` or `ctrl+y`	redo
`ctrl+backspace`	delete word before
`ctrl+delete`	delete word after
`ctrl+shift+-`	split cell on the cursor into two cells
`ctrl+s`	save
`ctrl+/`	toggle comments

Magic and cell commands

More: https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/

Command	Info	Description
`%magic`	#	Info about magic function
`%%lsmagic`	#	List all magic commands
`%time`	#	Time for a single command
`%%time`	#	Time for run single cell
`%timeit`	#	TimeIt for single commnad
`%%timeit`	#	Mean and Standard Devaition of running cell couple times.
`%%writefile pythoncode.py`	#	Write python code to the file
`%run pythoncode.py`	#	Run python code
`?str.replace()`	#	Describes command
`??func`	#	Display the source of the command
`!command`		Run shell command (for example ls)
`!ls *.csv`	#	list all `*.csv` files
`!pip install numpy`	#	install by pip
`%%html`	#	Treat cell as a HTML

Jupyter/Show Progress (tdqm)

source: https://github.com/tqdm/tqdm#installation

1543673704644

from tqdm import tqdm

a = 0
for i in tqdm(range(10000000)):
    a = a+1

# Links

# Data Types

bool

int

long

float

complex

decimal

fraction

tuple

list

dictionary

bytearray

# List Operations

# Slices

# OS

# Random

# Data.Frame

# Create ? ↶ pd.DataFrame html source

pd.DataFrame

# From Dictionary:

# From List

# import data

pd.read_csv

# Select

# Row Operations

Performance is slower than on the list

# Column Operations

# Numpy

# Sort

# Group calculations

# Column/Index operations

# Categories

# Ipython/Jupyter

# Run jupyter

# Setup password instead of token

# Run Jupyter on ssh more

# Hide warnings

# Show all columns

# Larger plot

# Reload module

# Python 2

# Python 3

# Python >=3.4

# Autoreload module in Jupyter/Ipython

# Keyboard shortcuts

# Magic and cell commands

# Jupyter/Show Progress (tdqm)