- Published on
Python
Table of Contents
- History of Python
- Part 1: BASIC and ABC
- Part 2: Th Programming Language
- Perl Python Combination
- Why Python?
- Python Internals
- Typing
- Lists
- ASIDE: Underying list Allocation
- Python vs. Shell vs. Emacs Scripting
- Classes and OOP
- Dunders and "Operator Overloading"
- Namespaces in Classes
- Modules
- The import Statement
- Packages
- The PYTHONPATH Environment Variable
History of Python
- From
grep
tosed
toawk
, such commands attempt to generalize and expand what the previous does, rising in levels of complexity. - The Perl programming language was designed to be able do everything awk, etc. can do.
- Python was designed to do everything Perl, etc. can do. The rise of Python can be attributed to two parts:
Part 1: BASIC and ABC
BASIC: an instructional language
Instead of writing very low-level code, go up one level of abstraction. People were given a language that:
- Hash tables are implemented into the language, like
set
anddict
- All the basic algorithms already built-in. Just
sort
lol. - Enforce indentation.
- An IDE to run the programs.
- Hash tables are implemented into the language, like
High school students will then be programming in this new language called ABC. It didn't work because employers still wanted production ready languages.
Part 2: Th Programming Language
Perl = sh + awk + sed + ... (putting all the little languages together)
- Perl was designed as an antidote to the "Little Languages" philosophy, so it combined all the little languages into one scripting language.
- Perl became the scripting language of choice for about 10 years.
- It was designed to be like a real spoken language - there was always more than one way to do something.
Perl Python Combination
Python emerged as a combination of:
- ABC had the philosophy that there is one correct way to do anything.
- The capabilities of Perl.
- Python is a scripting language that tries to do everything. Theoretically, if you know the language very well, you do not have to touch the little languages of the shell
ASIDE: All 3 languages, BASIC, Perl, and Python can be either compiled or interpreted.
Why Python?
- CONS: Slow, memory hog.
- PROS: Easier to write (development cost vs. runtime cost). A lot of libraries are also written in C/C++ code for a performance bonus, made possible by native method interfaces.
- Scripting languages like Python demonstrate an alternate balance between development cost and runtime cost
- Oftentimes, human time is much more valuable than computer time.
- Prevalent in machine learning
- this feat is probably due to being at the right place at the right time. Python happened to be a reasonable scripting language for the job when the field was emerging in popularity.
There's always going to be a time where you're the blind person next to the elephant. The goal of software construction is to make you a better blind person. - Dr. Eggert
Python Internals
- Python is object-oriented.
- It started with functions but no classes.
- When classes were introduced, they implemented methods as functions that explicitly take the
self
first argument, which is actually how OOP is implemented behind the hood in languages like C++.
Anyway, every value is an object. Every object has:
- Identity - cannot be changed
- Type - cannot be changed
- Value - can be changed, but only if the object is mutable
Typing
- In old Python,
int
used to have a fixed size, so there was a distinction between integers and longs. - The main categories of types in modern Python:
Singletons: Types with only one instance throughout runtime
None
: Python's version of anull
valueTrue
andFalse
: Python's booleans, which (fun fact) actually subclass the numeric typeint
.
Numbers
int
: A number without a fractional part. In modern Python, these can be any size, so you don't need to worry about bounds/overflow like in most languages.float
: A number with a fractional part. There's also the specialfloat("inf")
constant that represents infinity (1/0
in C,Infinity
in JavaScript, etc.).complex
: A number with a real and imaginary part. You can initialize a literal with thea+bj
syntax e.g.5+4j
.
Collections
Sequential Containers: Collections where order matters, with elements being accessed by index
list
: Python's built-in vector type. Heterogenous, arbitrary-sized, ordered collections.tuple
: Same aslist
, but immutable, so it's fixed length. These are nice because they are more efficient and are in a sense "safer" to use.
Associative Containers: Collections where elements are accessed by key
set
: Python's built-in hash table. An unordered collection of unique, hashable values.dict
: Python's built-in hash map. Unordered collections of key-value pairs. Keys must be unique and hashable.
Callables: Objects that support being called with optional arguments
- Functions: objects you define with the
def
keyword or anonymous ones with thelambda
keyword- functions are objects that can be called but can also be stored into another object and get called from that object (called callback)
f = lambda x,y: x+y+1
g = f
, where g is now a callableg(3, 19)
if g is f -> True
- functions are objects that can be called but can also be stored into another object and get called from that object (called callback)
- Methods: function objects bound to an instance of a class. They take a mandatory
self
positional argument, the class instance the method acts on behalf of. - Callables can be formatted with arguments:
def printf(fmt, *args)
: formats into tupledef printx(fmt, **args)
: formats into dictionaryprintx("abc def", a = 17, b = 19, c = -7)
results inargs = {'a' : 17, 'b':19}
There's also the buffer type. I assume this is only available in Python 2. Buffers are like multiple strings. When you're done working with it, you can convert it to a string with str(x)
.
Lists
Common list
methods:
my_list.append(value)
: add any value as an element to the end of the list.my_list.extend(iterable)
: lay all the values out from the iterable as elements at the end of the list.my_list.insert(index, value)
: insert an element at a position in the list, pushing everything after it backwards.my_list.count(value)
: return the number of occurrences ofvalue
(using==
checking).my_list.remove(value)
: remove the first occurrence ofvalue
(using==
checking), raisingValueError
if not found.my_list.pop([index])
: defaults to last item, which you can use along with.append
to emulate a stack data structure. RaisesIndexError
if empty.my_list.clear():
remove all elements.
Common operations universal to sequential containers:
len(my_list)
: return the number of elements.my_list[i:j]
: return aslice
of the container, starting from indexi
and up until but excludingj
.- Mutable ones also support this syntax on the LHS, where it means reassigning a segment of the container, as well as
del my_list[i:j]
, which deletes that segment of the container.
- Mutable ones also support this syntax on the LHS, where it means reassigning a segment of the container, as well as
list
Allocation
ASIDE: Underying - Probably uses cache size to determine starting size.
- After that, reallocation uses geometric resizing (approximately nine-eights according to mCoding).
- The total cost of calling
list.append
N times is . Because the amortized cost of this operation is .
Visualization: imagine that the list length is doubled for every allocation, which isn't true, but this doesn't change the asymptotic time, so it simplifies the derivation:
[e e e e e e e e e e e e e e e e e e e e e e]
... |<3>|<--2 ops-->|<-------1 op-------->|
The total cost is O(1) overall.
Python vs. Shell vs. Emacs Scripting
- Lisp is an ASL for Emacs, like an extension language. It uses existing code, Emacs primitives.
- Shell uses existing programs.
- Python was designed to be a general-purpose programming language, so there are no "primitives" you bring together
- you just write in the language altogether to program from scratch
- it also converges to the same phenomenon where programmers glue together existing modules like PyTorch and SciPy
- you just write in the language altogether to program from scratch
- What makes a language a scripting language supports this pattern of software construction of building applications from existing code
The goal of a scripting language is you don't code from scratch. You glue together other people's code. You provide the cement, and the other people provide the bricks. - Dr. Eggert
Classes and OOP
Class hierarchies are directed acyclic graphs (DAG).
This is especially apparent because, unlike languages like Java, Python supports multiple inheritances.
Python's method resolution order (MRO) is depth-first, left-to-right. So for example, if you define a class that inherits like so:
class C(A, B):
def some_method(self, arg):
pass
- With the DAG model, this design makes it so that if
A
andB
disagree,A
will always take priority.
Historical ASIDE: The decision to explicitly include self
in all method definitions was to not abstract a fundamental mechanism of OOP
- every method is bound to the class and acts on the instance
- If you examine the machine code of similar OOP languages like C++, you'll see that there's a hidden first argument to every method, which is the pointer to the object that the method is acting on behalf of
Introspection ASIDE: You can use the built-in __mro__
attribute of class objects to programmatically access a class' method resolution order. For example:
# Print the names of the classes in class O's MRO
print([cls.__name__ for cls in O.__mro__])
Dunders and "Operator Overloading"
The old way to redefine the comparison operators:
def __cmp__(self, other):
# negative for <, 0 for equal, positive for >
return num
- This is still supported but it is now anachronistic approach because you can run into hardware problems
- A notable example is the case of floating point numbers, which have an additional state, NaN, beyond negative, zero, and positive
- we have the familiar
__lt__
,__gt__
, etc.
- we have the familiar
This is the Python 2 predecessor to the familiar __bool__
method:
def __nonzero__(self):
# Return whether the object is considered to be "not zero"
return b
Namespaces in Classes
- Namespaces are just dictionaries
- Classes have a special attribute
__dict__
, adict
that maps names to values- This gives rise to opportunities to write "clever" Python code, where you can programmatically alter the definition of an existing class:
c = C()
c.__dict__["m"] is c.m
class SimpleNamespace:
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
keys = sorted(self.__dict__)
items = ("{}={!r}".format(k, self.__dict__[k]) for k in keys)
return "{}({})".format(type(self).__name__, ", ".join(items))
- above is a cool example of changing some of the namespaces
- This is (probably) how metaclasses are implemented, which was not mentioned in lecture but is cool to know. They're basically classes that determine how other classes should be implemented.
Modules
import
Statement
The - Creates a namespace for the module.
- Executes the contents of the module in the context of that namespace. Eggert didn't mention this, but this step is actually only performed if the module hasn't already been imported. Modules are only run onces.
- this is so that different classes with the same name do not collide with each other
- Adds a name, the module name, to the current namespace.
- all of the names are visible, but you have to type in modulename.(method)
from modulename import *
never do this as this will wipe out other namespaces
Proof for #2:
# module.py
print("Hello world")
# runner.py
import module
import module
$ python3 runner.py
Hello world
Packages
- Packages organize source code into a familiar tree structure
- This allows importing to be parallel the file system.
- packages are about software developers (organized for convenience of developers)
- classes are organized for the behavior of your objects at runtime
- in python importing is a statement while in C++ it is a declaration
- if tool > 1200: import trig (works in python)
The special __init__.py
scripts turns a directory into a proper package, and it is automatically run upon import.
PYTHONPATH
Environment Variable
The Just as how PATH
instructs the shell program where to search for commands, PYTHONPATH
instructs the Python interpreter on where to search for code.
Determines the behavior of the import
statement. Python will search through the sequence of paths, delimited by colons (Unix) or semicolons (Windows), to search for names of packages or modules to import. The path to the standard library is included in PYTHONPATH
by default.
This variable is stored in and can be modified programmatically with sys.path
, which is a list[str]
containing the individual string paths.
Why all this complexity with packages vs classes?
Packages are oriented towards developers (like a compile-time notion). The tree is structured so that different developers can work on different parts of the code.
Classes are about runtime behavior (a runtime notion). You want inheritance to be independent of package hierarchy. Classes are only concerned with their own behavior, "what to do next", so it should be able to pull code from anywhere in the codebase. How developers organize that codebase is made possible with packages.
<script
type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"
></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({ tex2jax: {inlineMath: [['$', '$']]}, messageStyle: "none" });
</script>
- Authors
- Name
- Apurva Shah
- Website
- apurvashah.org