1. 参数
import sys
print sys.argv[0] ## python 脚本名
print sys.argv[1] ## 第一个参数
print sys.argv ###参数数组
print len(sys.argv) ##参数个数
文件操作
# 打开文件
with open(regular_device_file, 'r') as rdf:
lines = rdf.readlines()
for line in lines:
line = line.strip().split('|')
# 存储csv文件
import csv
with open('train_y.csv', 'wb') as train_y:
wr = csv.writer(train_y, quoting=csv.QUOTE_ALL)
for line in y:
wr.writerow(line)
动态for实现在各个
2. 字符串
2.1 找到字串位置
- 找到字串的第一个位置:
str_.find('it')
- 找到字串的最后一个位置:
str_.rfind('it')
找极值
float('inf')
float('-inf')
Python 逗号结尾,就是返回一个tuple的意思,跟return (line,)的效果一样的。
collections 容器需要了解一下。
>>> a = '1232'
>>> a[::-1]
'2321'
这里有所有 Python 函数的时间复杂度,用 set 比 list 在找寻元素是否存在快。
字典判断是否存在,不在赋值为 0。
for c in a['content']:
chars[c] = chars.get(c, 0) + 1
字典对应映射。
id2emotion = {i:j for i,j in enumerate(emotion)}
emotion2id = {j:i for i,j in id2emotion.items()}
slice
切分数据:
slice(start, end, step)
- start : Optional. An integer number specifying at which position to start the slicing. Default is 0
- end: An integer number specifying at which position to end the slicing
- step: Optional. An integer number specifying the step of the slicing. Default is 1
# example
a = ("a", "b", "c", "d", "e", "f", "g", "h")
x = slice(0, 8, 3)
print(a[x])
# ('a', 'd', 'g')
yield vs return
yield 会暂停并把值反馈给调用者,但是还能继续恢复到暂停的地方继续执行,这就允许 yield 一个值一个值的传回,不像 return 只能一下子传回一个 list,当传回的数不存整个序列到内存可以采用 yield 的方式。
# A Simple Python program to demonstrate working
# of yield
# A generator function that yields 1 for first time,
# 2 second time and 3 third time
def simpleGeneratorFun():
yield 1
yield 2
yield 3
# Driver code to check above generator function
for value in simpleGeneratorFun():
print(value)
# 1
# 2
# 3
yield 用在 Python 的 generators 中,如果函数里用了 yield,那么这个函数也就成了 generator 函数,即用 yield 反馈数不是用 return。
# A Python program to generate squares from 1
# to 100 using yield and therefore generator
# An infinite generator function that prints
# next square number. It starts with 1
def nextSquare():
i = 1;
# An Infinite loop to generate squares
while True:
yield i*i
i += 1 # Next execution resumes
# from this point
# Driver code to test above generator
# function
for num in nextSquare():
if num > 100:
break
print(num)
输出:
1
4
9
16
25
36
49
64
81
100
可更改(mutable)与不可更改(immutable)对象
在python中,string, tuple 和 number 是不可更改的对象,而list, dict等则是可以修改的对象。有什么区别呢?具体可以参考这里。
声明二维的数组
# Creates a list containing 5 lists, each of 8 items, all set to 0
r, c = 5, 8;
# size of Matrix is (5, 8)
Matrix = [[0 for x in range(c)] for y in range(r)]
函数作为参数
def divide_on_feature(X, feature_i, threshold):
split_func = None
# sample is a row of X, and feature_i is to locate the feature and choose the value comparing to threshold
if isinstance(threshold, int) or isinstance(threshold, float):
split_func = lambda sample: sample[feature_i] >= threshold
else:
split_func = lambda sample: sample[feature_i] == threshold
Xy_1 = np.array([sample for sample in X if split_func(sample)])
Xy_2 = np.array([sample for sample in X if not split_func(sample)])
return np.array([Xy_1, Xy_2])
Python language
range vs. xrange
Python 3中没有 xrange,但是 range 表现的功能跟 Python 2中的 xrange 一样,所以如果考虑通用性,可以用 range。值得注意的是,不管是 range 还是 xrange,结果已经是对应的 len 减去了 1 的。
返回值
range 返回的是一个列表的数值,xrange 返回的是一个生成器对象(generator object),其只能用作循环取数。
内存占用
range 的内存占用远大过 xrange,因为一个返回的是列表,一个只是对象。
操作使用情况
range 返回是列表,所有的对于列表的操作都可以用在上面;xrange 返回的是 xrange 对象,列表操作不能用在其上,例如划分 x[2:5]。
速度
因为 xrange 仅仅针对生成器对象所包含的 lazy evaluation 所必须的取值,所以 xrange 在实现上会比 range 快。
Python 中的 underscore(_)的用法
单个_
表示private。
is vs. == for string comparison
首先得清楚 Python 中对象包含三个基本要素,id(身份标识)、Python type()(数据类型)和 value(值)。虽然 is 和 == 都是对对象进行比较判断作用的,但是对对象比较判断的内容并不相同。
== 是 Python 标准操作符中的比较操作符,用来比较判断两个对象的 value(值)是否相等。例如下面两个字符串的比较:
>>> a = 'iplaypython.com'
>>> b = 'iplaypython.com'
>>> a == b
True
is 是同一性运算符,用来判断两个对象的唯一身份标识,也就是 id 是否相同。例如:
>>> x = y = [4,5,6]
>>> z = [4,5,6]
>>> x == y
True
>>> x == z
True
>>> x is y
True
>>> x is z
False
>>>
>>> print id(x)
3075326572
>>> print id(y)
3075326572
>>> print id(z)
3075328140
参数设置
需要调参的部分,或者需要统一管理参数的时候,可以将所有参数放到同一个文件中,通过导入的方式引入参数。
# config.py
PARAMS = {
'f_s' : 0.5 ,
'N' : 2 * 189 ,
'M' : 5 ,
'p' : 2 ,
'gamma' : 0.002 ,
'q' : 19
}
# main.py
f_s = PARAMS['f_s']
N = PARAMS['N']
M = PARAMS['M']
p = PARAMS['p']
gamma = PARAMS['gamma']
q = PARAMS['q']
多个比较运算符放一起(Multiple logical comparisons on a single line for an if statement)
In [1]: a = 6
In [2]: b = 2
In [3]: c =8
In [4]: b < a > c
Out[4]: False
In [5]: b < c > a
Out[5]: True
create dict with keys
uid_name = ['LBS', 'age', 'appIdAction', 'appIdInstall', 'carrier', 'consumptionAbility', 'ct','education', 'gender', 'house', 'interest1', 'interest2', 'interest3', 'interest4','interest5', 'kw1', 'kw2', 'kw3', 'marriageStatus', 'os', 'topic1', 'topic2', 'topic3']
feature_dict = dict.fromkeys(uid_name)
Ternary Operators
is_fat = True
state = "fat" if is_fat else "not fat"
Add key, value pair to dictionary
feature_dict[feature_key].update({-2: sum_not_none})
null_dict.update({feature_key: sum_none})
ge_thred.update({feature_key: sum_ge_thred})
Numpy 排序
index = np.argsort(-np.array(null_dict.values()))
print np.array(null_dict.keys())[index]
r prefix
When an r
or R
prefix is present, backslashes are still used to quote the following character, but all backslashes are left in the string. For example, the string literal r”\n” consists of two characters: a backslash and a lowercase `n’. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r”"” is a valid string literal consisting of two characters: a backslash and a double quote; r”" is not a value string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.
不做转义,例如 \n 不具有回车的意义。
如何书写setup使得方便用
可以参考此项目,一般需要如下三个文件:
- requirements.txt
- setup.cfg
- setup.py
Python Programming
List
create a list of some empty dict
dict_list = list({} for i in xrange(col_size))
append
一次给 list 添加一个新的元素,如果一次想要添加多个可以用 extend:
>>> lst = [1, 2]
>>> lst.append(3)
>>> lst.append(4)
>>> lst
[1, 2, 3, 4]
>>> lst.extend([5, 6, 7])
>>> lst.extend((8, 9, 10))
>>> lst
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Errors
ImportError: No module named datasets
加入到 Python 找寻的路径即可,当然,要记住在文件夹下建立一个 __init__.py
文件,然后在文件中加入包含 datasets 文件夹的目录,注意层级关系:
import sys
sys.path.append('/Users/Bin/Dropbox/Codes/SSD-Tensorflow')
Print trick
输出分割线
print (''.center(20, "*"))
输出某个文件夹下的所有文件
# list the files in the data directory
from subprocess import check_output
print(check_output(['ls', './data']).decode('utf8'))
__future__
的用法
字典(Dict)
Set 集合
无序,唯一性。Sets are significantly faster when it comes to determining if an object is present in the set (as in x in s), but are slower than lists when it comes to iterating over their contents.
Python Test
Data Structure
Stack vs queue
利用 list 就能实现:
- Stack
- append 来 push 一个数
- pop 来 pop 一个数
- Queue
- 方案1:append 来 push 一个数,pop(0) 来 pop 一个数
- 方案2:list.insert(0, num) 来 push 一个数,pop 来 pop 一个数
Linked List
需要自己先定义一个链表类啦!
# Definition for singly-linked list.
class ListNode(object):
def __init__(self, x):
self.val = x
self.next = None
Tree
也是需要自己定义一个树的结构:
# Definition for a binary tree node.
class TreeNode(object):
def __init__(self, x):
self.val = x
self.left = None
self.right = None
字符串中填入变量
(filename + '-' + 'M({})'+'N({})'+'fs({})'+'p({})'+'q({})' + 'diff({})'+'alpha({})').format(M, N, f_s, p, q, diff, alpha)
还有用 % 的方式:
print('step %d, training accuracy %g' % (i, train_accuracy))
UTF-8 编码
# coding: utf8
运行 python 文件时附加参数
The code should look like this:
import sys
def hello(a, b):
print "hello and that's your sum:"
sum = a+b
print sum
if __name__== "__main__":
hello(int(sys.argv[1]), int(sys.argv[2]))
Then run the code with this command:
python hello.py 1 1
深拷贝与浅拷贝
- 直接复制:其实是对象的引用
- 浅拷贝:拷贝父对象,不会拷贝对象的内部子对象
- 深拷贝:copy 模块的 deepcopy 方法完全拷贝了父对象机器子对象
跨文件夹 Import 文件
参考后发现主要还是用 sys 的方式。
配置文件的设置
配置文件也是用 py 文件:
KEEP_PROB = 0.5
LEARNING_RATE = 1e-5
BATCH_SIZE =50
PARAMETER_FILE = "checkpoint/variable.ckpt"
MAX_ITER = 50000
调用的时候:
import config as cfg
cfg.KEEP_PROB