字符串相关处理

找到字串位置

  1. 找到字串的第一个位置:str_.find('it')
  2. 找到字串的最后一个位置:str_.rfind('it')

找极值

float('inf')
float('-inf')

Python 逗号结尾,就是返回一个tuple的意思,跟return (line,)的效果一样的。

collections 容器需要了解一下。

>>> a = '1232'
>>> a[::-1]
'2321'

  这里有所有 Python 函数的时间复杂度,用 set 比 list 在找寻元素是否存在快。

  字典判断是否存在,不在赋值为 0。

for c in a['content']:
    chars[c] = chars.get(c, 0) + 1

  字典对应映射。

id2emotion = {i:j for i,j in enumerate(emotion)}
emotion2id = {j:i for i,j in id2emotion.items()}

slice

切分数据:

slice(start, end, step)
  • start : Optional. An integer number specifying at which position to start the slicing. Default is 0
  • end: An integer number specifying at which position to end the slicing
  • step: Optional. An integer number specifying the step of the slicing. Default is 1
# example
a = ("a", "b", "c", "d", "e", "f", "g", "h")
x = slice(0, 8, 3)
print(a[x])
# ('a', 'd', 'g')

yield vs return

yield 会暂停并把值反馈给调用者,但是还能继续恢复到暂停的地方继续执行,这就允许 yield 一个值一个值的传回,不像 return 只能一下子传回一个 list,当传回的数不存整个序列到内存可以采用 yield 的方式。

# A Simple Python program to demonstrate working 
# of yield 
  
# A generator function that yields 1 for first time, 
# 2 second time and 3 third time 
def simpleGeneratorFun(): 
    yield 1
    yield 2
    yield 3
  
# Driver code to check above generator function 
for value in simpleGeneratorFun():  
    print(value) 
# 1
# 2
# 3

yield 用在 Python 的 generators 中,如果函数里用了 yield,那么这个函数也就成了 generator 函数,即用 yield 反馈数不是用 return。

# A Python program to generate squares from 1 
# to 100 using yield and therefore generator 
  
# An infinite generator function that prints 
# next square number. It starts with 1 
def nextSquare(): 
    i = 1; 
  
    # An Infinite loop to generate squares  
    while True: 
        yield i*i                 
        i += 1  # Next execution resumes  
                # from this point      
  
# Driver code to test above generator  
# function 
for num in nextSquare(): 
    if num > 100: 
         break    
    print(num) 

输出:

1
4
9
16
25
36
49
64
81
100

可更改(mutable)与不可更改(immutable)对象

在python中,string, tuple 和 number 是不可更改的对象,而list, dict等则是可以修改的对象。有什么区别呢?具体可以参考这里

声明二维的数组

# Creates a list containing 5 lists, each of 8 items, all set to 0
r, c = 5, 8;
# size of Matrix is (5, 8)
Matrix = [[0 for x in range(c)] for y in range(r)] 

函数作为参数

def divide_on_feature(X, feature_i, threshold):
    split_func = None

    # sample is a row of X, and feature_i is to locate the feature and choose the value comparing to threshold
    if isinstance(threshold, int) or isinstance(threshold, float):
        split_func = lambda sample: sample[feature_i] >= threshold
    else:
        split_func = lambda sample: sample[feature_i] == threshold

    Xy_1 = np.array([sample for sample in X if split_func(sample)])
    Xy_2 = np.array([sample for sample in X if not split_func(sample)])

    return np.array([Xy_1, Xy_2])

Python language

range vs. xrange

Python 3中没有 xrange,但是 range 表现的功能跟 Python 2中的 xrange 一样,所以如果考虑通用性,可以用 range。值得注意的是,不管是 range 还是 xrange,结果已经是对应的 len 减去了 1 的。

返回值

range 返回的是一个列表的数值,xrange 返回的是一个生成器对象(generator object),其只能用作循环取数。

内存占用

range 的内存占用远大过 xrange,因为一个返回的是列表,一个只是对象。

操作使用情况

range 返回是列表,所有的对于列表的操作都可以用在上面;xrange 返回的是 xrange 对象,列表操作不能用在其上,例如划分 x[2:5]。

速度

因为 xrange 仅仅针对生成器对象所包含的 lazy evaluation 所必须的取值,所以 xrange 在实现上会比 range 快。

Python 中的 underscore(_)的用法

单个_表示private。

is vs. == for string comparison

首先得清楚 Python 中对象包含三个基本要素,id(身份标识)、Python type()(数据类型)和 value(值)。虽然 is 和 == 都是对对象进行比较判断作用的,但是对对象比较判断的内容并不相同。

== 是 Python 标准操作符中的比较操作符,用来比较判断两个对象的 value(值)是否相等。例如下面两个字符串的比较:

>>> a = 'iplaypython.com'
>>> b = 'iplaypython.com'
>>> a == b
True

is 是同一性运算符,用来判断两个对象的唯一身份标识,也就是 id 是否相同。例如:

>>> x = y = [4,5,6]
>>> z = [4,5,6]
>>> x == y
True
>>> x == z
True
>>> x is y
True
>>> x is z
False
>>>
>>> print id(x)
3075326572
>>> print id(y)
3075326572
>>> print id(z)
3075328140

参数设置

需要调参的部分,或者需要统一管理参数的时候,可以将所有参数放到同一个文件中,通过导入的方式引入参数。

# config.py
PARAMS = {
	'f_s'   :  0.5 ,
	'N'     :  2 * 189 ,
	'M'     :  5 ,
	'p'     :  2 ,
	'gamma' :  0.002 ,
	'q'     :  19
}

# main.py
f_s = PARAMS['f_s']
N = PARAMS['N']
M = PARAMS['M']
p = PARAMS['p']
gamma = PARAMS['gamma']
q = PARAMS['q']

多个比较运算符放一起(Multiple logical comparisons on a single line for an if statement)

In [1]: a = 6

In [2]: b = 2

In [3]: c =8

In [4]: b < a > c
Out[4]: False

In [5]: b < c > a
Out[5]: True

create dict with keys

uid_name = ['LBS', 'age', 'appIdAction', 'appIdInstall', 'carrier', 'consumptionAbility', 'ct','education', 'gender', 'house', 'interest1', 'interest2', 'interest3', 'interest4','interest5', 'kw1', 'kw2', 'kw3', 'marriageStatus', 'os', 'topic1', 'topic2', 'topic3']
                 
feature_dict = dict.fromkeys(uid_name)

Ternary Operators

is_fat = True
state = "fat" if is_fat else "not fat"

Add key, value pair to dictionary

feature_dict[feature_key].update({-2: sum_not_none})
null_dict.update({feature_key: sum_none})
ge_thred.update({feature_key: sum_ge_thred})

Numpy 排序

index = np.argsort(-np.array(null_dict.values()))
print np.array(null_dict.keys())[index]

r prefix

When an r or R prefix is present, backslashes are still used to quote the following character, but all backslashes are left in the string. For example, the string literal r”\n” consists of two characters: a backslash and a lowercase `n’. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r”"” is a valid string literal consisting of two characters: a backslash and a double quote; r”" is not a value string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

不做转义,例如 \n 不具有回车的意义。


如何书写setup使得方便用

可以参考此项目,一般需要如下三个文件:

  • requirements.txt
  • setup.cfg
  • setup.py

Python Programming

List

create a list of some empty dict

dict_list = list({} for i in xrange(col_size))

append

一次给 list 添加一个新的元素,如果一次想要添加多个可以用 extend:

>>> lst = [1, 2]
>>> lst.append(3)
>>> lst.append(4)
>>> lst
[1, 2, 3, 4]

>>> lst.extend([5, 6, 7])
>>> lst.extend((8, 9, 10))
>>> lst
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Errors

ImportError: No module named datasets

加入到 Python 找寻的路径即可,当然,要记住在文件夹下建立一个 __init__.py文件,然后在文件中加入包含 datasets 文件夹的目录,注意层级关系:

import sys
sys.path.append('/Users/Bin/Dropbox/Codes/SSD-Tensorflow')

输出分割线

print (''.center(20, "*"))

输出某个文件夹下的所有文件

# list the files in the data directory
from subprocess import check_output
print(check_output(['ls', './data']).decode('utf8'))

__future__的用法

字典(Dict)

Set 集合

  无序,唯一性。Sets are significantly faster when it comes to determining if an object is present in the set (as in x in s), but are slower than lists when it comes to iterating over their contents.

Python Test

Data Structure

Stack vs queue

利用 list 就能实现:

  • Stack
    • append 来 push 一个数
    • pop 来 pop 一个数
  • Queue
    • 方案1:append 来 push 一个数,pop(0) 来 pop 一个数
    • 方案2:list.insert(0, num) 来 push 一个数,pop 来 pop 一个数

Linked List

需要自己先定义一个链表类啦!

# Definition for singly-linked list.
 class ListNode(object):
     def __init__(self, x):
         self.val = x
         self.next = None

Tree

也是需要自己定义一个树的结构:

# Definition for a binary tree node.
class TreeNode(object):
    def __init__(self, x):
        self.val = x
        self.left = None
        self.right = None

字符串中填入变量

(filename + '-' + 'M({})'+'N({})'+'fs({})'+'p({})'+'q({})' + 'diff({})'+'alpha({})').format(M, N, f_s, p, q, diff, alpha)

还有用 % 的方式:

print('step %d, training accuracy %g' % (i, train_accuracy))

UTF-8 编码

# coding: utf8

运行 python 文件时附加参数

The code should look like this:

import sys
def hello(a, b):
    print "hello and that's your sum:"
    sum = a+b
    print sum

if __name__== "__main__":
    hello(int(sys.argv[1]), int(sys.argv[2]))

Then run the code with this command:

python hello.py 1 1

深拷贝与浅拷贝

  • 直接复制:其实是对象的引用
  • 浅拷贝:拷贝父对象,不会拷贝对象的内部子对象
  • 深拷贝:copy 模块的 deepcopy 方法完全拷贝了父对象机器子对象

跨文件夹 Import 文件

  参考后发现主要还是用 sys 的方式。

配置文件的设置

  配置文件也是用 py 文件:

KEEP_PROB = 0.5
LEARNING_RATE = 1e-5
BATCH_SIZE =50
PARAMETER_FILE = "checkpoint/variable.ckpt"
MAX_ITER = 50000

  调用的时候:

import config as cfg
cfg.KEEP_PROB

References

  1. Understanding the underscore( _ ) of Python
  2. Writing tests
  3. Python 直接赋值、浅拷贝和深度拷贝解析