How to Tune Algorithm Parameters with Scikit-Learn

by Jason Brownlee on July 16, 2014 in Python Machine Learning Machine learning models are parameterized so that their behavior can be tuned for a given problem. Models can have many parameters and finding the best combination of parameters can be treated as a search problem. In this post you will discover how to tune the parameters of machine learning algorithms in Python using the scikit-learn library. Machine Learning Algorithm Parameters Algorithm tuning is a final step in the process of applied machine learning before presenting results. It is sometimes called Hyperparameter optimization where the algorithm parameters are referred to … Continue reading How to Tune Algorithm Parameters with Scikit-Learn

SQL syntax that you should pay attention to

The article will mainly talk about some pitfalls or important command of general SQL, just mark them and for future reference. Most of the content comes from SQL tutorial .   1. SQL Injection SQL injection is a technique where malicious users can inject SQL commands into an SQL statement, via web page input. Injected SQL commands can alter SQL statement and compromise the security of a web application. SQL Injection Based on 1=1 is Always True Look at the example above, one more time. Let’s say that the original purpose of the code was to create an SQL statement to … Continue reading SQL syntax that you should pay attention to

Difference between process and thread

Though, multiprocess and multithread are all accessible ways to do parallel computing and big data problems, there are some differences between these two terms and I will list some reference and thoughts about this topic. Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads … Continue reading Difference between process and thread

Fast way to do multi url parsing

Since opening URL sequentially, especially hundreds of URL is very slow, it is a perfect case to implement parallel computing. There are two biggest components that determine the speed of this task: “opening URL” and “read the context from the website”. So, I will briefly talk about the fast way to do the multi-URL parsing. First, we should try multithreading/multiprocessing packages. Currently, the three popular ones are multiprocessing;concurrent.futures and threading. Those packages could help us to open multi url at the same time, which could increase the speed. More importantly, after using multithread processing, and if you try to open hundreds of … Continue reading Fast way to do multi url parsing

My daily working log

I will list some packages or knowledge during my internship, just for reference and casual discussion. Python Time Scheduler Scheduler( Perfectly solving the problem of auto email sending Auto Email Sending Google Cloud ( DAG Structure Airflow( Word Comparision FuzzyWuzzy ( Difflib ( Difflib provides three types of comparison method, regarding the speed. Graphical Viewer snakeviz ( I likt this, it is based on CPython and very powerful on jupyter/?Ipython. Multithreading and Parallel Computing threading ( concurrent.future ( multiprocessing () multiprocessing.Pool cannot return value during the process multiprocessing.Process can use pipe() or queue() to make process communicate with each other multiprocessing.ThreadPool can … Continue reading My daily working log

A Complete Tutorial on Ridge and Lasso Regression in Python

AARSHAY JAIN , JANUARY 28, 2016 / 39 Introduction When we talk about Regression, we often end up discussing Linear and Logistics Regression. But, that’s not the end. Do you know there are 7 types of Regressions ? Linear and logistic regression is just the most loved members from the family of regressions.  Last week, I saw a recorded talk at NYC Data Science Academy fromOwen Zhang, current Kaggle rank 3  and Chief Product Officer at DataRobot. He said, ‘if you are using regression without regularization, you have to be very special!’. I hope you get what a person of his stature referred to. … Continue reading A Complete Tutorial on Ridge and Lasso Regression in Python

Morris Traversal

Morris Traversal方法遍历二叉树(非递归,不用栈,O(1)空间) 本文主要解决一个问题,如何实现二叉树的前中后序遍历,有两个要求: 1. O(1)空间复杂度,即只能使用常数空间; 2. 二叉树的形状不能被破坏(中间过程允许改变其形状)。 通常,实现二叉树的前序(preorder)、中序(inorder)、后序(postorder)遍历有两个常用的方法:一是递归(recursive),二是使用栈实现的迭代版本(stack+iterative)。这两种方法都是O(n)的空间复杂度(递归本身占用stack空间或者用户自定义的stack),所以不满足要求。(用这两种方法实现的中序遍历实现可以参考这里。) Morris Traversal方法可以做到这两点,与前两种方法的不同在于该方法只需要O(1)空间,而且同样可以在O(n)时间内完成。 要使用O(1)空间进行遍历,最大的难点在于,遍历到子节点的时候怎样重新返回到父节点(假设节点中没有指向父节点的p指针),由于不能用栈作为辅助空间。为了解决这个问题,Morris方法用到了线索二叉树(threaded binary tree)的概念。在Morris方法中不需要为每个节点额外分配指针指向其前驱(predecessor)和后继节点(successor),只需要利用叶子节点中的左右空指针指向某种顺序遍历下的前驱节点或后继节点就可以了。 Morris只提供了中序遍历的方法,在中序遍历的基础上稍加修改可以实现前序,而后续就要再费点心思了。所以先从中序开始介绍。 首先定义在这篇文章中使用的二叉树节点结构,即由val,left和right组成: 1 struct TreeNode { 2 int val; 3 TreeNode *left; 4 TreeNode *right; 5 TreeNode(int x) : val(x), left(NULL), right(NULL) {} 6 }; 一、中序遍历 步骤: 1. 如果当前节点的左孩子为空,则输出当前节点并将其右孩子作为当前节点。 2. 如果当前节点的左孩子不为空,在当前节点的左子树中找到当前节点在中序遍历下的前驱节点。 a) 如果前驱节点的右孩子为空,将它的右孩子设置为当前节点。当前节点更新为当前节点的左孩子。 b) 如果前驱节点的右孩子为当前节点,将它的右孩子重新设为空(恢复树的形状)。输出当前节点。当前节点更新为当前节点的右孩子。 3. 重复以上1、2直到当前节点为空。 图示: 下图为每一步迭代的结果(从左至右,从上到下),cur代表当前节点,深色节点表示该节点已输出。 代码: 1 void inorderMorrisTraversal(TreeNode *root) { 2 TreeNode *cur = root, *prev = NULL; 3 while (cur != NULL) 4 { 5 if (cur->left == NULL) // 1. 6 { 7 printf(“%d “, cur->val); 8 cur = cur->right; 9 } 10 else 11 { 12 // find predecessor 13 … Continue reading Morris Traversal