Bootstrap Colorpicker is a fancy and customizable colorpicker plugin for Bootstrap.
by via jQuery-Plugins.net RSS Feed
"Mr Branding" is a blog based on RSS for everything related to website branding and website design, it collects its posts from many sites in order to facilitate the updating to the latest technology.
To suggest any source, please contact me: Taha.baba@consultant.com
Bootstrap Colorpicker is a fancy and customizable colorpicker plugin for Bootstrap.
The web industry moves at a blistering pace and it can often feel like it's difficult to keep up. This is especially true of JavaScript land where frameworks are going in and out of fashion all the time, each with their own way of accomplishing the same basic tasks. So how do you keep your skill set relevant? Over on the main site, we published a great article for those of you that do (or are looking to do) side projects. It's full of resources to inspire you and plenty of tips for keeping learning fun.
But, you know, side projects aren't for everybody, right? So today I'd like to add a further tip for the list — start answering programming questions. If that makes you think of Stack Overflow, you're forgiven :) Stack Overflow is indeed a great place to ask and answer programming questions, but its flaws are well documented and it's not for everybody. Instead, I'd like to suggest an alternative — SitePoint forums.
Continue reading %Editorial: How Do You Keep Your Skill Set Relevant?%
PHP has an SSH2 library which provides access to resources (shell, remote exec, tunneling, file transfer) on a remote machine using a secure cryptographic transport. Objectively, it is a tedious and highly frustrating task for a developer to implement it due to its overwhelming configuration options and complex API with little documentation.
The phpseclib (PHP Secure Communications Library) package has a developer friendly API. It uses some optional PHP extensions if they're available and falls back on an internal PHP implementation otherwise. To use this package, you don't need any non-default PHP extensions installed.
composer require phpseclib/phpseclib
This will install the most recent stable version of the library via Composer.
Before diving in blindly, I'd like to list some use-cases appropriate for using this library:
Using phpseclib, you can connect to your remote server with any of the following authentication methods:
Continue reading %Phpseclib: Securely Communicating with Remote Servers via PHP%
All the different programming languages out there seem to be a better fit for machine learning tasks than Ruby, right? Python has scikit-learn, Java has Weka, and there’s Shogun for machine learning in C++, just to name a few. On the other hand, Ruby has an excellent reputation for fast prototyping. So, why shouldn’t you […]
Continue reading %Creating Machine Learning Systems with JRuby%
g9 is a javascript library for creating automatically interactive graphics. With g9, interactive visualization is as easy as visualization that isn't. Just write a function which draws shapes based on data, and g9 will automatically figure out how to manipulate that data when you drag the shapes around.
You might be wondering about the term Zipf distribution. To understand what we mean by this term, we need to define Zipf's law first. Don't worry, I'll keep everything simple.
Zipf's law simply states that given some corpus (large and structured set of texts) of natural language utterances, the occurrence of the most frequent word will be approximately twice as often as the second most frequent word, three times as the third most frequent word, four times as the fourth most frequent word, and so forth.
Let's look at an example of that. If you look into the Brown Corpus of American English, you will notice that the most frequent word is the (69,971 occurrences). If we look into the second most frequent word, that is of, we will notice that it occurs 36,411 times.
The word the accounts for around 7% of the Brown Corpus words (69,971 of slightly over 1 million words). If we come to the word of, we will notice that it accounts for around 3.6% of the corpus (around half of the). Thus, we can notice that Zipf's law applies to this situation.
Thus, Zipf's law is trying to tell us that a small number of items usually account for the bulk of activities we observe. For instance, a small number of diseases (cancer, cardiovascular diseases) account for the bulk of deaths. This also applies to words that account for the bulk of all word occurrences in literature, and many other examples in our lives.
Before moving forward, let me refer you to the data we will be using to experiment with in our tutorial. Our data this time will be from the National Library of Medicine. We will be downloading what's called a MeSH (Medical Subject Heading) ASCII file, from here. In particular, d2016.bin (28 MB).
I will not go into detail in describing this file since it is beyond the scope of this tutorial, and we just need it to experiment with our code.
After you have downloaded the data in the above section, let's now start building our Python script that will find the Zipf's distribution of the data in d2016.bin
.
The first normal step to perform is to open
the file:
open_file = open('d2016.bin', 'r')
In order to carry out the necessary operations on the bin
file, we need to load the file in a string variable. This can be simply achieved using the read()
function, as follows:
file_to_string = open_file.read()
Since we will be looking for some pattern (i.e. words), regular expressions come into play. We will thus be making use of Python's re
module.
At this point we have already read the bin
file and loaded its content in a string variable. Finding the Zipf's distribution means finding the frequency of occurrence of words in the bin
file. The regular expression will thus be used to locate the words in the file.
The method we will be using to make such a match is the findall()
method. As mentioned in the re
module documentation about findall()
, the method will:
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
What we want to do is write a regular expression that will locate all the individual words in the text string variable. The regular expression that can perform this task is:
\b[A-Za-z][a-z]{2,10}\b
where \b
is an anchor for word boundaries. In Python, this can be represented as follows:
words = re.findall(r'(\b[A-Za-z][a-z]{2,9}\b)', file_to_string)
This regular expression is basically telling us to find all the words that start with a letter (upper-case or lower-case) and followed by a sequence of letters which consist of at least 2
characters and no more than 9
characters. In other words, the size of the words that will be included in the output will range from 3
to 10
characters long.
We can now run a loop which aims at calculating the frequency of occurrence of each word:
for word in words: count = frequency.get(word,0) frequency[word] = count + 1
Here, if the word is not found yet in the list of words, instead of raising a KeyError
, the default value 0
is returned. Otherwise, count is incremented by 1
, representing the number of times the word has occurred in the list so far.
Finally, we will print the key-value pair of the dictionary, showing the word (key) and the number of times it appeared in the list (value):
for key, value in reversed(sorted(frequency.items(), key = itemgetter(1))): print key, value
This part sorted(frequency.items(), key = itemgetter(1))
sorts the output by value in ascending order, that is, it shows the words from the least frequent occurrence to the most frequent occurrence. In order to list the most frequent words at the beginning, we use the reversed()
method.
After going through the different building blocks of the program, let's see how it all looks together:
import re from operator import itemgetter frequency = {} open_file = open('d2016.bin', 'r') file_to_string = open_file.read() words = re.findall(r'(\b[A-Za-z][a-z]{2,9}\b)', file_to_string) for word in words: count = frequency.get(word,0) frequency[word] = count + 1 for key, value in reversed(sorted(frequency.items(), key = itemgetter(1))): print key, value
I will show here the first ten words and their frequencies returned by the program:
the 42602 abcdef 31913 and 30699 abbcdef 27016 was 17430 see 16189 with 14380 under 13127 for 9767 abcdefv 8694
From this Zipf distribution, we can validate Zipf's law in that some words (high-frequency words) represent the bulk of words, such as we can see above the
, and
, was
, for
. This also applies to the sequences abcdef
, abbcdef
, and abcdefv
which are highly frequent letter sequences that have some meaning particular to this file.
In this tutorial, we have seen how Python makes it easy to work with statistical concepts such as Zipf's law. Python comes in very handy in particular when working with large text files, which would require a lot of time and effort if we were to find Zipf's distribution manually. As we saw, we were able to quickly load, parse, and find the Zipf's distribution of a file of size 28 MB. Let alone the simplicity in sorting the output thanks to Python's dictionaries.