Mr Branding

Monday, October 3, 2016

Bootstrap Colorpicker – Customizable Colorpicker for Bootstrap

Bootstrap Colorpicker is a fancy and customizable colorpicker plugin for Bootstrap.

by via jQuery-Plugins.net RSS Feed

Editorial: How Do You Keep Your Skill Set Relevant?

The web industry moves at a blistering pace and it can often feel like it's difficult to keep up. This is especially true of JavaScript land where frameworks are going in and out of fashion all the time, each with their own way of accomplishing the same basic tasks. So how do you keep your skill set relevant? Over on the main site, we published a great article for those of you that do (or are looking to do) side projects. It's full of resources to inspire you and plenty of tips for keeping learning fun.

But, you know, side projects aren't for everybody, right? So today I'd like to add a further tip for the list — start answering programming questions. If that makes you think of Stack Overflow, you're forgiven :) Stack Overflow is indeed a great place to ask and answer programming questions, but its flaws are well documented and it's not for everybody. Instead, I'd like to suggest an alternative — SitePoint forums.

Continue reading %Editorial: How Do You Keep Your Skill Set Relevant?%

by James Hibbard via SitePoint

Phpseclib: Securely Communicating with Remote Servers via PHP

PHP has an SSH2 library which provides access to resources (shell, remote exec, tunneling, file transfer) on a remote machine using a secure cryptographic transport. Objectively, it is a tedious and highly frustrating task for a developer to implement it due to its overwhelming configuration options and complex API with little documentation.

Connection between client and server

The phpseclib (PHP Secure Communications Library) package has a developer friendly API. It uses some optional PHP extensions if they're available and falls back on an internal PHP implementation otherwise. To use this package, you don't need any non-default PHP extensions installed.

Installation

composer require phpseclib/phpseclib

This will install the most recent stable version of the library via Composer.

Use-cases

Before diving in blindly, I'd like to list some use-cases appropriate for using this library:

Executing deployment scripts on a remote server
Downloading and uploading files via SFTP
Generating SSH keys dynamically in an application
Displaying live output for remote commands executed on a server
Testing an SSH or SFTP connection

Connecting to the Remote Server

Using phpseclib, you can connect to your remote server with any of the following authentication methods:

RSA key
Password Protected RSA key
Username and Password (Not recommended)

Continue reading %Phpseclib: Securely Communicating with Remote Servers via PHP%

by Viraj Khatavkar via SitePoint

Animate Your React Native App

Creating Machine Learning Systems with JRuby

All the different programming languages out there seem to be a better fit for machine learning tasks than Ruby, right? Python has scikit-learn, Java has Weka, and there’s Shogun for machine learning in C++, just to name a few. On the other hand, Ruby has an excellent reputation for fast prototyping. So, why shouldn’t you […]

Continue reading %Creating Machine Learning Systems with JRuby%

by Paul Götze via SitePoint

g9.js – Automatically Interactive Graphics for the Web

g9 is a javascript library for creating automatically interactive graphics. With g9, interactive visualization is as easy as visualization that isn't. Just write a function which draws shapes based on data, and g9 will automatically figure out how to manipulate that data when you drag the shapes around.

by via jQuery-Plugins.net RSS Feed

How to Use Python to Find the Zipf Distribution of a Text File

You might be wondering about the term Zipf distribution. To understand what we mean by this term, we need to define Zipf's law first. Don't worry, I'll keep everything simple.

Zipf's Law

Zipf's law simply states that given some corpus (large and structured set of texts) of natural language utterances, the occurrence of the most frequent word will be approximately twice as often as the second most frequent word, three times as the third most frequent word, four times as the fourth most frequent word, and so forth.

Let's look at an example of that. If you look into the Brown Corpus of American English, you will notice that the most frequent word is the (69,971 occurrences). If we look into the second most frequent word, that is of, we will notice that it occurs 36,411 times.

The word the accounts for around 7% of the Brown Corpus words (69,971 of slightly over 1 million words). If we come to the word of, we will notice that it accounts for around 3.6% of the corpus (around half of the). Thus, we can notice that Zipf's law applies to this situation.

Thus, Zipf's law is trying to tell us that a small number of items usually account for the bulk of activities we observe. For instance, a small number of diseases (cancer, cardiovascular diseases) account for the bulk of deaths. This also applies to words that account for the bulk of all word occurrences in literature, and many other examples in our lives.

Data Preparation

Before moving forward, let me refer you to the data we will be using to experiment with in our tutorial. Our data this time will be from the National Library of Medicine. We will be downloading what's called a MeSH (Medical Subject Heading) ASCII file, from here. In particular, d2016.bin (28 MB).

I will not go into detail in describing this file since it is beyond the scope of this tutorial, and we just need it to experiment with our code.

Building the Program

After you have downloaded the data in the above section, let's now start building our Python script that will find the Zipf's distribution of the data in d2016.bin.

The first normal step to perform is to open the file:

open_file = open('d2016.bin', 'r')

In order to carry out the necessary operations on the bin file, we need to load the file in a string variable. This can be simply achieved using the read() function, as follows:

file_to_string = open_file.read()

Since we will be looking for some pattern (i.e. words), regular expressions come into play. We will thus be making use of Python's re module.

At this point we have already read the bin file and loaded its content in a string variable. Finding the Zipf's distribution means finding the frequency of occurrence of words in the bin file. The regular expression will thus be used to locate the words in the file.

The method we will be using to make such a match is the findall() method. As mentioned in the re module documentation about findall(), the method will:

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

What we want to do is write a regular expression that will locate all the individual words in the text string variable. The regular expression that can perform this task is:

\b[A-Za-z][a-z]{2,10}\b

where \b is an anchor for word boundaries. In Python, this can be represented as follows:

words = re.findall(r'(\b[A-Za-z][a-z]{2,9}\b)', file_to_string)

This regular expression is basically telling us to find all the words that start with a letter (upper-case or lower-case) and followed by a sequence of letters which consist of at least 2 characters and no more than 9 characters. In other words, the size of the words that will be included in the output will range from 3 to 10 characters long.

We can now run a loop which aims at calculating the frequency of occurrence of each word:

for word in words:
    count = frequency.get(word,0)
    frequency[word] = count + 1

Here, if the word is not found yet in the list of words, instead of raising a KeyError, the default value 0 is returned. Otherwise, count is incremented by 1, representing the number of times the word has occurred in the list so far.

Finally, we will print the key-value pair of the dictionary, showing the word (key) and the number of times it appeared in the list (value):

for key, value in reversed(sorted(frequency.items(), key = itemgetter(1))):
    print key, value

This part sorted(frequency.items(), key = itemgetter(1)) sorts the output by value in ascending order, that is, it shows the words from the least frequent occurrence to the most frequent occurrence. In order to list the most frequent words at the beginning, we use the reversed() method.

Putting It All Together

After going through the different building blocks of the program, let's see how it all looks together:

import re
from operator import itemgetter    

frequency = {}
open_file = open('d2016.bin', 'r')
file_to_string = open_file.read()
words = re.findall(r'(\b[A-Za-z][a-z]{2,9}\b)', file_to_string)

for word in words:
    count = frequency.get(word,0)
    frequency[word] = count + 1
    
for key, value in reversed(sorted(frequency.items(), key = itemgetter(1))):
    print key, value

I will show here the first ten words and their frequencies returned by the program:

the 42602
abcdef 31913
and 30699
abbcdef 27016
was 17430
see 16189
with 14380
under 13127
for 9767
abcdefv 8694

From this Zipf distribution, we can validate Zipf's law in that some words (high-frequency words) represent the bulk of words, such as we can see above the, and, was, for. This also applies to the sequences abcdef, abbcdef, and abcdefv which are highly frequent letter sequences that have some meaning particular to this file.

Conclusion

In this tutorial, we have seen how Python makes it easy to work with statistical concepts such as Zipf's law. Python comes in very handy in particular when working with large text files, which would require a lot of time and effort if we were to find Zipf's distribution manually. As we saw, we were able to quickly load, parse, and find the Zipf's distribution of a file of size 28 MB. Let alone the simplicity in sorting the output thanks to Python's dictionaries.

by Abder-Rahman Ali via Envato Tuts+ Code