Friday, May 26, 2017

How to Build a Twitter Follower-Farmer Detection App with RestDB

Twitter birds

This article was sponsored by RestDB. Thank you for supporting the partners who make SitePoint possible.

Are you active on Twitter? If so, do you often wonder why some accounts seem to follow you only to unfollow you moments (or days) later? It's probably not something you said - they're just follower farming.

Follower farming is a known social media hack taking advantage of people who "#followback" as soon as someone follows them. The big brands, celebs, and wannabe celebs take advantage of this, as it keeps their followers count high but following count low, in turn making them look popular.

Outline of twitter logo

In this post, we'll build an app which lets you log in via Twitter, grabs your followers, and compares the last fetched follower list with a refreshed list in order to identify the new unfollowers and calculate the duration of their follow, potentially auto-identifying the farmers.

Bootstrapping

As usual, we'll be using Homestead Improved for a high quality local environment setup. Feel free to use your own setup instead if you've got one you feel comfortable in.

git clone http://ift.tt/1Lhem4x hi_followfarmers
cd hi_followfarmers
bin/folderfix.sh
vagrant up; vagrant ssh

Once the VM has been provisioned and we find ourselves inside it, let's bootstrap a Laravel app.

composer create-project --prefer-dist laravel/laravel Code/Project
cd Code/Project

Logging in with Twitter

To make logging in with Twitter possible, we'll use the Socialite package.

composer require laravel/socialite

As per instructions, we should also register it in config/app.php:

'providers' => [
    // Other service providers...

    Laravel\Socialite\SocialiteServiceProvider::class,
],

'Socialite' => Laravel\Socialite\Facades\Socialite::class,

Finally, we need to register a new Twitter app at http://apps.twitter.com/app/new...

Registering a new Twitter app

... and add the secret credentials into config/services.php:

    'twitter' => [
        'client_id' => env('TWITTER_CLIENT_ID'),
        'client_secret' => env('TWITTER_CLIENT_SECRET'),
        'redirect' => env('TWITTER_CALLBACK_URL'),
    ],

Naturally, we need to add these environment variables into the .env file in the root of the project:

TWITTER_CLIENT_ID=keykeykeykeykeykeykeykeykey
TWITTER_CLIENT_SECRET=secretsecretsecret
TWITTER_CALLBACK_URL=http://ift.tt/2s50Eyx

We need to add some Login routes into routes/web.php next:

Route::get('auth/twitter', 'Auth\LoginController@redirectToProvider');
Route::get('auth/twitter/callback', 'Auth\LoginController@handleProviderCallback');

Finally, let's add the methods these routes refer to into the LoginController class inside app/Http/Controllers/Auth:

    /**
     * Redirect the user to the GitHub authentication page.
     *
     * @return Response
     */
    public function redirectToProvider()
    {
        return Socialite::driver('twitter')->redirect();
    }

    /**
     * Obtain the user information from GitHub.
     *
     * @return Response
     */
    public function handleProviderCallback()
    {
        $user = Socialite::driver('twitter')->user();

        dd($user);
    }

The dd($user); is there to easily test if the authentication went well, and sure enough, if you visit /auth/twitter, you should be able to authorize the app and see the basic information about your account on screen:

Basic Twitter User Information

Follower Lists

There are many ways of getting an account's follower list, and none of them pleasant.

Twitter Still Hates Developers

Ever since Twitter's Great War on Developers (spoiler: very little has changed since that article came out), it's been an outright nightmare to fetch full lists of people's followers. In fact, the API rate limits are so low that people have resorted to third party data aggregators for actually buying that data, or even scraping the followers page. We'll go the "white hat" route and suffer through their API, but if you have other means of getting followers, feel free to use that instead of the method outlined below.

The Twitter API offers the /followers/list endpoint, but as that one only returns 20 followers per call at most, and only allows 15 requests per 15 minutes, we would be able to, at most, extract 1200 followers per hour - unacceptable. Instead, we'll use the followers/ids endpoint to fetch 5000 IDs at a time. This is subject to the same limit of 15 calls per 15 minutes, but gives us much more breathing room.

It's important to keep in mind that ID != Twitter handle. IDs are numeric values representing a unique account across time, even across different handles. So for each unfollower's ID, we'll have to make an additional API call to find out who they were (the Users Lookup Bulk API will come in handy).

Basic API Communication

Socialite is only useful for logging in. Actually communicating with the API is less straightforward. Given that Laravel comes with Guzzle pre-installed, installing Guzzle's Oauth Subscriber (which lets us use Guzzle with the Oauth1 protocol) is the simplest solution:

composer require guzzlehttp/oauth-subscriber

Once that's in there, we can update our LoginController::handleProviderCallback method to test things out:

    public function handleProviderCallback()
    {
        $user = Socialite::driver('twitter')->user();

        $stack = HandlerStack::create();

        $middleware = new Oauth1([
            'consumer_key' => getenv('TWITTER_CLIENT_ID'),
            'consumer_secret' => getenv('TWITTER_CLIENT_SECRET'),
            'token' => $user->token,
            'token_secret' => $user->tokenSecret
        ]);

        $stack->push($middleware);

        $client = new Client([
            'base_uri' => 'https://api.twitter.com/1.1/',
            'handler' => $stack,
            'auth' => 'oauth'
        ]);

        $response = $client->get('followers/ids.json', [
            'query' => [
                'cursor' => '-1',
                'screen_name' => $user->nickname,
                'count' => 5000
            ]
        ]);

        dd($response->getBody()->getContents());
    }

In the above code, we first create a middleware stack which will chew through the request, pull it through all the middlewares, and output the final version. We can push other middlewares into this stack, but for now, we only need the Oauth1 one.

Next, we create the Oauth1 middleware and pass in the required parameters. The first two we've already got - they're the keys we defined in .env previously. The last two we got from the authenticated Twitter user instance.

We then push the middleware into the stack, and attach the stack onto the Guzzle client. In layman's terms, this means "when this client does requests, pull the requests through all the middlewares in the stack before sending them to their final destination". We also tell the client to always authenticate with oauth.

Finally, we make the GET call to the API endpoint with the required query params: the page to start on (-1 is the first page), the user for whom to pull followers, and how many followers to pull. In the end, we die this output onto the screen to see if we're getting what we need. Sure enough, here's 5000 of the most recent followers for my account:

Screenshot of 5000 Twitter user IDs

Now that we know our API calls are passing and we can talk to Twitter, it's time for some loops to get the full list for the current user.

The PHP side - Getting all Followers

Since there are 15 calls per 15 minutes allowed via the API, let's limit the account size to 70k followers for now for simplicity.

        $user = Socialite::driver('twitter')->user();

        if ($user->user['followers_count'] > 70000) {
            return view(
                'home.index',
                ['message' => 'Sorry, we currently only support accounts with up to 70k followers']
            );
        }

Note: home.index is an arbitrary view file I made just for this example, containing a single directive: .

Then, let's iterate through the next_cursor_string value returned by the API, and paginate through other IDs.

Wow, much numbers, very follow, wow.

Much numbers, very follow, wow.

With some luck, this should execute very quickly - depending on Twitter's API responsiveness.

Everyone with up to 70k followers can now get a full list of followers generated upon authorization.

If we needed to support bigger accounts, it would be relatively simple to make it repeat the process every 15 minutes (after the API limit resets) for every 75k followers, and stitch the results together. Of course, someone is almost guaranteed to follow/unfollow in that window given the number of followers, so it would be very hard to stay accurate. In those cases, it's easier to focus on the last 75k followers and only analyze those (the API auto-orders by last-followed), or to find another method of reliably fetching followers, bypassing the API.

Cleaning up

It's a bit awkward to have this logic in the LoginController, so let's move this into a separate service. I created app/Services/Followers/Followers.php for this example, with the following contents:

<?php


namespace App\Services\Followers;

use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Subscriber\Oauth\Oauth1;

class Followers
{

    /** @var string */
    protected $token;

    /** @var string */
    protected $tokenSecret;

    /** @var string */
    protected $nickname;

    /** @var Client */
    protected $client;

    public function __construct(string $token, string $tokenSecret, string $nickname)
    {
        $this->token = $token;
        $this->tokenSecret = $tokenSecret;
        $this->nickname = $nickname;

        $stack = HandlerStack::create();

        $middleware = new Oauth1(
            [
                'consumer_key' => getenv('TWITTER_CLIENT_ID'),
                'consumer_secret' => getenv('TWITTER_CLIENT_SECRET'),
                'token' => $this->token,
                'token_secret' => $this->tokenSecret,
            ]
        );

        $stack->push($middleware);

        $this->client = new Client(
            [
                'base_uri' => 'https://api.twitter.com/1.1/',
                'handler' => $stack,
                'auth' => 'oauth',
            ]
        );
    }

    public function getClient()
    {
        return $this->client;
    }

    /**
     * Returns an array of follower IDs for a given optional nickname.
     *
     * If no custom nickname is provided, the one used during the construction
     * of this service is used, usually defaulting to the same user authing
     * the application.
     *
     * @param string|null $nickname
     * @return array
     */
    public function getFollowerIds(string $nickname = null)
    {
        $nickname = $nickname ?? $this->nickname;

        $response = $this->client->get(
            'followers/ids.json', [
                'query' => [
                    'cursor' => '-1',
                    'screen_name' => $nickname,
                    'count' => 5000,
                ],
            ]
        );

        $data = json_decode($response->getBody()->getContents());
        $ids = $data->ids;

        while ($data->next_cursor_str !== "0") {

            $response = $this->client->get(
                'followers/ids.json', [
                    'query' => [
                        'cursor' => $data->next_cursor_str,
                        'screen_name' => $nickname,
                        'count' => 5000,
                    ],
                ]
            );
            $data = json_decode($response->getBody()->getContents());
            $ids = array_merge($ids, $data->ids);
        }

        return $ids;
    }

}

We can then clean up the LoginController's handleProviderCallback method:

    public function handleProviderCallback()
    {
        $user = Socialite::driver('twitter')->user();

        if ($user->user['followers_count'] > 70000) {
            return view(
                'home.index',
                ['message' => 'Sorry, we currently only support accounts with up to 70k followers']
            );
        }

        $flwrs = new Followers(
            $user->token, $user->tokenSecret, $user->nickname
        );
        dd($flwrs->getFollowerIds());
    }

It's still the wrong method to be doing this, so let's further improve things. To keep a user logged in, let's save the token, secret, and nickname into the session.

    /**
     * Get and store token data for authorized user.
     *
     * @param Request $request
     * @return Response
     */
    public function handleProviderCallback(Request $request)
    {
        $user = Socialite::driver('twitter')->user();

        if ($user->user['followers_count'] > 70000) {
            return view(
                'home.index',
                ['message' => 'Sorry, we currently only support accounts with up to 70k followers']
            );
        }

        $request->session()->put('twitter_token', $user->token);
        $request->session()->put('twitter_secret', $user->tokenSecret);
        $request->session()->put('twitter_nickname', $user->nickname);
        $request->session()->put('twitter_id', $user->id);

        return redirect('/');
    }

We save all the information into the session, making the user effectively logged in to our application, and then we redirect to the home page.

Let's create a new controller now, and give it a simple method to use:

artisan make:controller HomeController

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;

class HomeController extends Controller
{
    public function index(Request $request)
    {
        $nick = $request->session()->get('twitter_nickname');
        if (!$nick) {
            return view('home.loggedout');
        }

        return view('home.index', $request->session()->all());
    }
}

Simple, right? The views are simple, too:


<h1>FollowerFarmers</h1>

<h2>Hello, ! Not you? <a href="/logout">Log out!</a></h2>

<p>I bet you'd like to see your follower stats, wouldn't you?</p>


<h1>FollowerFarmers</h1>

<h2>Hello, stranger!</h2>

<p>You're currently logged out. How about you <a href="/auth/twitter">log in with Twitter </a> to get started?</p>

We'll need to add some routes to routes/web.php, too:

Route::get('/', 'HomeController@index');
Route::get('/logout', 'Auth\LoginController@logout');

With this, we can check if we're logged in, and we can easily log out.

Note that for security, the logout route should only accept POST requests with CSRF tokens - for simplicity during development, we're taking the GET approach and revamping it later.

Admittedly, it's not the prettiest thing to look at, but we're building a demo here - the real thing can get visually polished once the logic is done.

Registering a Service Provider

It's common practice to register a service provider for easier access later on, so let's do that. Our service can't be instantiated without the token and secret (i.e. before the user logs in with Twitter) so we'll need to make it deferred - in other words, it'll only get created when needed, and we'll make sure we don't need it until we have those values.

artisan make:provider FollowerServiceProvider

<?php

namespace App\Providers;

use App\Services\Followers\Followers;
use Illuminate\Support\ServiceProvider;

class FollowerServiceProvider extends ServiceProvider
{

    protected $defer = true;

    public function register()
    {
        $this->app->singleton(
            Followers::class, function ($app) {
            return new Followers(
                session('twitter_token'), session('twitter_secret'),
                session('twitter_nickname')
            );
        }
        );
    }

    public function provides()
    {
        return [Followers::class];
    }
}

If we put a simple count echo into our logged in view:



... and modify the HomeController to now use this ServiceProvider:

...

        return view(
            'home.index', array_merge(
                $request->session()->all(),
                ['ids'=> resolve(Followers::class)->getFollowerIds()]
            )
        );

... and then we test, sure enough, it works.

Basic views

Database

Now that we have a neat service to extract follower lists with, we should probably save them somewhere. We could save this into a local MySQL database, or even a flat file, but for performance and portability, I went with something different this time: RestDB.

RestDB is a plug-and-play hosted database service that's easy to configure and use, freeing up your choices of hosting platform. By not needing a database that writes to a local filesystem, you can easily push an app like the one we're building to Google Cloud Engine or Heroku. With the help of its templates, you can instantly set up a blog, a landing page, a web form, a log analyzer, even a mailing system - heck, the service even supports MarkDown for inline field editing, letting you practically have a MarkDown-based blog right there on their service.

RestDB has a free tier, and the first month is virtually limitless so you can thoroughly test it. The database I'm developing this on is on a Basic plan (courtesy of the RestDB team).

Setting up RestDB

Unlike with other database services, with RestDB it's important to consider record number limits. The Basic plan offers 10000 records, which would be quickly exhausted if we saved the follower of each logged in user as a separate entry, or even a list of followers for each user as a separate entry per 15 minute timeframe. That's why I chose the following plan:

Continue reading %How to Build a Twitter Follower-Farmer Detection App with RestDB%


by Bruno Skvorc via SitePoint

No comments:

Post a Comment