Friday, July 3, 2015

Turning a Crawled Website into a Search Engine with PHP

In the previous part of this tutorial, we used Diffbot to set up a crawljob which would eventually harvest SitePoint’s content into a data collection, fully searchable by Diffbot’s Search API. We also demonstrated those searching capabilities by applying some common filters and listing the results.

Diffbot Logo

In this part, we’ll build a GUI simple enough for the average Joe to use it, in order to have a relatively pretty, functional, and lightweight but detailed SitePoint search engine. What’s more, we won’t be using a framework, but a mere total of three libraries to build the entire application.

You can see the demo application here.

This tutorial is completely standalone, and as such if you choose to follow along, you can start with a fresh Homestead Improved instance. Note that in order to actually fully use what we build, you need a Diffbot account with Crawljob and Search API functionality.

Bootstrapping

Moving on, I’ll assume you’re using a Vagrant machine. If not, find out why you should, then come back.

On a fresh Homestead Improved VM, the bootstrapping procedure is as follows:

composer global require beelab/bowerphp:dev-master
mkdir sp_search
cd sp_search
mkdir public cache template template/twig app
composer require swader/diffbot-php-client
composer require twig/twig
composer require symfony/var-dumper --dev

In order, this:

  • installs BowerPHP globally, so we can use it on the entire VM.
  • creates the project’s root folder and several subfolders.
  • installs the Diffbot PHP client, which we’ll use to make all calls to the API and to iterate through the results.
  • installs the Twig templating engine, so we’re not echoing out HTML in PHP like peasants :)
  • installs VarDumper in dev mode, so we can easily debug while developing.

Continue reading %Turning a Crawled Website into a Search Engine with PHP%


by Bruno Skvorc via SitePoint

No comments:

Post a Comment