How to build a Joomla! plugin to pull Open Graph data from an external page

09.19.2014
Web design - Dev with Mtycks

Recently we had a client that wanted to share articles that they had found around the 'Net on their site. Now, normally we'd just set up some Weblinks categories and send them on their way—but they needed more.

For starters, they wanted to include an image from the article as well as a description. We suggested using the Open Graph metadata that most sites use so their content shows up nicely on Facebook.

How do we pull in the Open Graph metadata?

With a hat tip from Tom at StackOverflow.com, we can use this little bit of code using DOMXPath:

<?php 
	$data = file_get_contents('http://zunostudios.com/blog');
	$dom = new DomDocument;
	@$dom->loadHTML($data);
	
   $xpath = new DOMXPath($dom);
    # query metatags with og prefix
    $metas = $xpath->query('//*/meta[starts-with(@property, \'og:\')]');

    $og = array();

    foreach($metas as $meta){
        # get property name without og: prefix
        $property = str_replace('og:', '', $meta->getAttribute('property'));
        # get content
        $content = $meta->getAttribute('content');
        
        //Store each property into the $og array
        $og[$property] = $content;
    }
?>

You can use that wherever you'd like, but here's how we implemented into a content plugin.

The idea here is that a user will simply add a URL to the Link A field under the Images and links tab of a Joomla! Article and the Article will get auto-populated with an image and a description.

Here's the plugin's onContentBeforeSave method:

Note the comments that breakdown what's happening.

<?php
/**
 *
 * Article is passed by reference, but after the save, so no changes will be saved.
 * Method is called right after the content is saved
 *
 * @param   string   $context  The context of the content passed to the plugin (added in 1.6)
 * @param   object   $article  A JTableContent object
 * @param   boolean  $isNew    If the content is just about to be created
 *
 * @return  boolean   true if function not enabled, is in front-end or is new. Else true or
 *                    false depending on success of save function.
 *
 * @since   1.6
 */
public function onContentBeforeSave($context, $article, $isNew)
{
	
	//Turn on all error reporting so we can see what's wrong
	error_reporting(E_ALL);
	
	//Grab all the categories we've selected in our plugin
	$cats = $this->params->def('cats', array());
	
	//If the current article is in one of those categories
	//Let's grab the Open Graph metadata
	if( in_array($article->catid, $cats) ){
		
		//JSON Decode the images and urls of the article
		$images = json_decode($article->images);
		$urls = json_decode($article->urls);
		
		//If either the are empty, let's get the tags
		//OR if the introtext is empty, let's get the tags
		if( !empty($urls->urla) && (empty($images->image_intro) || empty($article->introtext)) ){
			
			//Get Open Graph Meta Data
			$data = file_get_contents($urls->urla);
			$dom = new DomDocument;
			@$dom->loadHTML($data);
			
		    $xpath = new DOMXPath($dom);
		    # query metatags with og prefix
		    $metas = $xpath->query('//*/meta[starts-with(@property, \'og:\')]');
		
		    $og = array();
			
			//Loop through all of the tags to add to $og variable
		    foreach($metas as $meta){
		    
		        # get property name without og: prefix
		        $property = str_replace('og:', '', $meta->getAttribute('property'));
		        
		        # get content
		        $content = $meta->getAttribute('content');
		        $og[$property] = $content;
		    
		    }
		    
		    //If the image_intro is empty, and we have an Open Graph image
		    //Let's set the article's image_intro
		    if( empty($images->image_intro) && isset($og['image']) ){
			    $images->image_intro = $og['image'];
			    $article->images = json_encode($images);
		    }
		    
		    //If the introtext is empty and we have an Open Graph description
		    //Let's set the article's introtext—wrapping it with paragraph tags
		    if( empty($article->introtext) && isset($og['description']) ){
			    $article->introtext = '<p>'.$og['description'].'</p>';   
		    }
			
			return true;
				
		}
					
	}

}
?>

The $cats variable above grabs the selected categories from the Content component that we set in the Plugin's settings:

And here's the form field that's in the Plugin's XML file:

<field name="cats"
	type="category"
	extension="com_content"
	label="Categories"
	description="Categories to include plugin"
	multiple="true" />