In the same way as most other users do, I find myself frequently posting links on twitter using one of the many URL shortening services. I’ll say at this point that I absolutely loathe these services, the recent demise of Tr.im and the debacle which has followed enforces the view that replicating something which already exists is bound to lead to issues (the thinking behind the popular programming acronym DRY).
It would be a serious step forwards (IMHO), if twitter allowed you to convert sections of a tweet to a link in much the same way as one does in HTML so we can avoid this, but anyway, I’ve digressed, people might complain it would make it too “complicated”, rant over, back on track.
As the subject of this post suggests this is just a random little code snippet I put together today to help me hold onto my links, make them a little more searchable and use the real link not a shortened version by posting them to delicious when I tweet them. It almost certainly exists elsewhere, but this is my take on it and doesn’t rely on me using a particular browser and having a plug-in installed on all the machines I use it on which the otherwise excellent tweecious would do.
Requirements
- PHP5 webserver
- SimplePie Rss parser (I’m using version 1.1.3 nothing has changed since which would break this, AFAIK)
- CRONtab or similar to make it automatic, you can always just load the page up in a browser window if you don’t have CRON on your server or use webcron.
What it does
- Any link you post on twitter is resolved to its original address, title grabbed off the original site and posted onto your delicious account with your tweet in the description and any tags you want.
- Links will only be posted once, so if a link is automatically posted and you then delete it from delicious then its not going to reappear the next time the script is run.
- You can optionally blacklist domains or sections of domains which you don’t want to be saved, for instance I tweeted a link to a programme on BBC iPlayer, there’s no point me saving it as iPlayer only stores stuff for a couple of weeks
- You can also prevent a tweet which has a specific hashtag from being posted, I use the tag “#ns” which will prompt the script to ignore the rest
To use it, just upload twittodel.php from the zip file below (or copy and paste the below source) to your webserver, add simplepie.inc and any other bits it needs, make a cache folder according to whatever you’ve set the CACHE_DIR option to (as a default it wants a folder called ‘cache’.
Download: twittodel.zip
As I say, it was put together fairly quickly so is not guaranteed to be perfect but anyway. Comments/improvements/criticism welcome. It might see the addition of Zemanta style automatic tagging like tweecious, I’ll update this post if it does.
// Author: Duncan Barnes (www.barnesdmd.co.uk)
// Updated: 16/08/09
// License: Creative Commons Attribution-Non-Commercial-Share Alike (http://creativecommons.org/licenses/by-nc-sa/2.0/)
//-----------------------------------------------------------------------------------------
//You'll need to enter your delicious credentials and a few other details here to make this work
//I've left in a few details as examples
//Delicious username
define('DELICIOUS_USERNAME','yourusernamehere');
//Delicious password
define('DELICIOUS_PASSWORD','yourpasswordhere');
//Tag/s to add to the delicious entry
define('DELICIOUS_TAGS','from_twitter');
//Your twitter account timeline rss feed
define('TWITTER_RSS','http://twitter.com/statuses/user_timeline/6752222.rss');
//You can put a hash tag here which you might want to use to prevent tweets being posted to delicious
define('TWITTER_OMIT','#ns');
//You can put bits of addresses in here which you don't want to be posted, separate with a comma, e.g 'bbc.co.uk/iplayer', no urls which have this in them would be posted
define('ADDRESS_OMIT','bbc.co.uk/iplayer,bbc.co.uk/programmes');
//Directory where this script can store cached data
define('CACHE_DIR','cache/');
//Max number of tweets to process each time the script is run, adjust based on how often you tweet and how often you run the cron controlling this script
define('MAX_TWEETS',6);
//-----------------------------------------------------------------------------------------
//Bit of quick error checking
if (!class_exists('SimplePie')){if(file_exists('simplepie.inc')){require_once 'simplepie.inc';}else{echo 'Simplepie not found, check simplepie.inc is in the same directory as this script!';exit();}}
$feed = new SimplePie(TWITTER_RSS,CACHE_DIR,3600) or die('Could not declare Simplepie, you might want to check your TWITTER_RSS and CACHE_DIR settings and also that you have any additional files which SimplePie wants.');
$curl_handle = curl_init() or die('Could not initiate curl, please check its enabled on your server!');
//Fetching our cache of previous things we've saved to delicious, means:
// a)we're not repeating a push to delicious
// b) If you delete an auto created entry from delicious it won't be recreated the next time this is run!
// The script will still continue if there's a problem here, we're not going to worry about it too much as this might be the first run
if(file_exists(CACHE_DIR.'delicious.data')){$deliciouscache = unserialize(file_get_contents(CACHE_DIR.'delicious.data'));}else{$deliciouscache = array();}
//For ease of use, the ADDRESS_OMIT option is a constant, however we want its contents as an array for ease of processing so we'll covert it now
$address_omit = explode(',',ADDRESS_OMIT);
@array_walk($address_omit, 'trim_value');
//Lets do it
$i=0;
foreach ($feed->get_items() as $item){
if($i==MAX_TWEETS){break;} //Stoppping if we've reached the limit
$tweet = $item->get_title();
//Checking the tweet isnt already in our cache or contains our omit string, we use an md5 of the tweet as it saves us doing any further lookups
if(in_array(md5($tweet),$deliciouscache) || strstr($tweet, TWITTER_OMIT)){$i++;continue;}
if(!$page = getUrl($tweet)){$i++;continue;} //We've failed to get the target url, on to the next...
if(!domain_check($page['url'])){$i++;continue;} //Checking the resolved url isn't in our list of urls we don't want to know about (ADDRESS_OMIT option)
// At this point we should have an array ($page) containing the title and the url provided by the getUrl function
// so lets go ahead and try and add to Delicious account
$url = urlencode($page['url']);
$title = urlencode($page['title']);
$tweetenc = urlencode($tweet);
$tag = urlencode(DELICIOUS_TAGS);
curl_setopt($curl_handle, CURLOPT_URL, 'https://'.DELICIOUS_USERNAME.':'.DELICIOUS_PASSWORD."@api.del.icio.us/v1/posts/add?url=$url&description=$title&extended=$tweetenc&tags=$tag");
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 0);
if($result = curl_exec($curl_handle)){
array_unshift($deliciouscache, md5($tweet));
if(count($deliciouscache) > MAX_TWEETS){ //One in one out, stopping the cache file growing exponentially
array_pop($deliciouscache);
}
}
$i++;
}
//Writing our cache back to file ready for the next run
if($res = fopen(CACHE_DIR.'delicious.data','w')){fwrite($res,serialize($deliciouscache));}
curl_close($curl_handle);
//Done
//Helper functions
function domain_check($url){global $address_omit;if(is_array($address_omit)){foreach($address_omit as $address){if(!empty($address) && strstr($url, $address)){return false;}}}return true;}
function trim_value(&$value){$value = trim($value);} //Used to trim the $domain_omit array
function getUrl($tweet){
global $curl_handle;
if(!$tweet){return false;}
preg_match('/http:\/\/\S*?/U', $tweet, $result);
curl_setopt($curl_handle, CURLOPT_URL, $result[0]);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, 1); // follow redirects (in the case of shortened urls)
if(!$buffer = curl_exec($curl_handle)){return false;}
$info = curl_getinfo($curl_handle);
if($info['http_code'] !== 200){return false;} //If the page is gone then cancel
// match the title of the page at the url
preg_match( "/<title>(.*)<\/title>/s", $buffer, $match );
$title = preg_replace('/<\/?(title)[^>]*>/iU', '', $match[1]);
return array('title'=>$title,'url'=>$info['url']);
}






