{"id":680,"date":"2013-02-14T21:36:50","date_gmt":"2013-02-15T02:36:50","guid":{"rendered":"http:\/\/awgentry.com\/?p=680"},"modified":"2013-04-29T22:52:50","modified_gmt":"2013-04-30T02:52:50","slug":"how-to-retrieve-webpage-meta-tags-with-phps-get_meta_tags","status":"publish","type":"post","link":"http:\/\/awgentry.com\/weblog\/how-to-retrieve-webpage-meta-tags-with-phps-get_meta_tags\/","title":{"rendered":"How To Retrieve Webpage Meta Tags With PHP&#8217;s get_meta_tags"},"content":{"rendered":"<p>I know that the burning question on all you Web geeks&#8217; minds, the one that keeps you up at night, is <strong>&#8220;How can I <a href=\"http:\/\/www.clker.com\/cliparts\/e\/I\/k\/L\/P\/M\/cartoon-duck-walking-th.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright\" src=\"http:\/\/www.clker.com\/cliparts\/e\/I\/k\/L\/P\/M\/cartoon-duck-walking-th.png\" alt=\"\" width=\"63\" height=\"99\" \/><\/a>grab the meta content from a website using PHP?&#8221;<\/strong> Either that, or &#8220;<a href=\"http:\/\/www.snopes.com\/critters\/wild\/duckecho.asp\">Does a duck&#8217;s quack echo?<\/a>&#8221; For purposes of this discussion, though, I&#8217;m gonna&#8217; go with the former.<\/p>\n<p><a href=\"http:\/\/www.clker.com\/cliparts\/9\/1\/b\/f\/11971486451344652162drunken_duck_spider_1.svg.thumb.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft\" style=\"margin: 5px;\" src=\"http:\/\/www.clker.com\/cliparts\/9\/1\/b\/f\/11971486451344652162drunken_duck_spider_1.svg.thumb.png\" alt=\"\" width=\"64\" height=\"99\" \/><\/a>As any Web guy or gal worth their salt knows, <strong>meta tags define each webpage&#8217;s title, description, and much more. <\/strong>They&#8217;re used as a webpage summary in search engine listings when they spider your site, and help to categorize your content. (Did I lose you? Go back to <a href=\"http:\/\/www.tizag.com\/htmlT\/meta.php\">Meta Tags 101<\/a> and <a href=\"http:\/\/searchenginewatch.com\/2167931\">102<\/a> before continuing.) Why would you want to retrieve them? Maybe you want to <strong>create your own website directory<\/strong>; this would be a quick way to pop out a nice website list without a ton of cut-and-paste. It could also help you <strong>spy on your competitors<\/strong>, to see what keywords you might want to focus on to try and beat them in the rankings game.<\/p>\n<p>This example focuses on the <em>title <\/em>(using the &lt;title&gt; tag)<em>, description, <\/em>and <em>keywords<\/em> tags, but could be modified to grab other data such as technical settings and so forth. Here&#8217;s the end result:<\/p>\n<blockquote><p><strong>Title:<\/strong> <em>Yahoo!<\/em><br \/>\n<strong>URL:<\/strong> <a href=\"http:\/\/www.yahoo.com\/\">http:\/\/www.yahoo.com\/<\/a><br \/>\n<strong>Description:<\/strong> Welcome to Yahoo!, the world&#8217;s most visited home page. Quickly find  what you&#8217;re searching for, get in touch with friends and stay  in-the-know with the latest news and information.<br \/>\n<strong>Keywords:<\/strong> yahoo, yahoo home page, yahoo homepage, yahoo search, yahoo mail, yahoo  messenger, yahoo games, news, finance, sport, entertainment<\/p><\/blockquote>\n<p>And now, the moment you&#8217;ve all been waiting for: the programming for this little gem. (Thanks to <a href=\"http:\/\/webhole.net\/2010\/02\/21\/how-to-extract-meta-data-from-a-page\/\">WebHole.net<\/a> for the original code.) Here goes:<\/p>\n<blockquote><p>&lt;?php<\/p>\n<p>\/*Modified from original source:<br \/>\nhttp:\/\/webhole.net\/2010\/02\/21\/how-to-extract-meta-data-from-a-page\/<br \/>\n*\/<\/p>\n<p>function getMetaData($url){<br \/>\n\/\/ get meta tags<br \/>\n$meta=get_meta_tags($url);<br \/>\n\/\/ store page<br \/>\n$page=file_get_contents($url);<br \/>\n\/\/ find where the title CONTENT begins<br \/>\n$titleStart=strpos($page,'&lt;title&gt;&#8217;)+7;<br \/>\n\/\/ find how long the title is<br \/>\n$titleLength=strpos($page,'&lt;\/title&gt;&#8217;)-$titleStart;<br \/>\n\/\/ extract title from $page<br \/>\n$meta[&#8216;title&#8217;]=substr($page,$titleStart,$titleLength);<br \/>\n\/\/ return array of data<br \/>\nreturn $meta;<br \/>\n}<\/p>\n<p>$website = &#8220;http:\/\/www.yahoo.com\/&#8221;;\u00a0\u00a0 \/\/ change this to the URL you want to scrape<\/p>\n<p>$tags=getMetaData($website);<\/p>\n<p>echo &#8216;&lt;b&gt;Title:&lt;\/b&gt; &lt;i&gt;&#8217;.$tags[&#8216;title&#8217;] . &#8216;&lt;\/i&gt;&#8217;;<br \/>\necho &#8216;&lt;br \/&gt;&#8217;;<br \/>\necho &#8216;&lt;b&gt;URL:&lt;\/b&gt; &lt;a href=&#8221;&#8216; . $website . &#8216;&#8221;&gt;&#8217; . $website . &#8216;&lt;\/a&gt;&lt;br \/&gt;&#8217;;<br \/>\necho &#8216;&lt;b&gt;Description:&lt;\/b&gt; &#8216;.$tags[&#8216;description&#8217;];<br \/>\necho &#8216;&lt;br \/&gt;&#8217;;<br \/>\necho &#8216;&lt;b&gt;Keywords:&lt;\/b&gt; &#8216;.$tags[&#8216;keywords&#8217;];<\/p>\n<p>?&gt;<\/p><\/blockquote>\n<p><span class=\"zem-script more-related pretty-attribution\"><script src=\"http:\/\/static.zemanta.com\/readside\/loader.js\" type=\"text\/javascript\"><\/script><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I know that the burning question on all you Web geeks&#8217; minds, the one that keeps you up at night, is &#8220;How can I grab the meta content from a website using PHP?&#8221; Either that, or &#8220;Does a duck&#8217;s quack echo?&#8221; For purposes of this discussion, though, I&#8217;m gonna&#8217; go with the former. As any&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17,133],"tags":[],"class_list":["post-680","post","type-post","status-publish","format-standard","hentry","category-geek","category-programming-web"],"_links":{"self":[{"href":"http:\/\/awgentry.com\/weblog\/wp-json\/wp\/v2\/posts\/680","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/awgentry.com\/weblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/awgentry.com\/weblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/awgentry.com\/weblog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/awgentry.com\/weblog\/wp-json\/wp\/v2\/comments?post=680"}],"version-history":[{"count":1,"href":"http:\/\/awgentry.com\/weblog\/wp-json\/wp\/v2\/posts\/680\/revisions"}],"predecessor-version":[{"id":991,"href":"http:\/\/awgentry.com\/weblog\/wp-json\/wp\/v2\/posts\/680\/revisions\/991"}],"wp:attachment":[{"href":"http:\/\/awgentry.com\/weblog\/wp-json\/wp\/v2\/media?parent=680"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/awgentry.com\/weblog\/wp-json\/wp\/v2\/categories?post=680"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/awgentry.com\/weblog\/wp-json\/wp\/v2\/tags?post=680"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}