The
Microlink API is used for extracting information from
any link.
Just enter a URL and you will receive data.
It was designed to get generic information present in the target website, based on metadata normalization using
.
Although this is expected, many use cases are left out of the scope if we need to get specific data information.
Today we're happy to introduce a new core functionality called Custom Rules π.
Custom Rules provide you an interface to interact with the API, specifying new data fields that can be extracted from an specific URL.
Imagine you want ot interact with an Instagram profile url, like
's profile.
A website is just an interface for a database, let's convert the web into real data π€.
By using
Microlink API we can obtain well structured and normalized data from any Instagram URL:
curl https://api.microlink.io/?url=https://instagram.com/elonmusk
The API response will look like the following:
{
"status": "success",
"data": {
"lang": "en",
"author": null,
"title": "Elon Musk (@elonmusk) β’ Instagram photos and videos",
"publisher": "Instagram",
"image": {
"width": 150,
"height": 150,
"type": "jpg",
"url": "https://scontent-iad3-1.cdninstagram.com/vp/b3d0c296df87fe4b1de4b01639d001ae/5BB89A41/t51.2885-19/s150x150/28429097_208691389878371_4706100807026606080_n.jpg"
},
"description": "7.7m Followers, 39 Following, 210 Posts - See Instagram photos and videos from Elon Musk (@elonmusk)",
"video": null,
"date": null,
"logo": {
"width": 192,
"height": 192,
"type": "png",
"url": "https://instagram.com/static/images/ico/favicon-192.png/68d99ba29cc8.png"
},
"url": "https://instagram.com/elonmusk/"
}
}
Although this is enough to have a global vision of what's behind a link (or to build a previsualization using our
SDK), you may be interested in specific information that we don't expose because it'sn't generic.
Let's define a rule for extracting the avatar profile.
A rule is a way to interact with the API. You've to declare the type of data you want to extract through properties. These properties are:
It defines the HTML element you want to get from the HTML of the targeted URL.
A simple way to get the selector could be copy it directly from DevTools.
The way to specify selectors is jQuery-like, so you can specify the selector using:
- An HTML tag, (e.g.,
img
). - An CSS class or pseudo class, id or data-attribute, (e.g.,
.avatar
). - A combination of both, (e.g.,
first:img
).
It defines which property from the matched selector should be picked.
That means, for example, if you want to extract an img
, probably you are interested in src
property.
It defines a check validator to be run against the extracted value defined by selector
and attr
.
It's possible to validate all the
basic properties that can be extracted using the API:
author
date
description
image
description
video
lang
logo
publisher
title
url
Each validator type
will be applied to a set of mutations from the original extracted value.
For example, if you define the type
as image
, then you'll be sure that the value extracted will be an image-compatible url, and your browser will be able to render it.
But it'll be different if you declare the type
as author
, because the value will be capitalized.
Now that we know how to define rules, let's see how to add them into the
request.
They need to be declared as query parameters using dot notation:
{
"data.avatar.selector": "img:first",
"data.avatar.attr": "src",
"data.avatar.type": "image"
}
Defining a new custom rule for 'avatar' field.
Here we are defining our custom rule for a new data field called avatar.
curl https://api.microlink.io/?url=https%3A%2F%2Fwww.instagram.com%2Felonmusk&data.avatar.selector=img%3Afirst&data.avatar.type=image&data.avatar.attr=src&prerender&video=false
Encoding the custom rule as query paramter in the API request.
After that, the API will return the new data field avatar
as part of the response payload π
{
"status": "success",
"data": {
"lang": "en",
"author": null,
"title": "Elon Musk (@elonmusk) β’ Instagram photos and videos",
"publisher": "Instagram",
"image": {
"width": 150,
"height": 150,
"type": "jpg",
"url": "https://scontent-iad3-1.cdninstagram.com/vp/b3d0c296df87fe4b1de4b01639d001ae/5BB89A41/t51.2885-19/s150x150/28429097_208691389878371_4706100807026606080_n.jpg"
},
"description": "7.7m Followers, 39 Following, 210 Posts - See Instagram photos and videos from Elon Musk (@elonmusk)",
"video": null,
"date": null,
"logo": {
"width": 192,
"height": 192,
"type": "png",
"url": "https://instagram.com/static/images/ico/favicon-192.png/68d99ba29cc8.png"
},
"url": "https://instagram.com/elonmusk/",
"avatar": {
"width": 150,
"height": 150,
"type": "jpg",
"url": "https://scontent-iad3-1.cdninstagram.com/vp/b3d0c296df87fe4b1de4b01639d001ae/5BB89A41/t51.2885-19/s150x150/28429097_208691389878371_4706100807026606080_n.jpg"
}
}
}
The payload now have a new 'avatar' field.
In this case, we've defined the type
as image
. The API can handle the property value and then provide us extra information. Like, for instance, the image dimensions.
Some scenarios need to contemplate that HTML markup can change.
This is specially remarkable in the way to define your custom rules selector
:
- A very specific selector (e.g.,
.avatar
) has better accuracy, but you don't have the guarantee that it's always present. - A more generic selector (e.g.,
img
) is easier to be found in the HTML markup, but it doesn't always have the expected value.
Ideally, a good solution needs to contemplate both approaches: first, resolve with an specific selector, and second, fallback into one more generic if it can't resolve the first selector.
This could be done with custom rules in the same API request π.
You just need to declare the conditions as part of the same rule:
{
"data.avatar.0.selector": ".avatar",
"data.avatar.0.attr": "src",
"data.avatar.0.type": "image",
"data.avatar.1.selector": "img:first",
"data.avatar.1.attr": "src",
"data.avatar.1.type": "image"
}
Adding more than one rule per data field.
Note that order is important: The data value extracted will be first value resolved successfully.
What happens if you declare a selector
that matches with more than one result?
{
"data.photos.selector": "article img",
"data.photos.attr": "src",
"data.photos.type": "image"
}
Declaring a custom rule for detecting all images.
curl https://api.microlink.io/?url=https%3A%2F%2Fwww.instagram.com%2Felonmusk&data.avatar.selector=img&data.avatar.type=image&data.avatar.attr=src&prerender&video=false
Can the API extract them? The answer is yes!
{
"status": "success",
"data": {
"lang": "en",
"author": null,
"title": "Elon Musk (@elonmusk) β’ Instagram photos and videos",
"publisher": "Instagram",
"image": {
"width": 150,
"height": 150,
"type": "jpg",
"url": "https://scontent-iad3-1.cdninstagram.com/vp/b3d0c296df87fe4b1de4b01639d001ae/5BB89A41/t51.2885-19/s150x150/28429097_208691389878371_4706100807026606080_n.jpg"
},
"description": "7.7m Followers, 39 Following, 210 Posts - See Instagram photos and videos from Elon Musk (@elonmusk)",
"video": null,
"date": null,
"logo": {
"width": 192,
"height": 192,
"type": "png",
"url": "https://instagram.com/static/images/ico/favicon-192.png/68d99ba29cc8.png"
},
"url": "https://instagram.com/elonmusk/",
"avatar": [
"https://scontent-iad3-1.cdninstagram.com/vp/1ffb38c951c16879d354091a0e80c836/5BA4CE48/t51.2885-15/s640x640/sh0.08/e35/c0.134.1080.1080/32039832_1818999621729707_2373182444238012416_n.jpg",
"https://scontent-iad3-1.cdninstagram.com/vp/9caae3887f4b707122a909ba18be9a17/5B167C40/t51.2885-15/s640x640/e15/31386504_411011476032232_463607480123916288_n.jpg",
"https://scontent-iad3-1.cdninstagram.com/vp/4fca495d133a478de0c63069761ff061/5BB36DC1/t51.2885-15/s640x640/sh0.08/e35/c0.135.1080.1080/31310672_249632775610280_7873472706304278528_n.jpg",
"https://scontent-iad3-1.cdninstagram.com/vp/e29f9a4d023d86b8ababa9d9991ae311/5BC2252B/t51.2885-15/s640x640/sh0.08/e35/c180.0.720.720/31463407_209037936363460_7225796096243531776_n.jpg",
"https://scontent-iad3-1.cdninstagram.com/vp/76f95b5147452dd937441ca05ffb797c/5BA75F40/t51.2885-15/e35/c167.0.620.620/31070327_164427757566288_2666001116772171776_n.jpg",
"https://scontent-iad3-1.cdninstagram.com/vp/0f30cdcd2fa57864966c36f6dd6b1755/5BA26788/t51.2885-15/s640x640/sh0.08/e35/30086931_229916390892091_3747042018648391680_n.jpg",
"https://scontent-iad3-1.cdninstagram.com/vp/3ea8b95d0d5129cb88bedd5baf5e321e/5B9EAE06/t51.2885-15/s640x640/sh0.08/e35/c0.0.1079.1079/30085730_1657613874332856_5430454433135722496_n.jpg",
"https://scontent-iad3-1.cdninstagram.com/vp/33ccb38ad541fbdcd9fb6322f5767b5e/5BAB1CCD/t51.2885-15/e35/c75.0.358.358/29738552_2099263200285553_2919404320380157952_n.jpg",
"https://scontent-iad3-1.cdninstagram.com/vp/8af59e7ec6c4fd34955e127ff79693a4/5BC49E68/t51.2885-15/s640x640/sh0.08/e35/c0.134.1080.1080/29718069_662668550574944_3003405522683559936_n.jpg",
"https://scontent-iad3-1.cdninstagram.com/vp/d75424916f00cf8a6357f79c54f70812/5BC09704/t51.2885-15/s640x640/sh0.08/e35/c0.125.1080.1080/29738021_445961452525265_1824961269409513472_n.jpg",
"https://scontent-iad3-1.cdninstagram.com/vp/981aea1c5f2366827ad2875b995b2808/5B167EDE/t51.2885-15/e15/c236.0.607.607/29418227_611168632571297_6056208306052005888_n.jpg",
"https://scontent-iad3-1.cdninstagram.com/vp/0659175d64fc417dfe5a3a5e5428eb59/5BAE44B0/t51.2885-15/s640x640/sh0.08/e35/29739298_2051786191528079_7343938294230548480_n.jpg"
]
}
}
The new 'photos' field is a collection.
The only difference is that this time the result is a collection.
When you see a null
in the API response, it means that it couldn't resolve the value properly.
You can define custom rules as fallback rules for an existing data field.
For example, we are seeing that the API is not resolving the author
field for Instagram profile urls. Let's add it!
{
"data.author.selector": "section h1:last",
"data.author.attr": "text",
"data.author.type": "author"
}
Declaring a custom rule for fallback a basic rule.
curl https://api.microlink.io/?url=https%3A%2F%2Fwww.instagram.com%2Felonmusk&prerender&video=false&data.author.selector=section%20h1%3Alast&data.author.type=author&data.author.attr=text
{
"status": "success",
"data": {
"lang": "en",
"author": "Elon Musk",
"title": "Elon Musk (@elonmusk) β’ Instagram photos and videos",
"publisher": "Instagram",
"image": {
"width": 150,
"height": 150,
"type": "jpg",
"url": "https://scontent-iad3-1.cdninstagram.com/vp/b3d0c296df87fe4b1de4b01639d001ae/5BB89A41/t51.2885-19/s150x150/28429097_208691389878371_4706100807026606080_n.jpg"
},
"description": "7.7m Followers, 39 Following, 210 Posts - See Instagram photos and videos from Elon Musk (@elonmusk)",
"video": null,
"date": null,
"logo": {
"width": 192,
"height": 192,
"type": "png",
"url": "https://instagram.com/static/images/ico/favicon-192.png/68d99ba29cc8.png"
},
"url": "https://instagram.com/elonmusk/"
}
}
Now the value is resolved properly π.
One thing that makes
Microlink API powerful is that you can combine every
API Parameter to work together.
{
"data.photos.selector": "img:first",
"data.photos.attr": "src",
"data.photos.type": "image",
"filter": "avatar",
"palette": true
}
curl https://api.microlink.io/?url=https%3A%2F%2Fwww.instagram.com%2Felonmusk&data.avatar.selector=img%3Afirst&data.avatar.type=image&data.avatar.attr=src&prerender&video=false&palette&filter=avatar
{
"status": "success",
"data": {
"avatar": {
"width": 150,
"height": 150,
"type": "jpg",
"url": "https://scontent-iad3-1.cdninstagram.com/vp/b3d0c296df87fe4b1de4b01639d001ae/5BB89A41/t51.2885-19/s150x150/28429097_208691389878371_4706100807026606080_n.jpg",
"palette": [
"#514030",
"#8a7f6c",
"#cac0ac",
"#f4e4d4",
"#4c3c24",
"#ad8851"
],
"background_color": "#F4E4D4",
"color": "#755C37",
"alternative_color": "#4C3C24"
}
}
}
Detecting predominant color for an image extracted using a custom rule and filtering it π€―.
This is specially useful when you want to optimize your API calls response time.