API Request

Parameters

To scrape a web page using the Shifter's Web Scraping API, simply use the API’s base endpoint and append the URL you would like to scrape as well as your API access key as GET parameters.

There is also a series of optional parameters you can choose from. On this page, you will find an example request used to scrape the URL http://httpbin.org/get

Parameter

Required

Details

api_key

Required

Specify your unique API access key to authenticate with the API. Your API access key can be found in your account dashboard.

url

Required

Specify the URL of the web page you would like to scrape.

render_js

Optional

Set to 0 (off, default) or 1 (on) depending on whether or not to render JavaScript on the target web page. JavaScript rendering is done by using a browser. When render_js is enabled, we charge 5 API calls for a datacenter request and 25 API calls for a residential request.

proxy_type

Optional

Set datacenter (default) or residential depending on whether proxy type you want to use for your scraping request. Please note that a single residential proxy API request is counted as 10 API calls when render_js is off and 25 API calls when the render_js is on.

country

Optional

Specify the 2-letter code of the country you would like to use as a proxy geolocation for your scraping API request. Supported countries differ by proxy type, please refer to the Proxy Locations section for details.

keep_headers

Optional

Specify whether or not to keep the original request headers in order to pass through custom headers. In order to only use the headers that are specified set the parameter keep_headers=0.

session

Optional

Set depending on whether or not to use the same proxy address to your request.

timeout

Optional

Specify the maximum timeout in milliseconds you would like to use for your scraping API request. In order to force a timeout, you can specify a number such as 1000. This will abort the request after 1000ms and return whatever HTML response was obtained until this point in time. The maximum value for this parameter is 60000.

device

Optional

Set desktop (default) or mobile or tablet, depending on whether the device type you want to your for your scraping request.

wait_until

Optional

{for advanced users} Specify the option you would like to us as conditional for your scraping API request. Can only be used when the parameter render_js=1 is activated.

wait_for

Optional

{for advanced users} Some websites may use javascript frameworks that may require a few extra seconds to load their content. This parameters specifies the time in milliseconds to wait for the website. Recommended values are in the interval 5000-10000. Can only be used when the parameter render_js=1 is activated.

wait_for_css

Optional

{for advanced users} Specify a CSS selector and the API will wait 10 seconds (the default value of the timeout parameter) until the selector appears. Can only be used when the parameter render_js=1 is activated.

screenshot

Optional

{for advanced users} Get the scraped website as a screenshot. Can only be used when the parameter render_js=1 is activated.

screenshot_options

Optional

{for advanced users} Control the size of the screenshot.

extract_rules

Optional

{for advanced users} Get the scraped website based on various extraction rules.

disable_stealth

Optional

{for advanced users} Disable the stealth plugin which is enabled by our scraping api by default. It can be set to 1 (to disable the stealth plugin) or 0 (to keep the stealth plugin enabled). Can only be used when the parameter render_js=1 is activated.

auto_parser

Optional

{for advanced users} Get the scraped website in JSON format. It can be set to 0 (default) or 1.

js_instructions

Optional

{for advanced users} Perform JavaScript instructions before obtaining the scraped website.

API Request Example

GET https://scrape.shifter.io/v1?api_key=api_key&url=https://httpbin.org/

⇡ Input

curl --request GET --url "https://scrape.shifter.io/v1?api_key=api_key&url=https%3A%2F%2Fhttpbin.org%2Fget"

const http = require("https");

const options = {
  "method": "GET",
  "hostname": "scrape.shifter.io",
  "port": null,
  "path": "/v1?api_key=api_key&url=https%3A%2F%2Fhttpbin.org%2Fget",
  "headers": {}
};

const req = http.request(options, function (res) {
  const chunks = [];

  res.on("data", function (chunk) {
    chunks.push(chunk);
  });

  res.on("end", function () {
    const body = Buffer.concat(chunks);
    console.log(body.toString());
  });
});

req.end();

import http.client

conn = http.client.HTTPSConnection("scrape.shifter.io")

conn.request("GET", "/v1?api_key=api_key&url=https%3A%2F%2Fhttpbin.org%2Fget")

res = conn.getresponse()
data = res.read()

print(data.decode("utf-8"))

<?php

$curl = curl_init();

curl_setopt_array($curl, [
  CURLOPT_URL => "https://scrape.shifter.io/v1?api_key=api_key&url=https%3A%2F%2Fhttpbin.org%2Fget",
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => "",
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 30,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => "GET",
]);

$response = curl_exec($curl);
$err = curl_error($curl);

curl_close($curl);

if ($err) {
  echo "cURL Error #:" . $err;
} else {
  echo $response;
}

package main

import (
	"fmt"
	"net/http"
	"io/ioutil"
)

func main() {

	url := "https://scrape.shifter.io/v1?api_key=api_key&url=https%3A%2F%2Fhttpbin.org%2Fget"

	req, _ := http.NewRequest("GET", url, nil)

	res, _ := http.DefaultClient.Do(req)

	defer res.Body.Close()
	body, _ := ioutil.ReadAll(res.Body)

	fmt.Println(res)
	fmt.Println(string(body))

}

var client = new RestClient("https://scrape.shifter.io/v1?api_key=api_key&url=https%3A%2F%2Fhttpbin.org%2Fget");
var request = new RestRequest(Method.GET);
IRestResponse response = client.Execute(request)

require 'uri'
require 'net/http'
require 'openssl'

url = URI("https://scrape.shifter.io/v1?api_key=api_key&url=https%3A%2F%2Fhttpbin.org%2Fget")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE

request = Net::HTTP::Get.new(url)

response = http.request(request)
puts response.read_body

⇣ Output

{
    "args": {},
    "headers": {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
        "Host": "httpbin.org",
        "Upgrade-Insecure-Requests": "1",
        "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.0 Safari/537.36",
        "X-Amzn-Trace-Id": "Root=1-6267b3eb-23a693e76b82605d44b1d103"
    },
    "origin": "104.144.25.118",
    "url": "https://httpbin.org/get"
}

PreviousWeb Scraping API NextBasic Request

Last updated 2 years ago

Was this helpful?