Bad request to Splash & HTTP status code is not handled or not allowed

hi kmike, i use scrapy-splash and meet a issue, when i first run 'scrapy crawl toutiao', it's run right, bug when i run it's second, it occur a issue.

i find the issue because headers i add, when i not use headers, it's run right, but it's errors when i use headers and run the second. 

the lua script and project follows, i need your help, thanks.

code:
```
import scrapy
import json
from scrapy_splash import SplashRequest
from scrapy.http.headers import Headers

script = """ 
function main(splash)
  splash:init_cookies(splash.args.cookies)
  assert(splash:go{
                    splash.args.url,
                    headers=splash.args.headers,
                    http_method=splash.args.http_method,
                    body=splash.args.body,
                  })

  assert(splash:wait(0.5))

  local entries = splash:history()
  local last_response = entries[#entries].response

  return {
    headers = last_response.headers,
    cookies = splash:get_cookies(),
    html = splash:html(),
    url = splash:url(),
    http_status = last_response.status,
  }
end
"""

HEADERS = Headers({
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding': 'compress',
    'Accept-Language': 'en-US',
    'Connection': 'keep-alive',
    'Cache-Control': 'no-cache',
    'Pragma': 'no-cache',
    'Host':'m.toutiao.com',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Mobile Safari/537.36'
})

class MySpider(scrapy.Spider):
    name = "toutiao"

    def __init__(self):
        self.start_url = "https://m.toutiao.com"

    def start_requests(self):
            yield SplashRequest(url=self.start_url,
                                callback=self.parse_result,
                                endpoint='execute',
                                cache_args=['lua_source'],
                                args={'lua_source': script, 'http_method': 'GET'},
                                headers=HEADERS)

    def parse_result(self, response):
        print("ok")
        print(response.headers)
```

the first run correct:
```
ok
{b'Vary': [b'Accept-Encoding, Accept-Encoding, Accept-Encoding'], b'Timing-Allow-Origin': [b'*'], b'Set-Cookie': [b'tt_webid=653006869922952004; Max-Age=7776000'], b'Transfer-Encoding': [b
'chunked'], b'Content-Type': [b'text/html; charset=utf-8'], b'Connection': [b'keep-alive'], b'X-Tt-Timestamp': [b'152040098.652'], b'X-Ss-Set-Cookie': [b'tt_webid=653006899221952004; Max-
Age=7776000'], b'Server': [b'Tengine'], b'Via': [b'cache1.cn406[13,0]'], b'Content-Encoding': [b'gzip'], b'Eagleid': [b'dcb54e411524000986256455e'], b'Date': [b'Wed, 07 Mar 2018 05:21:38 G
MT']} 
```

the second run error:
```
2018-03-07 13:18:54 [scrapy.core.engine] INFO: Spider opened
2018-03-07 13:18:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-03-07 13:18:54 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-03-07 13:18:55 [scrapy_splash.middleware] WARNING: Bad request to Splash: {'info': {'message': 'Lua error: [string "..."]:14: attempt to index field \'?\' (a nil value)', 'type': 'LUA_
ERROR', 'source': '[string "..."]', 'error': "attempt to index field '?' (a nil value)", 'line_number': 14}, 'description': 'Error happened while executing Lua script', 'error': 400, 'type'
: 'ScriptError'}
2018-03-07 13:18:55 [scrapy.core.engine] DEBUG: Crawled (400) <GET https://m.toutiao.com via http://172.17.0.2:8050/execute> (referer: None)
2018-03-07 13:18:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <400 https://m.toutiao.com>: HTTP status code is not handled or not allowed
2018-03-07 13:18:55 [scrapy.core.engine] INFO: Closing spider (finished)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bad request to Splash & HTTP status code is not handled or not allowed #168

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bad request to Splash & HTTP status code is not handled or not allowed #168

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions