python出现RuntimeError错误问题及解决_Python

python出现RuntimeError错误问题及解决

2023-02-15 13:59舔狗一无所有 Python

这篇文章主要介绍了python出现RuntimeError错误问题及解决方案，具有很好的参考价值，希望对大家有所帮助。如有错误或未考虑完全的地方，望不吝赐教

下面是出现的错误解释

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:

if __name__ == '__main__':
freeze_support()
...

The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.

下面是出现错误代码的原代码

import multiprocessing as mp
import time
from urllib.request import urlopen,urljoin
from bs4 import BeautifulSoup
import re

base_url = "https://morvanzhou.github.io/"

#crawl爬取网页
def crawl(url):
  response = urlopen(url)
  time.sleep(0.1)
  return response.read().decode()

#parse解析网页
def parse(html):
  soup = BeautifulSoup(html,'html.parser')
  urls = soup.find_all('a',{"href":re.compile('^/.+?/$')})
  page_urls = set([urljoin(base_url,url['href'])for url in urls])
  url = soup.find('meta',{'property':"og:url"})['content']
  return title,page_urls,url

unseen = set([base_url])
seen = set()
restricted_crawl = True

pool = mp.Pool(4)
count, t1 = 1, time.time()
while len(unseen) != 0:                 # still get some url to visit
  if restricted_crawl and len(seen) > 20:
          break
  print('\nDistributed Crawling...')
  crawl_jobs = [pool.apply_async(crawl, args=(url,)) for url in unseen]
  htmls = [j.get() for j in crawl_jobs]      # request connection

  print('\nDistributed Parsing...')
  parse_jobs = [pool.apply_async(parse, args=(html,)) for html in htmls]
  results = [j.get() for j in parse_jobs]    # parse html

  print('\nAnalysing...')
  seen.update(unseen)         # seen the crawled
  unseen.clear()              # nothing unseen

  for title, page_urls, url in results:
      print(count, title, url)
      count += 1
      unseen.update(page_urls - seen)     # get new url to crawl
print('Total time: %.1f s' % (time.time()-t1))    # 16 s !!!

这是修改后的正确代码

import multiprocessing as mp
import time
from urllib.request import urlopen,urljoin
from bs4 import BeautifulSoup
import re
 
base_url = "https://morvanzhou.github.io/"
 
#crawl爬取网页
def crawl(url):
    response = urlopen(url)
    time.sleep(0.1)
    return response.read().decode()
 
#parse解析网页
def parse(html):
    soup = BeautifulSoup(html,'html.parser')
    urls = soup.find_all('a',{"href":re.compile('^/.+?/$')})
    src="/uploads/allimg/230215/1400132V6-0.png" />

	2.双击numpy修改其版本：

	

	3.勾选才能修改版本，将需要的低版本导入即可：

	

	弄完了之后，重新运行就好。

	以上为个人经验，希望能给大家一个参考，也希望大家多多支持服务器之家。

	原文链接：https://blog.csdn.net/weixin_42099082/article/details/89365643