Hansard Speeches 1979-2020 Version 3.0.1

Download from Zenodo

A dataset containing every speech in the House of Commons from May 1979-July 2020.

Version 3.0.1

Version 3 is a major update of this dataset, with the following changes:

Coverage up to the end of August 2020
More information on government, opposition and parliamentary posts.
Information on MPs constituencies.
Inclusion of topical debate headings from the Hansard record
Removal of easily accessible information like birthdays from the flat file.
Fixes to various encoding issues.

File formats

The data is available as a tibble-format .rds file for the R programming language, and as a CSV/JSON dataset.

The CSV file does not contain the “government_posts”, “opposition_posts” and “parliamentary_posts” variables, those are available in the JSON files of the same name. This is due to the data structure which is not compatible with the flat file CSVs. Some MPs have multiple posts at once, which are in list columns. Eg:

[
  {
    "mnis_id": "177",
    "date": "2008-10-06",
    "government_posts": [
      {
        "gov_post_name": "Minister of State (Department for Business, Enterprise and Regulatory Reform) (Trade, Investment and Consumer Affairs) (also Department for International Development)"
      },
      {
        "gov_post_name": "Minister of State (Department for International Development)"
      }
    ]
  }
]

Dataset Variables

Variable	Description
“id”	The ID as assigned by mysociety
“speech”	The text of the speech
“display_as”	The standardised name of the MP.
“party”	The party an MP is member of at time of speech
“constituency”	Constituency represented by MP at time of
“mnis_id”	The MP’s Members Name Information Service number
“date”	Date of speech
“time”	Time of speech
“colnum”	Column number in hansard record
“speech_class”	Type of speech
“major_heading”	Major debate heading
“minor_heading”	Minor debate heading
“oral_heading”	Oral debate heading
“year”	Year of speech
“hansard_membership_id”	ID used by mysociety
“speakerid”	ID used by mysociety
“person_id”	ID used by mysociety
“speakername”	MP name as appeared in Hansard record for speech
“url”	link to speech
“government_posts”	Government posts held by MP (list-column)
“opposition_posts”	Opposition posts held by MP (list-column)
“parliamentary_posts”	Parliamentary posts held by MP (list-column)

Notes

The data used to create this dataset was taken from the parlparse project by mySociety.

This dataset is licensed under a Creative Commons Attribution 4.0 International License.

Please contact me if you find any errors in the dataset. The integrity of the public Hansard record is questionable at times, and while I have improved it, the data is presented ‘as is’.

The code used to create this dataset is on Github and is available under a GPL-3.0 License.

Previous documentation is available in the archive.