{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[原版（英文）图书地址](https://www.oreilly.com/library/view/feature-engineering-for/9781491953235/)\n",
    "\n",
    "\n",
    "**代码修改和整理**：[黄海广](https://github.com/fengdu78)，原文修改成jupyter notebook格式，并增加和修改了部分代码，测试全部通过，所有数据集已经放在[百度云](data/README.md)下载。\n",
    "\n",
    "**备注**：请等待，数据集已经下载，代码还未测试，先放上翻译。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 九、回到特征：将它们放到一起"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当第一次看到图1-1 中从数据到结果的路径时，很可能会无所适从。纵贯本书，我们的重点在于介绍特征工程的基本原则，我们使用的是玩具模型和简单明了的数据集，这些例子是有意设计成有说明性和启发性的。\n",
    "\n",
    "机器学习的常见例子是展示最理想的情况和最佳结果，这掩盖了本书中描述的路径中的艰辛。既然基础已经打好，我们就离开模拟数据的简单世界，投入到使用真实的、结构化数据集的特征工程中。在前进的每个阶段中，我们都会研究如何从原始数据生成特征，如何进行特征转换，以及特征工程中需要何种权衡取舍。\n",
    "\n",
    "先说一下，这个综合示例的目标不是为数据集建立最好的模型，而是演示一下本书中几种技术的实际应用，以及如何更加深入地研究一下各种技术是否可以为建模过程提供价值。\n",
    "\n",
    "## 基于物品的协同过滤\n",
    "\n",
    "我们的任务是使用Microsoft Academic Graph数据集的子样本为学术论文构建推荐器。 对于正在搜索引文但没使用Google学术搜索的所有人来说，这应该非常方便。 以下是有关数据集的一些相关统计信息："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Microsoft Academic Graph 数据集\n",
    "这个数据集包含 166 192 182 篇论文，可经由 Open Academic Graph 获取，\n",
    "- 只能用于研究目的。\n",
    "- 完整数据集的大小是 104GB。\n",
    "- 每条观测有 18 个变量用以标识论文，包括论文题目、论文摘要、作者姓名、关键字和研究领域。\n",
    "\n",
    "**备注**：这个数据集已经下载并处理好了，如果只是为了跑通本文代码，就不需要再下载了，本文数据我已经放到了[百度云](data/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这个数据集被设计成易于使用数据库存储和读取。对于机器学习模型来说，它可能不够整洁，需要做一些基本的数据整理。有些教师喜欢省略这个步骤，让学生直接建模并得到结果，但我们可不这么做，我们一切都从头开始。\n",
    "\n",
    "第一步是将一些变量整理为正确的形式，建立一个基于项目的协同过滤器，看看能否快速有效地找到那些非常相似的论文。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 基于物品的协同过滤的起源\n",
    "这种方法最初是由 Amazon 公司开发的，作为基于用户的商品推荐算法的一种改进。Sarawar 等人详细介绍了将推荐算法从基于用户切换到基于物品的过程中的困难和收获(Sarawar et al. (2001))。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "基于物品的协同过滤方法根据物品之间的相似程度来提供推荐。这项工作分为两个阶段： 首先找出物品之间的相似度评分，然后对所有评分进行排序，找到前 $N$ 个相似项目作为推荐。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 建立基于物品的推荐器\n",
    "基于物品的推荐器完成以下三项任务。\n",
    "\n",
    "1.\t生成关于“事物”或物品的信息。\n",
    "2.\t对所有物品进行评分，找出与某个项目“相似”的其他物品。\n",
    "3.\t返回评分排序 + 物品。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第一步：数据导入、清理和特征解析\n",
    "与所有优秀的科学实验一样，我们从一个假设开始。在这个例子中，我们假定那些大约在同一时间而且在同一研究领域发表的论文对用户是最有用的。我们使用一种简单的方法从完整数据集的一个子样本中解析出这些字段。在生成了简单的稀疏数组后，我们在整个物品数组上运行基于物品的协同过滤器，看看能否得到满意的结果。\n",
    "\n",
    "基于物品的协同过滤器使用相似度评分来比较物品。在这个例子中，余弦相似度可以在两个非零向量之间提供合理的比较。下面的例子使用的就是余弦距离，它是余弦相似度在正空间中的补集，即：\n",
    "\n",
    "$$D_C(A,B)=1-S_C(A,B)$$\n",
    "其中 $D_C$ 是余弦距离，而 $S_C$ 表示余弦相似度。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 学术论文推荐器：简单方法\n",
    "第一步就是导入和检查数据集。在例 9-1  中，我们先导入数据，然后选择出一些可用的字段，以此来开始实验。在保留的字段中仍然含有丰富的信息，如图 9-1 所示。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 例 9-1：导入并过滤数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(20000, 19)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_df = pd.read_json('data/mag_subset20K.txt', lines=True)\n",
    "\n",
    "model_df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['abstract', 'authors', 'doc_type', 'doi', 'fos', 'id', 'issue',\n",
       "       'keywords', 'lang', 'n_citation', 'page_end', 'page_start', 'publisher',\n",
       "       'references', 'title', 'url', 'venue', 'volume', 'year'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(20000, 9)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# filter out non-English articles\n",
    "# keep abstract, authors, fos, keywords, year, title\n",
    "# model_df = model_df[model_df.lang == 'en'].drop_duplicates(\n",
    "#     subset='title', keep='first').drop([\n",
    "#         'doc_type', 'doi', 'id', 'issue', 'lang', 'n_citation', 'page_end',\n",
    "#         'page_start', 'publisher', 'references', 'url', 'venue', 'volume'\n",
    "#     ],\n",
    "#                                        axis=1)\n",
    "model_df = model_df.drop_duplicates(\n",
    "    subset='title', keep='first').drop([\n",
    "        'doc_type', 'doi', 'id', 'issue', 'n_citation', 'page_end',\n",
    "        'page_start', 'publisher', 'venue', 'volume'\n",
    "    ],\n",
    "                                       axis=1)\n",
    "model_df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>abstract</th>\n",
       "      <th>authors</th>\n",
       "      <th>fos</th>\n",
       "      <th>keywords</th>\n",
       "      <th>lang</th>\n",
       "      <th>references</th>\n",
       "      <th>title</th>\n",
       "      <th>url</th>\n",
       "      <th>year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A system and method for maskless direct write ...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[Electronic engineering, Computer hardware, En...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>en</td>\n",
       "      <td>[354c172f-d877-4e60-a7eb-c1b1cf03ce4d, 76cf106...</td>\n",
       "      <td>System and Method for Maskless Direct Write Li...</td>\n",
       "      <td>[http://www.freepatentsonline.com/y2016/021111...</td>\n",
       "      <td>2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>[{'name': 'Ahmed M. Alluwaimi'}]</td>\n",
       "      <td>[Biology, Virology, Immunology, Microbiology]</td>\n",
       "      <td>[paratuberculosis, of, subspecies, proceedings...</td>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>The dilemma of the Mycobacterium avium subspec...</td>\n",
       "      <td>[http://www.omicsonline.org/proceedings/the-di...</td>\n",
       "      <td>2016</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                            abstract  \\\n",
       "0  A system and method for maskless direct write ...   \n",
       "1                                                NaN   \n",
       "\n",
       "                            authors  \\\n",
       "0                               NaN   \n",
       "1  [{'name': 'Ahmed M. Alluwaimi'}]   \n",
       "\n",
       "                                                 fos  \\\n",
       "0  [Electronic engineering, Computer hardware, En...   \n",
       "1      [Biology, Virology, Immunology, Microbiology]   \n",
       "\n",
       "                                            keywords lang  \\\n",
       "0                                                NaN   en   \n",
       "1  [paratuberculosis, of, subspecies, proceedings...   en   \n",
       "\n",
       "                                          references  \\\n",
       "0  [354c172f-d877-4e60-a7eb-c1b1cf03ce4d, 76cf106...   \n",
       "1                                                NaN   \n",
       "\n",
       "                                               title  \\\n",
       "0  System and Method for Maskless Direct Write Li...   \n",
       "1  The dilemma of the Mycobacterium avium subspec...   \n",
       "\n",
       "                                                 url  year  \n",
       "0  [http://www.freepatentsonline.com/y2016/021111...  2015  \n",
       "1  [http://www.omicsonline.org/proceedings/the-di...  2016  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_df.head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>图 9-1：Microsoft Academic Graph数据集的前两行</center>\n",
    "\n",
    "从表 9-1 中可以非常清楚地看出，需要何种程度的数据整理才能将原始数据转换为更适合建模的形式。列表和字典便于数据存储，但如果不经过一些解包操作的话，就不够整洁，不能很好地适应机器学习（Wickham, 2014）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>表9-1：model_df的数据概述</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "|Field name|Description|Field type|# NaN|\n",
    "|:-:|:-:|:-:|:-:|\n",
    "|abstract|paper abstract|string\t|4393|\n",
    "|authors|author names and affiliations|list of dict, keys = name, org|1|\n",
    "|fos|fields of study|list of strings|1733|\n",
    "|keywords|keywords|list of strings|4294|\n",
    "|title|paper title|string|0|\n",
    "|year|published year|int|0|"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在例 9-2 中，我们先重点关注两个字段，将它们从列表和整数转换为特征数组，如图 9-2所示。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  例 9-2：协同过滤阶段 1：建立物品特征矩阵"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "9325"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unique_fos = sorted(list({ feature\n",
    "                          for paper_row in model_df.fos.fillna('0')\n",
    "                          for feature in paper_row }))\n",
    "\n",
    "unique_year = sorted(model_df['year'].astype('str').unique())\n",
    "\n",
    "len(unique_fos + unique_year)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "13251"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_df.shape[0] - pd.isnull(model_df['fos']).sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "9150"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(unique_fos)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Ancient history',\n",
       " 'Dentistry',\n",
       " 'Functional residual capacity',\n",
       " 'Hierarchical database model',\n",
       " 'Irrigation',\n",
       " 'Mycology',\n",
       " 'Noise temperature',\n",
       " 'Powder diffraction',\n",
       " 'Random test generator',\n",
       " 'Reaction intermediate',\n",
       " 'Rotating wave approximation',\n",
       " 'Schwarz lemma',\n",
       " 'Service delivery framework',\n",
       " 'Social environment',\n",
       " 'Transforming growth factor']"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import random\n",
    "[unique_fos[i] for i in sorted(random.sample(range(len(unique_fos)), 15)) ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "def feature_array(x, unique_array):\n",
    "    row_dict = {}\n",
    "    for i in x.index:\n",
    "        var_dict = {}\n",
    "        \n",
    "        for j in range(len(unique_array)):\n",
    "            if type(x[i]) is list:\n",
    "                if unique_array[j] in x[i]:\n",
    "                    var_dict.update({unique_array[j]: 1})\n",
    "                else:\n",
    "                    var_dict.update({unique_array[j]: 0})\n",
    "            else:    \n",
    "                if unique_array[j] == str(x[i]):\n",
    "                    var_dict.update({unique_array[j]: 1})\n",
    "                else:\n",
    "                    var_dict.update({unique_array[j]: 0})\n",
    "        \n",
    "        row_dict.update({i : var_dict})\n",
    "    \n",
    "    feature_df = pd.DataFrame.from_dict(row_dict, dtype='str').T\n",
    "    \n",
    "    return feature_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wall time: 1min\n"
     ]
    }
   ],
   "source": [
    "%time year_features = feature_array(model_df['year'], unique_year)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wall time: 58min 30s\n",
      "Size of fos feature array:  5856418904\n"
     ]
    }
   ],
   "source": [
    "%time fos_features = feature_array(model_df['fos'], unique_fos)\n",
    "\n",
    "from sys import getsizeof\n",
    "print('Size of fos feature array: ', getsizeof(fos_features))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "9325"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "year_features.shape[1] + fos_features.shape[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wall time: 24.9 s\n",
      "Size of first feature array:  5969031694\n"
     ]
    }
   ],
   "source": [
    "# now looking at 10399 x  7760 array for our feature space\n",
    "\n",
    "%time first_features = fos_features.join(year_features).T\n",
    "\n",
    "first_size = getsizeof(first_features)\n",
    "\n",
    "print('Size of first feature array: ', first_size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will start with a simple example of building a recommender with just a few fields, building sparse arrays of available features to calculate for the cosine similary between papers. We will see if reasonably similar papers can be found in a timely manner."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(9325, 20000)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first_features.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>19990</th>\n",
       "      <th>19991</th>\n",
       "      <th>19992</th>\n",
       "      <th>19993</th>\n",
       "      <th>19994</th>\n",
       "      <th>19995</th>\n",
       "      <th>19996</th>\n",
       "      <th>19997</th>\n",
       "      <th>19998</th>\n",
       "      <th>19999</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0-10 V lighting control</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1-planar graph</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1/N expansion</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10G-PON</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 20000 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                        0     1     2     3     4     5     6     7     8      \\\n",
       "0                           0     0     0     0     0     0     0     0     0   \n",
       "0-10 V lighting control     0     0     0     0     0     0     0     0     0   \n",
       "1-planar graph              0     0     0     0     0     0     0     0     0   \n",
       "1/N expansion               0     0     0     0     0     0     0     0     0   \n",
       "10G-PON                     0     0     0     0     0     0     0     0     0   \n",
       "\n",
       "                        9      ...  19990 19991 19992 19993 19994 19995 19996  \\\n",
       "0                           0  ...      0     0     0     0     0     0     0   \n",
       "0-10 V lighting control     0  ...      0     0     0     0     0     0     0   \n",
       "1-planar graph              0  ...      0     0     0     0     0     0     0   \n",
       "1/N expansion               0  ...      0     0     0     0     0     0     0   \n",
       "10G-PON                     0  ...      0     0     0     0     0     0     0   \n",
       "\n",
       "                        19997 19998 19999  \n",
       "0                           0     0     0  \n",
       "0-10 V lighting control     0     0     0  \n",
       "1-planar graph              0     0     0  \n",
       "1/N expansion               0     0     0  \n",
       "10G-PON                     0     0     0  \n",
       "\n",
       "[5 rows x 20000 columns]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first_features.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>图 9-2：first_features 的头部——原始数据集中观测（论文）的索引是列，特征是行</center>\n",
    "\n",
    "我们成功地将一个较小的数据集（大约 1 万行原始数据）转换成了 2.5GB 的特征。但对于需要快速迭代的数据探索过程来说，这种方法太笨重了。我们需要更快速的方法，使得出的特征占用更少的计算资源和实验时间。\n",
    "\n",
    "稍安勿躁，不妨先看一下，现在这种特征在下一阶段能为我们做出多么好的推荐（见例 9-3）。我们定义“好”的推荐就是与输入相似的论文。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  例 9-3   协同过滤阶段 2：查找相似物品"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "# from scipy.spatial.distance import cosine\n",
    "\n",
    "\n",
    "# def item_collab_filter(features_df):\n",
    "#     item_similarities = pd.DataFrame(\n",
    "#         index=features_df.columns, columns=features_df.columns)\n",
    "#     item_similarities.fillna(0)#后面添加的，把nan填充为0\n",
    "#     for i in features_df.columns:\n",
    "#         for j in features_df.columns:\n",
    "#             item_similarities.loc[i][j] = 1 - cosine(features_df[i],\n",
    "#                                                      features_df[j])\n",
    "\n",
    "#     return item_similarities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.spatial.distance import cosine\n",
    "\n",
    "\n",
    "def item_collab_filter(features_df):\n",
    "    item_similarities = pd.DataFrame(\n",
    "        index=features_df.columns, columns=features_df.columns)\n",
    "    for i in features_df.columns:\n",
    "        for j in features_df.columns:\n",
    "            item_similarities.loc[i][j] = 1 - cosine(\n",
    "                features_df[i].tolist(),\n",
    "                features_df[j].tolist())  #这里有改动，增加了.tolist()\n",
    "\n",
    "    return item_similarities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>19990</th>\n",
       "      <th>19991</th>\n",
       "      <th>19992</th>\n",
       "      <th>19993</th>\n",
       "      <th>19994</th>\n",
       "      <th>19995</th>\n",
       "      <th>19996</th>\n",
       "      <th>19997</th>\n",
       "      <th>19998</th>\n",
       "      <th>19999</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0-10 V lighting control</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1-planar graph</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1/N expansion</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10G-PON</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14-3-3 protein</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2-choice hashing</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20th-century philosophy</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2D Filters</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2D computer graphics</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2DEG</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3-D Secure</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3D computer graphics</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3D lookup table</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3D pose estimation</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3D radar</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3D reconstruction</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3D single-object recognition</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3G</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3G MIMO</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40-bit encryption</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5-HT1 receptor</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5-HT2 receptor</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5-HT5A receptor</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5052 aluminium alloy</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56-bit encryption</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6111 aluminium alloy</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>78xx</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ABO blood group system</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AC motor</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1988</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1989</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1990</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1991</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1992</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1993</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1994</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1995</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1996</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1997</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1998</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1999</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2001</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2002</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2003</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2004</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2005</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2006</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2007</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2008</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2009</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2011</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>9325 rows × 20000 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                             0     1     2     3     4     5     6     7      \\\n",
       "0                                0     0     0     0     0     0     0     0   \n",
       "0-10 V lighting control          0     0     0     0     0     0     0     0   \n",
       "1-planar graph                   0     0     0     0     0     0     0     0   \n",
       "1/N expansion                    0     0     0     0     0     0     0     0   \n",
       "10G-PON                          0     0     0     0     0     0     0     0   \n",
       "14-3-3 protein                   0     0     0     0     0     0     0     0   \n",
       "2-choice hashing                 0     0     0     0     0     0     0     0   \n",
       "20th-century philosophy          0     0     0     0     0     0     0     0   \n",
       "2D Filters                       0     0     0     0     0     0     0     0   \n",
       "2D computer graphics             0     0     0     0     0     0     0     0   \n",
       "2DEG                             0     0     0     0     0     0     0     0   \n",
       "3-D Secure                       0     0     0     0     0     0     0     0   \n",
       "3D computer graphics             0     0     0     0     0     0     0     0   \n",
       "3D lookup table                  0     0     0     0     0     0     0     0   \n",
       "3D pose estimation               0     0     0     0     0     0     0     0   \n",
       "3D radar                         0     0     0     0     0     0     0     0   \n",
       "3D reconstruction                0     0     0     0     0     0     0     0   \n",
       "3D single-object recognition     0     0     0     0     0     0     0     0   \n",
       "3G                               0     0     0     0     0     0     0     0   \n",
       "3G MIMO                          0     0     0     0     0     0     0     0   \n",
       "40-bit encryption                0     0     0     0     0     0     0     0   \n",
       "5-HT1 receptor                   0     0     0     0     0     0     0     0   \n",
       "5-HT2 receptor                   0     0     0     0     0     0     0     0   \n",
       "5-HT5A receptor                  0     0     0     0     0     0     0     0   \n",
       "5052 aluminium alloy             0     0     0     0     0     0     0     0   \n",
       "56-bit encryption                0     0     0     0     0     0     0     0   \n",
       "6111 aluminium alloy             0     0     0     0     0     0     0     0   \n",
       "78xx                             0     0     0     0     0     0     0     0   \n",
       "ABO blood group system           0     0     0     0     0     0     0     0   \n",
       "AC motor                         0     0     0     0     0     0     0     0   \n",
       "...                            ...   ...   ...   ...   ...   ...   ...   ...   \n",
       "1988                             0     0     0     0     0     0     0     0   \n",
       "1989                             0     0     0     0     0     0     0     0   \n",
       "1990                             0     0     0     0     0     0     0     0   \n",
       "1991                             0     0     0     0     0     0     0     0   \n",
       "1992                             0     0     0     0     0     0     0     0   \n",
       "1993                             0     0     0     0     0     0     0     0   \n",
       "1994                             0     0     0     0     0     0     0     0   \n",
       "1995                             0     0     0     0     0     0     0     0   \n",
       "1996                             0     0     0     0     0     0     0     0   \n",
       "1997                             0     0     0     0     0     0     0     0   \n",
       "1998                             0     0     0     0     0     0     0     0   \n",
       "1999                             0     0     0     0     0     0     0     0   \n",
       "2000                             0     0     0     0     0     0     0     0   \n",
       "2001                             0     0     0     0     0     0     0     0   \n",
       "2002                             0     0     0     0     0     0     0     0   \n",
       "2003                             0     0     0     0     0     0     0     0   \n",
       "2004                             0     0     0     0     0     0     0     0   \n",
       "2005                             0     0     0     0     0     0     0     0   \n",
       "2006                             0     0     0     0     0     0     0     0   \n",
       "2007                             0     0     0     0     0     0     0     0   \n",
       "2008                             0     0     0     0     0     0     0     0   \n",
       "2009                             0     0     0     0     0     0     1     0   \n",
       "2010                             0     0     0     0     0     0     0     0   \n",
       "2011                             0     0     0     0     1     0     0     0   \n",
       "2012                             0     0     0     0     0     0     0     0   \n",
       "2013                             0     0     0     0     0     0     0     0   \n",
       "2014                             0     0     0     0     0     0     0     0   \n",
       "2015                             1     0     1     0     0     0     0     0   \n",
       "2016                             0     1     0     0     0     0     0     0   \n",
       "2017                             0     0     0     0     0     0     0     0   \n",
       "\n",
       "                             8     9      ...  19990 19991 19992 19993 19994  \\\n",
       "0                                0     0  ...      0     0     0     0     0   \n",
       "0-10 V lighting control          0     0  ...      0     0     0     0     0   \n",
       "1-planar graph                   0     0  ...      0     0     0     0     0   \n",
       "1/N expansion                    0     0  ...      0     0     0     0     0   \n",
       "10G-PON                          0     0  ...      0     0     0     0     0   \n",
       "14-3-3 protein                   0     0  ...      0     0     0     0     0   \n",
       "2-choice hashing                 0     0  ...      0     0     0     0     0   \n",
       "20th-century philosophy          0     0  ...      0     0     0     0     0   \n",
       "2D Filters                       0     0  ...      0     0     0     0     0   \n",
       "2D computer graphics             0     0  ...      0     0     0     0     0   \n",
       "2DEG                             0     0  ...      0     0     0     0     0   \n",
       "3-D Secure                       0     0  ...      0     0     0     0     0   \n",
       "3D computer graphics             0     0  ...      0     0     0     0     0   \n",
       "3D lookup table                  0     0  ...      0     0     0     0     0   \n",
       "3D pose estimation               0     0  ...      0     0     0     0     0   \n",
       "3D radar                         0     0  ...      0     0     0     0     0   \n",
       "3D reconstruction                0     0  ...      0     0     0     0     0   \n",
       "3D single-object recognition     0     0  ...      0     0     0     0     0   \n",
       "3G                               0     0  ...      0     0     0     0     0   \n",
       "3G MIMO                          0     0  ...      0     0     0     0     0   \n",
       "40-bit encryption                0     0  ...      0     0     0     0     0   \n",
       "5-HT1 receptor                   0     0  ...      0     0     0     0     0   \n",
       "5-HT2 receptor                   0     0  ...      0     0     0     0     0   \n",
       "5-HT5A receptor                  0     0  ...      0     0     0     0     0   \n",
       "5052 aluminium alloy             0     0  ...      0     0     0     0     0   \n",
       "56-bit encryption                0     0  ...      0     0     0     0     0   \n",
       "6111 aluminium alloy             0     0  ...      0     0     0     0     0   \n",
       "78xx                             0     0  ...      0     0     0     0     0   \n",
       "ABO blood group system           0     0  ...      0     0     0     0     0   \n",
       "AC motor                         0     0  ...      0     0     0     0     0   \n",
       "...                            ...   ...  ...    ...   ...   ...   ...   ...   \n",
       "1988                             0     0  ...      0     0     0     0     0   \n",
       "1989                             0     0  ...      0     0     0     0     0   \n",
       "1990                             0     0  ...      0     0     0     0     0   \n",
       "1991                             0     0  ...      0     0     0     0     0   \n",
       "1992                             0     0  ...      0     0     0     0     0   \n",
       "1993                             0     0  ...      0     0     0     0     0   \n",
       "1994                             0     0  ...      0     0     0     1     0   \n",
       "1995                             0     0  ...      0     0     0     0     0   \n",
       "1996                             0     0  ...      0     1     0     0     0   \n",
       "1997                             0     0  ...      0     0     0     0     0   \n",
       "1998                             0     0  ...      0     0     0     0     0   \n",
       "1999                             0     0  ...      0     0     0     0     0   \n",
       "2000                             0     0  ...      0     0     0     0     0   \n",
       "2001                             0     0  ...      0     0     0     0     0   \n",
       "2002                             0     0  ...      0     0     0     0     0   \n",
       "2003                             0     0  ...      0     0     0     0     0   \n",
       "2004                             0     0  ...      0     0     0     0     0   \n",
       "2005                             0     0  ...      1     0     0     0     0   \n",
       "2006                             0     0  ...      0     0     0     0     0   \n",
       "2007                             0     0  ...      0     0     0     0     0   \n",
       "2008                             0     0  ...      0     0     0     0     0   \n",
       "2009                             0     0  ...      0     0     0     0     0   \n",
       "2010                             0     0  ...      0     0     0     0     0   \n",
       "2011                             0     0  ...      0     0     0     0     0   \n",
       "2012                             0     1  ...      0     0     0     0     0   \n",
       "2013                             0     0  ...      0     0     0     0     0   \n",
       "2014                             0     0  ...      0     0     0     0     0   \n",
       "2015                             0     0  ...      0     0     0     0     0   \n",
       "2016                             0     0  ...      0     0     0     0     0   \n",
       "2017                             0     0  ...      0     0     0     0     0   \n",
       "\n",
       "                             19995 19996 19997 19998 19999  \n",
       "0                                0     0     0     0     0  \n",
       "0-10 V lighting control          0     0     0     0     0  \n",
       "1-planar graph                   0     0     0     0     0  \n",
       "1/N expansion                    0     0     0     0     0  \n",
       "10G-PON                          0     0     0     0     0  \n",
       "14-3-3 protein                   0     0     0     0     0  \n",
       "2-choice hashing                 0     0     0     0     0  \n",
       "20th-century philosophy          0     0     0     0     0  \n",
       "2D Filters                       0     0     0     0     0  \n",
       "2D computer graphics             0     0     0     0     0  \n",
       "2DEG                             0     0     0     0     0  \n",
       "3-D Secure                       0     0     0     0     0  \n",
       "3D computer graphics             0     0     0     0     0  \n",
       "3D lookup table                  0     0     0     0     0  \n",
       "3D pose estimation               0     0     0     0     0  \n",
       "3D radar                         0     0     0     0     0  \n",
       "3D reconstruction                0     0     0     0     0  \n",
       "3D single-object recognition     0     0     0     0     0  \n",
       "3G                               0     0     0     0     0  \n",
       "3G MIMO                          0     0     0     0     0  \n",
       "40-bit encryption                0     0     0     0     0  \n",
       "5-HT1 receptor                   0     0     0     0     0  \n",
       "5-HT2 receptor                   0     0     0     0     0  \n",
       "5-HT5A receptor                  0     0     0     0     0  \n",
       "5052 aluminium alloy             0     0     0     0     0  \n",
       "56-bit encryption                0     0     0     0     0  \n",
       "6111 aluminium alloy             0     0     0     0     0  \n",
       "78xx                             0     0     0     0     0  \n",
       "ABO blood group system           0     0     0     0     0  \n",
       "AC motor                         0     0     0     0     0  \n",
       "...                            ...   ...   ...   ...   ...  \n",
       "1988                             0     0     0     0     0  \n",
       "1989                             0     0     0     0     0  \n",
       "1990                             0     0     0     0     0  \n",
       "1991                             0     0     0     0     0  \n",
       "1992                             0     0     0     0     0  \n",
       "1993                             0     0     0     0     0  \n",
       "1994                             0     0     0     0     0  \n",
       "1995                             0     0     0     0     0  \n",
       "1996                             0     0     0     0     0  \n",
       "1997                             0     0     0     0     0  \n",
       "1998                             0     0     0     0     0  \n",
       "1999                             0     0     0     0     0  \n",
       "2000                             0     0     0     0     0  \n",
       "2001                             0     0     0     0     0  \n",
       "2002                             0     0     0     0     0  \n",
       "2003                             0     0     0     0     0  \n",
       "2004                             0     0     0     0     0  \n",
       "2005                             0     0     0     0     0  \n",
       "2006                             0     0     0     0     0  \n",
       "2007                             0     0     0     0     0  \n",
       "2008                             0     0     0     0     0  \n",
       "2009                             0     0     0     0     0  \n",
       "2010                             0     0     0     0     0  \n",
       "2011                             0     0     0     0     0  \n",
       "2012                             0     0     0     0     0  \n",
       "2013                             0     1     0     0     0  \n",
       "2014                             0     0     0     0     0  \n",
       "2015                             0     0     0     0     1  \n",
       "2016                             0     0     0     0     0  \n",
       "2017                             1     0     0     0     0  \n",
       "\n",
       "[9325 rows x 20000 columns]"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first_features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wall time: 1h 3min 18s\n"
     ]
    }
   ],
   "source": [
    "%time first_items = item_collab_filter(first_features.loc[:, 0:1000])\n",
    "#这一步时间非常长，大概要1小时"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Why does it take so long for us to calculate the item similarities using only two features? We are taking the dot product of a 10,399 × 1,000 我们只是使用两个特征来计算项目相似度，为什么计算时间如此之长？因为我们使用了嵌套 for 循环来计算一个 10 399×1000 的矩阵的点积。如果向模型中添加了更多观测，那每次循环的时间还会增加。注意，我们只筛选出了英文论文，这只是整个可用数据集的一个子集。当得到一个差不多“好”的结果时，还需要回到更大的数据集合上进行测试，看看这是不是最好的结果。\n",
    "\n",
    "怎么才能做得更快一些呢？因为每次只需要一个结果，所以可以修改一下函数，指定我们需要的前几个结果的数量，每次只计算一个项目。我们以后会这么做，因为需要持续改进实验。眼下还是使用全特征空间，理解一下在实际数据集上使用暴力算法时迭代造成的影响。\n",
    "\n",
    "要得到好的推荐，需要一种更好的特征转换方法。我们有足够的观测来改进吗？让我们绘制一幅热图（见例 9-4），看看是否有彼此相似的论文。结果显示在图 9-3 中。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 例 9-4:论文推荐热图"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import numpy as np\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWIAAAEECAYAAAAS8T49AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzsvXl8VcX5B/xNIAlBCAgGwtKwJ6CCaHmtWgVRCpZFQdSKAtYqSi2iuL1W0BRawPV1xQVti4D8CmJZoggWUYqK2ABRMOHe7DGSjYQkhKzAff8IczPnnJk5c7a7hPPlcz/knjtn5pln5jxn5plnifD5fD64cOHChYugITLYBLhw4cLFuQ5XELtw4cJFkOEKYhcuXLgIMlxB7MKFCxdBhiuIXbhw4SLIcAWxCxcuXAQZriB24cKFiyDDFcQuXLhwEWS4gtiFCxcuDKK2thaTJ09GUVGR5rfMzEzcfPPNmDBhAhYuXIhTp07p1ictiE+ePImGhgZj1Lpw4cJFG8P333+PGTNmID8/n/n7448/jmeeeQY7duyAz+fDhg0bdOsUCuKTJ09i8eLFGDVqFEaNGoVLL70Uv/rVr/DUU0+hpqbGVCdcuHDhIpyxYcMGpKSkoEePHprffv75ZzQ0NGDkyJEAgJtvvhnbt2/XrbO96MeFCxeib9++2LJlCxISEgAAJSUlWL9+PZ544gm8/fbbZvrhwoULFyGFmpoa5uIyLi4OcXFximtLly7l1lNWVob4+Hj/9/j4eJSWluq2LxTEHo8Hr7zyiuJanz598Mgjj2DSpEm6lasRmzgD9YWLDd8XTIzdVo4vJsYrrm3IzcVtAwdy75m5uxhrx/RymrRzGg9/W4RXrujL/C3lQCEWX5YovD82MUU4F/V+txMy9NrVTs6JKNvmJuvZUGNPSRauSRhiqN6XDhXg0eH9ACRZoK5F3sji+cevwhtvvKG5Pm/ePDz44IPS9Zw5cwYRERH+7z6fT/GdB6FqIioqCj/99JPmemFhIdq3F8pwJtQTe+y2csN1OAERHayJJhLCAJgTfebuYs212MQU5nUZsO4zW5eTiE1MkSpnlHaeEAYgJdT0hGwgFwxqemV5JgtS3+LLEm1dINDPBo9mo0IYwFkhbB0REZHSn7vuuguff/655nPXXXcZajMhIQHl5a3y5NixY0wVhhpCQfzII4/gd7/7HebOnYu//OUvWLx4MR544AHccccdePTRRw0RyILobZpyoND/d0GtR7euNzPyNeVSDhTiaB3/3qxqry4dapQ3HJEuS4P1ANQXLtZ9MGg+0CiqbX0Rkn7z6uLVIQsZ/pfWH2GWFQm02uaj/vHh0f7Zz9kAgGs/Mf/STi3M8f9NxlwEml9q3h2s8DKv2wFCW33hYhTUelDVlK2gl/xN2palgYwBGSMnUF+4WPisifBDpf6YmEEEIqU/cXFx6Nu3r+ajVkvooU+fPoiJicH+/fsBAFu2bMHo0aP1adWLR1xZWYmvv/4axcXF8Pl86NWrF66++mp069bNEIEtcIbhNNIrvBjZPUnxd06NB4PikjVl1SoE+l7Wd9G9Mqhs9KBbjJaOgloP+nVSXs+s8mJY15a212TnYdbgAcK6q5py0DV6kOIa6z6jdPN4x6PV6L2hjOI6D3p1dI5+Ht9YsKruMtKWUTy2rwgv/oq/QzEPa/R26i+/mq3Nf99w/ddddx1Wr16Nvn37Ys6cOZg/fz6GDx+OI0eOYNGiRaitrcVFF12E5cuXIzo6WliXriC2F1pB7IQuzin9XiD1hi7CA+E4J8LnDMOaIO484G7psify/mmpLasIukNHfeFi23ViTj0YrHrHbivHgIcyLddtRV8sgp11ytYl4oeZsTbSB3VZu+dWW4CMEDZyfkPzXI/fdjwrsoiIiJD+BBtBXxE7haKTHvQ9z9i2Uk9VYRRmt7a8FYtoe8lSTejd4ySMthssOgGguikXXaLFB7BmQfrFGtNg9jlUoeSJNd50GXSfdNnqnJWW2rKKNiuI7UY4bkH1IGN+5ATCZ2vsIriwJoi7Dp4rXbYqO7g+EUFXTahht0nbw99qfcEB8Rbq7cx8zbVQFcJWVA/BEMJAy9bYVRm4EMEOOWDEaiLYCD4FKtgtHIi9qVogiwTr3GH9pepmCROe4JeFUbOocFlZqh8sJ15ssg/vS4cKDNdNj7UTLxFSp506fdaCwgpY/Za9ZhR2yAEjdsTBhquacOHCRYjCmmqie9J86bIV3tcstWUVwX8VUEg/ayyfWcUX2GZXDDk1+sbmpH3ed717eG0UmzR0l8Fj+4r8/KL5RmgR8VINVlm6T0bqMtoOqz2j98qMVyggs8rLpZU1lnp8V//OcsBh1asH0VhUNhp/nmRRdNKe5yWcVsRBpUDt6UMsFNQnyX9Oa3WzFm3F95Rk+eusbspT/CZyLCD3kPZrmgsV32mo66XLDIpLxvFGr4bmE83WzWNo7zD67xd/1dfPL5pvpL/0NZFnVWn9EazOjvV/Jzyg+Tasa5Km/2ZAaKo/dQwAUEF5K3aLaeXVnpIs7r0syFi4sHhQd6pcQwfQygO7QOrv0eEMl1bWWPbocEZYJylL5p7aOYhXb01zIZPHBKJnhuWYRNon83Nk9ySFMCZzp7opl3lvamEOKhqOGLZ24iEyop30J9gIedUEba3ghOVCWzzBbysWHgMeykTeq8OCTYYLm9AazEcW1lQTPYc9Ll22NPMFS21ZRfDX5DqgBYoR4SK7vXFKCJvdxsvEddCDLJ/UW0e7VA9WQG+Hgy2EN+ZpV25WoFYPyPLbzLjIqAWK6zwBHXM9IWy3Q5OrmuCAMJqlM7V7QpDtTU6NR6PrMtJWeoVXc//Os4Fo1CD1PraviLuFXpOdh8s/LOO2x9pW8mCVZ+rtMaFZTwA9tq8Ia7JbVRSZVV5bXiCAdjvM2sYCYv0loUWkm5+5u1ijilD/fssArZMHb+yLTnqwr4y/zQe06gHy/5sZ+czyJMAUay7pCa2R3ZM0/X8zI18xZ3p1TLbkUCIj7KuasqV1xWRRZJeOuEW8yX6Ci5BXTbQl8IL+BAtWPQf14GTQn0AEFAqloD96aIsqNquqiV4XLZQuW/wjP9h7IBDUV8FewQpC9BurrNkT2vQKr6G2zCK9wqsRwmO3lZumm6yI6PtZqyRR/2SFsCyN6vYHxSUzbXvp+jLOrtB4bZDTefI7+d+IEJYJfckCSwjbaQHDE8JmnBloIazuL5lnRvmQYWDHZcUBgz58BoDcE/bwODKivfQn2AiqIL6yR0vQaNYgkt9k61Gf0Jqhwwz2lmUpHARYgqWw1sMUel9MjPdfNyqUWaufxy4+obn24uHOlvoHtApsve0waZ/mh9owXz3WF54VRryXQreYZEMrdzUPx24rx5Au9q361cLZjADSG2ejzgyy9Rnhg5HFSXqF19+GHj9YzjRTErUxUuyAqyM2CLNeNGodqehh5ekUrW7NY9v58OjwfsisatElswTLU/v1g0sToWxU1xpDWd6sze7o/5v0N6HDaUP1iQLfq+t6bJ/Si5D0WXQo88XEeD+dLMHOGiealzEcS6Nl6QWasqQ9NWgdpEiIyax+zcxdepx4MKL/V/Mkup1W20j4QuoV6ceBlsVJTGRLPbTJIuEXbUdM85zmB4u3MlYTC/Z11S0jg3CKvhZwQUwf8shcF8GIfm1QXLJp92P11okGbftMtsvqvqwd00uqf2uy8xSHdazVhboemgd0cG5Ci9GA3fEdhiLtWOtqiPT94W+1wb/J9zXZedh1lH2IxQK5j7Wq11M58Mb8qZH9pGkhB7kZVV58fjQGAJj36OmHRWNKfqPLkL9pPtIrT3q1SPdTb+6oeSI68CVlu6vGGdC65w+KS8ZLhwrQJbo1uQCZ7zJnHbSAFj1DNNZk52HLuJ5SZfUQTivioB/WhVNgeBcu2gJm7i7GR7PeDoNnxNputd8ly6TLFnz/lKW2rCLorwKzk4F48fDqDHZ0L1kTHNlyLx0q8K+MWHo2IwccPP2fDC2VjXJ5z8yaIDmZV40Hu0zv9EB4YpQ39plzteDZUTWoL1yM441e6bpFz5saZoIqWbmPh3BaEQecAp5LpcjVkoXzY8RvS5aANxuNSr2F08Oekiz0PS8ZXuqUmtc/2p1TXYbeKj46vB9mDR6APSVZTD3bwM7yVgSsw7vUwhw/Ld5qr5920nfCu24xQzX37inJ0iSAzDthbmr1jB2Ki1eVmrqXRwsPhbUepBbmoF+nZFMJLEVzlvxGlyE84bnw8rbvIl7S9cvOU9J+eYOSFtbzQWjSe95osOanDG2PDu9nWA6I4FpNcJByoBCHKqOYvyV24vvT24WrejYZzsKcVe2Ft1p5GlLVJNZBPr2/5bChorH1EEDdP5YpkZo3NyY26JaxC/TJ9QUdziDp7An7qAtahPZVPZsU5Wk+HqqMQs9YZf/MpFEnOPx7fR0hbxV7qDIKBbX6sQNSDhSiqikCacda+PlGRifN73oQzVnyG80H3tiRXcDQLuyDVdGY0/WTseKhtP6Igm/0/AS0YyyiCWidw6LszSReR++Ocs/3NQlDTGdK1yAiUv4TZARNR7y3LIu5MmNdlzVfMuqgwGqLRxdB7gkPc/Vp1jlC7z46iwavrNOOGTzQ7WZVe3HyVMsBzUuHCnB970ZdmqqbcvFz3Sm/pYlTyKr2+k239MYXUPKcl8XEiewmNG2sMSW/k7aN0kD44NR8yajy2jaWLX2dZKmOQb98Rbpszv6HLbVlFUFTTagfBt51QN/EjGwr9cqp9castljX6G0VTwVAt036QqsmrtnKdmum71OXefjbIsWDNrJ7EnPrZvdDRdOtVk2w2t1TkoX600rzNRmaukQPtF0Is1QTtP0sPb5ENQFAcQ/Nc56g+2JivGHVhN62+1hD6+PImk+EdkKT0RcB4QOp26valYlUEzIQjaWs2kQkB4winMzXgm41EWi4FhUuXIQLrL2kh4x6XbpsVtqDltqyiuArRyQgG5VJphwthM14Rdl9sqsHGesPu6NWmYHRFE8ATB2QmQFNm8yYy6RFcsIqh6ZNNKak7WBbBqlhNU2Y3YiIbCf9CTYCLoiXpRcwlfHEM4oF2WAmvHKFZw8o0o61BI4n3ltfTIyXMpeit2yPDu+nm52A9IWemLTHGO9+NQ9SC3M0q3e6DKnT7mAvrAhj6q0lvWVdll6A5C6nDLczopv+iqemmT0veJ6SOTUe/+ERmWeLL0v0/05v5186VOAfo3U5rVHeRKFXydjVFy5W0JBamIOqplaesOjjzXFC74qrjvuv0WOqrqu+cDEyqry6Ozu9Qy/1OLPUBzLxJsjzRfJDisoEFOETfC04qonjjV6NOUzRSY9tkflpEEHbM7bV7MpoW6X1RxT3Vzfloku0NkQiqVdU/96yLHir2+OuIQOYvxuBWZ7pHVilHcsSnsCvyc7D4LhT/jqKTnoQ086H+A5a0zarqG0+ik5RvU3dy5pnNKqb8hReYzI4fNyLi8/X1lnZeARl9ZEYakLnXd5whMk73jwD5IKsq/vPa8dp8A649WFNNZF05VvSZb17/2ipLasIyrvgxUOxmkMAs3anepj79fn46aRy62G0LVoIA8DxxmZmObWd6EIqXRLBlT2GYEyC1kzIDMzyTO8gJC5K/G6eNXgAukS3lul7XjL+3//px9Mwg7rTNabvLdda/ylQ3WR8HDq1Z/OmpD4SnXT4xgOPd7x5BgD/3N9Bt151/+M7DDVlp6sXl0IP5oSwDYiIkP8EGWF7WFfZeEThXFBQ61H42Afr7e8k1H2kUdNciLioROZvLoKDHyq9UuqXQEM0j3hQP2+BgcUV8dVvS5f1fjXXUltWEQLaEX2wIlGpJ0W/TsmKcrQQ5mXo0MvazDowKTrp0c3QYSdoGngPT2aVN+hC2IybMJ2cUx3JTf3dDEj0NJo2mVCjNM9n7i7W0JJZ5dXMDZarsJ4QVs8XmjaZTOZ6h7S86HFGhTDA9qhUQyZTeiDhi4iQ/gQbIbUiJgr9bT/FYO6w/oZrN5JR4e3MfG4bdpi4OZ3dwYl2nKaZrl/Efz26Cms9SNQRJry+iO61s/8b83Jxy4CBUrSapcUOegmddtT98LdFmgO7/BMebC8y/jy38G2KoXvUGHLtSumyWV/eZ6ktqwiKIDbyEBI44ckkgowwPpdtkslhkdM8+K48C5fHWzfud2Evthfl4Ia+zgR0b4VFO+Kx70qXzfpijqW2rCIoqgkzq10nhDArYtlLhwpwvFFrGsSyHw53IUz332h6GnJiL+IBzzTQSDSxtiyERXwwE5/bbhpEoIUwodWIF57d9DARYeATZISUaiJUES4r30DvGuyAiLcDHspE3qvDAkxReIGlDrALwee/xRXxuPeky2btvNdSW1YRFod1LOh58djxdiYIFSEscnrJqPJKC2GjvDFzCCN7j4i3dgoBUQhUntMIAelLqHmOAWInCh5kx2b3UnvFQ8BjTbvmazyE54qYIFxWxi5ctA1YXBGP/7t02azP7rHUllWE7Yo4GAiUELaSmtwsjGTtDRTMZuUOBRDayf+BGFMZd+RQh61j3i5C/mMAqampmDhxIsaPH48PPvhA8/uPP/6I6dOn48Ybb8T999+Pmhp9p6SAC+Ih75Qwr9N2k07Y4wLGs4AQiFQCgPZghdBP23Gq+8SyASX2qkTFwKqXxz8Cs4cdtLfdxrzWuAuk7yLePbavCNd+Yk7QiLbJh44bD4L/2L4ivJmRL1X2z2k/+cdB9h4a6jGlY4iQTM3k/y8mxjNto+l5wBNCejbVhA6ZkKI0zeo5yArVKmuzzALdH71niMwDXpZuM3DCjri0tBQvv/wy1q1bh82bN2P9+vXIzlb6FSxduhTz58/H1q1bMWDAAPz97/orc1c1YQEsVQXr8OTiVaVSWSfOJTih5iGHlXqHTDJxGtSg6XVSRTVzd7FtQZzMmImKINtv+/hjTTUxePIq6bIH1t3MXLnGxcUhLq7VBX3Tpk343//+h2XLWhKTrlixAj6fD/PmzfOXuf322zF79mxMnDgRKSkpSEhIwB//KI5lERKqCbJtC/b2WN2+Hj0kSSlt2sY6PNl2S5Vu22O3lSO9wstMoSQCWXXwknoa2erplZVdFYlChZKxNvKgyvZh2agqpFd4FUKYpRKQFcJ0xDBeRDa7VQ6PXXzCtrqsCGFWv3hjRo/P2G3lwrENaBhZA+Zr77//Pq6//nrN5/3331dUWVZWhvj41kPxHj16oLRUmWPxySefxKJFi3D11Vfjm2++we23365PaltaEZtJAWM+MpQWGdQWkabFrHdVqCFYKZnaGuzmY9sdF4sr4hvf1y90FgfWTpNaEb/11ltobGzEww+3pFbasGEDDh8+jCVLlgAAGhoaMH36dCxfvhwjRozAP//5T+zduxcrV4q9/EJiRWwXzExGOyNDXdg1ya+no2lpC0IYsD8lkxlYCYJvRyD1Dbm5zOtGVnp28zEUxiUkYeCwLi4uDn379tV8aCEMAAkJCSgvb90tlJeXo0ePHv7vXq8XMTExGDFiBADgd7/7Hb777jtdUoMqiK2ckOo9kEY8kwJxOs86gJy5u9jQwSR9aMMKGOPUIacZEFpkBacs7Wb1p8vSC7hb5uomtnBl4baB7PjARnXOIph92fAOPtX1OR2cigdbveZk4IAd8VVXXYW9e/eisrIS9fX1+OyzzzB69Gj/7/369UNJSQlyz76wP//8cwwfPly33oAKYnUkK/pNbnRysB5Iuo5Zg9kBv+nJsCy9AEUnPcwVxY/HvUivaPmM+pdSB8TKfEDHbCUrL3pyDju7UiYviPQKL9aO6eW/vqckS3M6rn6ZvPirVv3zMNUJ+cjuSZprZkFbTRDQVhOZVV4FH1mWHIQWWcGppl3v5UiPNd0+ax4VnfTgqZEtgpLwdObuYn+fukQP9FtNkHbJ2PEimOmBRQdvcaCOXKfmmZrfPAyKS2a2TeorqPUgs8rr57VaeMpaFZFyhFc5NR7dcV6WXiCdxMA2ge2AIO7ZsycWLFiA2bNnY+rUqZg8eTJGjBiBOXPm4NChQ+jSpQuWL1+Ohx9+GFOmTMFHH33kP9gTkhoOOuLH9hUphBAPQ94pwYe31GBk9yRLGT+sRLXKqfFgUFyy4uTYTH30AwMoT9PtPFmnocdnNU16ILwwgmXpBX6h6QRE+lR1/+yORkfqf2xfEWYOrpNWKejx3ei4EBTXefDSofOkni0ZkPGm5+eBY1m4TJDthcbM3cVYeMkJqi8WdcS3rpUum/3hTEttWUVABfHbmZ/Zak5jBqL0OepDNXJqrtbx1jQXIC6KLywCFQLTTgT7QNEoz0SB/2WCmOuZdtlt+sUCCUEZqKDrlY1H0Hg6wra5KTNnapuL0CnKmKBvfUYtCuLbtM4WPGRvuNNSW1YRUNUEmdgsUysjgcWNZAw+WteaTBIAzo9J4ralnlSJnZKR2ClZcT8AnPGxU9iQepvO8OnZdTQbt38hdsqQhZlg7DyQvqccKNQ123szIx+7jrYasRfUeizFESio5Qthnt5TlH2FCDWeSR/QMhdFWaR5QvggR2VS3nAEPx5n/8abryQOcDNnvlQ1sRMQiOoUofkMbF0gyLy4O0X1NWySKcozaAhu9DUeQucwyUUrnHJQcDIymB5iE1Pw8o67g74DCzbCOz6KNYE86I7/ky6bs26GpbasIqArYjvMhwKJsdvKmYbtLBMmOqKZzAlyMOJJ8ODUgxosIQy09IkIYZZpmd5cJL+LorYZxYCHMpnXZVa3Zh0h6LFNOVBoyfyP4OJVLYfXTs1h25w+2tKKuLS0FJ9++ilKS0sRGRmJhIQEXHvttfjFL35hojn2ijic39pjt5Ujf8cxyyEbYxNTMH3NXNsP4ew82JOtS+RibGasjfRBXdbpuRXOc1cEI7GtaZ7r8cNYjGOLK+JZ/5Ium7NG3/vNSQhXxDt37sStt94Kj8eDmJgYtG/fHhkZGbjjjjvw8ccfm2qQZcJTX7hYYbJi1nxF776MKq9Qb0hwvFH5whDpP7+YGI91C9tr6Cg66VHUo0dbfeFiTOijzH/O0tUazdwgK8ByT3j8NKYdy/LTTlb6GVVebl1rsvMUOuO8V4dx+ysjtMiKiPTfyIvk2VE1eGxfEW7aWcpsj85EklHl9be162i2Yd4WnfRw+0P6T9epVz8vS4roPqPPCl0+7ViW4jsdb5kIYZnVKRmf0vojGn6caP4JQOs8ynt1mPAMwt4MHfabrzkFoSB+6aWXsH79er9d3IIFC7B8+XJ8+OGHePPNN001uOTzWOb1E80RzL+NgDZXYw1oRUOE1Om0+rDg7x42zQTZNUpB/Mrh83CiOUJRj0yf1Lyh6yVbyiWfx9qyvVRjYOdkP435J9r5aT9U2UJDRYOSfpq/Sz6PxZEqJQ/Mmg7m1Hj8zhF0RDhZnGiOwMDOp7BlnDLIErGNpj0p12TH4vWNpwEASw524dqei9rS+40e08WPtBzS0na+tJCtaWLXx3tmAON8pmm+Y1Nnxff9ZdGa8gcrtdd46Bnb8mzR87NzVMvOmcwjgD+uM3cXm372mWgrqomJEydi27ZtzN8mT55sYlXcOgGJMbjaljK9wouYdloDf6MwYkdsh6++GVvPmbuL8dGst01tben6yN9m7UnNQt0eiyZZtUKgaQ9l8HimN095NthO2Z072UZLXydbqmPQ3Ruky+b88zZLbVmFUBA/+uijOO+883DbbbehV69eiIiIQFlZGdavX4+TJ0/i+eefN9gcW0dM9FF7y7K4b0vRb1ZgZ9CftganeO4UZOgN9T4ZWRTYFewno8qriWXMq5vmnx28FNdhrW8D7/1Qumzue7daassqhKqJpUuXonPnznj00UcxduxYjB49GvPnz0eHDh3wl7/8xXBjvC010UfRA6J2cRUN+JsZ+UyXUpkYEnpCmFUHy8WZRiBiPhhpw6wqg/BcLzA5C7L06eVPI7a+ahpY9xF61fbVBbUef3lSRo8nvN/1xt4MaHrVwk9Ep1EhfO0n5cxxYQWU59VNP4ex7fQtX0W20KQ+tWu1bXriyAj5T5ARlnbEvJPXNdl5hvR8ZgKEB3pFJXPKbLTfTmDX0Wxc13uwoXuO1nnQOwAeiDRtMmNO85zHfycyHNO0icaU13ZqYQ6mJA5i3GEcZuaUne23wOKK+P6PpMvmvjPdUltWIVwR/+Mf/wAANDU1Yfny5bjuuuswYcIEvPrqq2huZnuXmYHR6FDqSUhWraKJw1pBPTq8n+HVq1oIz9xd7D9wYUVEA9hBdNQgkdjUqzm9hz2zyotRFzRr2if9NbOa5UGUpokWwqJTfjriHC2EyTXWOBlJo8VKSUXTJvPiJTwvrvNw+U+um9lt8PpA00bGVNS2uj4iBFkel6SM7Hwffn6zcLdCp4Wi29dL/SRjmeKuiFWYNm0aNm3ahOXLl6O4uBj3338/zpw5g1WrVuG8887zB0OWh9wkaAsHN1b7EAj7VLsDiqv7bCboj+w9Zuo2CidjhrTdYO52wuKK+E//li6bu+JmS21ZhZRn3TfffIMXX3wRF110EYYPH47nn38e+/btc4woJ4QwbSNpF1jeWaQds30gXlaBcBKwWxCo+0wLSlkvLFnhyipH2uB5sBHIem7RQpgeazs8RNW8Z4VOtQo7vQIBdr9lrwUF7SLlP0GGcEU8efJkbNmyBffccw9ef/11dO7cGQDQ2NiIadOmcU3b+HDmECvlQCEWX5aoW85N4unCRTjB2kJhwENbpMvmvXqTpbasQvgqiI6OxhVXXIG8vDz87W9/AwCkpaVh1qxZGDt2rK2EiFYCem9YGSEMIKyFcCBXGayVFIkvQMMJx5Jg4Lvy4Cat5YEV0yTQq81QioliGJEGPkGGkIR///vf+O9//4ulS5fihhtuAACcOHEC06ZNw3333WcrISJjcNY2XS9zglF3VYJApE0CjAsxkjGad9hC6JY56NDjDStiGeslJhozu8zXAoHkLu1sqYd1gGUFrLRMrGfBaFtGDnCJaamZTCVGniVH5kEYHdYJBfGPP/6IiRMn4o9//CM2b96M2tpajB07FjNmzMDvf/97x4iSibPLOkSh71NbUKjOb34LAAAgAElEQVRjCvPaMqo3VcdaJe3Q9bKiaw3qzD4VF8W1OJJxBzq2V2qSCmo9KKhtSfdUUOtBzFnbThYdBGZN3a79RH51JNKR0zTFKj2juePEutcudIlm56GTiUtCo11EqzuwepxokLRMAD/ympF+/vHr8xXfWXOItFlQ65HOyEGPxdtHlG7W1U36C51NBR2k2gGU88A2W+0wijUh1BHPmDEDc+fOxcUXX4zly5ejqKgIq1evRnR0NKZOnYrNmzcbbM7e1WYgXDfPNWzIzeUmyLQboRC5zEiUMReB5pdFHfET8iEY8p635k5tFcIVcUNDA8aMGYPu3bvjxRdfRI8ePfDnP//ZdiL0tum8rZdaCA95p8QW1YKRbVh6hVex1V+W3nIiT2//ZOozmtFZFkbqzKzyCoXwY/uKLKs+6LGWFcJG+8BrD9Bul2mhYob/dtoRm4WV+vRUXWqwhDBtA67mh9qzzqzK0Ax87SKkP8GGUBCfOXMGFRUV/u/PPfccsrOzsWLFCkTYuJwnApVkTVZPDllTsKz7EzSqBaJ7UuugWA+QevLJCnWy1aeDrhhNyEhndDYCWm9MC3/S3797OhquUw1S/z3JdcJASqScSPWxdkyvluSZHAHG0hXSfBHpN9MrtMk/1S9rkerJDP/t3pEZdbxggVZ9EBC+yT5bI7sn+RcVLOyjQlmSOgbFJSv4kV7hRdfowYqxllGLrcnOs+espq3oiP/whz9g6tSp2L17NwAgNjYWb731Fv7973/D67V/9TayexLzQZGdlKxyPLvUtWN6aYQ+aZvon1m0qFeEdBkz2Yf1+qa34jrww524bERLksR7kus0v7Ou8cB7KGUEVEGtxx8BTg/3JNdh7ZheqGk2nndN1B/1eJFxpPWtMg84zXMe/2VWwrLp6QnoF4noBUra5tEwKbFRc83IPCD43cAG7m9DumgT7fEO3My8rGyxcW8rOmIAyMvLQ3R0NPr06eO/dvLkSXzwwQcmLCe8Uj7soRA7wQjs0puFCm/SjmVh1NkU6DLxA9Zk56FPx9OGY03owSxfZeNeZFR58elPMXh0eD9TsTJkQI+X2bGTuU829gVdFz3OADvHoJl4LCyYi0NhTRj3f+ZT6bL5S35rqS2rCMugPy7OXVg5oLXjcJB3mGmXwHJBw6IgTtkuXTZ/8Q2W2rKKEDBlDg+EjNumjQiWsb4VRxArOlk7LDR4h5muEA5BtBUdsdNw0nnCSAQnGTrIQ6xXL0tPVlznYepOrVhKsO4zWpeZbT/JBScLltBVC9PMKu0BbSBBYh6HCoy8qGR4FwgPSKttOOHQ4WsXKf0JNoJKgVohr16h0cFZRAkHWTCSy4vQoW6D1aa6XnUZ1uFgr47JzAMvs5YSQOsBGv0SoevKcMgUS50LTg8yK9hhXZNsCfQ0aPZBU/eN6GaubScWEv1Tcg2t+nm8o58lo7uIsdvKubsl+jo991ltGHlmHYmkF0YuzgHXEYfbQRxg/KCB9FF9GCJ7n9UygQAdID9UaDKKcDmscwpG52fgYe3l3G/5TumyBX8eZ6ktq3AP684RBNOLLRQ86FwEFvaMuUVB/Nzn0mUL/t/rLbVlFSGwKHcRCJCgQcFqW4RABVpyETiExIvXPawLPTgRGN6JdnhBYOxAMB8OkYUGy3jfzOGPk1YgdgeG59UfrMDwesH0gRAPAs9Am3FxdgqsqFYst0w7cLTOg6N1HoWhutEIXuqIYLzMtAW1Le2I6t91NBuvC/q6+LJEaWHsRCQyQP9lcO0n5QpX44JajzBqHGDcQoMc/hg5TVe3oRc9rbpJP5cgAXmJvZ6Rz3yhlTccYcZRIGMk4mlqYY6/TvrQ66BgpyAzR9ThTEVjJOMMwuq3zMtdHaFQBFvndFvyrLMX7hY0FOGUDpflqRUoxCam4OUddzNjK59LCG/9vDUdceKru6XLFj40xlJbVhH2qgn1djQcnRSCCdGDyuOlbF+NCmHRNtdMIH0zQtjMVjuUs1iEihDmqewcVW1EGPgEGWEtiDOrvJrtqNOxUnkOHaytNDmEMpPdIFDgPagb83KZvFyTnedYxDGR0HAq7rT6oNCM4LIy52xLHR8gmD1YpV/KtBqS8NsJh47ISPmPEaSmpmLixIkYP348PvjgA83vubm5mDVrFm688Ubcc889qK6u1qfVGAn2wuppuVUnANK+ETr0HEVow3Q6mlsoetaJcMsAtiuvETtYVnxaFuiobYFOnRTslPas+aTmmdVxtTPWNS86okxYWYIHLuyvueaEQ4cTKuLS0lK8/PLLWLduHTZv3oz169cjO7v1zMjn8+GPf/wj5syZg61bt2LYsGFYuXKlbr0h5VkXrPZpOpzaZqpfGrGJKYY962buLvZPeNZ9w7omSaeKDwTU8WlZIFtWOqZtuGJ7UY7/bzvjaejNEb22rHhwymBY1yTmOC/Ydz6jNBtOWDVFRkZIf2TxzTff4IorrkDXrl3RsWNHTJgwAdu3twYX+vHHH9GxY0eMHj0aADB37lzceeed+rQa7541bMzLRVVTjuZ60rsl/r/t3MqXNxzx58DyVuurCljbzJ0/K60kapqVwi7tWJai3o15LafxtPkQ+S3tWBYyq7zMLTC5T10vqWvtmF6YmlivqZOGHcFnaL0d6fvCtJ8UdKYdy/K3P3N3sWmzO3rLqja3SjuWhZ8Mbt2L6zx+11rWPKORWpiDtzPzcfJUsbQ7Lp1Zmeb/8UYvBsed8n9/4f+pAaAcU/X4EpC2C1UWA6Qt+j7CI5pXvJcdq/80zd+rYmzQY0zAWpjI5KsDWp+lwlqP5hlSg8wDO599IyvimpoaFBUVaT41NTWKOsvKyhAf3yojevTogdLS1vgrhYWFuOCCC/DUU09h2rRpSElJQceO+skZgmY1kXKgEIsvS9SUGLGmFD/Msj/t/dE6D3ozEo5aAd2Haz8px5eT4g23U1DrQb9OreVL64+gZ+xQqbLBhpqeNzPy/dtO+m+gxYRpSBf5VZmZvhptwwjosebNXTvqt3OMKxuPoFuMdi7JtqGey6x+y14zipY6rLkdD3nnv9Jl5zd9jzfeeENzfd68eXjwwQf939966y00Njbi4YcfBgBs2LABhw8fxpIlSwAAW7duxdNPP421a9di+PDheOWVV1BSUoJnn31W2H5ABXFs4oyQOcWVBSs4uV6CTTepqfMQmcbJCAI9s65Amn05Idh57eSciLJtbsoE7t9TkoVrEozFs2iN7WztpZr0rrwgTvvdSM3qFwDi4uIQFxfn/75p0yakpaVh6dKlAIAVK1bA5/Nh3rx5AIC9e/di+fLl2Lp1KwAgOzsb8+fPx7Zt24Tth5wdcXVTLje9uRnUNhehU5T2gf2h0suNulV3qhQd29u/KncCNc2FiIty/iG2Gz9UepHcpSdi2nUJNilhjXAYf/KsnTxVgvPaJ0jd0yIHrAVrT35PXhB77h0tVa60tBQzZszAxo0bERsbi9tvvx1//etfMWLECAAtCZfHjRuH9957D0OHDsXKlSuRlZWFF154QVhvwHXEvDxe12wtAwCmEL54lbEYuDTUQpiEhxzRLUmhg6VhlxD2GvAoejszX5jjjMcDJx5Clq6QxyuzGNEtiSmEie4ztZCt3yU8onllJDcc71CIxCTmtSu6FzAXdtQM1POANf6y/DAyP0Ugc4PXLlnwiISwmu92LMacCDXRs2dPLFiwALNnz8bUqVMxefJkjBgxAnPmzMGhQ4fQoUMHrFixAosWLcKkSZOwb98+PPnkk7r1hsSKWL2Vz6zymjrlFd2XU+NhnsibbYtg5u5iTOjT4DfrIu3Q9cqoKmbuLsbCS06gY3ufIR0hMUsiJmCdo3zoe16yn47H9hUZzijNg+x2XRTukfTTyDjRvNQbL/XvVtRExXUefwJSHszULzMmRualuj6WDpjUJ1tvekXLXOJZsVQ2etAtJllDK82P9AqvxjJKJhRo0UnPWbM+a6qJi/4pvyL+8W65FbFTCAlBzINsQkQX5ybC233XhT6sCeKLV+2RLnv499dYassqQtqzjghhWZMWUTliwkZDbS4kA70tOstMR8+MSgSZCFqsfuuZC5nF8Ub2y1QvwA6L13WnWrbZPLMuPcgKYZo2mTGnec7jv5HIZrKgaRPNZZYJWyiA9YyxwDJ1lL3XCCIi5T/BRgiQoIXaC0hveyhTrqZJqwhK7MT2eBNBL6PBuD4tWR7oeutPneIV54LcrxcvIbPKi6qzfaPb/K48SnPNKjKrvDg/hr1KYZlJLUtvtbdOpLbKhCaii6e9+Oh7eDQYBU1booTah+Y5j/9zh/W35BWZXqHNNUfTVsWYr2qayP/qegpqtR6NpIwMzeRekZdjZaP2t/gOyjnA86xjWYjQ99rl9h1GwdeCr5qQTUMuYyrTVnEub8G/K8/C5fGhnM6nbUE2Yt72ohzc0Fc+fRgP4ufammrikrXyqonvZ57Dqom9ZVnSnmCyQthoklGz9xgFa3Uwdlu5oQzSevWFYqYLWZdxHu3nohC2283eSH2yEfPUQtgszU4urtpFyn+CjaCSQJJPEtiRxZmuk9yjZ1pE7jESUpP8JkvXyO5JKFPpwb6YGI+R3ZN0A2ez4keQ02giwCobj2hOqFn3ifpkJHCLbBnZB81q3JGMKq/pLM48mDlDUEM0P1h865/CjnrHGjfZeBZmhB1vTHlZnFltBGKBI0I4qSZC4F3QOqHIYBI9Fr1aVgttFtT6L3LPhV2TuPou+h4jITXJb7HtfIrMDKQdul5yGNWjA9t1eUiXJGGELNGuIaZdS1vdYoYq7s+p8eDR4f0UmTT0+sQyw6IF5JB3SjS/q8sAYGaqINCLBMbSEdPlefde2DUJOasv1fxuJfiOjD5Zr37WvCVjwnr55C9u0Zer+8EaNzJe6jFmZbkwoiMmYD0zhI7KRo+/b3SdND9WZHbS3C+aGwTnoo44JATx1/+uUnw3a9cruo9nD2k1KtXnR2PwD2/rhCPt0PXyQkrS+PrfVUg7FmW4/WFdk/xt0W0SOi7p3my4Th6eub5evxCA+65bxf3t639XKehUG/I/NVL70qH5ojdeah6q55bdMFO/jF23kXmpro9lhz6saxJ2Hc2Wrndk9yRhJDxiQwwoeU7zg/Vip58VHvRCzcoiIjJC+hNsBP2wLhwQzJQ/LgKDQMV7cGEE1hZJl3/4lXTZ72692lJbVhESK+JQR1sRwqGc0ifYcIVw24OrmggRkAMHnn5YfSCh950G0YXxytB6M2Kcz7K9BJQ6PZZ+T0avxnIaUd/3xcR4SzpTnn6Rd13PJpgFsxk6nMjsYdbRRAYsO2IRQjknop226nYinKwmXNUEA7w4rkCLB5DacJ0GKx6xqL5Awo7tt17/zUAdu1gEK/F67eh/VVM2ukYP1lx3gi9tFayY2/Rz08pLa6qJX2+SV018PS24qglXELsIGM5lxxQXZmBNEF+9RV4Qf3VTcAVx+6C27uKcgiuEXQQSoaD7lUVQtSOhcngUCDpYOr7YxBTTuj/WfaGoR6Tz34kQirQHC7I8C1Z9wWrDKCIiIqQ/wYauaqKxsRF79uxBSUkJIiMjkZCQgMsvvxydOunbA2phr2rCTUkU3ggFVcW5HMPEDALLL2uqiWs/+Vq67JeTfm2pLasQrogPHjyI3/zmN1i9ejW+//57HDhwAKtWrcINN9yAvXv3OkqYTMYDWgjbdWoeqEwLZtKH59R4/P3k9be0XhtO0GqfjPJWlOWCRn3hYt2VFMmYbWV8jxrM2i0Cz/IFUFquOGHFwYNsW+tyrFuBGOEXmXd68y/tWJYt7uRqhJP5mnBFPGXKFLzwwgsYOlR5wnnkyBE88cQT/gR58rBPyE3bWYpN4/gpjfaWZUm5RYcTRNkNck94MLBz6GR4DjWkFuZgSqL1aGGhAlbUQlEf6bkjkyUjNGBtRXz9p/Ir4s9/G8Ir4jNnzmiEMAAMHToUZo0t7LA5nLm7mCmE6bplhLDsSoIup2fTm1Pj8dNB7IdlAtvvKcli8obWnc4aPIDLP7NC2GzENjoeQGaVl2n/LAurc0LmfiKg6P5WNBzxfzdKvx06bdFcyqzyctuYubvYL4Qzq1rtkS/syo97TQveWYMHoKLhiHBemhkTeu6budduOJGzzikIBXHv3r2xcuVKHD9+3H+tpqYG7777Lvr06WOqwWMN7F4bSQDJ0wuL/OjTjmVpMhqIfOlpPPZdV//feiuJ402RuO+/LeVJoPpeHZP9/Xv42yLkn9BOumsShuBYQ4Q/iSUpq+4rj39mwQo8QycPJRlJaN7tKclSxAO4779dUS0IZA6I1RWscTMyH8j9qYU52FOSpeChGnR/T56KwE8n2wGALv0kaWf+CQ/2lGRh7ZheQhplkpzOGjzAv22/eFWpn37SJzL26vvWjunl5ycda0Q9n0XJQbt3GIoTza19fvjbIsUY0zwlYCWwpa8NikvmPoN6mW0I7UbGXQ+RET7pT7AhVE1UVlZiyZIl+PLLL/0rYJ/Ph2uvvRYpKSno3r27weZcO+JQhfoQ5u3MfN3sICJYOUgNhUO8cEHbPrC2ppr47WfydsSfjg/hWBPdunXDK6+8gv379+Pzzz/Hf/7zHxw8eBCvvfaaCSEcPNCHXID8tou3beflg8us8lreYlnZ4luB+hDGrBAmIRnVwkFPBaJILSVxiMeCug3ZXId6UKsJWPPHauhGUr/Rrb1dQljGjT5cTCYJ2kf4pD/Bhq752vbt27F161aUlJSgXbt2SEhIwLhx43DTTTeZaM6eFXG4r5g25ObitoH6oTGDgWDwNlirukAd4IX7fA0erK2Ib9opnyppy7jgpkoSeta98cYb2LdvH26++WYkJCTA5/OhrKwMH330EbxeLx5//HHbCEmv8EpnabBrUmdUedGhnc+1NqBQX7gYhbUeYVD0mbuLcWm3Juk0V3p47OITANiCuLLRo4h9aycuj7cvVnPuCQ+81e2ZedyueHue8N6sai+GdLEmdGRgtp1wtUAKgVg+0hCuiCdMmIDU1FRER0crrjc1NWHKlCnYsWOHwebkV8RGBPObGfkY27tJc1BgpI5wQ2aVVzrIt9UV52P7iqSCmdOQpS+nxiM8NP2h0osR3ZJM0UBQUOvBqTPyh7MiBDq4j527BSPBlQKJPSVZuCaBJeitPbvTP5dfEX90fQgnD42MjGSaqZ0+fRpRUcazSRiBEQH6wIX9/Q89MQw/3qgvhNVWFHaCpZ880fwTo2RLdDbW3zy8nZnPFXLHG7UvO5kHubDWw9WpmhGAsi8JtXBU699HdEsyTQNBv07JtghhQJsy3ghoc0ZZB4a1Y3rZputWC+Gqphxb6lbXYfS5IkLY7ucxIsIn/Qk2hCvid955Bzt27MDkyZORkJCAiIgIlJWVITU1FePHj8d9991nsDnXaiJYCKae0tWRnnuwZ8ytrYhv++K/0mU3jB1tqS2r0D2s2717N3bu3Ini4mL4fD707t0b119/Pa699loTzbmCOBzRtk2k9EH6f669UIKfIsyaIL7jy93SZdddO8ZSW1YhVE088sgjGDNmDB544AFUVFQgPT0dn3zyCVauXImSEnZGXzsx4KFMx9twoY9QFcKBivhF+n8uCWEg/FOEtRnPury8FtvC5cuXY/Lkydi/fz8OHDiASZMmYeHChY4Tl/fqMMfbcBG+oO2N7Qhlatbd20VoItLAJ9iQoiEvLw/33HOP//udd94ZkBUxDZ6LLMvtEghcFDUe7HTVVIPXZ6tQ0ywbRc1O6LnCAkq6yCrVjtCMTlvYsNyeeWX0rtkBb7XXcN00740+k3qwe163mRVxVVUV0tPT0a9fP+TktDL9559/RmSkfe8RmQeeZ3h/+Pc9/aEfKxpaLQ4uVJ3as6wJzEAt4EvrjyiuvXSoJWwjbY5T01yoW29qYY6CfhEO/54fdY6A9Fe2TgAaEyI1zysajki94FhlyBixxpqmcdQFWjMmdR9EThjqsk6/TIzUT/jLNtUC9zdReVmw5kFSlyRu3axwqoCS9+Rvuu7Uwhzh/Myo8jJjgRxv9ErNayMIp1gTQmk6YcIEPPvss/jvf/+L559/HgCwZcsW3HzzzSYsJpSgH1bWgyXzwBN34j/t7YpR/ypFd5VpER3z9/wYe1Y7agHfM3ao4hrLySEuSj9h5ZTEQRr6RRj1L/HqgfS3e4ehpmIf0yB87t5hKLrHnNEtr+YRAH+ySPVYF9Z6hP0+UuVF9w5DdftLoK5rSuIgHBWYixndOanjEQcytKaVXV73DkOFcZnVUCf3FNFA81yPHxd2TfKbJBKkHcuy7fmk0T5C/hNsCAXxk08+iX/9619IS0vD008/DaAlBObatWsxZcoUw42lHCjEmxn5ALQPqzrGAuthVoPYhm68LgFpt2vfpurDhh8qvSg3sELMqvYiq9qrmcBVTUpb111Hld+v/aRFXymyCc5SRcb6odLr5w2vXgD+Mqz+8mD00GVvmXK7Oigu2b+KYT2gb2bkm155ijz4dh3NxtCz80DdXzWvCApqPYhNTFHMp94dkzXtpBwoxA+VXqzP7QBAu7JNOcDexURGtFO0xQPrNzIv1OCtPgl4z4xRtIvwKfim7uMf9hzV3KMuc2HXJP9cIHNYFHCe7AZv2dWqylTPa/UuyMgzKkI4rYjdLM4uHDPLCqb5U2xiCl7ecbelCHJtAeFtcmftxfPAN19Il33zqrGW2rKKoB4YlnHefKJ0NMEGSd1jF0Khr6wHlecFaAR6QrjhdIXlNnioL1wsLYTrToktLmqbral2rID3jBDQ84dO1UQQSkLY7mdHD04d1qWmpmLixIkYP348PvjgA265L7/8Etddd50crcZIsBc9OLpBdZAX2VB7RkPymTF5eteeXZMfegFtZGxlnQhF2DnqF4bKky2sEVq81c4JYhr09po15h3bKy0uaJ7HJqagU5T2heKEDTNNG+Ej6xkhbccmpijmT9fo4KaC0juLiIuyJ0iULJwwXystLcXLL7+MdevWYfPmzVi/fj2ys7UqxGPHjuG5556TrjdkVRMZVV6hTmznz9kY12cwgJbkkPWnlHELSuuPcA8cwhXL0gvw1MiWyawOllPdlIcu0ezsIXq81INeYB4r0Iv0VtNcgLiofpZoOFrnwaCh62xZHToZDc4OiNRBR+s86N0x2dHxFIH3TKYdy2Jay1hVTTz87S7psksuHIWamhrN9bi4OMTFxfm/b9q0Cf/73/+wbNkyAMCKFSvg8/kwb54ywt7cuXMxZcoUvPTSS9i1S58OYRjMYEJPcFQ1tb7HenfUTqrmMyFwFGoBFQ1HNBYAlY2tB0XqB8kHvjWD1UMe2YfWSEQ4ApEQBlpXUVYER++Oybjo+QdM308jlIVwTo0Hr1zBp488J01nIpjzy2nwnkm2ELYOI9YQ77//Pt544w3N9Xnz5uHBBx/0fy8rK0N8fOsOqkePHvjhhx8U96xevRoXXnghLrnkEnla5UkNLYzv2144mTpHRTOv2w3RytvKqpzVrz9fUsst33javti6IohW3kaFcCDBszIJxs6JnrcifhqF7MuqR4czpoWwFX6RPIfVTbnoEu18YgQjut+77roL06ZN01ynV8NAS0LliIjWin0+n+K71+vFZ599hlWrVhlyeguojthOvVpcVKJwMtk1uVk6xQ25reY6rElJ9Ht2P+Ci/gZKmNjFV6sQ6SN5pmc0yFzk8c3JOBb0OHaJHiBFrx1IOVCImbuLLa2EaX7JnLGwPPf0hDBxirIKI2Ew4+Li0LdvX81HLYgTEhJQXt7a7/LycvTo0cP/ffv27SgvL8f06dNx3333oaysDHfccYcurQEVxHo6OuLrr87bJZvHS1SOlUsuvcKruEcdayC9wqtxnS066VGkOWLFJ1AHybES81Xv8OuxfUX+PHF0X0h/Rf1TQ4/Pot+J3azMWJEyLI9DvZx/5F6WHpT0b/FlSgca2qaXlBHNxZm7i/2/i1Lam4WaR4Reej6SMRW1zctzx7JhzqzyYvFliVh4yQkpGtMrxPkXKxs9mmdDXT69wmvKK/B3AxsM38OCE1YTV111Ffbu3YvKykrU19fjs88+w+jRrSE058+fjx07dmDLli1YuXIlevTogXXr1unTaqaDVtA/hW38TWfTaDyt/I235VULFtHWeFBcssZof2T3JMU96lgD5Du9+qLTyJMy6jc4eUBIoOteHZOlAsqoefPSoQKNUFeXefFXff1B0+m+kG2qqH9qkLJ0gG7S99TCHAzrmoTcEy0PG/mf0PQPb6ymPb12WB6H9PaatTIS1T+yexLSK7yaFWY/Sg9N8+Der476V74j1rR67tE850WeWzuml3BMyW/0eJG/eX2oPx3h/40OhK8W+oQm8r+6vn4MvTspQz9b6hUtHethZPckDIpL1pQhzj4sXblaNULzWta7M73Cq3nGzMIJq4mePXtiwYIFmD17NqZOnYrJkydjxIgRmDNnDg4dOmSJ1oDi0wWnmNdjWs+hpIOvGA3S0jX6jKlsuw9eeFL4u9qtmaw6burX6L9G948HNW96xGoP4Hj8swL1g07TTVaeXaNbaCH5/eg8f58uOIWeHfRdn41ClBOPt/Jem90Rf0iqB6C/E7lr8EnsPXgnACB1WpVh+kRjSn7LX9y6e9Ibu9h2bAOmfUfMHzyrx5Z+Zv40TLk6fusGrdXAy786rvjOyl0nk7lc7xli0WcVTnnWTZkyBR9//DF27NiBOXPmAADeffddDB8+XFGub9++UhYTQAibr1nFS4cKbEtuaQZ2B1OnPaSc8pYKhQDwwQ9G7hwIf43y2alxeTszH1+Vxdhet9n5qb3PmlBefHCndNmUS8dZassqAr4i5m3n0iu8mhgHVsASwnrxENTtk++iYCusA4v0Ci/WjunlDzRTWOuRUk2kV3gVMSjorTmZoOkVXsVktSOGLulDSzZlKBXCRIUAACAASURBVALkkL6nFuYIx04dO4O+1wisCOG9ZVkKWvTywpXWH/GPMYt+vftlVBN0GcJfnuAj9KtB7jMD0YHaxF80KmhhPR9mnkmWEJaZC/WFi22NCd3OwCfYCOqKeOy2clviyFpFIOhgrWpiE1Mwfc1cUysSVn2hsKJVQ3Z1FIq0Bwt273gCEW/CmTasrYiXpf9HuuxTI39jqS2rCIqLM3nL0sJPFMnKCo7WeRSRnwBtFCyjQvgg562tjghGR1+jhQyJ3lZfuNi08Fk7ppemPbN1qQ+3ZDJJqyNksSKIHa3zSD+cS3+p1E+S+WA25kXKgUJds7DqJn7UMB7UkfcIdh3NRnVTnuH6AC3vCM948wxgr+DV2FuWhZQDhQGJNyFqQ4ZWJxBOgeFDQkds54p0b1kW80DBallRHd+URtuij6YtR+yCnXXK1iXSz5sZayN9IGXJ/07vdkJlVxcq0OOH3tmN0h3f2rx9/gf5FfETI4K7Ig4JQQwAa7LzMGtwaDgLnOtg+f6zDtHsHrMBD2U6lqeQH8+gZfXNMvkKNg4f9+Li85XCyEkesRDcQ29rgvjlw/KCeMHF56BqggWnhbDVLBWBaidQXlYisAQW6xDNyJjJeGHRAsaMw4SoDVE8AxkhrI7IZjdInXS/1UIYMJZQl7YH1wMvYzothFn9DlQmbTNwA8Nz4QaGD0ec6wdppP/hHWTdOIJvSmhtRfx6xmfSZR+8cLyltqwiZFbELloRaquMc1kIA639NyOEyQo31MZUBsEQwnbG1g4n8zVXEKvAe2AC+SBZXXXJ0momML4dcCKQfajCihA/F2HnSz+crCbalCCWDQ4kAu+BIdfXZItNlJalF2ho2Zhn3EzKClh9YAk/o6f9mVVeUy7ivFgJdL3nEuzubyjyT/SyZZmqyrhJG0U46YhDQhDbtdoMRDxc9QHV2G3lioMOkkGDpuWWAfqxV2MTUxxZKZI67VhpLP2+s1RAFvXBD902a6z1xs0IX9Rlnd7JmKnf7nka7DjQNM8JP3jzbcBDmczDUScyhkRFyn+CDfewzoULFyEKay+Y97N2SJe9a8gES21ZRVDeBWnHtP7relt+OuyiEZTWH9F4Lslsr0XBqYlHlrqM7Lb98HF7Xkhm1AQyYI0PjTXZeX79MuGB2tPOLnxdajzWAaHleKOYz2Y84d7PYt9T2XgEm/LFsUx42HWU7a0nmicyz4O6/3aOkZGMzGafXasIJx1xyK6IXY+lwBvvu7AHbd3MLXDOV9ZWxOtytkuXvWPQDZbasooQ0I60HZxL1gAuXIQ6nAgM7xRCgQYm6NUwsUTQw7L0AlOhF9X3qLdweuEQCegDCppms1vCQK+GnTi5DgaOCgLC8+bH0TqPZp6ZHbdwWg2bGXPeapj1nLBUjqJwtHbOwXBSTQRcEA95h53ZVGSCQywR9FacT43sx0wdTw8urVclD576nvgOQ/EjpZ9bm91B2C4rhupTI/v5zdbizyZrrGwUTzI1b3h6cxYfqprM6SdpkJNrlrmd6AEZfONey23TSK/wYubuYuG5Ad3mkHdKFPSRtPEs7D8W5f/7z2mtkd3u+KIbnhrZTzGW8aokmzftLIVTYCXZNAtRvjv6OZu5u1hhrXDN1jJT7RHenzyllWhEaI/6V6nfbG1K4iBuXXZaT0RF+qQ/wUZI6YjVemEScCS9wov60xG6kdKMRhpTR1/T+w60HDzQaYLsiOAmgp6unAgOOyKsWXFlzqr2YkgXYzTYnVZ90OyDyFl9qeY6TZvMeNE85/HfiTOMvWVZmLGiHfIXDxTOZdK2k+coMv1W81IZOc2OZ8PanN5S8Kl02Zv6/dZSW1YRUoK4reHtzHzMHdbf8H1t/bDnXIQ7pnJQLgasCeLUQnlBPCUxuII4JFUTdngKsUy7zG7/9HTU6i30kHdKkFnlVQhhmT4R3rC8+Ig6Ysg7JY4dChIaadUE6buadzR/h7xToglSbxZWdYSZVV5N4H+ArW75c9pP/szKrHtk2tL7jR5T4uxA30ePMS9NEO+ZMQO67UF3pyu+s1QTZuYa655R/9JX68zcXexPvGsHwklHHHIr4tTCHKEOySh4cWgvXlWKw7/vybzHW+1FksFtdrCwpyQL1yQ4pxohMLu650HEfxehAzueRzNj3dKutVXq9iL5FfENfc+xFbEau4uVxuyjE8QkESN8vaSGtc1FyD3h4cahFU0MlhA2Y40hC1aaobpT2hUE60Cu/Vl21Z86ZokGdXqk2uajiu96QlhNG8tZgi6j5n/dKfMBiGQOKgk9wXIucAqk77XNRVwHFSsHuXYsisy8cO1ot32ET/oTbARdEI/pNVjxvUu01jSG3r6R31mHAHS5TlF9FYdqLIi859QghxB2Zpom6Bk7VHOtY/uWyUvHbegarZ2cV/YYgjXZeYhtf4ElGrrFKGnoFNXb0P0HjiknM2sc1fTTZmYd25s/dGLxhcauo9l+ejbliy1gACXPeQHTedetgJ6PIosR0vaAhzL9fe8U1ZfJc0CfP3ZBL0t6oBFOqomAC2LeBKN1jmr9Ls9uUbYcQUaVV7Hyk00BQ69YWS8AtXAmdNEupnruyBlVXg1v9pZlaWyJWfyz28uJdnEmD5co84ieezoP/5cjFopmXnpFJz0al+Hrere+7Okxz6jy4qp/t+hF6XtonvNsufNeHSYcU9Zvenya1r/B/zc9pmqzOUKTGTtzmi61KztrjAl/CEQ7CtEqVlZI2+m23y5C/hNs6OqIGxsbsWfPHpSUlCAyMhIJCQm4/PLL0alTJxPNKQUTL5JXaf0R5irRCioajqB7B22dc7/+GW//uo+tbanx43EvLmKkvbEDn/yUg1EXNKNn7FBFX0h/P/kpB5N+Yc+KSD0u6rplx03Ec944ydwbLpAZEyPPgLq+538owBMjlIsMwjcj9f457ScsH/UL1J0q9e/QCGqbi9ApShw4XvSMq0GPe+t5hLVnZk/JJ9Jlr0mYZKktqxAK4oMHD+Khhx5C//790bNnT/h8PpSVlSE3NxcvvPACrrzySoPN6etZ7baN5KV7EZkT2X0w5STCNY2Ra87lQoQWOfBrS3V8XSoviH/dM7iCWKiaeOaZZ7By5UqsXr0aL7zwAl588UWsXr0a7733HpYvX24bEcTcZebuYoUQZplvGQURwmqzKJEQkBHCmVUt3l9qkyOr5lesoNki2CmEWR5ZTgUdn75mru11Drx0KwB9FYBo+8szIaPnH28ussy/ePXx6jcS31jkQQcAxQJXbxqsOcsa95m7i6X4QK7L9J1Xhx2LsXDSEQtXxJMmTcInn7DfKlOmTEFqaqrB5pQDEyoR1gJBB2vlGpuYgulr5poSpqz6QnF1LLvyDUXagwW7dwuB2H0404Y11cR35fIr4svjQ3hF3Lt3b6xcuRLHjx/3X6upqcG7776LPn2s6+mcFn4kZq5einvanZV1PwtGMzOsHdML24uUBxb1hYuxdkwvpBwoFN7LOqFXC629ZVmaa6z7jNKtxztAvFuRfTitCmEZOo3CSDp6M+DxjcUzK5lGAqECCkU1U4SBT7AhXBFXVlZiyZIl+PLLL0GK+Xw+XHvttUhJSUH37t0NNie3TZONGWE0tkQ4IRz7Fo40u9BHVVM2ukYP1i9oO6zNpbRj8iviUReE8Iq4W7dueOWVV7B//358/vnn+M9//oODBw/itddeMyGEleCtNgfNPij9MIfSQ09cZY2Cp0ez2jcnHVAIRDrAcyE2s6z+1y4YMeezkqGb3EvmUDCEsB0ZxttMPOKPP/64pVBkJLZt24ZHH30Uc+bMwcaNGy03zFNLsKJnmYWsasIM1FvF/MX6UcTUqgmgReDqqSb02ga0D+mFXZMccTqgQV4WROjSLw+rqga9B1Et6IOhmgjkQiA2McVQJDMraj9yLyukbKBgh9oyIsIn/Qk2hKqJadOmYdOmTVixYgW++uor3H333fD5fFi7di1++ctf4uGHHzbYXOsKgndAZiZ0Xla1FydPaR8Mp0NU0jC7LQ/0fXZCTQP93S76Khs96BaTHBL9DTbOJZM/O8zXvq/8WLrsJd0mW2rLKqRW5Tt27MC7776L8ePHY8KECXj33Xfx6afyATVY+GJiPNP8xozgHNIlifmQytZlR1JFPSHBSmS562g2+pnxiwGY9+klyzQCmZgM6j7T38nfvMSYAJBPtcGjvVtMMrMtO6EXp0Mdd8Np8HjGEsIyYy4aA7sg00ZNs/zOz5YVsYGPEaSmpmLixIkYP348PvjgA83vO3fuxE033YQbb7wRDzzwAKqrq3XrlBLEnTt3RlRUa2aD6OhotG/f3gDpbLz4K7Fnjp0Q2feqMzHY0Y7aDvP8mCRNho7reg/G+TFJCtthEk5Sz0b0/BitYDo/Jsk2F1G9OB0yyKnx+N2LWfrU/lQbpD88/snArM5WL05Hp6jeClWIHbbVatUK+Z5Z5cV1vQdLt8GaB2rQLt48yLips/T+5JpMG3FRibpl7IQTdsSlpaV4+eWXsW7dOmzevBnr169HdnbrS6i2thZ/+ctfsHLlSmzduhXJycl4/fXX9WkV/Xj06FHccMMNOH78OF544QUAgMfjwfz58zFy5Eh56iWhHmiZWK1OQNYQngWS6mUYQ79GVneAsq/9OrVef+DC/gDYL6nMKq9jDhakfjshk/ZG/YJk8U9NF28uvHi4s1ESpUHrvFljCyhTYem9SEl9ZB6Q73/3dBS2wYId46aOV6LmMbHzpudteoVXwRf1WBo5sHUiX6ITK+JvvvkGV1xxBbp27YqOHTtiwoQJ2L69NVt0c3MzUlJS0LNni0t4cnIyiov1+SAUxNu2bcNbb72Fe++9F8nJLQ/IkSNH0L9/fzz99NMGyG8Fb3DUgwooJwe9NeUd5IiE9aC4ZKQW5ii23LwDnl6qnGd0OZktO+mj6LBH3dfUwhwNb146VKDo67CuSVj6vVjYyPr2s0AeflYSSHXQFpoPM3cX6x448lQLPGFNHz6qhdLI7knMsV47ppfuC5uuN7Uwx89fowemY7eVa9pSvGgH12nuYY2veh6QF3B6hddPm7od9fwn/DFqKaOen6zAPIRfhE6aXjKmpN3GM0qRRpfVo83OXHUEERHyn5qaGhQVFWk+NTU1ijrLysoQH9+qNunRowdKS1uDMp1//vn4zW9+AwBoaGjAypUrMW7cOH1aRYd1F198MVJSUnDrrbcaZgIbchPlsX1FptQWmVVe7koip8bDHGyzbRGoD5FIOzQtRrzGCmo96NcpWdgXGo/tK8I9yXUY1jVJ0RdCh9X+0RjyTgmy7k8Q0iLTlqgcb5yMtmEHius8mpeyHZDpg+z4s+p7MyPfv7NSlzFS77L0An/iXjXIIaoIZg9Y12TnnV2EWTsXyKySP6zbuSYPb7zxhub6vHnz8OCDD/q/v/XWW2hsbPQbKmzYsAGHDx/GkiVLFPedOHECf/rTn9C3b18sW7ZMt32hIL7++usRHx+P7t2746mnnrLBmy54OevUiQ2dbsfsJDSThDNUQQdckrVgscI/J61kZBKKGoGaVlKnndYhhbUeJHay70XC6rez4QGs8cFbLS+IEyJGa1a/ABAXF4e4uDj/902bNiEtLQ1Lly4FAKxYsQI+nw/z5s3zlykrK8M999yDK664Ak899RQiIvSVH7rmax9++CFWrlyJVatW4frrr8f06dNx6aWXol27dtKdbMW5lTzUhQsXVmBNEGcZEMRDusiZr5WWlmLGjBnYuHEjYmNjcfvtt+Ovf/0rRowYAQA4ffo0br31VowbNw4PPPCAdPu6VhPt27fHAw88gB07dqB///7461//ilGjRmHChAnSjcgiq7pFUBuNQCaDo5wDODNJI41CnYbITryZke/nF90X0l+7knoC2nHR+86DiOe8caLvJfMkXCHDJyPPgHqMWQ4/hOdG6r1lFz9pKS8tEw0j40SPu13mdk44dPTs2RMLFizA7NmzMXXqVEyePBkjRozAnDlzcOjQIezatQsZGRnYsWMHbrrpJtx0001YuHChPq2iFfHUqVOxefNmzfWamhoUFBRg+PDh0h1ogZerL7Wqj5O531vtRXwHOZMfGjt/zsa4PnzzHHWC0o15ubhlwEB8X+nFJd2SUNWUI5WuRs0bVuLTQEQpI3TTUCdUPd7o9fNx5u5i/GlYra1qgcJaD8oaIrk5B3l82JiXiz7nnZaiJbUwB//f4Th8MTEeKQcKsfgyY+ZVojlXXOfBieYIBc/Mjp2dY07TrI67vTDtJywd9QtFeZnY3OUNR3RNQHlxwcWwtiLOqZGPDjkoboqltqxCKIg3b96MqVOn2tic+A2p1je9dKgAjw7vh/QKL+pPR+g+XEb1a2o9nd53oMVagLaxddp7T08HR07U7dArWnngzei2q5ty0SVa3zVcFoNmH2S6yNO0yYyXjD7YCd3o3rIszFjRDvmLBwrnMmnbSf2sTL/VvFSfw1h/NqzN6dwT8oJ4YOcQFsT2w9yWUvakN7PKi6Xfd5YWJk6uLgN5um8H1A++1R1KZpUXm/JjuKfuPOTUeJByMC4kYhM7ZTUB2OcCbsQKwimI5rrRyG3K/ljrV74BQdzfFcQuXLhwwYI1QVxYKy+IEzsFVxCHQgQ4w8b0ZqB3CCQLdVyKrGqv4nBhXY42HKbMYV3KgUJbDylJf+2ss6DWIxWXg3XYQg5uWGOtR6ORPqjLOj23AjF37YDReWDkoI2uW48fvIM4u55PGkYcOoKNgAtiljvm4ssSpd1YAb53nsjVs3fHZOwpyVLEYpB1m6bdLxtPK0dtSJckhZ/9HYMG+ukgrtLdYvRjWYzr3Yi6U8q612Tnafoq687a++yWup8JO1KWi/eekiz065TMPJTJrPIqTu5ZcQeIjpZ1ICaisbopl/k7jw91pyIUgmHxZYmK7/SYVzQc8X/nCSo1/8mLaObuYmRWebkHfCz6jLgi0zkR1ffx5r/ITZjFQ/U47ylp9To8dXafLPOMnDrT2j6PH4Q2XkyK3g6ogMIpQ0dIrIhZE0ukP+PpD2V0ZbQLsBkdndqFOL3Cyw2YQusX9R7Cdzyd/HEGaFjVlVqNQ7AsvcBwXaIAMkYDxneJHshslzfWN66PU3yfubtYIYToMc+o0reFV/OfvIjUsSJ40OMZb2EwrGsS1o7phZHdkzTzgjcnjLoJf10axaWv8bR8PU1nXZsHxSXr8kMvuJCd8U7CKXlowAUxK1bC2jG90DW6VVVtJegOASs8YM9Yc+rwvBP8B3Zk9yQM63pKcW3p951RXOfB95WtNND9Y2HtmF4oaVC2o66X1G0kl5qRgxzC97KG1pl5eXwzgBbe0XXR/F36fWeUNrROJXUAGRp6LxZa9ZF2rGWFZqQPy8fX4R/eWKn2thd1wIJ95wOA4h5ZdI32cesn403P98kLWuJP0PObfrF3i2HPEfW8sAK67Vlj3lfMy+1FHfx/kxfWt2XRunXS47N2TC/m/Hw947yWNgVz4+3MfN3nxAjCaUUcFod1RqwmSLmikx7pADjqE2xiB2wFdltN0BYeTll7hIKlh16sCasQWSs4bYFA6pdphy7jFF3FdR4MHLoO9YWLqfgOxkHoI7Ep6PlpxGpi5u5iLLzkhG1WE6X1W6XL9oy90VJbVhEUQcwyEncqA8PD3xbhvqF1CvtGq23xzJrU9fKM4e0yizLbD2KfzYNMjAK1PbUZhwgZyASX4UGvH6X1R9Az1lgsat7Y5Z7woFOUDz1MxLbm8U40T2RsiFn9D0amE/P2xNboLGuQF8Q9OpyDgtiF8xj4ZjFyHwi+La4MAuEt6ARCOX2TFWcPo/c6F1DLWp3lBgRxfJAFcVAO6/RMoOgDIgKrWYlZMXYB/kkzL1Zxaf0R3TJqiPprNSB2To3HXwdd15eza2ypX90WAeGDmaSdappoIawep5pm5VxwIoC4WZgVwjJ9MNJPddmjdR6NIGXNER7IvbxnBlAeLPKEsJFnlm6LfsasIJyyOLsr4jYKc779LowglFfyVhKNhk6SUmsr4spG+RVxt5hzcEWsBsk4IEqhrk4XL/ubEajrkal37LZyvHRIu4I3ClbGBxnw7iFC2EidemVl6xLxQzTGVttl8dBMe7JYO6aX7fUbNe/jwYogrS9czIzgBiifCfI3zXM9ftjxrMgjfOwmgrIirmkuxCP72uO9q3trSqQW5mBK4iDNQYroYOWHSi+qmyJwTYLyQMDMYYyaDtkye0qyNO3LoKLhCLqbOOAx0p4VPgBsGqub8nDqTCO+KYvy84AuJ9svOoobCz8e9+Ki85NM87eF1lyc8Z0yHHWPhdrmInSKCs5O496vjjKfGVkYDbIkGpuTp4pxXnt7dgOsQ8WWvl5rqd6qpm3SZbtGT7TUllUERRCzGO/kNk99mEAfsqhP/2Uga/XAO7UPttWEHvRCH6ZXePH50RiF5YVTmUW+K8/C5fHmBLATVhMbcnNx20CtMMs94YG3uj1u6Ksf7lQNMxYnMpYIdmfoCDyszaeqpk+ly3aN/q2ltqzC1RGr4Gzql1aY0cOpM/6yoGeaFmpoS7rs7UU5fkEcSP1xINoyM1+NPEvseWBNEFc3bdcvdBZdom+w1JZVhKwgDpRADGUMeCgTea8OCzYZLgwidA67nIEV5w9jsCqId0iX7RJtf8YhIwhZQezChYtzHdYEcU3zf6TLxkX9xlJbVhESVhNmYdcJswt5BPbU23k4aVXRFhFe/Aofq4mACmK9QRSZiw2afVBzbe2YXraZrvHAql8vKJGMuZXTdNsB1ngFQ/8sa75GeCoTS5f0jaf+CqTAoek1srgwau6YVe2VvodXbtmoKv/fPBM3GhUS8avVyD1hj9NOpIF/wUZAKdDT+YpOgVm5yPTuoWHGA4xXv8ji4eFvi6QsGXh0h1KgcTt19EYFGy2QZC1DnkrrCgBS1ht6fVP/Tgsdu3diQ7okSQk1NfT4oo6CNqRLEveeAQ9l6tYdm5iimLcie2MCM6aZRq2Y+Agf37qAU8BzsXTadfWRi0/6wyoahcjVk9UO6Qvt2izr1nr/0Hr/99TCHGYZozDqHk7TXVjb4kIt4l1OjUeTYUFN54qrjhuiYfGlNYbKkzZYtPBwvNHr543ePfdvmOP/+4KYlmC9orFguRSLyt+/YQ4yqryafmdUeU0/G9MHNCi+q/tIj/Nrj0Yzx5ieOzQPRNdYkH2G7JQDERER0p9gI+CCmASRZl13Uueb2CkZoy4Y4g/EnWlggotsMckWjmzBEjsl+8M4nqL6KhPa8a3M89C7Y7I/OLbaoeSxfUWKemSDaLNiAajvpXlB003681kRPy7toLhkTYYFdX8vPBv+UW+MSZaIE83seULopuknf1/YNclPy2P72DsgOjD5+TFJft40n80yIfJUJHOHmFmp+0hnuCC/0WVEc+CVK/r66afBuqYHwg91NpXeHZMV/aPHeUriIIy6YIjiftI+TSOLbhIbhrU6JjzTs2d25tkPHx1xm7WaMBKP2M56STxdOoasHQ4cRuMRm4nrK0OzE3yl4yA7HY/YKcjwxUg8YtZ9dqO4zoOqpgjb6zZj18zO3m2NrrpTe6TLdmx/jaW2rCL4yhGHsD63g34hE9B72IgQoTMz2OFFR0/Qj2a9rVvejDCTodmJlxsdjJ5kcgg3yPCF8JeVpUbmPruxpSDGkbrNOJcMikuWmtdGEIF20p9gIyRXxKEc1SpQCEcehCPNLkIZ1lbEDaf3Spft0O5KS21ZRUiuiJ14mM1aTQS6HWI1EY4CTUSzFasJWThpckbrP81YOMjWb6eu1EhuQ7XVBAusfjvBC/vg6og5YK+I1e7MwXYR1Qt6o4dg0x9uCJzLbOAR7q76dsxl83Eqfm2p3cbT30mXjWl3uaW2rCLgK2JW9o0vJsYrTu0PH75Duj61aZbIVEvWfE0thPVMb1ILcxT0P731D5oyMhYaat6wzNdY/LMC1gp+58/ZAFp4SVZVNO/UfVmWXoB1ObmmaeAJYbr/shYuy9ILcMuuEqmyacey/P0n9BvZ0RCaWHOO/Hai5oz/2qh/lWrqp/vIm7t2jjnNRzUt03aWasrLmqeJILI3VtNA6LPn5eWuiDlgT7RwXkGO3VaO/B3HbAnO44SO1c46ZesSBSsyG3VOtg+kLPnf6bkVznPXLtDjo8cPY4GsrOmIm86kSZeNjhxlqS2rCAlBzEI4bunsFqRu9LXwRFsXzuESfa35jDYsAg9RkWzP3UAhpA7r6PgLwRDCZlIl0WXsXs3KCGEz6ZWM1mVnG07CqfgdpP+yh4FWhHAo8ZqnKgkffb6rmuAgdCaZCxcuQh3WVsSnznwvXbZ95CWW2rKKkFoRm4F6lRKsMH3hGpJTZH7E42Uw+hqoNs2YYxmZc6Ft7uUceIegTvIjnGJNnNMr4r1lWegZe8bGaE/W4VTut3CBOr9gMBH+Od/CHdbmwWnfYemy7SIuttSWVYT9ipjAzIrpyh5DQkoIA3IhHNsyQkUIA/qBavRw8SqtOZiLwCHCwD8jSE1NxcSJEzF+/Hh88MEHmt8zMzNx8803Y8KECVi4cCFOnTqlW2dICGI7vN6C5YmWUeVl2vuyUNXELrcuJxcPf1tkKIQjD6X1ykDcsva3MqEyZesi/GDZX5sZa1a7PNtudVmnPSpF9R/+fU/ub1bCWwYCRkKnimyT1ZB9VuxARESk9EcWpaWlePnll7Fu3Tps3rwZ69evR3Z2tqLM448/jmeeeQY7duyAz+fDhg0b9GkNBdWETObhcMtOPOChTHz4dHt/aEEXoYlAzStZU8RgeBmmFuZoQq7KwA7eieuwtjvyQf5Fd6KmF2pqtDGw4+LiEBcX5/++adMm/O9//8OyZcsAACtWrIDP58O8efMAAD///DPuuusu7Ny5EwCQlpaG1157DatXrxa231704+nTp7Fq1Sps2bIFpaWlaNeuHRISs5jtaAAACydJREFUEnD99ddjzpw5iI7mx6hlg83YR4frM1ymTCgh79XwovdcRaDmlex8mDU48PNmSqK5Nu3gnZP8j4C8aun991/HG2+8obk+b948PPjgg/7vZWVliI9vNa3t0aMHfvjhB+7v8fHxKC3VV1EJBfHf/vY31NTUYNGiRUhISIDP50NZWRnWr1+PZ555Bs8++6xuAy5cuHAR6rjrrrswbdo0zXV6NQwAZ86cUVhZ+Hw+xXe933kQCuJvvvkGO3bsUFzr168fRo0ahYkTJ+pW7sKFCxfhALUKgoeEhASkpbW6TpeXl6NHjx6K38vLW80Zjx07pvidB6GWun379qiqqtJcP378ONq3F8pwFy5cuGhzuOqqq7B3715UVlaivr4en332GUaPHu3/vU+fPoiJicH+/fsBAFu2bFH8zoNQmt59992YOnUqrr/+eiQkJCAiIgJlZWXYtWsX/vSnP1nskgsXLlyEF3r27IkFCxZg9uzZaG5uxi233IIRI0Zgzpw5mD9/PoYPH44XX3wRixYtQm1tLS666CLMnj1bt15dqwmv14tdu3ahuLgYPp8PvXr1wnXXXYfk5NCyv3XhwoWLcIWufiEpKQmnT59GSUkJIiMjkZCQ4AphFy5cuLARQkGcm5uLhx56CHV1dejZs6ffaqJdu3Z47bXXMHToUNHtLly4cOFCAkLVxG233YZ58+ZplM179uzBq6++io0bNzpOoAsXLly0dQitJk6ePMk88bvmmmvQ2NjoGFEuXLhwcS5BqJo4//zzsW3bNo3N8LZt29C1a1fpRr788ku89NJLaGpqQnJyMpYtW4ZOnTqZozgMsGXLFvz9739HREQEYmNjsXDhQgwfPhw333wzGhoaEBUVBQCYMmUK7r33XtTX12PRokXIyMjAmTNn8Pjjj2PcuHFB7oU9ePbZZ7F9+3Z06dIFADBgwAC88soreOedd7Bp0yacPn0aN954I+bNm4eIiAhUVlbiiSeewNGjRxEZGYklS5bgsssuC3IvrGPz5s345z//6f9+4sQJlJaWYvfu3Zg0aRISEhL8v91zzz248cYb2xwvfD4fnnzySSQlJeGee+7B6dOn8eyzz2LPnj04ffo0/vCHP2DGjBkAgPz8fCxcuBDHjx9Hx44d8dxzz2HQoBY37I0bN+If//gHTp06hSuvvBKLFi3yP1NhC58A+fn5vltuucX3y1/+0vfb3/7WN3HiRN+oUaN806dP9xUUFIhu9aOiosJ3xRVX+PLy8nw+n8/3/PPP+1JSUqTuDUfk5OT4fv3rX/tKS0t9Pp/P9+WXX/rGjBnjO3nypO+Xv/ylr6mpSXPPc88951u0aJHP5/P5fv75Z9/VV1/tKy4uDijdTuG2227z7d+/X3Htyy+/9N10002+kydP+hoaGnx33nmn75NPPvH5fD7f/PnzfW+99ZbP5/P5MjIyfFdffbWvrq4u4HQ7iaamJt9tt93m+7//+z9fTk6Ob/z48cxybYkX2dnZvlmzZvkuueQS33vvvefz+Xy+tWvX+u69915fc3Ozr6qqyjdhwgTf999/7/P5fL7p06f7tm7d6vP5WubLpEmTfGfOnPF5PB7f6NGjfRUVFb7Tp0/7FixY4Fu5cmXQ+mUXhKqJfv364cMPP8Qnn3yC5cuX429/+xs+/vhjbNy4EYmJiVKC/quvvsLw4cPRv39/AMCMGTOQmpoKXyBjDQUQ0dHR/397dx/S5BYHcPzrk6QtbTEr51AJIjMIIri1aknlM0KaoxpMKyiT3uzFCnr5x7kiLLHIKIL+sFCCVRZKDDLCKYH0Z4UEokVRE5xb72EibnP3j7En8nbVe6+36Tqf/zzbw3PO8Xg8z3n5PVRUVCinaRYtWsT79+958uQJKpWKXbt2YTabOXv2LAMDAwC4XC6sVisAOp0Og8HAgwcPolaG8TI4OEhHRwfXrl3DbDZTWlpKT08Pzc3N5Ofno1KpSEhIwGKx4HQ6CQQCPHr0iIKCAgAWLlzI3LlzaWtri3JJxldNTQ0ajYbNmzfz7NkzJEli69atmM1mrly5QjAYjLm6cDgcWK1W8vLylDSXy4XFYiE+Ph61Wo3JZMLpdOL1enn9+jUmkwmA1atX09/fT0dHBy0tLeTm5qLRaJAkicLCQpxOZ7SKNW5G3b72/PlzpXIi29eMRiN//DG2t5729vb+8Nil1Wrp6+vj27dvMTk9kZ6eTnp6OhB+FKusrCQ3N5fBwUH0ej1lZWVMmzaNY8eOceHCBcrKyvB4PKSlfQ/jmZqaSm/v2F4JP5F5vV6WL1/OkSNHmD9/PtevX2f//v2kpKSwYsUK5XtarRav18unT58YGhpCo9Eon8VKXUR8/PiR2tpaGhsbgXBgrZUrV3L06FECgQB79uwhKSkJk8kUU3Vht9sBePz4sZI2vN1rtVq6urrweDzMmTMHSfo+ToyU3ePxKH9fkWvGElRnohtxRHz79m1OnDiBWq0mJycHg8FAUlISdrudurq6Md1geBAM5cbShAiF/L/p7+/n8OHDuN1uKioqkGWZ8+fPM3PmTBISEti7d68SKi/0k8AgsVA/GRkZ1NTUkJWVRVxcHDt37sTtdv80MIokST9tK6FQiClTpvzqrP9v7ty5gyzLZGRkAOGdSeXl5ahUKmbMmEFxcTEul+u3qIvh7X4s7WD4k3TkmsluxBFxbW0td+/e/UswjO3bt2O1WtmxY8eoN0hLS6O9/ftL/LxeL2q1GpVK9e9yPAn09PRQUlLCvHnzuHHjBomJibS2tpKcnMzSpUuBcAOKxOtIS0vD5/Mxa9YsIBxKLxb2aHd2dtLZ2cnGjRuVtFAohE6nw+fzKWk+nw+tVktKSgqhUIjPnz8ri8E+n4/U1L8PsD7ZNDU1YbPZlJ/v3btHdna28vuOtIvfoS4i7T4i0g50Oh3v3r37oaOOfPZ310x2I/4rkSSJ5OTkv6RPnz59zKuUq1ator29nTdv3gDhUbYsy/88p5NEX18f27ZtY926dVy8eJHExEQgPEVTVVXFwMCAEuc5shtFlmXq6+uV77W1tbF27dqolWG8SJLEmTNn6O7uBuDmzZssWLAAWZZxOp309/czODhIY2MjRqOR+Ph41qxZo7zRoLOzk1evXqHX66NZjHHz5csX3G43S5YsUdJevnzJ5cuXCQaDDAwM4HA4WL9+fczXBYTbfUNDA4FAgK9fv3L//n2MRiNarZbMzEyampqA8LkFSZLIysoiNzeX1tZWPnz4QCgUor6+PiZ2GI04Is7JyaGkpASLxaLM5fh8PhoaGjAYDGO6QUpKCpWVlRw6dAi/309mZiZVVVX/PecTlMPhUBakmpublfS6ujq6u7vZtGkTwWAQvV6vBE4qLS3l1KlTmEwmgsEgx48fH/Ni6ESWlZWFzWZj3759BINBtFot1dXV6HQ6Xrx4gdVqxe/3I8uyMmo+efIkNpuN/Px84uLiOHfu3E8HA5PR27dvmT179g+DmIMHD3L69GnMZjOBQIC8vDxl4TaW6wLCC/dut5sNGzbg9/spLCxk2bJlAFRXV1NeXs7Vq1eZOnUqly5dQpIksrOzOXDgAEVFRfj9fhYvXszu3bujXJL/bsSTdUNDQ9y6dYuWlhY8Hg9DQ0PodDpkWWbLli0xNV8lCIIQLb/4nXWCIAjCcKMu1o2kuLh4XDMjCILwOxqxI+7q6uLhw4c/bMIWBEEQxteoUxNFRUUUFBQop1wEQRCE8TXqTmi73c7Tp09/RV4EQRB+S2KxThAEIcom/9lAQRCESU50xIIgCFEmOmJBEIQoEx2xIAhClImOWBAEIcr+BJ0zbP7U0qoaAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x21863ac3860>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.set()\n",
    "ax = sns.heatmap(\n",
    "    first_items.fillna(0),\n",
    "    vmin=0,\n",
    "    vmax=1,\n",
    "    cmap=\"YlGnBu\",\n",
    "    xticklabels=250,\n",
    "    yticklabels=250)\n",
    "ax.tick_params(labelsize=12)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>图 9-3：相似论文热图，基于两个初始特征：发表年份和研究领域</center>\n",
    "\n",
    "颜色较暗的像素表示彼此相似的项目。黑色对角线说明了余弦相似度能正确地表示出每篇论文都与它本身是最相似的。但是，因为有个特征中有很多 NaN 值，所以对角线是断断续续的。可以看出，尽管多数项目是彼此不相似的（这说明我们的数据集来源非常广泛），但还是有一些相似度评分很高的候选值。定性地看，这些可能是好的推荐，也可能不是， 但至少说明了我们的方法不是一无是处。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "例 9-5 给出了将物品相似度转换为推荐的方法。值得庆幸的是：我们还有大量可用的特征，改进空间非常大。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 例 9-5：基于物品的协同过滤推荐"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [],
   "source": [
    "def paper_recommender(paper_index, items_df):\n",
    "    print('Based on the paper: \\nindex = ', paper_index)\n",
    "    print(model_df.iloc[paper_index])\n",
    "    top_results = items_df.loc[paper_index].sort_values(\n",
    "        ascending=False).head(4)\n",
    "    print('\\nTop three results: ')\n",
    "    order = 1\n",
    "    for i in top_results.index.tolist()[-3:]:\n",
    "        print(order, '. Paper index = ', i)\n",
    "        print('Similarity score: ', top_results[i])\n",
    "        print(model_df.iloc[i], '\\n')\n",
    "        if order < 5: order += 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Based on the paper: \n",
      "index =  2\n",
      "abstract                                                    NaN\n",
      "authors       [{'name': 'Jovana P. Lekovich', 'org': 'Weill ...\n",
      "fos                                                         NaN\n",
      "keywords                                                    NaN\n",
      "lang                                                         en\n",
      "references                                                  NaN\n",
      "title         Should endometriosis be an indication for intr...\n",
      "url           [http://www.fertstert.org/article/S0015-0282(1...\n",
      "year                                                       2015\n",
      "Name: 2, dtype: object\n",
      "\n",
      "Top three results: \n",
      "1 . Paper index =  71\n",
      "Similarity score:  1.0\n",
      "abstract                                              NaN\n",
      "authors                    [{'name': 'Harold S. Wilson'}]\n",
      "fos                                                   NaN\n",
      "keywords                                              NaN\n",
      "lang                                                  NaN\n",
      "references                                            NaN\n",
      "title         Chapter III. “My Blood Is Like Champagne”62\n",
      "url                                                   NaN\n",
      "year                                                 2015\n",
      "Name: 71, dtype: object \n",
      "\n",
      "2 . Paper index =  798\n",
      "Similarity score:  1.0\n",
      "abstract                                                    NaN\n",
      "authors                              [{'name': 'غفاری، سعیده'}]\n",
      "fos                                                         NaN\n",
      "keywords      [جستجو در پایان نامه های دانشجویی دانشگاه فردو...\n",
      "lang                                                         fa\n",
      "references                                                  NaN\n",
      "title         تهیه نانوکامپوزیت پلیمری توسط نانوذرات اکسید ف...\n",
      "url           [http://thesis.um.ac.ir/rs/59364.pdf, http://t...\n",
      "year                                                       2015\n",
      "Name: 798, dtype: object \n",
      "\n",
      "3 . Paper index =  2\n",
      "Similarity score:  1.0\n",
      "abstract                                                    NaN\n",
      "authors       [{'name': 'Jovana P. Lekovich', 'org': 'Weill ...\n",
      "fos                                                         NaN\n",
      "keywords                                                    NaN\n",
      "lang                                                         en\n",
      "references                                                  NaN\n",
      "title         Should endometriosis be an indication for intr...\n",
      "url           [http://www.fertstert.org/article/S0015-0282(1...\n",
      "year                                                       2015\n",
      "Name: 2, dtype: object \n",
      "\n"
     ]
    }
   ],
   "source": [
    "paper_recommender(2, first_items)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "好消息是返回的最相似论文就是我们要找的论文，坏消息是即使我们使用了选定的特征，另外两篇论文似乎与我们的搜索初衷相去甚远。\n",
    "\n",
    "“是的，是的，”你可能会说，“但现在是大数据时代，这会解决我们的问题！我们难道不能通过更多数据找出更好的结果吗？”可能会，但即使大数据也不能弥补糟糕的数据和特征选择所造成的恶果。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](images/chapter9/9-4.png)\n",
    "<center>图 9-4：机器学习(https://xkcd.com/1838/)</center>\n",
    "现在的暴力方法太慢了，远算不上是智能的、迭代的特征工程。下面试验一下新的特征工程技术，看看是否能提高计算速度，找到更合适的特征和搜索结果的更好方式。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第二步：更多特征工程和更智能的模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最初的方法是创建一个巨大的、稀疏的数组，然后通过一个筛选器暴力求解。有多种方式可以改进这种方法。下一步的重点是使用更好的技术来处理两个初始特征，并修改基于物品的协同过滤方法来加快迭代。\n",
    "\n",
    "首先，在假设中的两个变量上，试验一下本书介绍过的精彩的特征工程技巧。在更加深入地研究了特征之后，我们可以选择那些适合每种变量的技术，将变量转换为适合推荐系统的“更好”的特征。\n",
    "\n",
    "\n",
    "### 学术论文推荐器：第2轮\n",
    "先看出版年份。2.2.2 节（“Quantization or Binning”）中介绍了为什么使用原始计数作为特征不适合那些使用相似度度量的方法。例 9-6 和图 9-5 会研究如何对 year 进行转换，以使它更加适合我们选择的模型。\n",
    "\n",
    "#### 例 9-6：等宽分箱 + 虚拟编码（第 1 部分）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Year spread:  1800  -  2017\n",
      "Quantile spread:\n",
      " 0.25    1992.0\n",
      "0.50    2005.0\n",
      "0.75    2012.0\n",
      "Name: year, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "print(\"Year spread: \", model_df['year'].min(), \" - \", model_df['year'].max())\n",
    "print(\"Quantile spread:\\n\", model_df['year'].quantile([0.25, 0.5, 0.75]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0,0.5,'Occurrence')"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZUAAAEPCAYAAACKplkeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X10VPWdx/H3PJAQTAImhAyIMS0uilCyWhDdUqCEYE4IBsKzFOmRFONSxFpBUCCGh1IerCA+gK2KUCogpE14hsCBRrbr0eIREcHdYEoV8kBIVkN4mGTu/mEzTWCASbgzySSf1zmcw9w7d+53vgz55Hfvnd+1GIZhICIiYgJrYxcgIiLNh0JFRERMo1ARERHTKFRERMQ0ChURETGNQkVEREyjUBEREdMoVERExDQKFRERMY1CRURETKNQERER0yhURETENAoVERExjd2XL24YBjNnzqRr165MmjSJixcvkpmZyaeffophGPTs2ZOMjAxat27NuXPnmDFjBqdPn8ZqtTJv3jzuu+8+AA4cOMCLL77I5cuXueuuu/j1r39NaGhovWopKzuPy9WwCZkjI0MpLa1o0LbNmfrimfrimfriWVPti9Vq4dZbb6n3dj4Llfz8fDIzMzly5Ahdu3YF4PXXX6e6upqcnBwMw2D69OmsXr2aadOmkZmZSa9evUhPT+fzzz9n8uTJ7NmzhwsXLjBr1izeffddYmNjWbp0KcuWLeOFF16oVz0ul9HgUKnZXq6mvnimvnimvnjWnPris8Nf69evZ9SoUSQmJrqX9e7dmyeeeAKr1YrNZqNbt26cPn2aqqoqDhw4wOjRowHo1q0bsbGx5OXl8f777/ODH/yA2NhYAMaNG8fWrVvRbWBERJoen41U5s6dC8ChQ4fcy/r27ev++9dff80777zD/PnzKSsrw+VyERER4V4fHR1NYWEhFy9exOFwuJc7HA4qKio4f/58vQ+BiYiIb/n0nMq1HD16lF/84hf89Kc/5Sc/+QlFRUVYLJY6zzEMA5vNhsvlumodgNVav0FWZOTNBVBUVNhNbd9cqS+eqS+eqS+eNae++D1Utm/fTmZmJnPmzGHo0KEAREZGYhgG5eXltGvXDoDi4mKio6MJDQ3lk08+cW9fVFRE27ZtadOmTb32W1pa0eDjllFRYZSUfNugbZsz9cUz9cUz9cWzptoXq9XSoF/G/XpJ8f79+1mwYAFvvvmmO1AA7HY7AwYMYNOmTQAcP36c/Px8+vTpQ9++ffnkk08oKCgAYMOGDcTHx/uzbBER8ZJfRyqLFy/GMAxmz57tXnbfffeRkZFBRkYGs2fPJjk5GYvFwpIlSwgL+25IuGjRIp588kmcTicxMTEsXrzYn2WLiIiXLEYLuYxKh7/Mp754pr54pr541lT7EhCHv0RExH/CwkMICw/x6z4b5eovERHxvdbB3/2I9+c4SCMVERExjUJFRERMo1ARERHTKFRERMQ0ChURETGNQkVEREyjUBEREdMoVERExDQKFRERMY1CRURETKNQERER0yhURETENAoVERExjUJFRERMo1ARERHTKFRERMQ0ChURETGNQkVEREyjUBEREdMoVERExDQKFRERMY1CRURETKNQERER0yhURETEND4PFcMwePbZZ3nzzTcBqK6uZuHChSQmJpKQkMC7777rfm5BQQHjx48nKSmJkSNHkp+f7163efNmkpKSGDx4MBkZGTidTl+XLiIi9eTTUMnPz2fixIns3r3bvWzDhg0UFBSwbds2Nm/ezDvvvMORI0cAeOaZZxg7diw7duxg6tSpTJs2DcMw+OKLL1i5ciV/+MMf2LVrF99++y1r1qzxZekiItIAPg2V9evXM2rUKBITE93LcnNzSU1NxW6307ZtW4YMGUJOTg5FRUWcPHmSIUOGANC/f38qKys5duwY+/btY+DAgURERGC1WhkzZgw5OTm+LF1ERBrAp6Eyd+5chg4dWmfZmTNn6Nixo/uxw+GgsLCQM2fO0KFDB6zWf5UUHR3tXnflNkVFRb4sXUREGsDu7x0ahoHFYqnz2Gq14nK56iyvWWez2TAM46rltcPHG5GRoQ0vGoiKCrup7Zsr9cUz9cUz9cUzX/fFn333e6h07NiR4uJi9+Pi4mIcDgedOnWipKSkTujUrLvWNvVRWlqBy2Xc+IkeREWFUVLybYO2bc7UF8/UF8/UF8980Zew8BBaB//rx3tDXt9qtTTol3G/X1IcHx/Pli1bqKqq4ptvvmH79u0MGjQIh8NBTEwMO3bsACAvLw+r1UrXrl0ZOHAg+/fvp7S0FMMw2LhxI4MGDfJ36SIiAaF1sJ2hv8pulH37faQybtw4Tp06RUpKCk6nkzFjxnD//fcD8Nvf/pY5c+bw+uuvExQUxIoVK7Bardx9991MmTKFiRMn4nQ6iYuL4+c//7m/SxcRkRuwGFeesGimdPjLfOqLZ+qLZ+qLZ2b3pebQ19BfZbP1xRSgmR/+EhER36l9LqUxKFRERMQ0ChURETGNQkVEREyjUBEREdM07hkdERExxZVfeGwsGqmIiASgsPAQwsJD3I8b8wuPtTV+rImISL3VjEq+hTrh0tg0UhERCXBN4bBXDYWKiIiYRqEiIiKmUaiIiIhpFCoiImIahYqISIC67KxucnfTVKiIiASooFa2JvHdlNoUKiIiYhqFioiImEahIiIiplGoiIiIaRQqIiJiGoWKiIiYRqEiIiKmUaiIiIhpFCoiImIahYqIiJhGoSIiIqZRqIiIiGkaJVT27t3L0KFDSUlJ4dFHH+XUqVNUV1ezcOFCEhMTSUhI4N1333U/v6CggPHjx5OUlMTIkSPJz89vjLJFRJqEpnRP+iv5/cbGFy9eZPr06WRnZ3PHHXewZs0aFixYQP/+/SkoKGDbtm2cP3+eMWPG0L17d3r27MkzzzzDxIkTGTp0KAcPHmTatGls3boVi8Xi7/JFRBpdU7on/ZX8PlKprq7GMAy+/fZbAM6fP09wcDC5ubmkpqZit9tp27YtQ4YMIScnh6KiIk6ePMmQIUMA6N+/P5WVlRw7dszfpYuIyA34Pe5uueUWMjMzGTt2LO3atcPlcvHuu+/y+OOP07FjR/fzHA4HJ06c4MyZM3To0AGr9V/5Fx0dTWFhId27d/d3+SIich1+D5UTJ07w6quvsmPHDmJiYli7di1Tp07F5XLVOZxlGAZWq/Wq5TXrbDZbvfYbGRl6U3U3tburNRXqi2fqi2fqi2e+7os/++73UHn//fe57777iImJAWD8+PEsWrSIPn36UFxc7H5ecXExDoeDTp06UVJSgmEY7nCpWVcfpaUVuFxGg2qOigqjpOTbBm3bnKkvnqkvnqkvnjWkL/UNiYb03Wq1NOiXcb+fU7nnnnv48MMPOXv2LAC5ubl07tyZ+Ph4tmzZQlVVFd988w3bt29n0KBBOBwOYmJi2LFjBwB5eXlYrVa6du3q79JFROQG/D5SefDBB5k0aRITJkygVatWtG3bltdee43vfe97nDp1ipSUFJxOJ2PGjOH+++8H4Le//S1z5szh9ddfJygoiBUrVtQ5xyIiIk2DV6Hicrl46623+J//+R/mzJnD+vXrSUtLq/d5jRrjx49n/PjxVy1//vnnPT4/NjaWdevWNWhfIiLiP179ur9kyRK++OILjhw5Anx3CGrRokU+LUxERAKPV6Hy17/+ld/85jcEBwcTGhrKW2+9xaFDh3xdm4iIBBivQsVut9c5hxEUFITd3nS/0SkiIo3Dq2To2rUr69evp7q6mpMnT7JmzRruvvtuX9cmIiK1hIWHNOkpWsDLkcrzzz/PZ599RmlpKY888giVlZU899xzvq5NRERqaR1sZ+ivshu7jOvyKvJCQ0N54okn+PWvf01FRQWnTp3i1ltv9XVtIiISYLwaqaxbt47//M//BKCsrIypU6fy3nvv+bQwEREJPF6FysaNG933N7n99tv585//zNq1a31amIiIBB6vQqW6uprQ0H/NARMWFqZ7mYiIyFW8CpXvf//7LFu2jH/84x/84x//YMWKFcTGxvq4NBERCTRehUpmZiYFBQUMGzaMkSNHUlBQwAsvvODj0kREJNB4dfVX+/bteeWVV3xdi4iIBDivQuXkyZP87ne/o7y8HMP41z1JVq1a5bPCRETkX8LCQxq7BK94FSozZ86kZ8+e9O7dWyfoRUQaQVP/Jn0Nr6q8cOECs2fP9nUtIiIS4Lw6UX/HHXfUudWviIiIJ17fpCs5OZnu3bsTHBzsXq5zKiIiUptXoZKQkEBCQoKvaxERkQDnVagMHz6cwsJCTpw4Qd++fSkqKqJTp06+rk1ERAKMV+dUDh48yNixY8nMzKS0tJQhQ4aQm5vr69pERCTAeBUqr7zyCps2bSI8PJwOHTrwxz/+kZdfftnXtYmISIDxekLJDh06uB9369ZN31cREZGreHVOJSQkhNOnT7uD5KOPPqpzFZiIiPhGINxCuDavKv3Vr37FY489RklJCWPGjKGgoICVK1f6ujYRkRav5hbCW19MaexSvOJVqMTExLBp0yY+/vhjXC4XcXFxRERE+Lo2EREJMF6Fyk9/+lN27dpF//79fV2PiIgEMK9O1N92220cPnwYl8tlyk5PnDjBhAkTGDZsGKmpqRw9ehSA1atXk5iYSEJCAitXrnTPiHzu3DnS0tJISkoiOTmZw4cPm1KHiIiYy6uRSn5+Po888gh2u52goCAMw8BisTToh/uFCxeYNGkSCxcupH///uTm5vLMM88wa9Ysdu7cSVZWFjabjUmTJtGlSxeSkpLIzMykV69epKen8/nnnzN58mT27NlDSEhgTAUtItIQgTLdfW1ehcry5cuJiooyZYeHDh3i9ttvdx9Ki4+Pp3PnzvzhD38gOTmZNm3aAJCamkpOTg6DBw/mwIEDZGRkAN9dzhwbG0teXh6DBw82pSYRkaYokK76quH1/VR27dplyg6//PJLoqKieO655zh+/Djh4eFMnz6dM2fO8OCDD7qf53A4KCoqoqysDJfLVefCgOjoaAoLC02pR0REzONVqNScU/n3f/93rFavTsNcU1VVFQcPHmTt2rXExcWRm5vL5MmT+f73v1/nC5WGYWC1WnG5XFd90dIwDGw2W732GxkZelN1R0WF3dT2zZX64pn64pn64pmv++LPvvv9nEqHDh3o0qULcXFxAAwaNIjZs2djtVrr3LOluLgYh8NBZGQkhmFQXl5Ou3bt3Ouio6Prtd/S0gpcLuPGT/QgKiqMkpJvG7Rtc6a+eKa+eKa+eHa9vpgVBg3pu9VqadAv416Fyvr16+v9wtfSr18/Fi9ezNGjR+nRowcffvghFouFiRMn8sorrzB69GjsdjtZWVmkpqZit9sZMGAAmzZtYvLkyRw/fpz8/Hz69OljWk0iImIOr0KlvLzc4/Lbbrut3juMiori1VdfJTMzkwsXLhAUFMTKlSvp1asXX3zxBaNGjcLpdBIfH8+wYcMAyMjIYPbs2SQnJ2OxWFiyZAlhYRpGi4g0NV6FytSpU91/dzqdlJSU0KNHDzZv3tygnfbu3Zv33nvvquXp6emkp6dftbx9+/a6y6SISADwKlT2799f5/EHH3zA1q1bfVKQiIgErgZdytWnTx8+++wzs2sREZEA59VIpXaAGIbB0aNHuXjxos+KEhGRwFTvcyoWi4WIiAheeOEFX9UkIiIByutzKhUVFYSGhnLp0iUqKiqIjIz0dW0iIhJgvDqnsmPHDlJTUwE4ffo0ycnJV528FxER8SpUVq1axdq1awH43ve+R1ZWlu78KCLiQ4E4QzF4GSoulwuHw+F+3LFjR9PurSIiIlcLxBmKwctQiYiIYMOGDVRVVVFdXc3mzZtp3769r2sTEZEA41WozJs3j02bNhEXF0fPnj3ZtGmT+/4mIiIiNbwaX8XGxrJu3Tqqqqqw2WxcunRJV3+JiMhVvL76a/jw4bRt25aSkhJd/SUiIh7p6i8RETGNrv4SERHT6OovERExjVcn6ufNm8fTTz/N/PnzAejevTvLli3zaWEiIi1RWHhIwH5HBbwYqRQVFbFu3Tqqq6vp0qULw4YNY8WKFcTExPijPhGRFqV1sJ2hv8pu7DIa7LqhcubMGUaNGoXNZuOpp55iypQpBAcHM2rUKL7++mt/1SgiIgHiumOs5cuX8/TTT7vvFQ/w0EMP0b17d5YvX87SpUt9XqCIiASO645Ujh07VidQaowYMYIjR474rCgREQlM1w0VwzCuuS4oKMj0YkREWrJAnZm4tuuGis1mo6io6KrlRUVFChUREZMF8lVfNa4bKmPHjuW5556joqLCvay0tJQZM2bwyCOP+Lw4EREJLNeNxXHjxnHq1Cl+/OMfc+edd1JVVUVBQQGPPvooI0aM8FeNIiISIG441nr22Wf52c9+xieffAJAXFwc0dHRPi9MREQCj1cH8KKjoxk8eLCvaxERkQDn1dxfIiIi3mjUUMnNzeXee+91P169ejWJiYkkJCSwcuVK9yXN586dIy0tjaSkJJKTkzl8+HBjlSwiYrqw8BCiosIauwxTNFqoFBQUsHjxYvfjgwcPsnPnTrKysti2bRsffPABO3fuBCAzM5NevXqxY8cOli5dyrRp07hw4UJjlS4iYqpAn++rtkYJlQsXLjB9+nRmzpzpXrZ3716Sk5Np06YNwcHBpKamkpOTQ1VVFQcOHGD06NEAdOvWjdjYWPLy8hqjdBERuY5G+abN3LlzGTNmDHfddZd72ZkzZ3jwwQfdjx0OB0VFRZSVleFyuYiIiHCvi46OprCwsF77jIwMvamam8vQ1Gzqi2fqi2fqS+PwZ9/9Hirr16/HbrczcuRIvvrqK/dywzCwWCx1HlutVlwuV53lNetsNlu99ltaWoHLde1pZ64nKiqMkpJvG7Rtc6a+eKa+eKa+eOaPH/gN6bvVamnQL+N+D5U//elPXLx4kZSUFJxOp/vv99xzD8XFxe7nFRcX43A4iIyMxDAMysvLadeunXudvisjItL0+P2cyubNm9m2bRvZ2dm88cYbtG7dmuzsbBISEsjJyaGyspLLly+TlZXFoEGDsNvtDBgwgE2bNgFw/Phx8vPz6dOnj79LFxEx3WVndWOXYKomM3vZwIED+eKLLxg1ahROp5P4+Hj3tPsZGRnMnj2b5ORkLBYLS5YsISxMx2ZFJLCFhYcQ1Kp+h/KbukYNlc6dO/Pxxx+7H6enp5Oenn7V89q3b8+qVav8WZqIiM81h1mJr6Rv1IuIiGkUKiIiYhqFioiImEahIiIiplGoiIiIaRQqIiJiGoWKiIiYRqEiIiKmaX7fvBERaeLCwkOa5RcfQSMVERG/a0435bqSQkVEREyjUBEREdMoVERExDQKFRERPwoLD2nsEnyqeV5+ICLSxDTnK75q00hFRMQPmvMVX7UpVERExDTNfywmItKIwsJDsFgsjV2G3yhURER8qCWcR6lNh79ERMQ0ChUREZOFhYc0+0uHr0WhIiJistbBdqxWC1FRYY1dit8pVERETHDl6CSola1FXEJ8JYWKiIgJWvLopDaFiojITaoZobTU0UltChURkZvU0i4bvp5GCZXs7GwefvhhUlJSGDt2LJ9++ikAq1evJjExkYSEBFauXIlhGACcO3eOtLQ0kpKSSE5O5vDhw41RtoiI3IDf4/XkyZMsXbqUrKwsOnTowMGDB5k6dSqZmZns3LmTrKwsbDYbkyZNokuXLiQlJZGZmUmvXr1IT0/n888/Z/LkyezZs4eQkJZ5yZ6ISFPl95FKUFAQCxYsoEOHDgD06NGDs2fPsmvXLpKTk2nTpg3BwcGkpqaSk5NDVVUVBw4cYPTo0QB069aN2NhY8vLy/F26iIjcgN9HKp07d6Zz584AGIbBokWLGDhwIMXFxfTt29f9PIfDQVFREWVlZbhcLiIiItzroqOjKSws9HfpIiJyA412dqmyspKZM2dSWFjI73//e5566qk6k64ZhoHVasXlcl01GZthGNhstnrtLzIy9KbqbemXCV6L+uKZ+uKZ+tI4/Nn3RgmV06dPk56eTpcuXVi7di2tW7emY8eOFBcXu59TXFyMw+EgMjISwzAoLy+nXbt27nXR0dH12mdpaQUul9GgeqOiwigp+bZB2zZn6otn6otnzaEvNTMOBwfZuHipCgiMK78a0ner1dKgX8b9fk6loqKCCRMmMHjwYF566SVat24NQHx8PDk5OVRWVnL58mWysrIYNGgQdrudAQMGsGnTJgCOHz9Ofn4+ffr08XfpItLCtQ62Exz03XdRWgfbW8yNt+rD7xG7fv16Tp8+zd69e9m7d697+Zo1axg8eDCjRo3C6XQSHx/PsGHDAMjIyGD27NkkJydjsVhYsmQJYWEaRouINDV+D5XHH3+cxx9/3OO69PR00tPTr1revn17Vq1a5evSRESuqaXOOlxf+ka9iIgXrjx3ctlZ3UiVNG0KFRGRBghqVb8rUFsKhYqIiJhGoSIiIqZRqIiI1NKSbwVsBoWKiLRoV4ZI7ZttKVzqT6EiIi2apxCpudlWIHxbvqlRqIhIi3etELnsrCa8bRvNWVYPChURkWsIamVzT8si3lGoiEiLo5PxvqMDhiLSIoSFh9A62M7FS1Xuw1xXzt172VmtLzXeJI1URKRFqJlRuCZQLjurrzpXokC5eQoVEWmWbnSIq+bkvJhLoSIizU7Noa6aUUntcNFEkL6lcyoi0uzoEFfj0UhFRJotHeLyP41URCRg1RzW+vabC+7H+hZ849JIRUQC1pXnTXTP+ManUBGRZkEjlKZBoSIiAcPTZcKeTsZL41GoiEjA8HSZsE7GNy0aL4pIQNHIpGnTSEVEmpyw8JBr3iRLI5OmTaEiIk1OzVVcVqvFfR5Fo5PAoFAREb8LCw9x3/yqJjRqHoe3beN+XlArG1arRZcKBxCFioj4XFh4CJec1e7DWa2D7e6bX9WcfK95HBxUdxoVTasSWBQqIgL49sZVrYPtBP9z1HHlYSxN8Ni8BFSoHDhwgKFDh/LQQw/x5JNPUlFR0dgliQSMmkNMtYOj5lxFeNs2dS7XvXLdtcLmRkF05bkQTyfZNRJpXgImVM6dO8esWbNYuXIlu3fv5vbbb2fZsmWNXZaIz9T+ge3tD+/az7/y6qmaQ0w1o4WaIKl9yKnmct0r19XepnbItA62u9d52rfOhbQ8ARMq77//Pj/4wQ+IjY0FYNy4cWzduhXDMLza3mq1NPjPzW7fXP80dl/Cw0MIDw/x2XY3el54zW/y/3xOzePLzuqrXqP2c8NrjQBq/rgf13qt2j+wa//d/Zq1tmsdbGfSgj11tnU//ufzagS1sjFpwR53kHS4NeSG62ovrx0ytdfV7Kv2vq98/Ssfa53v1tX++838H68vi+HtT+VG9sYbb/DVV18xb948AKqqqujevTt/+9vfCA0NbeTqREQEAmik4nK5sFiuTk6rNWDegohIsxcwP5E7duxIcXGx+3FRURFt27alTZs219lKRET8KWBCpW/fvnzyyScUFBQAsGHDBuLj4xu3KBERqSNgzqkAHDx4kBdffBGn00lMTAyLFy+mXbt2jV2WiIj8U0CFioiING0Bc/hLRESaPoWKiIiYRqEiIiKmUaiIiIhpWmyoGIbBs88+y5tvvglAdXU1GRkZJCUlkZSUxOLFi91TwBQUFDB+/HiSkpIYOXIk+fn57tfZvHkzSUlJDB48mIyMDJxOZ6O8H7PUpy/79+/n/vvvJyUlxf2nZpLP5jb555V9KS8v56mnnuKhhx5i+PDhrFu3zv3clvx5uV5fWsrnJTs7m4cffpiUlBTGjh3Lp59+CsDq1atJTEwkISGBlStXuv8fnTt3jrS0NJKSkkhOTubw4cPu1wrIvhgt0P/+7/8aEyZMMOLi4ozf//73hmEYxpYtW4wJEyYYVVVVxuXLl43U1FRjx44dhmEYxogRI4ycnBzDMAzjwIEDxpAhQwyXy2WcOHHC6Nevn1FaWmpUV1cbv/zlL4033nij0d7XzapvX5YtW2a8/vrrV71OaWmp8cADDxhffvmlYRiGsWTJEiMjI8Nfb8N0nvoyY8YMY9asWUZVVZVx6dIlIy0tzdi/f79hGC3783K9vrSEz0t+fr7xox/9yCgqKjIM47t///79+xsHDhwwUlJSjPPnzxsXL140xo8fb2zfvt0wDMN48skn3X05duyY0bdvX6OysjJg+9IiRyrr169n1KhRJCYmupdVV1dz4cIFLl++zOXLl3E6nQQHB1NUVMTJkycZMmQIAP3796eyspJjx46xb98+Bg4cSEREBFarlTFjxpCTk9NYb+um1acvAB9//DH//d//zcMPP8wjjzzChx9+CNz85J9Njae+fPbZZ6SkpGCz2QgKCmLAgAHs3r27xX9ertUXaBmfl6CgIBYsWECHDh0A6NGjB2fPnmXXrl0kJyfTpk0bgoODSU1NJScnh6qqKg4cOMDo0aMB6NatG7GxseTl5QVsX1pkqMydO5ehQ4fWWZaamkp4eDj9+vWjb9++3HHHHQwcOJAzZ87QoUOHOnOMRUdHU1hYyJkzZ+jYsaN7ucPhoKioyG/vw2z16QtAu3btGDt2LNnZ2Tz99NP84he/oLCwkMLCQhwOh/s1HA4HFRUVnD9/3q/vxyye+tKzZ0+ys7NxOp2cP3+e3bt3U1JS0uI/L9fqC7SMz0vnzp0ZMGAA8N2hwUWLFjFw4ECKi4s9/tuXlZXhcrmIiIhwr6v5vARqX1pkqHjyyiuvEBERwaFDh/jLX/5CeXk5b731lseJLA3DwGazXfUbg2EYzW6Cy2v1pWZdYmIiFouFXr16ce+993Lo0KEWMfnnzJkzsVgsDB8+nClTpvCjH/2IVq1atfjPy7X6Ai3r81JZWcm0adM4deoUCxYswDCMOu+x5t/+ep+XQO1L067Oj/bu3cuIESMICgoiLCyM4cOH88EHH9CpUydKSkrq/EAoLi7G4XBcNcllzfLm5Fp9+eabb1i1alWdvhiGgd1ubxGTf1ZUVDB9+nS2bdvGmjVrMAyDmJiYFv95uVZfWtLn5fTp04wdOxabzcbatWsJDw+/5r99ZGRPY4I2AAAF3UlEQVQkhmFQXl5eZ110dHTA9kWh8k/33HMPO3fuBMDpdLJ//37i4uJwOBzExMSwY8cOAPLy8rBarXTt2pWBAweyf/9+SktLMQyDjRs3MmjQoMZ8G6a7Vl9uueUW1q9fz549ewA4duwYR44c4cc//nGLmPxzw4YNvPzyywCcPXuW9957j+Tk5Bb/eblWX1rK56WiooIJEyYwePBgXnrpJVq3bg1AfHw8OTk5VFZWcvnyZbKyshg0aBB2u50BAwawadMmAI4fP05+fj59+vQJ2L606Lm/Zs6cyb/9278xadIkysrKmD9/PseOHcNms/Hggw8yY8YMgoKCKCgoYM6cOZSVlREUFMT8+fPp3r07AFu2bOHtt9/G6XQSFxfH/Pnz3SeyA5W3ffn0009ZsGAB58+fx2azMWvWLB544AGgeU7+WbsvFRUVzJgxg1OnTmEYBpMnTyYlJQWgRX9erteXlvB5Wb16NcuXL6dr1651lq9Zs4aNGzeydetWnE4n8fHxzJgxA4vFwtmzZ5k9ezZfffUVFouFZ599lr59+wKB2ZcWHSoiImIuHf4SERHTKFRERMQ0ChURETGNQkVEREyjUBEREdMoVESAzMxMRo8eTXV1tXtZdXU1Y8eO5aWXXvLZfsvLy5k3bx5Dhw4lJSWFYcOGkZWV5bP9bdiwgQ0bNvjs9UUUKiJ8912LCxcusHr1avey1atXY7PZePLJJ32yzwsXLjBhwgQ6d+7Mn//8Z7Kzs1m5ciWvvvoqf/rTn3yyz48++oiLFy/65LVFAOyNXYBIUxAcHMyyZcsYN24cP/nJTzAMgz/+8Y9s2bIFm80GQG5uLqtWraKqqoqQkBBmzpxJXFwcxcXFzJ07l7KyMkpKSrjttttYsWIFERER9OvXjx/+8IccP36c6dOnuyfjBNi2bRtt27blsccecy+7/fbbWbFihXvEdOLECRYsWEB5eTkWi4W0tDQefvhh/uu//ovFixeTnZ0NUOfxSy+9RHFxMUVFRXz99dd06tSJpUuX8tFHH/GXv/yFDz74gODgYMaNG+fHDktLoVAR+ae77rqLX/7ylzz//PO4XC4WLlxIdHQ0APn5+bz88susW7eOtm3bcvz4cdLS0ti3bx/btm2jd+/eTJo0CZfLRVpaGlu3bmXixIkA3H333R4PoR09epT77rvvquU9evQAvpsW54knnuD5558nPj6ewsJCRo4c6Z4K/Xr+9re/kZWVRWhoKD//+c/ZuHEjU6ZMITc3lx49eihQxGcUKiK1TJgwgd27d9OlSxf69+/vXn7o0CGKiop49NFH3cssFgunTp3iscce48MPP+Ttt9+moKCA/Px8evfu7X7eD3/4Q4/7qpml9lry8/MxDMM935PD4SAhIYG8vDzuvffe676PBx54gNDQUOC7+dv+7//+78ZvXsQEChWRK3Tu3JmYmJg6y1wuF3379uXFF190Lztz5gzR0dH85je/4fjx4wwfPpw+ffpw6dKlOrPx3nLLLR73ExcXx5YtW65avmfPHo4cOUJSUtJVU5+7XC6qqqqwWCx19nHlbYmvnE9MszGJv+hEvYgXHnjgAfLy8vjyyy8B2LdvH8OGDePSpUu8//77/OxnPyMlJYVbb72Vv/71r9cdgdRISkqitLSUt99+230O5e9//zuLFy/mzjvv5M4778TlcrFv3z4ACgsLyc3N5T/+4z+49dZb+frrrzl37hyGYbB9+3av3ofdbqeqqqqBXRC5MY1URLxw991388ILL/DUU0+57wPy2muvERISwpQpU1i4cCEvvvgirVq1olevXvz973+/4WsGBQXxzjvvsGTJEoYOHYrNZsNutzN16lSGDRsGwGuvvcbChQtZvnw5LpeLadOmuQ+tjRgxghEjRtC+fXv69+/PiRMnbrjPfv36sXTpUgDS0tJuoiMinmmWYhERMY0Of4mIiGkUKiIiYhqFioiImEahIiIiplGoiIiIaRQqIiJiGoWKiIiYRqEiIiKm+X8xmHxkZXoAjgAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x21863b797f0>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# plot years to see the distribution\n",
    "fig, ax = plt.subplots()\n",
    "model_df['year'].hist(ax=ax, bins= model_df['year'].max() - model_df['year'].min())\n",
    "ax.tick_params(labelsize=12)\n",
    "ax.set_xlabel('Year Count', fontsize=12)\n",
    "ax.set_ylabel('Occurrence', fontsize=12)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>图 9-5：数据集中 10 000 多篇学术论文的原始出版年份分布</center>\n",
    "从图 9-5 中的偏态分布来看，出版年份非常适合分箱操作。\n",
    "\n",
    "我们将根据变量的取值范围来分箱，而不是唯一的特征值数量。为了进一步压缩特征空间，我们对分箱结果进行虚拟编码（见例 9-7）。Pandas 的内置函数可以完成这两项任务。这些方法的结果很容易解释，所以我们可以对转换后的特征做一个快速检查（见图 9-6），再进行后面的工作。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 例 9-7   定宽分箱 + 虚拟编码（第 2 部分）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "217"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we'll base our bins on the range of the variable, rather than the unique number of features\n",
    "model_df['year'].max() - model_df['year'].min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [],
   "source": [
    "# binning here (by 10 years)\n",
    "bins = int(round((model_df['year'].max() - model_df['year'].min()) / 10))\n",
    "\n",
    "temp_df = pd.DataFrame(index=model_df.index)\n",
    "temp_df['yearBinned'] = pd.cut(model_df['year'].tolist(), bins, precision=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "We have reduced from 175 to 21 features representing the year.\n"
     ]
    }
   ],
   "source": [
    "# now we only have as many bins as we created(grouping together by 10 years)\n",
    "print('We have reduced from', len(model_df['year'].unique()), 'to',\n",
    "      len(temp_df['yearBinned'].values.unique()),\n",
    "      'features representing the year.')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>(1800.0, 1810.0]</th>\n",
       "      <th>(1810.0, 1820.0]</th>\n",
       "      <th>(1820.0, 1830.0]</th>\n",
       "      <th>(1830.0, 1839.0]</th>\n",
       "      <th>(1839.0, 1849.0]</th>\n",
       "      <th>(1849.0, 1859.0]</th>\n",
       "      <th>(1859.0, 1869.0]</th>\n",
       "      <th>(1869.0, 1879.0]</th>\n",
       "      <th>(1879.0, 1889.0]</th>\n",
       "      <th>(1889.0, 1899.0]</th>\n",
       "      <th>...</th>\n",
       "      <th>(1918.0, 1928.0]</th>\n",
       "      <th>(1928.0, 1938.0]</th>\n",
       "      <th>(1938.0, 1948.0]</th>\n",
       "      <th>(1948.0, 1958.0]</th>\n",
       "      <th>(1958.0, 1968.0]</th>\n",
       "      <th>(1968.0, 1978.0]</th>\n",
       "      <th>(1978.0, 1987.0]</th>\n",
       "      <th>(1987.0, 1997.0]</th>\n",
       "      <th>(1997.0, 2007.0]</th>\n",
       "      <th>(2007.0, 2017.0]</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 22 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   (1800.0, 1810.0]  (1810.0, 1820.0]  (1820.0, 1830.0]  (1830.0, 1839.0]  \\\n",
       "0                 0                 0                 0                 0   \n",
       "1                 0                 0                 0                 0   \n",
       "2                 0                 0                 0                 0   \n",
       "3                 0                 0                 0                 0   \n",
       "4                 0                 0                 0                 0   \n",
       "\n",
       "   (1839.0, 1849.0]  (1849.0, 1859.0]  (1859.0, 1869.0]  (1869.0, 1879.0]  \\\n",
       "0                 0                 0                 0                 0   \n",
       "1                 0                 0                 0                 0   \n",
       "2                 0                 0                 0                 0   \n",
       "3                 0                 0                 0                 0   \n",
       "4                 0                 0                 0                 0   \n",
       "\n",
       "   (1879.0, 1889.0]  (1889.0, 1899.0]        ...         (1918.0, 1928.0]  \\\n",
       "0                 0                 0        ...                        0   \n",
       "1                 0                 0        ...                        0   \n",
       "2                 0                 0        ...                        0   \n",
       "3                 0                 0        ...                        0   \n",
       "4                 0                 0        ...                        0   \n",
       "\n",
       "   (1928.0, 1938.0]  (1938.0, 1948.0]  (1948.0, 1958.0]  (1958.0, 1968.0]  \\\n",
       "0                 0                 0                 0                 0   \n",
       "1                 0                 0                 0                 0   \n",
       "2                 0                 0                 0                 0   \n",
       "3                 0                 0                 1                 0   \n",
       "4                 0                 0                 0                 0   \n",
       "\n",
       "   (1968.0, 1978.0]  (1978.0, 1987.0]  (1987.0, 1997.0]  (1997.0, 2007.0]  \\\n",
       "0                 0                 0                 0                 0   \n",
       "1                 0                 0                 0                 0   \n",
       "2                 0                 0                 0                 0   \n",
       "3                 0                 0                 0                 0   \n",
       "4                 0                 0                 0                 0   \n",
       "\n",
       "   (2007.0, 2017.0]  \n",
       "0                 1  \n",
       "1                 1  \n",
       "2                 1  \n",
       "3                 0  \n",
       "4                 1  \n",
       "\n",
       "[5 rows x 22 columns]"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_yrs = pd.get_dummies(temp_df['yearBinned'])\n",
    "X_yrs.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "IntervalIndex([(1800.0, 1810.0], (1810.0, 1820.0], (1820.0, 1830.0], (1830.0, 1839.0], (1839.0, 1849.0] ... (1968.0, 1978.0], (1978.0, 1987.0], (1987.0, 1997.0], (1997.0, 2007.0], (2007.0, 2017.0]]\n",
       "              closed='right',\n",
       "              dtype='interval[float64]')"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_yrs.columns.categories"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0,0.5,'Counts')"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAFACAYAAACvE0uFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3XlcVPX+P/DXDMyAySbILKBm5ZpL5s3sqn1TS01vXVzKb2VoWV7Nqy1qLogbmiJecivFJXl0y1xS09wyzWu5gF3UciltwRCUYQBZZkA25/z+8Md8JRU/h8NwGHg9H48eyZnzms/nc/jAmzkzn3M0kiRJICIiugut2h0gIiL3wIJBRERCWDCIiEgICwYREQlhwSAiIiEsGEREJIQFg4iIhLBgEBGREBYMIiISwoJBRERCWDCIiEgICwYREQlhwSAiIiGeaneguuTkFMDhuPXCu0FBPsjOtlf5eZXk61tWzbbdMatm2xyze2Rd2bZWq0GjRg1lPV+dKRgOh3TbglH+mNLnZrb2t+2OWTXb5pjdI6t22zfjKSkiIhLCgkFEREJYMIiISAgLBhERCWHBICIiISwYREQkhAWDiIiE1Jl1GERE9Z2vXwN4e1X8tR4c7Ov8d1FxGWz516r8/CwYRER1hLeXJ56duOOOj++MDYNNwfPzlBQREQlhwSAiIiEsGEREJIQFg4iIhLBgEBGREBYMIiISwoJBRERCXLYOw2KxYOnSpfD394ckSQgODsbly5dhs9kQERGB0tJSREdHw9/fHy1btsSwYcOwdu3aCvsEBga6qntERCSTywpGcnIyjh8/jvbt2+PBBx9EUlIS4uLikJiYiM2bN6O4uBjh4eHo3LkzRo0ahbCwsFv2GTNmjKu6R0REMrmsYJhMJnz88cdo0qQJRo4cCZPJ5NxutVpRWloKs9kMAPDz80N+fr7zFUX5PnIEBfnc8bGbl8ZXhZJ8fcuq2bY7ZtVsm2N2j2x15KvruVxWMNavX48hQ4ZAo9HA19cXly9fBnDjVJXBYIDD4YDFYoHZbEZeXh4MBgNyc3Mr7CNHdrb9tveuDQ72RWZm1RfDK8nXt6yabbtjVs22OWb3yMrNixSD8ufSajWV/qF9Oy4rGIMHD8ayZctgNpvRsWNH6HQ6zJ49G/n5+ZgzZw6KiooQHR2NL774An379oWnpye6du1aYR8iIqo9XFYw2rVrh7i4uDs+7uvri9jY2ArbRowY4aruEBGRQvxYLRERCWHBICIiISwYREQkhAWDiIiEsGAQEZEQFgwiIhLCgkFEREJYMIiISAgLBhERCWHBICIiISwYREQkhAWDiIiEsGAQEZEQFgwiIhLCgkFEREJYMIiISAgLBhERCWHBICIiIS67Rev69etx5swZlJaW4uTJk+jfvz9KSkpgt9sRFRWF5ORkrF69Gt7e3ujVqxf69OmDmJiYCvvo9XpXdY+IiGRy2SuMYcOGITo6GiaTCR988AHsdjsiIyPRpUsX7Nu3D+vWrcOUKVMwf/58bNiwAampqbfsQ0REtYfLXmEAwO+//w6bzYbi4mIYjUYAgMlkwvnz53H16lXnNo1Gg8zMzFv2kSMoyOeOjwUH+1ZxBMrz9S2rZtvumFWzbY7ZPbLVka+u53JpwdiwYQNef/116HQ6ZGRkAAAsFgsMBgOMRiOsVisMBgMkSYLZbL5lHzmys+1wOKRbtgcH+yIz01blMSjJ17esmm27Y1bNtjlm98jKzYsUg/Ln0mo1lf6hfTsuLRgpKSlo1qwZACAgIADz5s2D3W7H3Llz0bZtWyxcuBA6nQ7h4eEwm8237ENERLWHSwvGmjVrnP+eMGFChcdatGiB2NjYCtv+vA8REdUe/FgtEREJYcEgIiIhLBhERCSEBYOIiISwYBARkRAWDCIiEsKCQUREQlgwiIhICAsGEREJYcEgIiIhLBhERCSEBYOIiISwYBARkRAWDCIiEsKCQUREQlgwiIhICAsGEREJYcEgIiIhLBhERCTEZff0TktLw4oVKxAUFISGDRsiPz8fJSUlsNvtiIqKQnJyMlavXg1vb2/06tULffr0QUxMTIV99Hq9q7pHREQyuewVRnx8PEwmEzIzMxEUFAS73Y7IyEh06dIF+/btw7p16zBlyhTMnz8fGzZsQGpq6i37EBFR7eGyVxgpKSmYMGECWrZsiZEjR+Kxxx4DAJhMJpw/fx5Xr16F0WgEAGg0GmRmZjq/Lt9HjqAgnzs+FhzsW8VRKM/Xt6yabbtjVs22OWb3yFZHvrqey2UFIzg4GD4+PtDpdACAjIwMAIDFYoHBYIDRaITVaoXBYIAkSTCbzbfsI0d2th0Oh3SbfvgiM9OmYBxVz9e3rJptu2NWzbY5ZvfIys2LFIPy59JqNZX+oX07LisYr7/+OmJjYxEUFITnn38ev/32G+bNmwe73Y65c+eibdu2WLhwIXQ6HcLDw2E2mxEQEFBhHyIiqj1cVjAeeOABLF269I6Pt2jRArGxsRW2TZgwwVXdISIihfixWiIiEsKCQUREQlgwiIhICAsGEREJYcEgIiIhLBhERCSEBYOIiISwYBARkRAWDCIiEsKCQUREQlgwiIhICAsGEREJYcEgIiIhLBhERCSEBYOIiISwYBARkRAWDCIiEsKCQUREQoRv0ZqamoqmTZvi0KFDOHfuHIYPHw5f3zvfcPzKlSt444030LZtWwQHB+P69esoKSmB3W5HVFQUkpOTsXr1anh7e6NXr17o06cPYmJiKuyj1+urZZBERKSc0CuMmTNnYs2aNfj9998RGRmJtLQ0REREVJr5/vvv0bhxYwBA48aNYbfbERkZiS5dumDfvn1Yt24dpkyZgvnz52PDhg1ITU29ZR8iIqo9hF5hnD17Flu2bMHq1asxaNAgTJw4EYMHD64007FjR3Tr1g2NGzfGK6+8gq5duwIATCYTzp8/j6tXr8JoNAIANBoNMjMznV+X70NERLWHUMGQJAlarRZHjx7FmDFjAABFRUWVZn7++Wd06tQJWq0WkiQhLS0NAGCxWGAwGGA0GmG1WmEwGCBJEsxmMzIyMirsI0dQkM8dHwsOvvOpMxFK8vUtq2bb7phVs22O2T2y1ZGvrucSKhjNmjXDqFGjkJaWhkcffRQTJ05E69atK83ce++9iImJQWBgIAYMGID09HTMmzcPdrsdc+fORdu2bbFw4ULodDqEh4fDbDYjICCgwj5yZGfb4XBIt2wPDvZFZqZN1nNVV76+ZdVs2x2zarbNMdferF/APfDSedz2seLS68jPLay0rbsp74tWq6n0D+3bESoYCxYswP79+/GXv/wFOp0OjzzyCAYNGlRppn379li6dOkdH2/RogViY2MrbJswYYJId4iI6iwvnQdG7Tl528fWDOhcw72pSOhN73nz5iEsLAxNmjQBALz44ouYPHmySztGRES1S6WvMGbNmoWMjAycOHECV69edW4vKytDamqqyztHRES1R6UF47nnnsOvv/6KCxcuoF+/fs7tHh4e6NSpk8s7R0REtUelBaNDhw7o0KEDunXrBpPJVFN9IiKiWkjoTe/09HS8++67yMvLgyT93yeRdu7c6bKOERFR7SJUMGbOnInBgwfjwQcfhEajcXWfiIioFhIqGJ6ennj11Vdd3RciIqrFhD5W27JlS1y4cMHVfSEiolpM6BVGamoqhgwZgpCQEHh5eTm38z0MIqL6Q6hgvPPOO67uBxER1XJCBaNVq1au7gcREdVyQgXjscceg0ajgSRJzk9JBQcH47vvvnNp54iIqPYQKhg335uipKQEu3btwsWLF13WKSIiqn1k39Nbr9dj8ODBOHr0qCv6Q0REtZTQK4zc3FznvyVJwtmzZ5Gfn++yThERUe0j+z0MAAgKCsL06dNd2jEiIqpdZL+HQURE9ZNQwXA4HPjoo4/w3XffoaysDN27d8eYMWPg6SkUJyKiOkDoTe/Y2FgkJiZixIgRePXVV3Hq1CnExMS4um9ERFSLCL1EOHz4MLZu3QqdTgcA6NmzJ/7+978jIiKi0tzEiRPRu3dvpKen4/Lly7DZbIiIiEBpaSmio6Ph7++Pli1bYtiwYVi7dm2FfQIDA5WPjoiIqo1QwZAkyVksgBsfrb3569uJj49Hw4YNAQBJSUmIi4tDYmIiNm/ejOLiYoSHh6Nz584YNWoUwsLCbtlnzJgxCoZFRETVTahgtGnTBvPnz8fLL78MjUaDTz75pNLLhRw8eBC+vr7o1KkTHA6H89WCyWSC1WpFaWkpzGYzAMDPzw/5+fm37CNXUJDPHR8LDvaV/XzVla9vWTXbdsesmm1zzO6Rre7nUpIXKhizZs3CvHnz8MILL8DhcODxxx/HjBkz7rj/l19+CT8/P+dq8PJXGhaLBQaDAQ6HAxaLBWazGXl5eTAYDM61HuX7yJWdbYfDId2yPTjYF5mZNtnPVx35+pZVs213zKrZNsdce7N3+4Ve2XOJFIPyvFarqfQP7duptGCUlJRgxowZeOqppxAdHQ0A+Mc//gEPDw/4+Ny5oSVLlgAAtm3bBi8vL2RlZWH27NnIz8/HnDlzUFRUhOjoaHzxxRfo27cvPD090bVr1wr7EBFR7VJpwVi2bBnsdjs6d+7s3DZ37lzMmTMHy5cvv+tlzwcPHnzb7b6+voiNja2wbcSIEaJ9JiIiFVT6sdpDhw4hNjYWQUFBzm1GoxExMTE4cOCAyztHRES1R6UFQ6fTwdvb+5btPj4+0Ov1LusUERHVPpUWDK1WC7vdfst2u92OsrIyl3WKiIhqn0oLxjPPPIPIyEgUFhY6txUWFiIyMhJ9+/Z1eeeIiKj2qLRgjBgxAr6+vujevTuGDh2K5557Dt27d4efnx/++c9/1lQfiYioFqj0U1JarRZz587FmDFjcO7cOWi1WnTs2LFK6ySIiMi9CS3cCw0NRWhoqKv7QkREtZjsW7QSEVH9xIJBRERCWDCIiEgICwYREQlhwSAiIiEsGEREJIQFg4iIhLBgEBGREBYMIiISwoJBRERCWDCIiEiI0LWkquKPP/5AbGwsGjdujA4dOuDq1au4fPkybDYbIiIiUFpaiujoaPj7+6Nly5YYNmwY1q5dW2GfwMBAV3WPiIhkclnBsNlsmDx5MkwmE0aPHg29Xo+4uDgkJiZi8+bNKC4uRnh4ODp37oxRo0YhLCwMSUlJFfYZM2aMq7pHREQyueyUVIcOHaDX6zF69Gg8+uijzlcLJpMJVqsVWVlZMJvNAAA/Pz/k5+ffsg8REdUeLnuF8fPPP8NkMmHdunUYP36885auFosFBoMBDocDFosFZrMZeXl5MBgMyM3NrbCPHEFBPnd8LDjYt+oDUZivb1k123bHrJptc8zuka3u51KSd1nBKC0txaxZsxAQEIAmTZrAZDJh9uzZyM/Px5w5c1BUVITo6Gh88cUX6Nu3Lzw9PdG1a9cK+8iRnW2HwyHdsj042BeZmbYqj0NJvr5l1WzbHbNqts0x197s3X6hV/ZcIsWgPK/Vair9Q/t2XFYwOnbsiGXLlt3xcV9fX8TGxlbYNmLECFd1h4iIFOLHaomISAgLBhERCWHBICIiISwYREQkhAWDiIiEsGAQEZEQFgwiIhLCgkFEREJctnCPiKi+ahTgBU+d3vn1zSuwy0pLkJNbrEa3FGPBICKqZp46PU58/e5tH/tL30UA3LNg8JQUEREJYcEgIiIhLBhERCSEBYOIiISwYBARkRAWDCIiEsKCQUREQlgwiIhICAsGEREJcdlK75MnT+Lf//437rnnHoSEhKCwsBAlJSWw2+2IiopCcnIyVq9eDW9vb/Tq1Qt9+vRBTExMhX30ev3dGyIiohrhsoKRn5+PefPmwcfHByNHjkSTJk0QFRWFrVu3Yt++fTh8+DCmTJkCo9GIkSNHok2bNs5CUb7Ps88+66ruERGRTC4rGD179oQkSVi5ciU6d+4MjUYDADCZTDh//jyuXr0Ko9EIANBoNMjMzHR+Xb6PHEFBPnd87OYLf1WFknx9y6rZtjtm1WybY665bG3qh5K8ywqG3W7H/Pnz8eyzz6J58+ZYuXIlAMBiscBgMMBoNMJqtcJgMECSJJjNZmRkZFTYR47sbDscDumW7cHBvsjMtFV5HEry9S2rZtvumFWzbY7Ztdm7/VKu7Llcmb05r9VqKv1D+3ZcVjDee+89pKSkYOvWrfDw8IDRaMS8efNgt9sxd+5ctG3bFgsXLoROp0N4eDjMZjMCAgIq7ENERLWHywrGggULKn28RYsWiI2NrbBtwoQJruoOEREpxI/VEhGREBYMIiISwoJBRERCWDCIiEgICwYREQlhwSAiIiEsGEREJIQFg4iIhLhs4R4Rkbvy9/eGXq+rsO3my26UlJQiL6+oprulOhYMIqI/0et1t1yJ4mYTJ04EUP8KBk9JERGREBYMIiISwoJBRERCWDCIiEgICwYREQlhwSAiIiEsGEREJIQFg4iIhLBgEBGREJeu9E5JScFbb72F7du3Y+3atbh8+TJsNhsiIiJQWlqK6Oho+Pv7o2XLlhg2bNgt+wQGBrqye0RUhzXybwBPfcVfcTdf3qOspAw5eddqultuzWUFIzMzE59//jkaNGiA4uJiJCUlIS4uDomJidi8eTOKi4sRHh6Ozp07Y9SoUQgLC7tlnzFjxriqe0RUx3nqPfHrv47c8fGWk3rUYG/qBpcVjODgYEyaNAmvvfYacnNzna8WTCYTrFYrSktLYTabAQB+fn7Iz8+/ZR85goJ8KumL7x0fE6EkX9+yarbtjlk1266PY64t/XDXY18jFx8MCgpCbm4uAMBiscBgMMDhcMBiscBsNiMvLw8Gg+GWfeTIzrbD4ZBu2R4c7IvMTFuV+64kX9+yarbtjlk1264PYxb5xXin51KSFcmrlb05r9VqKv1D+3ZqpGB4enqia9eumD17NvLz8zFnzhwUFRUhOjoaX3zxBfr27XvbfYiIqPZwecH46KOPAAAjRoyosN3X1/eWywf/eR8iIqo9+LFaIiISwoJBRERCWDCIiEgICwYREQlhwSAiIiEsGEREJIQFg4iIhNTIwj0ioqrw99ND7+VVYdvNq5lLiouRl19S092qt1gwiKjW0nt54YNpr97x8XEL4gGwYNQUnpIiIiIhLBhERCSEBYOIiISwYBARkRC+6U1ELuXv1wB6r9vfKrWkuAx5+bxNqrtgwSAil9J7eSJq4q7bPjYz9pka7g0pwVNSREQkhAWDiIiE8JQUUT3gG+AFb52+wrabV0wXlZbAlltc090iN1OrCkZGRgaio6Ph7++Pli1bYtiwYWp3iajWaOSvh6f+zpfJKCspRk7e7Vc9e+v0GLrpjTs+9+b/XQkb7lwwGvnq4en9f21XaLeoGDk2rrauD2pVwdi4cSPCw8PRuXNnjBo1CkOHDoVOpxPKarWaKj2m9LmZrT1t11TWx8cbXjd96ufmX57FxWWw24uEs3LynnovXPpgzB2fu9m4OGi1pXd8PPiewDs+BlR+DDy9vZA06vZtP7ImDtqCO7cLAP6NGlSpXQDwDQiq9PFK++3ndcfH7pb18/OrchYA9N6NqpwNaqC/42N3yxoqOdY356vy86KRJEmSnXKRGTNmYOzYsTCbzZg4cSKmT5+OwMDKJzkREdWMWvWmt9lshsViAQDk5eXdtcITEVHNqVWvMDIzMxEdHY2GDRuiffv2GDp0qNpdIiKi/69WFQwiIqq9atUpKSIiqr1YMIiISAgLBhERCWHBICIiISwYREQkhAWDiIiEsGAQEZGQWnUtqeqwZ8+e224fMGCAUP706dO33d6xY0eXZpX0W0lWSZ8BIDs7+7bbg4Iqv/6P0qxax1ppXq3vlZrHS0nbas0vtbK1XZ1buDdq1Cj87W9/w83D2rt3L1avXi2Uf+GFF9CjR48K244ePYoNGza4NKuk30qySvoMAM8++yzat29fYdvZs2exc+dOl2bVOtZK82p9r9Q8XkraVmt+qZUFgH79+qFZs2YVjndqair27dt31+zQoUPh7+9fIWuz2bBp0yahtu9KqmOuXLkiZWRkSGfPnpWsVqtzm6iff/5ZKisrk3JycqTr1687t7k6q6TfSrJK+ixJkpSYmCi0rbqzah1rpXm1vldqHi8lbas1v9TKSpIk7du3T2jb7WzevFloW1XVuVNSn3/+OdLT02EwGGC1WnHvvfdizJg7XxL6z06cOIHFixfDz88PeXl56NOnD55//nmXZ5X0W0lWSZ+BG3+9vPbaa9BoNJAkCcOHD8cTTzzh8qxax1ppXq3vlZrHS0nbas0vtbIAEBwcjLFjx6KkpAReXl4YO3Ys+vbtK5Tt0qULZs6cCavVCoPBgDfeeEPWz/NdVVvpqSXee++9Cl/PnTtXVn7OnDkVvp45c2aNZJX0W0lWSZ8lSZKmT59e4eupU6fWSFatY600r9b3Ss3jpaRtteaXWllJkqTJkydLRUVFkiRJ0rVr16QJEyYIZydNmiSlpqZKpaWl0qVLl6S3335bVtt3U+deYeTk5GD37t0wmUzIyMhAfn6+rHxubi5++OEH56XW7XZ7jWSV9FtJVkmfAeDatWvIyspC48aNkZWVhZIS8TuvKcmqdayV5tX6Xql5vJS0rdb8UitbztPT0/n/8n+L8Pb2RpMmTQAATZs2RcOGDWW3XZk696Z3YWEh9u7di6ysLJjNZvTr1w9eXpXfdetmGRkZ2LRpE7KyshASEoLnnnsOjRs3dnlWSb+VZJX0GQAuXLiAdevWIScnB0ajEa+88goeeOABl2fVOtZK82p9r9Q8XkraVmt+qZUFgISEBMTHxwMAvLy8MHz4cHTp0kUou2PHDmzfvh06nQ5arRZhYWHo37+/cNt3Va2vV2qhgwcPKsqfPn1alaySfivJKumzJElSenq6Klm1jrXSvFrfKzWPl5K21ZpfamUlSZJKS0sV5atTnV+4p9UqG+KFCxdUySrpt5Kskj4DwPbt21XJqnWslebV+l6pebyUtK3W/FIrCwDR0dFVzs6aNUtR239W505JlZaW4uuvv3Z+SuDpp5+Gh4eHrOf44YcfnPlOnTrVSLakpARffvklrl69ipCQEAwYMED4B1NJFgASExORnZ2NkJAQPPzww8K5cqmpqcjOzobZbIbRaKyxbFX7rfR4ZWZmolGjRti1axcKCgowcOBA4XPFSrIlJSXQ6/VISkpCQUEBHn/8cVlzpKpZJX1W2na5K1euoLCwEC1atJCVq2o2OTkZ999/v+y2lGYBID8/v9bentpj9uzZs9XuRHWKjIyE0WhE8+bNkZeXh08//RRPPfWUcD4qKgpWqxU6nQ4//fQTvvnmGzz++OMuzy5ZsgT3338/UlJS4O/vj40bN6JXr14uz8bFxcFms+HcuXPIysrC4cOH0a1bN6EsAHz22Wc4cuQIjh8/ju+//x5paWnChVJJVkm/lRwvAIiIiMD27dvRvHlzGI1GxMfHC88xJdmJEydi//79yMnJQVlZGbZt2ybcbyVZJX1W2va0adNw4sQJHDx4EH/88QcSEhLQvXt3l2fDwsKg1+uFr3hQXVkAeOqpp3D//fejefPmsrO9e/dG165dZb0PKUedOyXl4+ODwYMHo1u3bhg8eDB8fHxk5fV6PcaPH4/nn38e48ePr7GszWZD7969UVBQgEGDBsn6C0xJ1mq1Ijw8HJ6enhg3bhxsNpusficnJ2Pq1KkICgrCggULcPHixRrJKum3kuMFAE2aNEFQUBCGDh2K3r17y/prW0k2JCQEXl5emDBhAsLDw2W98awkq6TPStsOCAhATk4OFi1ahKlTp8r6xJGSbPfu3WEymfDaa69h69at+O2332okCwA9evTApUuX8OabbyIxMRHXrl0Tznbu3Bm7du3C7NmzkZqaKqtdEXXuY7V6vR6zZs1yfgTQ399fVt5ut2P16tUwmUywWCwoKiqqkWxBQQGio6MRGBiIPXv2yPolpiSblZWFTz75BB4eHjh69CgKCgqEs8CNl/sHDx7EtWvXcO7cOeTk5NRIVkm/lRwvAPDw8MD333+PhIQEHDx4UNbHTJVkr127hm+++QYnT57E6dOnkZ6eXiNZJX2+XdsWi0U4a7PZcOTIEZw/fx5paWmyfgnm5+dXOQsATz75JHr27ImDBw9i+/btmDRpUo1kAWD48OEYOHAgtm3bhs2bN+P9998Xyul0OkyaNAnJycn4+OOPkZqailWrVslqu1Jqv+vuCpcuXZJOnjxZpU8nOBwOKSEhQdq5c6eUlJRU5eyJEydkZYuLi52XSyhftCMn+9NPP1Upm5OTI+3fv18qKSmRMjIyZH8iIyUlRYqPj5fy8vKkX3/9VcrLy5Odzc3NlX755RdZWSX9VnKsb+ZwOKRz585V6VMs169fl86ePSsr63A4pIKCAslut0v79u2TbDZblbJff/21rKySPt/cdkFBgey27Xa7dPHiRSkzM1OKj4+XLl++LDtrtVqldevWycoeO3ZMeN/qzEqSJO3evbvK2U8//VRR23dT597DAAB/f3+YzWbZp6MAYMuWLdiyZQu6d++ORx99FDExMcLnPS9cuIDDhw+jTZs2WLFiBQCgTZs2QtnExERs3rwZRUVFiI6OxtWrV9G5c2eh7Llz5+BwOJCRkYGFCxciNDRU+A3kjRs34oEHHsC0adNw6NAh+Pr6ynrD7sSJE3jsscewaNEinDp1Cq1bt0ZgYKBQdv78+Rg7dix8fX0RFBQk61TFV199hdDQUMydOxcHDx6EwWBA06ZNhbKJiYlYu3YtNm3ahH379sFoNDoXO4k4cOAA3nvvPezcuRPHjh1DYGCg7PPNGo0GBoNB1qubLVu2YMWKFQgJCUHPnj2xZMkSWXNzy5Yt8PLyQnx8PHQ6nfDcPH36NDIyMmC1WrFmzRo0bdpU1gcUPv74Y+j1ekRGRuLnn39Go0aNhOfYkSNHYDabERsbi+zsbHTo0EF4fs2YMQMDBgxAQEAAHn74Yfj6+gr3+cSJEyguLsacOXOwc+dONG7cWHh+5eXlYd68edixYwf27duHVq1aCfcZuPEq4f3338emTZvw/fffo02bNsJ9r+r7JqLq3Cmp8PBw6HQ35ZbVAAAX2klEQVQ6eHh4QJIkaDQarFmzRjiflJSEVatWYcGCBWjUqBGysrKEsx9//DFeeuklzJw5E1u3bsX06dMRFhYmlP3qq68wefJkjB07Fhs3bsSMGTOE2120aBFCQkLQtGlTXL58Gd99953wxLlw4QLOnz+PNWvWQK/XY8aMGXjyySeF205MTMT+/fsxbtw4BAQEYP78+ViwYIFQNjMzEzNnzsSTTz6J/v37Q6PRCLd78uRJbN++HUuXLoWPjw8iIyPx17/+VSh76NAhrFixArGxsZgwYQImT56Mrl27Crd96NAhfPTRR86vp02bJnytICXzMykpCXFxcdU2NwcOHCiUVTK/AGVzzB3n1/bt27F48WJ8+OGHeP311zF9+nQsXrxYuO0PP/wQb731FkwmE9LT0xETEyOcX7t2bYUr1QI3rjZcXepcwZgyZQqOHDki6+JoN7t+/Tq0Wi2mTJmCadOmITc3VzhbVlaGBx98EBEREdBqtXA4HMJZm80GT09PREVFoaSkRNY5+fj4eCxbtgydOnVCVlYWxo0bJ5xNSUmBwWBASUkJSktLUVhYKJwFbnxk0mg0wmg0Qq/X4/r168JZo9GIqKgobNu2DW+88QYaNGgg/IPh4eGBkpISNGjQABqNRtYbgxkZGbDb7bBarSgoKEBpaalwFlB26Qcl81OtualkfgHK5pg7zq+cnBxotVpcvnwZ99xzj+yP9Su5vIfD4UBRUREee+wxWW2KqnOnpAwGA/z9/at8sxKtVovz58+jdevWePjhh7F3714MGjRIOJuSkoIePXogISEBPj4+aNeunVDWaDTCYrHgoYcewg8//ID77rtP+DSHVqvFX//6Vxw9ehSnTp0SflUDAM2aNUNxcTECAwNx6NAhPProo7j33nuF8x4eHvjPf/6D5cuX48CBAxg4cKDwZRASExPRq1cvtG/fHs888wyeeOIJ6HQ6oaynpyc0Gg10Oh2mTZuGAQMGCJ9i8fLywvLlyzF69Gj8/PPP6NGjh6xTLM2bN8eHH36Izz//HGfPnsWoUaOETzkYDAYEBATIOkVRTq25qWR+Acrm2J/nV1hYmPB6CrXmV05ODhYvXozw8HAkJSWhXbt2sk7z2mw2LFy4EHv27MHu3bvRp08ftGzZUij78MMP4+LFi3j66acRGhqK0NBQ4XZF1LmFe0oXGam1sKq83d27d8Nut9fYYrDqXJRlt9vxP//zPzWymExJu5mZmQgICMDu3burPOYvv/wS2dnZCA0NrbFFlkqykiQhMTHRmZW70HHnzp3ORZJyFzoqXSiZkJDgbFv0fT3gxpiPHz9e5UWpCQkJVTpeSo51uYyMDGRmZsJoNCI4OFhW9vr167DZbPDz81O8Kv/P6lzBeOutt1BYWIg+ffqgcePG+Prrr2UtrVeSf/vtt+Hh4YHQ0FAEBwfj119/RVRUlMvbVSsL/N+YmzRpgsaNG8sas5LjpaRdpWOOjY3Fww8/7HwleOrUKeG2y7OnTp3C/fffX2PZuLg4NGzYEMnJyQgKCkJxcTEmTpwoq92qjFdpXkm/3TELAMuXL8eVK1eqdP+R9evX49tvv4Wfnx/y8/Nl39/mburcwr0/LzKS+0kptRZWVediMDljrq5FWe+8806NLiZT0q7SMZcv/LPb7TW6yFKtBZpKxqs0r6Tf7pgFbhyvBQsW4J133sGCBQtkfbjh999/x+rVq/Gvf/0Lq1evxtmzZ2W1fTd17k3vPy8yysvLU5SXs0ipqKgIBw8erNICpepcDCZnzEoXZSkZs1pZpWNWa5FldS3QPHbsWI0udFSr3+6YBdS9v81duXSVh4qqushISV7Jwiol7aqZra7FZDWZLVfVMd+8UPLatWtukS1f6FhcXCxZLBbZCx2r2q7SvJJ+u2NWkiSpoKBA2rJlixQXFyft2LFDKi4uFs5aLBZp6dKl0owZM6SVK1dKWVlZstq+mzr3HgYRkTs7dOgQdu7cidGjR6NVq1ZYs2aN8FqK06dP48CBA3j55ZdhMBiwdetWDBkypNr6VudOSSlduKck/+abbzo/ry03q6RdtbKAsjGrlVVzjqiVddfj5Y7zS0kWuHHXvFmzZmHhwoUYN24cfv31V+Hs2rVrMXr0aERHR2PmzJk4fvx4tRaMOndK6syZM9LKlStVyX/33XdVvpaLknbVykqSsjGrlVVzjqiVddfj5Y7zS0lWkiRp2rRpkiTdODU1adIkafz48cLZyMhISZIkKSsrS5o6dao0ceLEKvfjdupcwZAkSfrll19UyycmJqrSrlpZSVI2ZrWyas4RtbLuerzccX4pyX722WfOCxD+9ttvUp8+fYSzy5cvd2aTkpKkbt26Vbkft8P3MIiI6ii73V6li7DeCQsGEREJqXML94iIyDXq3MUH/2zmzJnIyMhA+/btazw/f/58pKenC1/krbraVSsLKBuzWlk154haWXc9Xu44v5RkleZXr16NK1euoHXr1lVq+894SsqFiouLodfrZV2H390pGbNa2frIXY+XO84vpcdaST4tLQ2NGjWSffmbO6lz6zAOHDiADRs2QKPRQJIkDB8+XPjmNkrzp06dwpo1a1BSUgIvLy+MHTtW+K8CJe2qlQWUjVmtrJpzRK2sux4vd5xfSrJK8ykpKfjoo49gtVphMBjwxhtvVFuxAFD31mFMnz69wtdTp06tsfzkyZOd94i+du2aNGHChBppV62sJCkbs1pZNeeIWll3PV7uOL+UZJXmJ02aJKWmpkqlpaXSpUuXpLfffltW23dT5970Lr8bGgDZd0Orjrynp6fz/+X/dnW7amXLVXXMamXVnCNqfq/c8XgB7je/lGaV5Mvv1ufp6Sn7bn0i6tx7GBcuXMC6deuQk5MDo9GIV155RfgOcErzCQkJiI+PB3Djrm7Dhw9Hly5dXN6uWllA2ZjVyqo5R9TKuuvxcsf5pSSrNL9jxw5s374dOp0OWq0WYWFh6N+/v3Dbd1PnCsafWSwWmEwmVfJlZWVV+utCabtqZQFlY1Yrq+YcUSvrrsfLHeeXkmx15KtTnTsl9Wfbt29XLS/nLm7V2a5aWUDZmNXKqjlH1Mq66/Fyx/mlJKs0P2vWLEVt/1mdfIVhtVqRmZkJg8Eg+364SvNK7qerpN3U1FRkZ2fDbDbDaDTWWBa48bG//Px8NGrUSPZfQmpllY5Zaf7KlSsoLCxEixYtajRbXFyMsrKyKp3bVtKu0rySftd0Nj8/H35+frLbqq68K9W5grFs2TKkp6dX6X64SvNK7qerpN3PPvsMly5dQnZ2Njw9PdG6dWu88sorLs8CwO7du/HNN9+gsLAQAPDMM8/gmWeeqdVZpWNWkp82bRoCAgKQlZWFoKAgeHh44N1333V5duHChTAajTh8+DB8fHzw0EMPYeTIkS5vV81+q5V9/PHHMXfuXPTs2VNo/+rM9+7dGytWrECbNm2q1Pbd1LlTUna7vcr3w1WaV3I/XSXtJicnY+rUqQgKCsKCBQtw8eLFGskCNz4z/v777+O+++5DXFwcEhISan1W6ZiV5AMCApCTk4NFixZh6tSpsj4xpCQLAD/88AM++ugjLF26FJcvX66xdtXqt1rZHj164NKlS3jzzTeRmJjovDdGTeQ7d+6MXbt2Yfbs2UhNTZXVroja8U5KNbr5frgWi0X2/ZrVup+ukn5fuXIFBw8exLVr13Du3Dnk5OTUSBYA0tPTceHCBeTl5SEtLU1Wv9XKKh2zkrzNZsORI0dw/vx5pKWlyfqhVpLNy8vDmTNnkJaWBpvNJusXoJJ21ey3WlkAGD58OAYOHIht27Zh8+bNeP/992skr9PpMGnSJCQnJ+Pjjz9GamoqVq1aJavtytS5U1KFhYXYu3ev8/xy37594eXlJTuflZUFs9mMfv36CeczMjKwadMmZGVlISQkBM8//zyCgoJktxsSEiKr35cuXcJ//vMfDBw40PkeiOg50EuXLuHgwYMYNGgQrFYrjEajrPOnP/74I3bt2oVXX30VV65cgdFoRNOmTWss+8orryA9PV1WtnzMgwcPdq6IlTPm8uNdfszk5AsKCpCZmQkfHx/s2rULffv2RUhIiKxsw4YNsWvXLvTr1084m5GRgeTkZJhMJqxfvx5DhgxB27ZtZbe7e/duWX2urn6bzWZ8+umnGDx4MB588EFZ2fIx11R2z549GDBggNC+1Z1fv349hg0bVuW276bOvcLIzMzEjz/+6PxBfuSRR2A2m4Xz99xzT5VvaZiRkYGysjKMGzdO9v108/LyoNFo8NRTT2Hx4sUIDAxE9+7dhbI6nQ6+vr7IysrCkiVL8OKLLwpnmzVr5jz//sUXX8g6lw/ceFU1evRoLFu2DKWlpbLeLwJu3M4yPj4ekiThpZdeEs59++23mDZtGrRaLUJDQ2W1mZaWhieffBLR0dG4du0aRo8eLatgaLVa/PTTTzh58iT8/f0xfvx44WxWVhbWrVvnnJ9yrg/UsGFD55uvr776qnAOuDE3ExMTMWzYMERGRmLr1q3CBSM/Px8nT57EQw89hKSkJLRs2VJWwbi53xqNRlb2/PnzaN++PZYuXYrS0lJZbz5bLBZ06NAB69atg6+vr6wPkmzatAnjxo2DVqtFZGSkcA64cQouNTUVK1eudM4vOe8pdOzYEVOmTEFRUZFzfon2vX///li1apXzD8eXXnqpWu+HUecuDaJ0aXxUVJQ0a9asCv+JGj9+vHT27FnpnXfekXJycqR3331XODtlyhRp79690t///nfJZrPJurWikmyvXr2k8PBw6bXXXpN69uwpvf7668JZSbpxmYfIyEjpl19+kaxWqzRlyhTh7OzZs6UZM2ZICQkJ0oULF2RdMuJ///d/pQkTJkinTp2S1V9JunHphZdfflm6ePGilJ+fL+v7JEk35ojNZpPi4uKktLQ0WXNMyfx0x7kpScrmGOeXvPn17rvvSomJidIff/whJSYmyr4syd3UuVcY5UvjAVRpaXyHDh2QmZlZpZeE/v7+aNeuHaZPn46FCxfC4XAIZz08PPD000/Dw8MDPj4+uOeee2oku379esTFxWHs2LFYsWIF5syZI5wFbpxuCA4ORsuWLQHcOP6iPD090aBBA3Tt2hUajUbWR2Pvu+8+REREYO3atVi9ejWaN2+OyZMnC2UDAgKQm5uLZs2aQavVyr4KaGFhIXx8fHDp0iWEhoaiQYMGwlkl89Md5yagbI5xfsmbXz4+PujatSsA4N5778VXX30lq+27qXMF45FHHsGrr75aYWm8HAMHDsTevXtln+YAAKPR6Dz/+Nxzz+HNN98UzrZr1w779u1Dv379kJCQIGslrJKs2WzGtGnT8K9//QtWq1U4V65Vq1Y4deoU/vvf/+KHH36Q3e/t27djyJAhCAgIwJNPPimrbV9fX7zzzju4fv06/vjjD+Fc27ZtcfXqVSQmJuKDDz6Q/fHF0NBQDB8+HC+++CK2bduGjh07CmeVzE93nJuAsjnG+SVvfoWGhmLMmDFo3LgxcnNzncWjutS5N73LXb58GYWFhc6/TOSqjoVVJpNJ9vlDtRZ0paWlobCwEK1atZKdLW+7oKBA9vGWJAlnzpyBl5eXrJu8lJaWQqfTKT5eVenzzfmqtJ2amur8cENNLrK8OduwYUNZc7M6FjpmZmbC4XDgkUcekZ1VMmaLxYKmTZvKKjYlJSXIyMhQbTGsknxBQQFsNhsMBoPsxcN3U+cKRvkioezsbAQGBtboIiN3zvJ41cyYb170p9Pp0KpVqxpZZFldWbl9VrPt2nCslR4vufn169fju+++g5+fH/Ly8mQtHhZR5xbulS8SiomJqfFFRu6c5fESpyR/86K/+fPn19giy+rKyu2zmm3XhmOt9HjJzf/+++9YtWoVFi1aJHvxsIg69x6GmouM6lvWXfut5pjVWmSp5uJOd+y3ux4vJYuHhVTrZ65qAbvdLl28eFGyWq3SunXrpMuXL9dYvr5l3bXfao45JSVFio+Pl3Jzc6VffvlFysvLq9NZd+23ux4vi8UiLV26VJoxY4a0cuVKKSsrS1bbd1Pn3sMgIqqvTp8+jQMHDuDll1+WvXhYRJ07JTV37lxcv369wrbZs2fXSL6+ZdVsm2N2j6yabbtjVml+7dq1GD16NKKjozFz5kwcP36cBaMyShY3Kc3Xt6yabXPM7pFVs213zCrNK1mgKaRaT3DVEnv27FEtX9+yarbNMbtHVs223TGrJL98+XJp9+7dkiRJUlJSktStWzdF/fizOlcwMjIybtlmsVhqJF/fsmq2zTG7R1bNtt0xW91t22w2WW3fTZ07JbV7927YbDZ06tQJOp0O33//PQwGA1588UWX5+tb1l37zTHzeNXWbHW3/d///hfBwcHCbd9NnfyU1B9//IFvv/0WGo0GPXv2RLNmzWosX9+y7tpvjpnHq7Zm1W67MnWyYBARUfWrc5cGISIi12DBICIiISwYVKekpaWhbdu2CAsLQ1hYGJ599lk8//zzOHHiBADgzJkzsu4FocTo0aOxbdu2Ctvy8vLwxBNPYP369RW2nz9/Ho888gguXLhQI30jqoo69ykpIm9vb+zYscP59Z49ezBt2jR8/fXX6NChA5YtW6Za3/z9/bFo0SKMHj0a3bp1w3333YeSkhK8++67mDx5sqx7ghDVNBYMqvNyc3MRHBwMADh+/Djmzp2LXbt2YerUqfDx8cGFCxdgsVjQunVrLFy4EA0bNkSHDh3wj3/8A0ePHoXVasXrr7+Ol156CQDw+eefY8OGDXA4HAgICMCMGTPwwAMPICMjA1OnToXVakVISAiys7Nv259HH30Uw4cPx5QpU7Bx40YsXrwYrVq1wtChQwHcuHlPTEwMTpw4gevXrztX7vr4+ODAgQNYu3YtSkpKcPXqVQwZMgTjx4/HsWPHEBMTAy8vLxQVFeHTTz9FREQELl26BK1Wiw4dOmDOnDmybxdKdDMWDKpzioqKnLc+zc/PR2ZmJj788MPb7nv27Fn8+9//hkajwdChQ/HVV19hyJAhKCkpQaNGjbBx40acPXsWL774IoYMGYIff/wR27dvx/r169GgQQMcOXIE48aNw969exEVFYWHHnoIb7/9NlJSUjBw4MA79nH8+PFISEhAREQEfvzxR3z++efOx1auXAlvb29s27YNGo0GMTExWLJkCSIiIhAfH49FixahadOmSE9PR+/evTF8+HAAwC+//IKDBw/CZDJh69atKCkpwY4dO1BWVoaZM2ciLS0NTZs2rcYjTfUNCwbVOX8+JXXs2DH885//xJdffnnLvo8//jj0ej2AG/ePzsvLcz5Wfg/odu3aoaSkBIWFhTh06BBSUlLwwgsvOPfLz89Hbm4ujh07hilTpgAA7r333krvp+zp6YnY2Fg89dRT2Lx5c4XbpR46dAiFhYU4fPgwgBu3oy2/3eaqVatw6NAh7NixA7/99hskSUJRURGAG/dzLr8NaZcuXbB06VIMHz4c3bp1w2uvvcZiQYqxYFCd161bNzRr1gxnzpxBUFBQhce8vb2d/9ZoNLh5WZKXl5dzO3Dj/uMOhwNhYWHOW7I6HA5YrVb4+/vfkvf0rPzHq/wX+J9/kV+/fh0zZ85E9+7dAQB2ux2lpaWw2+0YNGgQ+vXrh7/85S8YMmQI9u/f72zznnvucT5Hs2bNsH//fhw/fhyJiYkYMWIE3nvvPTzxxBMCR4zo9vgpKarzLl68iMuXL6Nt27aKn6tHjx7YvXs3rFYrAGDDhg0YMWIEgBuvVjZt2gTgxl3Tjh8/XuU2PvnkE5SWluL69euIiIjAkiVLcPHiRVy7dg1vvfUWevXqhYSEBJSVld1yKWwA+OSTTzBjxgw8/vjjmDx5Mh577DH89NNPVRw10Q18hUF1zs3vYQA3XgVERUXhvvvuc/6ir6oePXpg1KhRGDlyJDQaDXx8fPDBBx9Ao9Fg1qxZmDZtGvr37w+TyYQ2bdpUqY3x48dj4cKFGDhwoPNN78mTJ8Pb2xs9evRA//79odPp0KZNG9x///24dOnSLc8xaNAg/Pe//8Xf/vY3eHt7IzQ0FMOGDVM0diJeGoSIiITwlBQREQlhwSAiIiEsGEREJIQFg4iIhLBgEBGREBYMIiISwoJBRERCWDCIiEjI/wNeH8yveHMCrwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x21863c63898>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# let's look at the new distribution\n",
    "fig, ax = plt.subplots()\n",
    "X_yrs.sum().plot.bar(ax=ax)\n",
    "ax.tick_params(labelsize=8)\n",
    "ax.set_xlabel('Binned Years', fontsize=12)\n",
    "ax.set_ylabel('Counts', fontsize=12)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>图 9-6：分箱之后新特征 X_yrs 的分布</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按照 10 年的宽度进行分箱，我们保留了原始变量中的基本分布。如果需要对其他分布进行分箱，可以修改一下分箱设置，改变变量在模型中的呈现方式。因为我们使用的是余弦相似度，所以这样做没有问题。下面接着处理模型中最初包含的另外一个特征。研究领域特征空间对初始模型的大小和处理时间有非常显著的影响。\n",
    "\n",
    "检查一下已经完成的工作。通过解析字符串列表，我们在第一关中创建了一个“短语袋”。既然已经有了一个非常好用的稀疏数组，就应该使用这个更高效的数据类型来表示这个短语袋。例 9-8 演示了将 Pandas 数据框转换为 NumPy 稀疏数组之后对计算时间的影响。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 例 9-8:将短语袋从 pd.Series 转换为 NumPy 稀疏数组"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [],
   "source": [
    "unique_fos = sorted(list({ feature\n",
    "                          for paper_row in model_df.fos.fillna('0')\n",
    "                          for feature in paper_row }))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "def feature_array(x, unique_array):\n",
    "    row_dict = {}\n",
    "    for i in x.index:\n",
    "        var_dict = {}\n",
    "        \n",
    "        for j in range(len(unique_array)):\n",
    "            if type(x[i]) is list:\n",
    "                if unique_array[j] in x[i]:\n",
    "                    var_dict.update({unique_array[j]: 1})\n",
    "                else:\n",
    "                    var_dict.update({unique_array[j]: 0})\n",
    "            else:    \n",
    "                if unique_array[j] == str(x[i]):\n",
    "                    var_dict.update({unique_array[j]: 1})\n",
    "                else:\n",
    "                    var_dict.update({unique_array[j]: 0})\n",
    "        \n",
    "        row_dict.update({i : var_dict})\n",
    "    \n",
    "    feature_df = pd.DataFrame.from_dict(row_dict, dtype='str').T\n",
    "    \n",
    "    return feature_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wall time: 57min 51s\n"
     ]
    }
   ],
   "source": [
    "%time fos_features = feature_array(model_df['fos'], unique_fos)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>0-10 V lighting control</th>\n",
       "      <th>1-planar graph</th>\n",
       "      <th>1/N expansion</th>\n",
       "      <th>10G-PON</th>\n",
       "      <th>14-3-3 protein</th>\n",
       "      <th>2-choice hashing</th>\n",
       "      <th>20th-century philosophy</th>\n",
       "      <th>2D Filters</th>\n",
       "      <th>2D computer graphics</th>\n",
       "      <th>...</th>\n",
       "      <th>open</th>\n",
       "      <th>pH</th>\n",
       "      <th>photoperiodism</th>\n",
       "      <th>r-process</th>\n",
       "      <th>route</th>\n",
       "      <th>strictfp</th>\n",
       "      <th>string</th>\n",
       "      <th>van der Waals force</th>\n",
       "      <th>Ćuk converter</th>\n",
       "      <th>μ operator</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2 rows × 9150 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   0 0-10 V lighting control 1-planar graph 1/N expansion 10G-PON  \\\n",
       "0  0                       0              0             0       0   \n",
       "1  0                       0              0             0       0   \n",
       "\n",
       "  14-3-3 protein 2-choice hashing 20th-century philosophy 2D Filters  \\\n",
       "0              0                0                       0          0   \n",
       "1              0                0                       0          0   \n",
       "\n",
       "  2D computer graphics    ...     open pH photoperiodism r-process route  \\\n",
       "0                    0    ...        0  0              0         0     0   \n",
       "1                    0    ...        0  0              0         0     0   \n",
       "\n",
       "  strictfp string van der Waals force Ćuk converter μ operator  \n",
       "0        0      0                   0             0          0  \n",
       "1        0      0                   0             0          0  \n",
       "\n",
       "[2 rows x 9150 columns]"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fos_features.head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_fos = fos_features.values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Our pandas Series, in bytes:  5856418904\n",
      "Our hashed numpy array, in bytes:  112\n"
     ]
    }
   ],
   "source": [
    "# We can see how this will make a difference in the future by looking at the size of each\n",
    "print('Our pandas Series, in bytes: ', getsizeof(fos_features))\n",
    "print('Our hashed numpy array, in bytes: ', getsizeof(X_fos))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "真是太棒了！把两个特征组合在一起，一起输入过滤器（见例 9-9），重新运行推荐器（见例 9-10）看看能否得到更好的结果。在过滤器中，我们使用了 scikit-learn 的余弦相似度函数。我们还是每次只对一个物品进行推荐，目的是节省计算时间。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  例 9-9：协同过滤阶段 1+2：建立项目特征矩阵，搜索相似项目"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "9172"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_yrs.shape[1] + X_fos.shape[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wall time: 2.06 s\n",
      "Size of second feature array, in bytes:  1467520112\n"
     ]
    }
   ],
   "source": [
    "# now looking at 10399 x  7623 array for our feature space\n",
    "\n",
    "%time second_features = np.append(X_fos, X_yrs, axis = 1)\n",
    "\n",
    "second_size = getsizeof(second_features)\n",
    "\n",
    "print('Size of second feature array, in bytes: ', second_size)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The power of feature engineering saves us, in bytes:  -665280615\n"
     ]
    }
   ],
   "source": [
    "print(\"The power of feature engineering saves us, in bytes: \", 802239497 - second_size)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics.pairwise import cosine_similarity\n",
    "\n",
    "\n",
    "def piped_collab_filter(features_matrix, index, top_n):\n",
    "\n",
    "    item_similarities = 1 - cosine_similarity(features_matrix[index:index + 1],\n",
    "                                              features_matrix).flatten()\n",
    "    related_indices = [\n",
    "        i for i in item_similarities.argsort()[::-1] if i != index\n",
    "    ]\n",
    "\n",
    "    return [(index, item_similarities[index])\n",
    "            for index in related_indices][0:top_n]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 例 9-10：基于物品的协同过滤推荐：第 2 轮"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [],
   "source": [
    "def paper_recommender(items_df, paper_index, top_n):\n",
    "    if paper_index in model_df.index:\n",
    "\n",
    "        print('Based on the paper:')\n",
    "        print('Paper index = ', model_df.loc[paper_index].name)\n",
    "        print('Title :', model_df.loc[paper_index]['title'])\n",
    "        print('FOS :', model_df.loc[paper_index]['fos'])\n",
    "        print('Year :', model_df.loc[paper_index]['year'])\n",
    "        print('Abstract :', model_df.loc[paper_index]['abstract'])\n",
    "        print('Authors :', model_df.loc[paper_index]['authors'], '\\n')\n",
    "\n",
    "        # define the location index for the DataFrame index requested\n",
    "        array_ix = model_df.index.get_loc(paper_index)\n",
    "\n",
    "        top_results = piped_collab_filter(items_df, array_ix, top_n)\n",
    "\n",
    "        print('\\nTop', top_n, 'results: ')\n",
    "\n",
    "        order = 1\n",
    "        for i in range(len(top_results)):\n",
    "            print(order, '. Paper index = ',\n",
    "                  model_df.iloc[top_results[i][0]].name)\n",
    "            print('Similarity score: ', top_results[i][1])\n",
    "            print('Title :', model_df.iloc[top_results[i][0]]['title'])\n",
    "            print('FOS :', model_df.iloc[top_results[i][0]]['fos'])\n",
    "            print('Year :', model_df.iloc[top_results[i][0]]['year'])\n",
    "            print('Abstract :', model_df.iloc[top_results[i][0]]['abstract'])\n",
    "            print('Authors :', model_df.iloc[top_results[i][0]]['authors'],\n",
    "                  '\\n')\n",
    "            if order < top_n: order += 1\n",
    "\n",
    "    else:\n",
    "        print('Whoops! Choose another paper. Try something from here: \\n',\n",
    "              model_df.index[100:200])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Based on the paper:\n",
      "Paper index =  2\n",
      "Title : Should endometriosis be an indication for intracytoplasmic sperm injection (ICSI) in fresh IVF cycles\n",
      "FOS : nan\n",
      "Year : 2015\n",
      "Abstract : nan\n",
      "Authors : [{'name': 'Jovana P. Lekovich', 'org': 'Weill Cornell Medical College, New York, NY'}, {'name': 'G.D. Palermo', 'org': 'Weill Medical College of Cornell University, New York, NY'}, {'name': 'Nigel Pereira', 'org': 'The Ronald O. Perelman and Claudia Cohen Center, New York, NY'}, {'name': 'Zev Rosenwaks', 'org': 'Weill Cornell Medical College, New York, NY'}] \n",
      "\n",
      "\n",
      "Top 3 results: \n",
      "1 . Paper index =  13686\n",
      "Similarity score:  1.0\n",
      "Title : Electromagnetic System Design and Visual Simulation\n",
      "FOS : ['Simulation', 'Systems engineering', 'Engineering', 'Engineering drawing']\n",
      "Year : 2004\n",
      "Abstract : An electromagnet design system is developed using Visual C++ language and OpenGL technology to visualize parametric 3D model. The system consists of primary design, optimization design, dynamic and static characteristics, and visual simulation. All empirical parameters and curves used in design process are stored in database. Through human-computer interactions, an electromagnetic system can be designed conveniently with the results and characteristics curves displayed in graphic model. Using this system can greatly shorten the process of product design, and the results satisfy technical requirements.\n",
      "Authors : [{'name': 'Niu Chunping', 'org': \"School of Electrical Engineering, Xi'an Jiaotong University, Xi'an 710049, China)\"}] \n",
      "\n",
      "2 . Paper index =  4857\n",
      "Similarity score:  1.0\n",
      "Title : An Investigation of the Action and Haemolytic Effect of Glyceryl Guaiacolate in the Horse\n",
      "FOS : ['Respiration', 'Carbon dioxide', 'Oxygen', 'Surgery']\n",
      "Year : 1978\n",
      "Abstract : SUMMARY#R##N##R##N#Glyceryl guaiacolate (GGE) was found to be a useful and safe casting agent when given by rapid intravenous infusion. It was administered to premedicated horses under controlled conditions at various concentrations from 10 to 20 per cent GGE solution. The onset and degree of relaxation was dependent only on the speed of infusion. For casting adult horses 350 to 450 ml of 15 per cent solution must be given within 30 to 60 seconds. A slight transient hypoxaemia occurred which seemed to be related to the animal being in lateral recumbency rather than the depressive action of GGE on respiratory function. At the higher concentrations (20 per cent solution) 2 cases of aseptic thrombosis of the jugular vein occurred.#R##N##R##N##R##N##R##N#There were no significant changes in the blood picture associated with GGE administration apart from some transient haemolysis in horses dosed with 20 per cent solution. However, if the solution was adequately stabilised this did not occur. The haemolytic threshold without stabilizing substances was found to lie between 16 and 20 per cent. For daily use a prepacked 15 per cent stable solution was recommended.#R##N##R##N##R##N##R##N#RESUME#R##N##R##N#L'ester glycerique du gaiacol s'est reveleetre un facteur relaxeur utile lorsqu'on l'injecte par voie intraveineuse rapide. On l'a administrea des chevaux premediques a des concentrations variables en solution de 10 %a 20 %. L'apparition et le degre de relaxation dependent seulement de la vitesse de perfusion. Pour coucher des chevaux adultes 350 a 450 ml de la solution doivent etre injectes en 30 a 60 secondes. Une legere et passagere hypoxie fut constatee qui semble n'etre provoquee que par le decubitus lateral des animaux plus que due a une action depressive du medicament sur la fonction respiratoire. A des concentrations elevees (20%) deux cas de trombose aseptique de la veine jugulaire furent constates.#R##N##R##N##R##N##R##N#On ne constata point de modifications significatives dans la formule sanguine a la suite de l'injection de glyceryl gaiacolate a l'exception d'une hemolyse passagere lors de concentrations de l'ordre de 20 %.#R##N##R##N##R##N##R##N#Ceci n'apparut d'ailleurs que lors de l'emploi de solutions non stabilisees. Le seuil d'apparition de l'hemolyse se situerait pour des concentrations de 16 a 20 %.#R##N##R##N##R##N##R##N#L'emploi de solutions a 15 % est recommande.#R##N##R##N##R##N##R##N#ZUSAMMENFASSUNG#R##N##R##N#Guiacolglyzerin-Aether (GGE) wurde als nutzliches und sicheres Mittel zum Niederlegen von Pferden befunden, wenn es rasch intravenos infundiert wird. Praemedizierte Pferde erhielten GGE in Konzentrationen von 10 bis 20 % unter kontrollierten Bedingungen. Eintritt und Grad der Relaxierung hingen nur von der Infusionsgeschwindigkeit ab. Um erwachsene Pferde niederzulegen, mussen 350 bis 450 ml der 15 prozentigen Losung innert 30 bis 60 Sekunden gegeben werden. Eine leichte, vorubergehende Hypoxaemie trat auf, die anscheinend eher der Seitenlage des Pferdes als der atmungsbeeintrachtigenden Wirkung des GGE zuzuschreiben war. Bei hoheren Konzentrationen (20 %) waren zwei Falle von aseptischer Thrombose der Jugularvenen zu verzeichnen.#R##N##R##N##R##N##R##N#Das Blutbild wies keine signifikanten Veranderungen auf, ausser einer vorubergehenden Haemolyse mit der 20 prozentigen Losung. Wenn die Losung adaequat stabilisiert war, trat die Haemolyse nicht auf. Die haemolytische Schwelle ohne Stabilisatoren liegt zwischen 16 und 20 Prozent. Fur den taglichen Gebrauch wird eine vorverpackte, 15 prozentige stabile Losung empfohlen.\n",
      "Authors : [{'name': 'Schatzmann U', 'org': 'Klinik fur Nutztiere und Pferde, University of Berne, Langgasstrasse 124, Berne, Switzerland'}, {'name': 'Tschudi P', 'org': 'Klinik fur Nutztiere und Pferde, University of Berne, Langgasstrasse 124, Berne, Switzerland'}, {'name': 'James P. Held', 'org': 'Klinik fur Nutztiere und Pferde, University of Berne, Langgasstrasse 124, Berne, Switzerland'}, {'name': 'B. Muhlebach', 'org': 'Klinik fur Nutztiere und Pferde, University of Berne, Langgasstrasse 124, Berne, Switzerland'}] \n",
      "\n",
      "3 . Paper index =  11762\n",
      "Similarity score:  1.0\n",
      "Title : Spongiosis and sclerosis of the frontal sinus\n",
      "FOS : nan\n",
      "Year : 1954\n",
      "Abstract : nan\n",
      "Authors : [{'name': 'Schlosshauer B'}] \n",
      "\n"
     ]
    }
   ],
   "source": [
    "paper_recommender(second_features, 2, 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "说实话，我并不认为这次特征选择的效果有多么好。这些字段中有很多缺失值。下面继续看一下能否找出一些信息更丰富的特征。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 找到你的位置\n",
    "在 Pandas 数据框和 NumPy 矩阵之间的转换中，索引会令人迷惑——索引的大小相同， 但其分配的位置却不一样。为了解决这个问题，Pandas 提供了 `.iloc`、`.loc` 和 `.get_loc` 三种方法，如例 9-11 所示。\n",
    "\n",
    "- `.loc` 返回基于初始 Pandas 数据框的索引，可以让我们引用具体的论文。\n",
    "- `.iloc` 使用整数位置，和 NumPy 数组的索引是一样的。\n",
    "- `.get_loc` 可以帮助我们在已知数据框索引时找出整数位置。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 例 9-11:在转换时维护索引分配"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "abstract      A microprocessor includes hardware registers t...\n",
       "authors                        [{'name': 'Mark John Ebersole'}]\n",
       "fos           [Embedded system, Parallel computing, Computer...\n",
       "keywords                                                    NaN\n",
       "lang                                                         en\n",
       "references    [1bdfcbc2-29de-4c81-836a-eb672338a081, 1da9e4c...\n",
       "title         Microprocessor that enables ARM ISA program to...\n",
       "url           [http://www.freepatentsonline.com/y2013/030501...\n",
       "year                                                       2013\n",
       "Name: 21, dtype: object"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_df.loc[21]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "abstract      A microprocessor includes hardware registers t...\n",
       "authors                        [{'name': 'Mark John Ebersole'}]\n",
       "fos           [Embedded system, Parallel computing, Computer...\n",
       "keywords                                                    NaN\n",
       "lang                                                         en\n",
       "references    [1bdfcbc2-29de-4c81-836a-eb672338a081, 1da9e4c...\n",
       "title         Microprocessor that enables ARM ISA program to...\n",
       "url           [http://www.freepatentsonline.com/y2013/030501...\n",
       "year                                                       2013\n",
       "Name: 21, dtype: object"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_df.iloc[21]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "30"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_df.index.get_loc(30)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第三步：更多特征=更多信息\n",
    "到此为止，我们的实验并没有支持初始假设，即仅靠出版年份和研究领域就足以推荐出相似的论文。既然如此，有以下几种选择。\n",
    "\n",
    "- 使用原始数据集中更多的数据，看看能否得到更好的结果。\n",
    "- 花费更多时间去探索数据，看看能否找到一个足够密集的集合来提供好的推荐。\n",
    "- 添加更多特征，继续迭代当前模型。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第一种选择假设问题在于我们对数据的抽样，这是有可能的，但这样做和图 9-4 中翻动数据垃圾堆找寻更好答案的比喻是一样的。\n",
    "\n",
    "第二种选择可以更好地理解原始数据。这应该在数据探索过程中，根据特征和模型选择决策的变化不断地重新进行。在本例中，初始的子样本选择就反映了这个步骤。因为在数据集中还有更多变量可用，所以我们就不再重新进行这个步骤了。\n",
    "\n",
    "最后看第三种选择，添加更多特征，在当前模型的基础上继续前进。加入更多关于每个项目的信息可以改善相似度评分，进而得到更好的推荐。\n",
    "\n",
    "根据我们最初的探索，下一步的工作将集中在信息量最丰富的两个字段上：论文摘要和作者姓名。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 学术论文推荐器：第3轮"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "回顾一下第 4 章，我们可以知道论文摘要非常适合使用 tf-idf 来过滤掉噪声并找出显著相关的单词。我们在例 9-12 中实现了 tf-idf。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 例 9-12：停用词 +tf-idf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [],
   "source": [
    "# need to fill in NaN for sklearn\n",
    "filled_df = model_df.fillna('None')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    A system and method for maskless direct write ...\n",
       "1                                                 None\n",
       "2                                                 None\n",
       "3                                                 None\n",
       "4    早期発見と治療成績の向上で担癌患者の生存期間が延びており，それに伴い重複癌を経験する機会も増...\n",
       "Name: abstract, dtype: object"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# abstract: stopwords, frequency based filtering (tf-idf?)\n",
    "filled_df['abstract'].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<20000x99896 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 592505 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
    "\n",
    "vectorizer = TfidfVectorizer(\n",
    "    sublinear_tf=True, max_df=0.5, stop_words='english')\n",
    "X_abstract = vectorizer.fit_transform(filled_df['abstract'])\n",
    "\n",
    "X_abstract"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "n_samples: 20000, n_features: 99896\n"
     ]
    }
   ],
   "source": [
    "print(\"n_samples: %d, n_features: %d\" % X_abstract.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "ename": "MemoryError",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mMemoryError\u001b[0m                               Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-88-9635c2b3f99c>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mthird_features\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0msecond_features\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mX_abstract\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtoarray\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0maxis\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\scipy\\sparse\\compressed.py\u001b[0m in \u001b[0;36mtoarray\u001b[1;34m(self, order, out)\u001b[0m\n\u001b[0;32m    941\u001b[0m         \u001b[1;32mif\u001b[0m \u001b[0mout\u001b[0m \u001b[1;32mis\u001b[0m \u001b[1;32mNone\u001b[0m \u001b[1;32mand\u001b[0m \u001b[0morder\u001b[0m \u001b[1;32mis\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    942\u001b[0m             \u001b[0morder\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_swap\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'cf'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 943\u001b[1;33m         \u001b[0mout\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_process_toarray_args\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0morder\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mout\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    944\u001b[0m         \u001b[1;32mif\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[1;33m(\u001b[0m\u001b[0mout\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mflags\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mc_contiguous\u001b[0m \u001b[1;32mor\u001b[0m \u001b[0mout\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mflags\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mf_contiguous\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    945\u001b[0m             \u001b[1;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'Output array must be C or F contiguous'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\scipy\\sparse\\base.py\u001b[0m in \u001b[0;36m_process_toarray_args\u001b[1;34m(self, order, out)\u001b[0m\n\u001b[0;32m   1128\u001b[0m             \u001b[1;32mreturn\u001b[0m \u001b[0mout\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1129\u001b[0m         \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1130\u001b[1;33m             \u001b[1;32mreturn\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mzeros\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0morder\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0morder\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   1131\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1132\u001b[0m     \u001b[1;32mdef\u001b[0m \u001b[0m__numpy_ufunc__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mfunc\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmethod\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mpos\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0minputs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mMemoryError\u001b[0m: "
     ]
    }
   ],
   "source": [
    "third_features = np.append(second_features, X_abstract.toarray(), axis = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "paper_recommender(third_features, 2, 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "论文作者比较混乱，而且参差不齐，我们把它整理成字典，再对它进行 one-hot 编码，这样可以降低计算负载，如例 9-13 所示。\n",
    "\n",
    "#### 例 9-13：使用 scikit-learn 的 DictVectorizer 进行 one-hot 编码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>authors</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[{'name': 'Ahmed M. Alluwaimi'}]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[{'name': 'Jovana P. Lekovich', 'org': 'Weill ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[{'name': 'P. M. Voltes'}]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[{'name': '高田和外'}, {'name': 'ほか'}]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             authors\n",
       "0                                               None\n",
       "1                   [{'name': 'Ahmed M. Alluwaimi'}]\n",
       "2  [{'name': 'Jovana P. Lekovich', 'org': 'Weill ...\n",
       "3                         [{'name': 'P. M. Voltes'}]\n",
       "4                 [{'name': '高田和外'}, {'name': 'ほか'}]"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "authors_df = pd.DataFrame(filled_df.authors)\n",
    "authors_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [],
   "source": [
    "authors_list = []\n",
    "\n",
    "for row in authors_df.itertuples():\n",
    "    # create a dictionary from each Series index\n",
    "    if type(row.authors) is str:\n",
    "        y = {'None': row.Index}\n",
    "    if type(row.authors) is list:\n",
    "        # add these keys + values to our running dictionary    \n",
    "        y = dict.fromkeys(row.authors[0].values(), row.Index)\n",
    "    authors_list.append(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'None': 0},\n",
       " {'Ahmed M. Alluwaimi': 1},\n",
       " {'Jovana P. Lekovich': 2, 'Weill Cornell Medical College, New York, NY': 2},\n",
       " {'P. M. Voltes': 3},\n",
       " {'高田和外': 4}]"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "authors_list[0:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0., 0., 0., ..., 0., 0., 0.],\n",
       "       [0., 0., 0., ..., 0., 0., 0.],\n",
       "       [0., 0., 0., ..., 0., 0., 0.],\n",
       "       ...,\n",
       "       [0., 0., 0., ..., 0., 0., 0.],\n",
       "       [0., 0., 0., ..., 0., 0., 0.],\n",
       "       [0., 0., 0., ..., 0., 0., 0.]])"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.feature_extraction import DictVectorizer\n",
    "v = DictVectorizer(sparse=False)\n",
    "D = authors_list\n",
    "X_authors = v.fit_transform(D)\n",
    "\n",
    "X_authors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "n_samples: 20000, n_features: 25154\n"
     ]
    }
   ],
   "source": [
    "print(\"n_samples: %d, n_features: %d\" % X_authors.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'third_features' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-95-8954394db0bd>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[1;31m# now looking at 5167 x  38070 array for our feature space\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      2\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0mfourth_features\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mthird_features\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mX_authors\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0maxis\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mNameError\u001b[0m: name 'third_features' is not defined"
     ]
    }
   ],
   "source": [
    "# now looking at 5167 x  38070 array for our feature space\n",
    "\n",
    "fourth_features = np.append(third_features, X_authors, axis = 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在可以使用推荐器看看这些新特征的效果了。例 9-14 给出了结果。\n",
    "#### 例 9-14：基于项目的协同过滤推荐：第 3 轮"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "paper_recommender(fourth_features, 2, 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "尽管某些字段中有缺失数据，但最后一轮特征工程得到的前 3 个结果都向我们推荐了医学领域的其他论文。\n",
    "\n",
    "这个数据集中的论文范围非常广泛。例如，论文的随机抽样中可以包括以下研究领域： “Coupling  constant”“Evapotranspiratioin”“Hash function”“IVMS”“Meditation”“Pareto analysis”“Second-generation wavelet transform”“Slip”和“Spiral  galaxy”。考虑到这 1 万多篇论文中共有 7604 个唯一的研究领域，这些最终结果应该是向着正确方向前进。我们的工作正逐步接近有用的模型，我们对此非常有信心。\n",
    "\n",
    "对更多文本型变量继续迭代，比如找出论文题目中的名词短语，或对关键字进行词干提取，都可以使我们更加接近“最佳”推荐。\n",
    "\n",
    "需要注意的是，这里所说的“最佳”只是所有推荐器和搜索引擎追求的一种理想状态。我们要搜索出一个对用户最有帮助的结果，这不一定能从数据中直接表现出来。特征工程可以抽象出显著的特征并将其转化为一种表示形式，以使算法能揭示出其中包含的显式和隐式信息。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 总结\n",
    "正如你看到的，建立一个机器学习模型非常容易，但要建立一个好模型并得到有用的结果则需要花很多时间做很多工作。在本章中，我们为了获得更好的结果，检验了可能的变量集合，使用多种特征工程方法进行了试验。在这里，“更好”的含义不仅包括从训练和测试中得到好的结果，还包括使模型更简洁，以及减少在各种试验上的迭代时间。\n",
    "\n",
    "本书开头说过，要精通一门学科，需要深入理解其中的原理，以便获得直觉，进而有效地将知识应用到工作中。希望从本书中你能获得必要的方法和工具，提高工作的效率和效果，同时扩展你的数学与计算机能力，更好地理解为什么特征工程是开发有用的机器学习模型的一项基本技能。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参考文献\n",
    "Sarwar, Badrul, George Karypis, Joseph Konstan, and John Riedl. “Item-Based Collaborative Filtering Recommendation Algorithms.” Proceedings of the 10th International Conference on the World Wide Web (2001) 285–295.\n",
    "\n",
    "Sinha, Arnab, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. “An Overview of Microsoft Academic Service (MAS) and Applications.” Proceedings of the 24th International Conference on the World Wide Web (2015): 243–246.\n",
    "\n",
    "Tang, Jie, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. “ArnetMiner: Extraction and Mining of Academic Social Networks.” Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008): 990–998.\n",
    "\n",
    "Wickham, Hadley. “Tidy Data.” The Journal of Statistical Software 59 (2014)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
