NLTK comes with many corpora, toy grammars, trained models, etc. A complete list is posted at: http://nltk.org/nltk_data/
To install the data, first install NLTK (see http://nltk.org/install.html), then use NLTK’s data downloader as described below.
NLTK is a community driven project and is available for use on Linux, Mac OS X and Windows. Let’s first get started by installing NLTK to glue with Python using the following steps. Ubuntu Developers (Mail Archive) Please consider filing a bug or asking a question via Launchpad before contacting the maintainer directly. Original Maintainer (usually from Debian). This will take a a few minutes to download jupyter and nltk. When download is complete, you may wish to change directory by your home directory so that jupyter notebooks can be opened and saved in your home directroy.
Apart from individual data packages, you can download the entire collection (using “all”), or just the data required for the examples and exercises in the book (using “book”), or just the corpora and no grammars or trained models (using “all-corpora”).
Interactive installer¶
For central installation on a multi-user machine, do the following from an administrator account.
- Sudo apt install python3-nltk Developing things against the system Python environment is a little risky though. As You update to newer releases of Ubuntu, these packages will update too.
- Nltk: '3.2.1' If you have the StanfordParser compressed files already you don’t have to download again, if you’re running a dual boot windows and linux then I’d say just copy the StanfordParser zip packages or uncompressed files to your linux drive.
- NLTK finds third party software through environment variables or via path arguments through api calls. This page will list installation instructions & their associated environment variables. To search for java binaries (jar files), nltk checks the java CLASSPATH variable, however there are usually.
- I want to install nltk 3.0 on Ubuntu 13.10. I have been running Ubuntu for a few weeks (first time I am using Linux) and I have just downloaded python 3.4.0 - 3.3 is also on Ubuntu since it was installed with the operating system.
Run the Python interpreter and type the commands:
A new window should open, showing the NLTK Downloader. Click on the File menu and select Change Download Directory. For central installation, set this to C:nltk_data
(Windows), /usr/local/share/nltk_data
(Mac), or /usr/share/nltk_data
(Unix). Next, select the packages or collections you want to download.
If you did not install the data to one of the above central locations, you will need to set the NLTK_DATA
environment variable to specify the location of the data. (On a Windows machine, right click on “My Computer” then select Properties>Advanced>EnvironmentVariables>UserVariables>New...
)
Test that the data has been installed as follows. (This assumes you downloaded the Brown Corpus):
Installing via a proxy web server¶
If your web connection uses a proxy server, you should specify the proxy address as follows. In the case of an authenticating proxy, specify a username and password. If the proxy is set to None then this function will attempt to detect the system proxy.
Command line installation¶
The downloader will search for an existing nltk_data
directory to install NLTK data. If one does not exist it will attempt to create one in a central location (when using an administrator account) or otherwise in the user’s filespace. If necessary, run the download command from an administrator account, or using sudo. The recommended system location is C:nltk_data
(Windows); /usr/local/share/nltk_data
(Mac); and /usr/share/nltk_data
(Unix). You can use the -d
flag to specify a different location (but if you do this, be sure to set the NLTK_DATA
environment variable accordingly).
Run the command python-mnltk.downloaderall
. To ensure central installation, run the command sudopython-mnltk.downloader-d/usr/local/share/nltk_dataall
.
Windows: Use the “Run…” option on the Start menu. Windows Vista users need to first turn on this option, using Start->Properties->Customize
to check the box to activate the “Run…” option.
Test the installation: Check that the user environment and privileges are set correctly by logging in to a user account,starting the Python interpreter, and accessing the Brown Corpus (see the previous section).
Manual installation¶
Create a folder nltk_data
, e.g. C:nltk_data
, or /usr/local/share/nltk_data
,and subfolders chunkers
, grammars
, misc
, sentiment
, taggers
, corpora
,help
, models
, stemmers
, tokenizers
.
Download individual packages from http://nltk.org/nltk_data/
(see the “download” links).Unzip them to the appropriate subfolder. For example, the Brown Corpus, found at:https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/brown.zip
is to be unzipped to nltk_data/corpora/brown
.
Set your NLTK_DATA
environment variable to point to your top level nltk_data
folder.
NLTK requires Python versions 3.5, 3.6, 3.7, or 3.8
For Windows users, it is strongly recommended that you go through this guide to install Python 3 successfully https://docs.python-guide.org/starting/install3/win/#install3-windows
Setting up a Python Environment (Mac/Unix/Windows)¶
Please go through this guide to learn how to manage your virtual environment managers before you install NLTK, https://docs.python-guide.org/dev/virtualenvs/
Alternatively, you can use the Anaconda distribution installer that comes “batteries included” https://www.anaconda.com/distribution/
Mac/Unix¶
Install NLTK: run
pipinstall--user-Unltk
Install Numpy (optional): run
pipinstall--user-Unumpy
Test installation: run
python
then typeimportnltk
For older versions of Python it might be necessary to install setuptools (see http://pypi.python.org/pypi/setuptools) and to install pip (sudoeasy_installpip
).
Windows¶
These instructions assume that you do not already have Python installed on your machine.
32-bit binary installation¶
Install Python 3.7: http://www.python.org/downloads/ (avoid the 64-bit versions)
Install Numpy (optional): https://www.scipy.org/scipylib/download.html
Install NLTK: http://pypi.python.org/pypi/nltk
Test installation:
Start>Python38
, then typeimportnltk
Installing Third-Party Software¶
Please see: https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software
Installing NLTK Data¶
After installing the NLTK package, please do install the necessary datasets/models for specific functions to work.
If you’re unsure of which datasets/models you’ll need, you can install the “popular” subset of NLTK data, on the command line type python -m nltk.downloader popular, or in the Python interpreter import nltk; nltk.download(‘popular’)
Nltk Data Download
For details, see http://www.nltk.org/data.html