File: //usr/lib/python3/dist-packages/chardet/__pycache__/universaldetector.cpython-39.pyc
a
    �n�_�0  �                   @   s�   d Z ddlZddlZddlZddlmZ ddlmZmZm	Z	 ddl
mZ ddlm
Z
 ddlmZ dd	lmZ G d
d� de�ZdS )a  
Module containing the UniversalDetector detector class, which is the primary
class a user of ``chardet`` should use.
:author: Mark Pilgrim (initial port to Python)
:author: Shy Shalom (original C code)
:author: Dan Blanchard (major refactoring for 3.0)
:author: Ian Cordasco
�    N�   )�CharSetGroupProber)�
InputState�LanguageFilter�ProbingState)�EscCharSetProber)�Latin1Prober)�MBCSGroupProber)�SBCSGroupProberc                	   @   sn   e Zd ZdZdZe�d�Ze�d�Ze�d�Z	dddd	d
ddd
d�Z
ejfdd�Z
dd� Zdd� Zdd� ZdS )�UniversalDetectoraq  
    The ``UniversalDetector`` class underlies the ``chardet.detect`` function
    and coordinates all of the different charset probers.
    To get a ``dict`` containing an encoding and its confidence, you can simply
    run:
    .. code::
            u = UniversalDetector()
            u.feed(some_bytes)
            u.close()
            detected = u.result
    g�������?s   [�-�]s   (|~{)s   [�-�]zWindows-1252zWindows-1250zWindows-1251zWindows-1256zWindows-1253zWindows-1255zWindows-1254zWindows-1257)z
iso-8859-1z
iso-8859-2z
iso-8859-5z
iso-8859-6z
iso-8859-7z
iso-8859-8z
iso-8859-9ziso-8859-13c                 C   sN   d | _ g | _d | _d | _d | _d | _d | _|| _t�	t
�| _d | _| �
�  d S )N)�_esc_charset_prober�_charset_probers�result�done�	_got_data�_input_state�
_last_char�lang_filter�loggingZ	getLogger�__name__�logger�_has_win_bytes�reset)�selfr   � r   �;/usr/lib/python3/dist-packages/chardet/universaldetector.py�__init__Q   s    zUniversalDetector.__init__c                 C   sV   dddd�| _ d| _d| _d| _tj| _d| _| jr>| j�	�  | j
D ]}|�	�  qDdS )z�
        Reset the UniversalDetector and all of its probers back to their
        initial states.  This is called by ``__init__``, so you only need to
        call this directly in between analyses of different documents.
        N�        ��encoding�
confidence�languageF�    )r   r   r   r   r   �
PURE_ASCIIr   r   r   r   r
   )r   �proberr   r   r   r   ^   s    
zUniversalDetector.resetc                 C   s>  | j r
dS t|�sdS t|t�s(t|�}| js�|�tj�rJdddd�| _nv|�tj	tj
f�rldddd�| _nT|�d�r�dddd�| _n:|�d	�r�d
ddd�| _n |�tjtjf�r�dddd�| _d| _| jd
 dur�d| _ dS | j
tjk�r.| j�|��rtj| _
n*| j
tjk�r.| j�| j| ��r.tj| _
|dd� | _| j
tjk�r�| j�s^t| j�| _| j�|�tjk�r:| jj| j�� | jjd�| _d| _ n�| j
tjk�r:| j�s�t | j�g| _| jt!j"@ �r�| j�#t$� � | j�#t%� � | jD ]:}|�|�tjk�r�|j|�� |jd�| _d| _  �q&�q�| j&�|��r:d| _'dS )a�  
        Takes a chunk of a document and feeds it through all of the relevant
        charset probers.
        After calling ``feed``, you can check the value of the ``done``
        attribute to see if you need to continue feeding the
        ``UniversalDetector`` more data, or if it has made a prediction
        (in the ``result`` attribute).
        .. note::
           You should always call ``close`` when you're done feeding in your
           document if ``done`` is not already ``True``.
        Nz	UTF-8-SIG�      �?� r   zUTF-32s   ��  zX-ISO-10646-UCS-4-3412s     ��zX-ISO-10646-UCS-4-2143zUTF-16Tr   ���)(r   �len�
isinstance�	bytearrayr   �
startswith�codecs�BOM_UTF8r   �BOM_UTF32_LE�BOM_UTF32_BE�BOM_LE�BOM_BEr   r   r#   �HIGH_BYTE_DETECTOR�search�	HIGH_BYTE�ESC_DETECTORr   Z	ESC_ASCIIr   r   r   �feedr   ZFOUND_IT�charset_name�get_confidencer!   r
   r	   r   ZNON_CJK�appendr
   r   �WIN_BYTE_DETECTORr   )r   Zbyte_strr$   r   r   r   r6   o   s�    
�
��
�
�
�
��
�
zUniversalDetector.feedc           	   	   C   st  | j r| jS d| _ | js&| j�d� n�| jtjkrBdddd�| _n�| jtjkr�d}d}d}| j	D ]"}|sjq`|�
� }||kr`|}|}q`|r�|| jkr�|j}|j�
� }|�
� }|�d	�r�| jr�| j�||�}|||jd�| _| j�� tjk�rn| jd
 du �rn| j�d� | j	D ]`}|�s�qt|t��rP|jD ] }| j�d|j|j|�
� � �q,n| j�d|j|j|�
� � �q| jS )
z�
        Stop analyzing the current document and come up with a final
        prediction.
        :returns:  The ``result`` attribute, a ``dict`` with the keys
                   `encoding`, `confidence`, and `language`.
        Tzno data received!�asciir%   r&