Slixfeed/slixfeed/fetch.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""

FIXME

1) feed_mode_scan doesn't find feed for https://www.blender.org/
   even though it should be according to the pathnames dictionary.

TODO

1) Support Gemini and Gopher.

2) Check also for HTML, not only feed.bozo.

3) Add "if utility.is_feed(url, feed)" to view_entry and view_feed

4) Refactor view_entry and view_feed - Why "if" twice?

5) Replace sqlite.remove_nonexistent_entries by sqlite.check_entry_exist
   Same check, just reverse.

"""

from aiohttp import ClientError, ClientSession, ClientTimeout
from asyncio import TimeoutError
# from asyncio.exceptions import IncompleteReadError
# from bs4 import BeautifulSoup
# from http.client import IncompleteRead
# from lxml import html
import slixfeed.config as config
# from xml.etree.ElementTree import ElementTree, ParseError


# async def dat():

# async def ftp():
    
# async def gemini():

# async def gopher():

# async def http():

# async def ipfs():

async def download_feed(url):
    """
    Download content of given URL.

    Parameters
    ----------
    url : list
        URL.

    Returns
    -------
    msg: list or str
        Document or error message.
    """
    user_agent = (
        config.get_value(
            "settings", "Network", "user-agent")
        ) or 'Slixfeed/0.1'
    headers = {'User-Agent': user_agent}
    proxy = (config.get_value(
        "settings", "Network", "http_proxy")) or ''
    timeout = ClientTimeout(total=10)
    async with ClientSession(headers=headers) as session:
    # async with ClientSession(trust_env=True) as session:
        try:
            async with session.get(url, proxy=proxy,
                                   # proxy_auth=(proxy_username, proxy_password),
                                   timeout=timeout
                                   ) as response:
                status = response.status
                if response.status == 200:
                    try:
                        doc = await response.text()
                        # print (response.content_type)
                        msg = [doc, status]
                    except:
                        # msg = [
                        #     False,
                        #     ("The content of this document "
                        #      "doesn't appear to be textual."
                        #      )
                        #     ]
                        msg = [
                            False, "Document is too large or is not textual."
                            ]
                else:
                    msg = [
                        False, "HTTP Error: " + str(status)
                        ]
        except ClientError as e:
            # print('Error', str(e))
            msg = [
                False, "Error: " + str(e)
                ]
        except TimeoutError as e:
            # print('Timeout:', str(e))
            msg = [
                False, "Timeout: " + str(e)
                ]
    return msg
Split main.py into modules 2023-10-24 16:43:14 +02:00			`#!/usr/bin/env python3`
			`# -- coding: utf-8 --`

Add feeds, mionr improvements and notes 2023-11-22 12:47:34 +01:00			`"""`

			`FIXME`

			`1) feed_mode_scan doesn't find feed for https://www.blender.org/`
			`even though it should be according to the pathnames dictionary.`

Fix tasks. Listen carefully to Laura. 2023-11-23 17:55:36 +01:00			`TODO`

			`1) Support Gemini and Gopher.`

Add proxy services. Merry Christmas to one and all! 2023-12-26 12:22:45 +01:00			`2) Check also for HTML, not only feed.bozo.`

WIP Add http proxy support. Add more functionality to handle bookmarks. Split into more modules. Remove callback function initdb. Tasked status messages are broken. 2024-01-02 12:42:41 +01:00			`3) Add "if utility.is_feed(url, feed)" to view_entry and view_feed`
Add ClearURLs functionality. Fix Proxy functionality (remove www). 2023-12-27 23:48:31 +01:00
			`4) Refactor view_entry and view_feed - Why "if" twice?`

Split more functions into smaller functions 2024-01-02 19:11:36 +01:00			`5) Replace sqlite.remove_nonexistent_entries by sqlite.check_entry_exist`
			`Same check, just reverse.`

Add feeds, mionr improvements and notes 2023-11-22 12:47:34 +01:00			`"""`

Disable activation token and mastership mechanism 2023-12-05 09:18:29 +01:00			`from aiohttp import ClientError, ClientSession, ClientTimeout`
Fox issue with callback (adding URL) and an attempt to import specific parts of modules 2023-12-04 15:41:02 +01:00			`from asyncio import TimeoutError`
Restructure modules and database. Add OPML import functionality. Minor improvements. 2024-01-06 23:03:08 +01:00			`# from asyncio.exceptions import IncompleteReadError`
			`# from bs4 import BeautifulSoup`
			`# from http.client import IncompleteRead`
			`# from lxml import html`
WIP Add http proxy support. Add more functionality to handle bookmarks. Split into more modules. Remove callback function initdb. Tasked status messages are broken. 2024-01-02 12:42:41 +01:00			`import slixfeed.config as config`
Split main.py into modules 2023-10-24 16:43:14 +02:00			`# from xml.etree.ElementTree import ElementTree, ParseError`
Update 8 files - /slixfeed/sqlitehandler.py - /slixfeed/xmpphandler.py - /slixfeed/opmlhandler.py - /slixfeed/datahandler.py - /slixfeed/datetimehandler.py - /slixfeed/__main__.py - /slixfeed/confighandler.py - /slixfeed/filterhandler.py 2023-11-13 14:45:10 +01:00
Add preview commands (read and select) and experimenting with XEP-0249 2023-11-26 06:48:09 +01:00
Segregate code into more particular functions 2024-01-04 02:16:24 +01:00			`# async def dat():`
Add preview commands (read and select) and experimenting with XEP-0249 2023-11-26 06:48:09 +01:00
Segregate code into more particular functions 2024-01-04 02:16:24 +01:00			`# async def ftp():`

			`# async def gemini():`
Split main.py into modules 2023-10-24 16:43:14 +02:00
Segregate code into more particular functions 2024-01-04 02:16:24 +01:00			`# async def gopher():`
Split main.py into modules 2023-10-24 16:43:14 +02:00
Segregate code into more particular functions 2024-01-04 02:16:24 +01:00			`# async def http():`
Add preview commands (read and select) and experimenting with XEP-0249 2023-11-26 06:48:09 +01:00
Segregate code into more particular functions 2024-01-04 02:16:24 +01:00			`# async def ipfs():`
Add preview commands (read and select) and experimenting with XEP-0249 2023-11-26 06:48:09 +01:00
Split main.py into modules 2023-10-24 16:43:14 +02:00			`async def download_feed(url):`
			`"""`
			`Download content of given URL.`
Update datahandler.py 2023-11-02 06:17:04 +01:00
Update 8 files - /slixfeed/sqlitehandler.py - /slixfeed/xmpphandler.py - /slixfeed/opmlhandler.py - /slixfeed/datahandler.py - /slixfeed/datetimehandler.py - /slixfeed/__main__.py - /slixfeed/confighandler.py - /slixfeed/filterhandler.py 2023-11-13 14:45:10 +01:00			`Parameters`
			`----------`
Segregate code into more particular functions 2024-01-04 02:16:24 +01:00			`url : list`
Update 8 files - /slixfeed/sqlitehandler.py - /slixfeed/xmpphandler.py - /slixfeed/opmlhandler.py - /slixfeed/datahandler.py - /slixfeed/datetimehandler.py - /slixfeed/__main__.py - /slixfeed/confighandler.py - /slixfeed/filterhandler.py 2023-11-13 14:45:10 +01:00			`URL.`

			`Returns`
			`-------`
			`msg: list or str`
			`Document or error message.`
Split main.py into modules 2023-10-24 16:43:14 +02:00			`"""`
Fix updates retrieval 2024-01-04 14:58:06 +01:00			`user_agent = (`
			`config.get_value(`
			`"settings", "Network", "user-agent")`
			`) or 'Slixfeed/0.1'`
Segregate code into more particular functions 2024-01-04 02:16:24 +01:00			`headers = {'User-Agent': user_agent}`
More segregation of code 2024-01-04 13:38:22 +01:00			`proxy = (config.get_value(`
			`"settings", "Network", "http_proxy")) or ''`
Fox issue with callback (adding URL) and an attempt to import specific parts of modules 2023-12-04 15:41:02 +01:00			`timeout = ClientTimeout(total=10)`
Add user agent setting. Add command reset (mark as read). Fix error with command recent. Fix error with command stats. Thanks roughnecks for reporting these issues. 2023-12-18 16:29:32 +01:00			`async with ClientSession(headers=headers) as session:`
Fox issue with callback (adding URL) and an attempt to import specific parts of modules 2023-12-04 15:41:02 +01:00			`# async with ClientSession(trust_env=True) as session:`
Split main.py into modules 2023-10-24 16:43:14 +02:00			`try:`
Segregate code into more particular functions 2024-01-04 02:16:24 +01:00			`async with session.get(url, proxy=proxy,`
			`# proxy_auth=(proxy_username, proxy_password),`
			`timeout=timeout`
			`) as response:`
Split main.py into modules 2023-10-24 16:43:14 +02:00			`status = response.status`
			`if response.status == 200:`
			`try:`
			`doc = await response.text()`
			`# print (response.content_type)`
Segregate code into more particular functions 2024-01-04 02:16:24 +01:00			`msg = [doc, status]`
Split main.py into modules 2023-10-24 16:43:14 +02:00			`except:`
Update 8 files - /slixfeed/sqlitehandler.py - /slixfeed/xmpphandler.py - /slixfeed/opmlhandler.py - /slixfeed/datahandler.py - /slixfeed/datetimehandler.py - /slixfeed/__main__.py - /slixfeed/confighandler.py - /slixfeed/filterhandler.py 2023-11-13 14:45:10 +01:00			`# msg = [`
			`# False,`
			`# ("The content of this document "`
			`# "doesn't appear to be textual."`
			`# )`
			`# ]`
			`msg = [`
Segregate code into more particular functions 2024-01-04 02:16:24 +01:00			`False, "Document is too large or is not textual."`
Update 8 files - /slixfeed/sqlitehandler.py - /slixfeed/xmpphandler.py - /slixfeed/opmlhandler.py - /slixfeed/datahandler.py - /slixfeed/datetimehandler.py - /slixfeed/__main__.py - /slixfeed/confighandler.py - /slixfeed/filterhandler.py 2023-11-13 14:45:10 +01:00			`]`
Split main.py into modules 2023-10-24 16:43:14 +02:00			`else:`
Update 8 files - /slixfeed/sqlitehandler.py - /slixfeed/xmpphandler.py - /slixfeed/opmlhandler.py - /slixfeed/datahandler.py - /slixfeed/datetimehandler.py - /slixfeed/__main__.py - /slixfeed/confighandler.py - /slixfeed/filterhandler.py 2023-11-13 14:45:10 +01:00			`msg = [`
Segregate code into more particular functions 2024-01-04 02:16:24 +01:00			`False, "HTTP Error: " + str(status)`
Update 8 files - /slixfeed/sqlitehandler.py - /slixfeed/xmpphandler.py - /slixfeed/opmlhandler.py - /slixfeed/datahandler.py - /slixfeed/datetimehandler.py - /slixfeed/__main__.py - /slixfeed/confighandler.py - /slixfeed/filterhandler.py 2023-11-13 14:45:10 +01:00			`]`
Fox issue with callback (adding URL) and an attempt to import specific parts of modules 2023-12-04 15:41:02 +01:00			`except ClientError as e:`
Update 8 files - /slixfeed/sqlitehandler.py - /slixfeed/xmpphandler.py - /slixfeed/opmlhandler.py - /slixfeed/datahandler.py - /slixfeed/datetimehandler.py - /slixfeed/__main__.py - /slixfeed/confighandler.py - /slixfeed/filterhandler.py 2023-11-13 14:45:10 +01:00			`# print('Error', str(e))`
			`msg = [`
Segregate code into more particular functions 2024-01-04 02:16:24 +01:00			`False, "Error: " + str(e)`
Update 8 files - /slixfeed/sqlitehandler.py - /slixfeed/xmpphandler.py - /slixfeed/opmlhandler.py - /slixfeed/datahandler.py - /slixfeed/datetimehandler.py - /slixfeed/__main__.py - /slixfeed/confighandler.py - /slixfeed/filterhandler.py 2023-11-13 14:45:10 +01:00			`]`
Fox issue with callback (adding URL) and an attempt to import specific parts of modules 2023-12-04 15:41:02 +01:00			`except TimeoutError as e:`
Split main.py into modules 2023-10-24 16:43:14 +02:00			`# print('Timeout:', str(e))`
Update 8 files - /slixfeed/sqlitehandler.py - /slixfeed/xmpphandler.py - /slixfeed/opmlhandler.py - /slixfeed/datahandler.py - /slixfeed/datetimehandler.py - /slixfeed/__main__.py - /slixfeed/confighandler.py - /slixfeed/filterhandler.py 2023-11-13 14:45:10 +01:00			`msg = [`
Segregate code into more particular functions 2024-01-04 02:16:24 +01:00			`False, "Timeout: " + str(e)`
Update 8 files - /slixfeed/sqlitehandler.py - /slixfeed/xmpphandler.py - /slixfeed/opmlhandler.py - /slixfeed/datahandler.py - /slixfeed/datetimehandler.py - /slixfeed/__main__.py - /slixfeed/confighandler.py - /slixfeed/filterhandler.py 2023-11-13 14:45:10 +01:00			`]`
			`return msg`