For a deeper look into our Eikon Data API, look into:

Overview |  Quickstart |  Documentation |  Downloads |  Tutorials |  Articles

question

Upvotes
3 0 0 4

How do I remove/skip collecting news headlines that are in an invalid format (such as HTML)?

I am currently running a Python program that makes use of the get_news_headlines(). I have been using it for a while and almost always managed to collect news articles correctly. I ran my code today to collect news created this weekend and got an error. I believe the error regards a news article that is of HTML format and for some reason cannot be collected. This issue caused the whole program to halt, which is problematic considering I need to eventually start collecting real-time data. Is there a way this issue could be fixed please, by for example, skipping such news articles of invalid format?

The error is as follows:

1637668010766.png


eikon-data-apinews
1637668010766.png (68.5 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
13k 32 12 18

Hi @azzopardic,

What is the complete code snippet and filter query that you are using. The above shown error message is not adequate for us to determine the cause.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
23k 22 9 14

Hello @azzopardic ,

Fully agree with @Gurpreet, and as the error was not manifesting before and either started coming up or just came up now, would suggest to print the payload prior to parsing it, if you see the issue again, you will have the specific news headline that has triggered it, and will be able to paste it into this question, may be helpful in addition to the code.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
3 0 0 4

Hi, thank you for looking into my query,

As I stated, this issue takes place in the rare case that a news article is of invalid format. I cannot pinpoint the exact article that is causing this issue, as this error is being raised on attempting to collect the news. I only have access to the storyID of a news article once I have the news DataFrame which I collect through this function.

Here is the code for your perusal:

def get_news(ftr, curr, date_from, date_to):

    ek.set_app_key(xxx) 
    
    cfg = cp.ConfigParser()
    cfg.read('eikon.cfg')

    news_curr = pd.DataFrame()

    news_curr = ek.get_news_headlines(ftr,
                                        date_from=date_from,
                                        date_to=date_to,
                                        count=100)
...

And another part of the error:

1637739154878.png

Note:

I tried looking for the news article that's causing the problem, but as I was trying out the same dates and filter, it seems I can't recreate the error :/



1637739154878.png (17.7 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
13k 32 12 18

Hi @azzopardic ,

The call is a standard API news head line, and I have never had issue with it. Most likely it is something that you are doing in your code. You will have to provide the complete code and filter query, for us to help you.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvote
9.7k 49 38 60

Hi @azzopardic

Looking at the above function get_news(), I noticed you are performing what appears to be application initialization code, i.e. set_app_key() and reading a config file. How often are you calling get_news()? I would suggest you remove these lines of code outside of the get_news() as they may be initializing your eikon session every time you are trying to get news. I don't know if this is at all related to the issues reported, but it may eliminate side effects of doing this over and over unecessarily.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Noted with thanks.

Upvotes
3 0 0 4

Hi @Gurpreet,

I have never had a problem like this either, so I doubt it is something from the code, since running it on different dates and filters returns no errors. Just in case, below please find the variables used as the parameters of the function call.

date_from = '2021-11-19T15:18:17'
date_to = '2021-11-22 07:47:51'
all_filters = {
        'EUR/AUD':'R:EURAUD=',
        'EUR/CAD':'R:EURCAD=',
        'EUR/CHF': 'R:EURCHF=',
        'EUR/GBP': 'Topic:FRX AND R:EURGBP=',
        'EUR/JPY': 'R:EURJPY=',
        'EUR/NOK': 'R:EURNOK=',
        'EUR/NZD': 'R:EURNZD=',
        'EUR/SEK': 'R:EURSEK= AND Topic:FRX',
        'EUR/USD': 'Topic:FRX AND R:EUR=',
        'GOLD': 'R:XAU=', 
        'SILVER': 'R:XAGEUR=R',
        'OIL': 'Topic:CRU' 
    }

ftr is one of the values in all_filters

Thank you for all your help

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
23k 22 9 14

Hello @azzopardic and @Gurpreet ,

Perhaps it would be helpful, I have run a quick test on the above, this way:

date_from = '2021-11-19T15:18:17'
date_to = '2021-11-22 07:47:51'
all_filters = {
        'EUR/AUD':'R:EURAUD=',
        'EUR/CAD':'R:EURCAD=',
        'EUR/CHF': 'R:EURCHF=',
        'EUR/GBP': 'Topic:FRX AND R:EURGBP=',
        'EUR/JPY': 'R:EURJPY=',
        'EUR/NOK': 'R:EURNOK=',
        'EUR/NZD': 'R:EURNZD=',
        'EUR/SEK': 'R:EURSEK= AND Topic:FRX',
        'EUR/USD': 'Topic:FRX AND R:EUR=',
        'GOLD': 'R:XAU=', 
        'SILVER': 'R:XAGEUR=R',
        'OIL': 'Topic:CRU' 
    }
for key in all_filters:
    print(key, '->', all_filters[key])
    df = ek.get_news_headlines(all_filters[key],
                                        date_from=date_from,
                                        date_to=date_to,
                                        count=100)
    print(df)

and was not able to reproduce the issue that you observe.

Some results that have returned were empty, others had headlines of up to 100 headlines, but I was not able to reproduce the error.

If this is not what you are doing @azzopardic , please advise what is different in your code, and how to reproduce the issue you are facing so that we can try to see the same?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

I had another issue recently with regards to invalid formats of news articles, and the same thing happened. I could not recreate the issue using the same code as I was doing before. I believe this is a problem stemming from the articles themselves.

I've just tried it again myself, and I couldn't find the article either. Unfortunately, I do not know how to solve this problem. I don't think that this problem comes from my code, as I have been using it for a couple of months now and I have never had this or similar errors until now, and the error is raised on the line containing the call to get_news_headlines().

Thank you for your assistance.

@azzopardic,

My inclinations is that there is a bug in the code. We are unable to reproduce any errors.

Can you please run your code with the DEBUG logging enabled and show us the logs when this exception happens next time.

I will try, thank you.
Click below to post an Idea Post Idea