Dukascopy
 
 
Wiki JStore Search Login

Attention! Read the forum rules carefully before posting a topic.

    Try to find an answer in Wiki before asking a question.
    Submit programming questions in this forum only.
    Off topics are strictly forbidden.

Any topics which do not satisfy these rules will be deleted.

Why does the downloaded data take so much space?
 Post subject: Why does the downloaded data take so much space? Post rating: 0   New post Posted: Sun 05 Jul, 2009, 04:41 

User rating: 0
Joined: Tue 30 Jun, 2009, 20:38
Posts: 17
I have tried to run a test from 2008.01.01 to 2009.06.25 on a 15 minute period and on all 22 instruments, but I ran out of disk space half way through the data download. I checked out what was going on and found out that the data was occupying about 5.5 gigabytes. This is absolutely absurd. If I were to download the same data from metatrader (which I have) it occupies more like 50 megabytes. It looks like the platform has converted that data that it downloaded into every other timeframe, and that takes up a LOT more space (and time to convert it?).

Is there a way that I can get around this? Or is it a requirement for the strategy tester (in its current incarnation) to have data for all time frames? If this is the case, can this be changed in the future? I only have 10 GB of storage on the virtual server I am renting to run my strategies, and so I do not have enough space to store enough data for my strategy.


 
 Post subject: Re: Why does the downloaded data take so much space? Post rating: 0   New post Posted: Tue 07 Jul, 2009, 11:31 
User avatar

User rating:
Joined: Fri 31 Aug, 2007, 09:17
Posts: 6139
"This is absolutely absurd."
That is the real slice of life. :)

"If I were to download the same data from metatrader (which I have) it occupies more like 50 megabytes."
The downloaded historical data is saved on your HDD in uncompressed format.

You can change a path to saved historical data in Tools\Preferences\Advanced tab.


 
 Post subject: Re: Why does the downloaded data take so much space? Post rating: 0   New post Posted: Tue 07 Jul, 2009, 17:25 

User rating: 0
Joined: Tue 30 Jun, 2009, 20:38
Posts: 17
So no explantion for why it takes so much space? And no plans to fix this in the future? This may be a deal breaker for me unless I can find a way around it.

Surely it wouldn't be too dificult to allow the strategy tester to run on only a single period and only a single offerside? Not only would this cut down on the hard disk space utilization on your client's system, but it would also reduce the amount of data your server has to serve out, and it would allow your clients to download the data much more quickly and therefore complete their tests more quickly.

And I'm guessing that when the platform says "Downloading Data", really it is doing more than that? Even after I have downloaded the data (and it is cached) the platform says "Downloading Data" for almost an hour before actually running the test. I would guess that during this time it is doing something related to converting the data to different time frames. If I am correct, then this time would also be elimated by only testing on one timeframe and one offerside.

By allowing testing on only one timeframe and one offerside the tester will not have to be run on so many data points, and this means that the time it takes to complete a test could be dramatically reduced as well.

You could even retain the current level of functionality by allowing the user to select more than one timeframe and both offersides to test on - there is no reason that it has to be limited to one. I suppose some strategies may rely on data from multiple time frames...

I looked more closely at the data I got from metatrader, and it looks like it may be compressed (though that is fine with me, in fact I may prefer it). For my current strategy I only want to test on 15 minute bars from the last 1.5 years on all instruments. This data from metatrader would occupy 150 MB if I had a complete set of data, but some instruments only go back a couple months so the actual space it occupies is more like 100 MB. So, metatrader requires ~150MB while jForex requires ~10GB in order to run the same test. Does this not seem at least a tad outrageous?


 
 Post subject: Re: Why does the downloaded data take so much space? Post rating: 0   New post Posted: Wed 08 Jul, 2009, 13:03 
User avatar

User rating:
Joined: Fri 31 Aug, 2007, 09:17
Posts: 6139
The Historical Tester simulates test environment pretty close to the Real Market and Dukascopy Trade Server. It cannot be simulated correctly in case of a single period downloading. We are going to optimize historical data downloading for reducing network load in the near future. But, currently, we do not see any way to reduce an amount of the disk cache. Such optimization can harm other clients who need as much precise testing as possible.


 
 Post subject: Re: Why does the downloaded data take so much space? Post rating: 0   New post Posted: Wed 08 Jul, 2009, 13:51 
User avatar

User rating: 3
Joined: Wed 18 May, 2011, 16:25
Posts: 331
Location: SwitzerlandSwitzerland
Hello triwebb1,

If you have a filesystem that supports compression, you could change the location of the cache to a directory that has compression configured. This should shrink the data to about 10% of the original size.

I agree with Pavel, nothing should be considered that would reduce the precision of the historic datafeed.

Best, RR.


 
 Post subject: Re: Why does the downloaded data take so much space? Post rating: 0   New post Posted: Wed 08 Jul, 2009, 14:00 
User avatar

User rating:
Joined: Fri 31 Aug, 2007, 09:17
Posts: 6139
It is brilliant idea to put you cache in compressed forlder! I suppose it will be compressed much more than 10%.


 
 Post subject: Re: Why does the downloaded data take so much space? Post rating: 0   New post Posted: Wed 08 Jul, 2009, 18:48 
User avatar

User rating: 3
Joined: Wed 18 May, 2011, 16:25
Posts: 331
Location: SwitzerlandSwitzerland
Depending on the compressiontype used, the compression rate is about 90%, so the filesize shrinks to about 10% of its original size (1GB will shrink to about 100MB).


 
 Post subject: Re: Why does the downloaded data take so much space? Post rating: 0   New post Posted: Wed 08 Jul, 2009, 19:52 

User rating: 0
Joined: Tue 30 Jun, 2009, 20:38
Posts: 17
I will definately try that, RoadRunner, thank you for the suggestion.

My proposed solution (allowing the user to select which time frames and which offerside to test on) does not have to reduce the functionality of the tester at all. You know how the interface allows you to select different/multiple instruments? Do the same thing for the period. Maybe you cannot test on only one offerside, I can see how that may be a problem, but that depends on the format of the data.

If you make the periods selectable (like the instruments) then the user could select all periods to test on and nothing would change. But for someone like me who is only interested in the open or close prices of 15 minute bars then we could just select the 15 minute timeframe and not have to download or test on ALL data, only the 15 minute bars.

I understand that it is not as simple as just opting out of downloading the data, you will have to make some changes to the strategy tester's code so that it does not generate nor require data from the other periods, but I can't imagine that this would be too difficult, and it seems like the benefits of doing this make it well worth it.

Again, if you do what I am proposing then you are not taking away any functionality from the tester, rather you are to it and in a lot of cases dramatically increasing its efficiency.

I don't even understand why the tester runs on all timeframes every time - I see no need for it. If I tell it to run on 15 minute bars, why do I care about ticks? The tester should be completely oblivious to ticks in that case. Similarily, if I am running on 15 minute bars, then why do I care about 4 hour bars? Again, the tester should be completely oblivious to the 4 hour bars since I have told it that I want it to run on 15 minute bars.

So you may say that I have not told it to run on 15 minute bars, rather I told it to download 15 minute data and interpolate/extrapolate the rest of the data from that. Ok, so this may be a useful function for some strategies, but why not make this functionality optional, therefore greatly increasing the flexability of this platform and greatly increasing the testing efficiency for all strategies that don't require this functionality?


 
 Post subject: Re: Why does the downloaded data take so much space? Post rating: 0   New post Posted: Fri 10 Jul, 2009, 19:43 

User rating: -
"I don't even understand why the tester runs on all timeframes every time - I see no need for it. If I tell it to run on 15 minute bars, why do I care about ticks? The tester should be completely oblivious to ticks in that case."

this will create partial disconnect between backtesting and reality
this happens a lot in Metatrader you system makes a trade live but when u backtest it over the same period it does nothing
or vice versa

also think about the case when 15M bar has lots of volatility in it and both your TP and SL are on the single bar
which one of them should be hit first? if tick is ignored it will be 50/50

i really wish the platform could just download actual historical uncompressed tick
for maximum backtesting accuracy


 
 Post subject: Re: Why does the downloaded data take so much space? Post rating: 0   New post Posted: Tue 14 Jul, 2009, 00:01 

User rating: 0
Joined: Tue 30 Jun, 2009, 20:38
Posts: 17
The strategy I am developing right now only looks at the open prices of bars. That is all it cares about, so I have no need for any other information. I understand that there are a lot of other strategies that do not operate this way. In fact, I would say that most do not, however it would be nice if jForex would be flexible enough to handle my style of strategy development efficiently. This does not mean that it has to work only for my style and not anybody elses - I'm sure it would be quite simple to allow the user to choose to test ONLY on open prices, or ONLY using open/close prices, or to go ahead and use full tick data. If this were done then everybody would be happy. Right now people that right strategies the way I do are not happy because jForex is EXTREMELY inefficient and wastes bucket-loads of resources when testing a strategy that does not require full tick data / simulated tick data.


 

Jump to:  

  © 1998-2024 Dukascopy® Bank SA
On-line Currency forex trading with Swiss Forex Broker - ECN Forex Brokerage,
Managed Forex Accounts, introducing forex brokers, Currency Forex Data Feed and News
Currency Forex Trading Platform provided on-line by Dukascopy.com