Estimating file size

Random chit-chat and anything that doesn't belong elsewhere
Locked
rollin_eng
Veteran User
Posts: 3512
Joined: Wed May 04, 2011 11:06 pm

Estimating file size

Post by rollin_eng »

I am posting this as a follow up to this thread:

viewtopic.php?f=26&t=32370

It seems some people think it is impossible to estimate (to a useful degree) file size while encoding, and I would like to test to see if this is actually the case.

I have a few ripped blu-rays I plan on testing this out on and would like people, musicvid i'm looking at you, to suggest a few more if possible to get a good selection.

I currently have:

Transformers 4 (Terrible movie but lots of action)
X-Men DOFP
Godzilla (2014)
Aliens
The Hurt Locker
Harry Potter 5
Jaws

I plan on using the apple tv preset as that's what I use (I am aware other settings could change things) and I will post the results here to see what happens.

Thanks.
musicvid
Veteran User
Posts: 3666
Joined: Sat Jun 27, 2009 1:19 am

Re: Estimating file size

Post by musicvid »

I am thankful you started your own thread. I'm entering finals month, and this will be my only reply on your topic for the remainder of the spring.
I can understand you not being able to visualize the many scenarios where a mid-encode estimate can not work, however these can be easily demonstrated.

Here is a three minute segment analysis of a BluRay-formatted video I produced in 2011. It was subsequently encoded in Handbrake using High preset, with no other changes. This segment consists of five short clips of varying complexity, followed by seven stills that are separated by small fades. This graphic begs the question, "At what point in a single pass will any estimating scheme approach the final ABR, and thus allow a usable file size estimate?" The answer for this example, and most of my purchased movies is, "The estimate will converge on the actual ABR near the end of the sample, and not sooner."

So with the analysis already done, what is the average bitrate for this clip. What is the finished file size? (Look at the solid line on the histogram to estimate the answer, which I've obscured.) How do we know the actual bitrate before the end? Do we start logging data at the beginning, or wait until there is a change? At what point will any algorithm know a reasonable finished estimate? Is it worth adding a separate analysis pass in order to find out? If such a thing were possible, why wouldn't I want my estimate before the lengthy encoding step? Streaming content providers need to control this stuff up front -- that's why we don't use Constant Quality for that.

The bitrate ratio in this particular example is astronomical -- more than 2500:1, it appears. It is simple to see that by mid-encode, the cumulative ABR (file size predictor) is off by nearly 200% from the final ABR. At 2/3 completion, it's closer, but still off by nearly 150% of final ABR. That kind of low predictability is simply unacceptable, so the elegant solution is simply to leave it out.

I, along with cohorts from Massachusetts to Essex to Brisbane to Thailand, already have hundreds of hours of research invested in this as a part of a larger (since completed) tutorial project on preparing video for local and internet streaming delivery; not once since that volunteer project began in 2009 have I been accused of substituting anecdote for data. In fact, the comments we receive from other editors are exactly the opposite -- our redundant analyses and methodology are just too complex to be easily understood. I hope when going forward with your own inquiry, you will be able to adhere to your own standards in that respect. Just a note of caution: asking for data alone is not Empiricism, nor is it scientific, as one naturally ignores or trivializes everything that does not fit within their preconceptions. In addition to bulletproof methodology and testing, informed peer review and replicability are absolutely essential. In light of that, I'd rather not see any criticism of my testing, until we've seen your own. Best of luck.

Image
rollin_eng
Veteran User
Posts: 3512
Joined: Wed May 04, 2011 11:06 pm

Re: Estimating file size

Post by rollin_eng »

musicvid wrote:I am thankful you started your own thread. I'm entering finals month, and this will be my only reply on your topic for the remainder of the spring.
This is good to know as I don't think I will get to this quickly myself.
musicvid wrote: I can understand you not being able to visualize the many scenarios where a mid-encode estimate can not work, however these can be easily demonstrated.

Here is a three minute segment analysis of a BluRay-formatted video I produced in 2011. It was subsequently encoded in Handbrake using High preset, with no other changes. This segment consists of five short clips of varying complexity, followed by seven stills that are separated by small fades. This graphic begs the question, "At what point in a single pass will any estimating scheme approach the final ABR, and thus allow a usable file size estimate?" The answer for this example, and most of my purchased movies is, "The estimate will converge on the actual ABR near the end of the sample, and not sooner."
The problem we have here is that this 3 minute segment is just a single data point that you created, thus we are back to anecdotes.
musicvid wrote: So with the analysis already done,
Not really is it.
musicvid wrote: what is the average bitrate for this clip. What is the finished file size? (Look at the solid line on the histogram to figure the answer.) How do we know the actual bitrate before the end? Do we start logging data at the beginning, or wait until there is a change? At what point will any algorithm know a reasonable finished estimate? Is it worth adding a separate analysis pass in order to find out? If such a thing were possible, why wouldn't I want my estimate before the lengthy encoding step? Streaming content providers need to control this stuff up front -- that's why we don't use Constant Quality for that.

The bitrate ratio in this particular example is astronomical -- more than 2500:1, it appears. It is simple to see that by mid-encode, the cumulative ABR (file size predictor) is off by nearly 200% from the final ABR. At 2/3 completion, it's closer, but still off by nearly 150% of final ABR. That kind of low predictability is simply unacceptable, so the elegant solution is simply to leave it out.
Again this clip is nice but its still just a single data point that you created.
musicvid wrote: I, along with cohorts from Massachusetts to Essex to Brisbane to Thailand, already have hundreds of hours of research invested in this as a small part of a larger (since completed) project on preparing video for local and internet streaming delivery; not once since that project began in 2009 have I been accused of substituting anecdote for data. In fact, the comments we receive from other editors are exactly the opposite -- our redundant analyses and methodology are just too complex to be easily understood. I hope when going forward with your own inquiry, you will be able to adhere to your own standards in that respect. Just a note of caution: asking for data alone is not Empiricism, nor is it scientific, as one naturally ignores or trivializes everything that does not fit with their preconceptions. In addition to bulletproof methodology and testing, informed peer review and replicability is absolutely essential. In light of that, I'd rather not see any criticism of my testing, until I've seen your own. Best of luck.
This is awesome, please post the data and results here (or provide a link) so we can take a look. I am sure you have data for this and not anecdotes but until I see it im sure you can understand that it is anecdotal to me. Maybe this process could even help supplement your findings.

All my testing/findings will be posted here and I will encourage others to join in. I have no control over these boards thus cannot ignore or trivialize anything.

I am sorry that you feel I have criticized your testing on this subject however I have not actually seen any testing/results yet to criticize (I don't consider a 3 minute clip to be much of a test).

Thanks
mduell
Veteran User
Posts: 7180
Joined: Sat Apr 21, 2007 8:54 pm

Re: Estimating file size

Post by mduell »

Another consideration in addition to the bitrate variation is HB doesn't reliably know the length of the source video, which is of course critical to the extrapolation of filesize. This was one of the drivers of the elimination of the target filesize option, so you may want to review the various threads regarding missed target file sizes that were attributed to incorrect video lengths in the scan.
rollin_eng
Veteran User
Posts: 3512
Joined: Wed May 04, 2011 11:06 pm

Re: Estimating file size

Post by rollin_eng »

mduell wrote:Another consideration in addition to the bitrate variation is HB doesn't reliably know the length of the source video, which is of course critical to the extrapolation of filesize. This was one of the drivers of the elimination of the target filesize option, so you may want to review the various threads regarding missed target file sizes that were attributed to incorrect video lengths in the scan.
This really doesn't matter to me as handbrake reports a % completed so that is what I will use, if that is wrong then handbrake has other issues.
User avatar
JohnAStebbins
HandBrake Team
Posts: 5583
Joined: Sat Feb 09, 2008 7:21 pm

Re: Estimating file size

Post by JohnAStebbins »

rollin_eng wrote:
mduell wrote:Another consideration in addition to the bitrate variation is HB doesn't reliably know the length of the source video, which is of course critical to the extrapolation of filesize. This was one of the drivers of the elimination of the target filesize option, so you may want to review the various threads regarding missed target file sizes that were attributed to incorrect video lengths in the scan.
This really doesn't matter to me as handbrake reports a % completed so that is what I will use, if that is wrong then handbrake has other issues.
% complete is not accurate when the detected duration is not accurate.
rollin_eng
Veteran User
Posts: 3512
Joined: Wed May 04, 2011 11:06 pm

Re: Estimating file size

Post by rollin_eng »

JohnAStebbins wrote:
rollin_eng wrote:
mduell wrote:Another consideration in addition to the bitrate variation is HB doesn't reliably know the length of the source video, which is of course critical to the extrapolation of filesize. This was one of the drivers of the elimination of the target filesize option, so you may want to review the various threads regarding missed target file sizes that were attributed to incorrect video lengths in the scan.
This really doesn't matter to me as handbrake reports a % completed so that is what I will use, if that is wrong then handbrake has other issues.
% complete is not accurate when the detected duration is not accurate.
Perhaps we should remove all stats from the bottom then as you cannot be sure they are correct.

If the % is not accurate then nothing will be, correct?
User avatar
JohnAStebbins
HandBrake Team
Posts: 5583
Joined: Sat Feb 09, 2008 7:21 pm

Re: Estimating file size

Post by JohnAStebbins »

Correct. These stats are accurate most of the time. But there are degenerate cases where they are completely wrong.
Djfe
Bright Spark User
Posts: 178
Joined: Tue May 13, 2014 8:01 pm

Re: Estimating file size

Post by Djfe »

about the unreliable length of the source video: that may has to do with video sources like transportstreams which can contain errors and can be cutted by leaving out certain parts, right?
Djfe
Bright Spark User
Posts: 178
Joined: Tue May 13, 2014 8:01 pm

Re: Estimating file size

Post by Djfe »

What about taking guesses from the file size that is left, rather than the video duration? That should be more accurate, right?
rollin_eng
Veteran User
Posts: 3512
Joined: Wed May 04, 2011 11:06 pm

Re: Estimating file size

Post by rollin_eng »

JohnAStebbins wrote:Correct. These stats are accurate most of the time. But there are degenerate cases where they are completely wrong.
Of course, I assume there will always be outliers that will not be accurate.
User avatar
JohnAStebbins
HandBrake Team
Posts: 5583
Joined: Sat Feb 09, 2008 7:21 pm

Re: Estimating file size

Post by JohnAStebbins »

Djfe wrote:about the unreliable length of the source video: that may has to do with video sources like transportstreams which can contain errors and can be cutted by leaving out certain parts, right?
Transport streams are indeed the primary example where the duration is often inaccurate. This is primarily because transport stream have no header that identifies the duration and transport streams can and often do have timestamp discontinuities (i.e. timestamps jump forward or backward in time by large amounts). This makes estimating the duration based on sampling of a few timestamps difficult.
rollin_eng
Veteran User
Posts: 3512
Joined: Wed May 04, 2011 11:06 pm

Re: Estimating file size

Post by rollin_eng »

JohnAStebbins wrote:
Djfe wrote:about the unreliable length of the source video: that may has to do with video sources like transportstreams which can contain errors and can be cutted by leaving out certain parts, right?
Transport streams are indeed the primary example where the duration is often inaccurate. This is primarily because transport stream have no header that identifies the duration and transport streams can and often do have timestamp discontinuities (i.e. timestamps jump forward or backward in time by large amounts). This makes estimating the duration based on sampling of a few timestamps difficult.
If a file is not reporting data correctly then all stats will be wrong, I do not expect the file size calculator to be any different.
rollin_eng
Veteran User
Posts: 3512
Joined: Wed May 04, 2011 11:06 pm

Re: Estimating file size

Post by rollin_eng »

Djfe wrote:What about taking guesses from the file size that is left, rather than the video duration? That should be more accurate, right?
I don't know, but we can find out.
User avatar
JohnAStebbins
HandBrake Team
Posts: 5583
Joined: Sat Feb 09, 2008 7:21 pm

Re: Estimating file size

Post by JohnAStebbins »

Djfe wrote:What about taking guesses from the file size that is left, rather than the video duration? That should be more accurate, right?
You folks are welcome to run whatever experiments you need to satisfy your curiosity. But we already did all this long ago. HandBrake used to have a "target file size" option. It was inaccurate often enough that the forums were full of complaints about missing the target file size. So we (the HandBrake developers) already know from doing the experiment on a massive scale (millions of HandBrake users) that it produces unsatisfying results.
rollin_eng
Veteran User
Posts: 3512
Joined: Wed May 04, 2011 11:06 pm

Re: Estimating file size

Post by rollin_eng »

JohnAStebbins wrote:
Djfe wrote:What about taking guesses from the file size that is left, rather than the video duration? That should be more accurate, right?
You folks are welcome to run whatever experiments you need to satisfy your curiosity. But we already did all this long ago. HandBrake used to have a "target file size" option. It was inaccurate often enough that the forums were full of complaints about missing the target file size. So we (the HandBrake developers) already know from doing the experiment on a massive scale (millions of HandBrake users) that it produces unsatisfying results.
I don't want a target file size, I want to know if its reasonable to estimate file size of an ongoing encode, just like the current time estimate.
Djfe
Bright Spark User
Posts: 178
Joined: Tue May 13, 2014 8:01 pm

Re: Estimating file size

Post by Djfe »

I just totally forgot that source is likely to be encoded differently/or maybe isn't encoded at all (raw) -> so that even simple scenes might take up lots of space but little time to encode

so you would need a preanalysis for that (two-pass encoding)
mduell
Veteran User
Posts: 7180
Joined: Sat Apr 21, 2007 8:54 pm

Re: Estimating file size

Post by mduell »

rollin_eng wrote:This really doesn't matter to me as handbrake reports a % completed so that is what I will use, if that is wrong then handbrake has other issues.
Indeed HB has other issues, but you're extending the use of known bad data to new features. Not wise.
Djfe wrote:What about taking guesses from the file size that is left, rather than the video duration? That should be more accurate, right?
Only for inputs that are themselves constant bitrate.
rollin_eng wrote:I don't want a target file size, I want to know if its reasonable to estimate file size of an ongoing encode, just like the current time estimate.
I suppose HB could display an estimated file size as the current output size, which would also converge on the correct answer by the time the encode finished. For example with the encode musicvid posted the analysis of, this method would be closer to the actual output size than a bitrate-based estimate after about 2/3 of the encode was complete.
musicvid
Veteran User
Posts: 3666
Joined: Sat Jun 27, 2009 1:19 am

Re: Estimating file size

Post by musicvid »

There are exactly 180 data points in the demonstration I posted, numbered 0-179.
That's more than enough data from which to draw conclusions, your latest attempt at trivialization notwithstanding. My past research is not germaine to this discussion and is software-specfic, except for the points already noted. If you'd like to start yet another thread on production for streaming content delivery, I'll be happy to respond this summer, time permitting.
As I've already indicated, I'm deferring comment on this topic until I've seen your test results and methodology, both of which are completely unknown at this time.
I've actually posted a couple of anecdotes on the forum, having identified them as such at the time. I'll be sure to do likewise in the future.
There are some very capable individuals responding to your thread, so I defer with gratitude. Respect them, they really know their stuff.
As for me, I deal with background noise and sophomorisms ever day of the week . . .
musicvid
Veteran User
Posts: 3666
Joined: Sat Jun 27, 2009 1:19 am

Re: Estimating file size

Post by musicvid »

Oh yeah, what's the file size for the example above?
Remember, I said it's fifth grade math…
rollin_eng
Veteran User
Posts: 3512
Joined: Wed May 04, 2011 11:06 pm

Re: Estimating file size

Post by rollin_eng »

mduell wrote:
rollin_eng wrote:This really doesn't matter to me as handbrake reports a % completed so that is what I will use, if that is wrong then handbrake has other issues.
Indeed HB has other issues, but you're extending the use of known bad data to new features. Not wise.
If handbrake is producing bad data perhaps you can help correct that, maybe you could look into this and have it corrected while I am investigated file size estimating.
rollin_eng
Veteran User
Posts: 3512
Joined: Wed May 04, 2011 11:06 pm

Re: Estimating file size

Post by rollin_eng »

musicvid wrote:There are exactly 180 data points in the demonstration I posted, numbered 0-179.
That's more than enough data from which to draw conclusions, your latest attempt at trivialization notwithstanding. My past research is not germaine to this discussion and is software-specfic, except for the points already noted. If you'd like to start yet another thread on production for streaming content delivery, I'll be happy to respond this summer, time permitting.
As I've already indicated, I'm deferring comment on this topic until I've seen your test results and methodology, both of which are completely unknown at this time.
I've actually posted a couple of anecdotes on the forum, having identified them as such at the time. I'll be sure to do likewise in the future.
There are some very capable individuals responding to your thread, so I defer with gratitude. Respect them, they really know their stuff.
As for me, I deal with background noise and sophomorisms ever day of the week . . .
Sigh, if you want to stand by the 3 minute video that you created as being the only data needed to prove a point then I will leave that with you. Obviously I could create a video that that would give different results so I don't see what this proves.

You bought up your previous research as relevant but now state it is not, I don't know what to make of this so again i'll just let this go.

Yes, I haven't completely decided what my methodology will be but fear not, I will post it here.

I have read some of your previous anecdotes, but im not sure what that has to do with this topic.

I am not sure what your last 3 sentences are supposed to convey, I plan on listening to everyone who has useful information to add to the subject.

You seem to think I have some kind of agenda here and I can assure you there is none. I would like to know if its possible to 'estimate' the final size of a video encode, that is all. By hopefully using many data sources we will see if this is possible, if so great, if not then the data will be there to show other people who may ask the same question in the future.

If you or someone else has already tried this then please post the data/results so I can save myself a bunch of time.

Thanks
rollin_eng
Veteran User
Posts: 3512
Joined: Wed May 04, 2011 11:06 pm

Re: Estimating file size

Post by rollin_eng »

musicvid wrote:Oh yeah, what's the file size for the example above?
Remember, I said it's fifth grade math…
I'm not sure why you want me to guess the size of the file you created, obviously any estimation of file size will have more input than a screenshot.

And again the internet tough guy routine does not impress anyone.

I will guess at 2,600MB.

Thanks
musicvid
Veteran User
Posts: 3666
Joined: Sat Jun 27, 2009 1:19 am

Re: Estimating file size

Post by musicvid »

For all the incorrigible speculation, i merely wanted to see if you actually know how to figure a media file size based on all the necessary information (explained above).

It appears you may not. Good luck with those tests.
Locked